2023年7月31日发(作者:)

在消息认真中键入Hash函数

使用加密功能,如MD5或SHA哈沙为消息或认证成为一个标准的方法在许多互联网应用程序和协议。虽然很容易实施,这些机制通常是基于特设技术,缺乏健全的安全分析我们提出新的、简单、实用的认证方案施工的基础,信息在加密的哈希函数。我们的方案NMAC和HMAC ,被证明是安全的只要哈希函数有一些合理的潜在优势。 iloreover加密,我们的方案的安全紧密相关的哈希函数此外我们的计划是非常有效和实用的。他们的表现是本质上说这的潜在的哈希函数,iloreover他们使用哈希函数(或其压缩作用)作为一个黑盒,以便得到更广泛的图书馆代码或硬件可以用来实现他们S一个简单的方法.

1介绍

摘要本文针对上海,authenticationand问题带到forenformation哈希函数,如使用密码的安全需求MD5the互联网。

哈希算法是密码学主要的一部分。这是我们加密人员与泛滥的破解技术抗争的主力,我们知道他们最不喜欢就是密码图形。

一个hash算法提供了可变长度的输入字符串和固定长度的结果。输入的很简便就是“hash”的意思,这个词不是人名的缩写。你可以用hash来输入数据,固定长度的字符串允许我们使用hash值来引用实际字符串本身。

因为hash算法使用长的字符串,再变成一个短的。不可避免有2个字符串通过hash算法会得出一样的结果,这个在密码学中叫“碰撞”。举个你可以明白hash值的例子,假如Jon Callas和Jane Cannoy他们名字的hash值都是JC。碰撞是了解hash算法很重要的部分,我们将会在比特(bit)的单位上有更多的介绍。

尽管缩写是一个很简单描述原文的方式,缩写造成了密码学目的的hash算法的错误。密码学的hash算法。有很多用在加密技术中的属性。

●很难逆向运算hash算法。据hash知识,没有一个好的办法找到hash值对应的那个字符串。我们已经知道了hash算法会丢失数据,创造了一个简单的相对性。这个相同的性质也是名字缩写的:除了JC没其它的信息,不能找出我的名字,是JonCallas?是Jane Cannoy?还是?

●一个hash值,它应该很难确定一个本来的字符串。这个性质是缩写遗漏(initials

lack)。看缩写的时候如果知道名字的匹配是很简单的。在密码学中,我们想找出源信息和这个结果之间的联系,他们之间的关系是尽可能不透明的。

●确定一个源字符串,我们根据这个字符串的hash值很难找出第二个字符串。很难有效的改变字符串获得一个碰撞。也很难改变“我同意支付100美元”到“我同意支付500美元”而获得碰撞。注意这2个字符串之间只有1位不同。

●也很难找出碰撞的2个字符串的hash值。这个算法在很多不同的事情上给了我们灵活的想法,这里有一些例子:

●当你在PGP软件中输入密码的时候,我们使用hash算法来生成一个密钥。中间的过程就是hash算法,通常一遍遍的使用来降低破解者的暴力破解的风险。

●PGP软件的随机数生成器在传入数据后,会根据你键盘和鼠标的移动时时更新。这样使得观察者不确定这个值,也没有不变的随机数字。我们使用hash算法消除观察者的数据中的不均匀性。

●随机数生成器使用hash算法产生输出值。这个过程PGP软件也做了。

●文件完整性算法,使用hash算法可以很快的检查文件。比如:你可以保留文件的hash列表在你的电脑上。hash数据库中的值也变了,你就看到计算机内的文件变化了。软件分布系统站点通常有分布的

●复杂密码系统使用hash算法创建数据完整性作为它的一个系统组件,我们稍后会了解这个。注意几乎所有算法现在都在被广泛使用,这有一个假设它们不会发生碰撞。如果2个密钥发生了hash碰撞,任何一个密钥都可以解密文件。如果2个软件包有相同的hash值时,一个肯定被误认为是另外一个。

通用Hash算法

表格1列出了一些hash算法的共同点,特别是PGP使用的。

表格1:通用Hash算法

名称

MD5

大小(Bits) 描述

128 MD5是hash系列算法中的最低标准,PGP软件在PGP5.0版本以前使用。MD5的脆弱性在1996年第一次出现。MD5是MD4的改进,PGP软件不再使用它的原因是它是第一个被破解的通用hash算法

SHA-1 160 SHA-1是MD5的改进,由NIST设计,解决MD5的问题后被广泛使用。 RIPE-MD/160 160 RIPE-MD/160是一个和SHA-1差不多的Hash算法。设计RIPE-MD/160为了改善超过MD5。它被Reseaux IPEuropéens(RIPE)组织设计,而不是美国NIST我们认为它的安全性和SHA-1差不多。

SHA-256 256 SHA-256是美国NIST最新设计的新Hash算法。也属于“SH-2”的类型,它有和其它不同的内部结构,但和其它hash算法的基本结构都是一样的。

SHA-512

SHA-384

512

384

512这是“SH-2”算法的一种,和SHA-256差不多。

SHA-384有比SHA-512更小的输出。一般不常用SHA-384是因为除了大小以外没有任何优势,如果我们需要比SHA-256强度高的算法,我们会直接选SHA-512。同样SHA-224是SHA-256缩小版。

Hash算法难度

我确定你注意到我使用了一个很含糊的词“大约”。这是因为答案不是准确,只是接近平方根。大概的说,碰撞的机率是:

pro(pigeons,holes)1holes!

(holespigeons)!*holespigeons

其中,Prob是机率的意思,只表示函数名,pigeons是鸽子,holes是笼子洞。Pigeons和holes都是输入变量的名字。

省下你的数学运算。如果你解决了鸽子数目的问题,结果的机率是一个洞的2个鸽子里1面每一个都有的概率。可以算出约为1.2holes,对于我们使用的那个问题来说,我2们也可以认为等于holes。特别是当我们去处理一个非常大的数字的时候,这样去推测很方便,这个方法也是理论数学中被惯用的手法。

所以,如果我们有一个n-bit的hash算法,如果我们有2 个字符串的碰撞的机率相等。也就是说,160-bit的hash运算只有80-bit的安全性。280是很大的一个数字。大约2n2倍的阿伏伽德罗常数。阿伏伽德罗常数d是摩尔体积的分子数,或者用一个方便的东西表示,就是一汤勺水中水分子的数目。那是个很大的数。王小云 带着报告参加了2004年密码界峰会,她震撼了密码界。她没有用一张纸来展示如何碰撞,她仅仅只用了他们中的一部分。就像你看到的,因为碰撞很难发现。仅仅有128-bit的hash算法中的一部分中有碰撞,也就意味着碰撞已经出现了。对于密码分析学家的主要问题是“她知道我们不能够做什么?”6和月以后,她的技术扩展到攻击质数的160-bit的hash算法。

这就是我们在最后2年所总结的:

●王小云是最优秀的密码分析专家。她有着其它数学家没有的基础数学洞察力;她非常迅速的成为世界上为数不多的、最优秀的hash算法密码分析专家。

●一些其它的理论工作不是去进行应用实际,而是更多的思考。

●有很多议案关于如何修改剩下的算法来抵抗王小云的攻击。他们都非常棒,但是一个明显的问题是,“明年有什么攻击,这个修正可以解决吗?”当然,这个问题是不能回答的。我们不可能反对未知的攻击来保护我们的算法。无论如何,其中的很多议案确实是解决的好办法。一个简单的技术诸如当进行hash运算时使用每双字节(用AABBCC来代替ABC),或者插入0比特在每4个字节后面,或者添加随机数据在准备hash运算的数据之前,用这些办法解决了已知的问题。

●我们开始考虑一个如何设计一个好的hash算法的想法。在2005年10月,NIST主持了一个关于hash算法的工作组。密码专家开始考虑想出一个如何设计一个好的hash算法的想法。第2个工作组在2006年8月开始计划。同样也有像AES相似的竞争方式来产生一个新的hash算法。

●工程师的观点中也有一些好的想法。在PGP团队中,我们已经发扬了首创精神。

在PGP团队,我们开始转移MD5到1997年的水平。PGP5.0开始从MD5向SHA-1发展,保持MD5的唯一目的是为了向后兼容性。PGP8.0.3介绍了这个技术支持,也可以在阅读中找到,但是没有SHA-256、SHA-384和SHA-512的算法。PGP9.0开始从SHA-1向SHA-256发展。

Keying Hash Functions for Message Authentication

(Preliminaryversion)

MihirBellarey RanCanettizHugo Krawczykx January25,1996

The use of cryptographic hash functions like DM5 or SHA for message authentication has

become a standard approach in many Internet applications and protocols. Though very easy to

implement, these mechanisms are usually based on ad hoc techniques that lack a sound security

analysis

We present new, simple, and practical constructions of message authentication schemes

based

on a cryptographic hash function. Our schemes, 1}1iIAC and H1iIAC, are proven to be secure

as

long as the underlying hash function has some reasonable cryptographic strengths. 1iloreover,

the security of our schemes is tightly related to that of the hash function

In addition our schemes are efficient and practical. Their performance is essentially that of

the underlying hash function. 1iloreover they use the hash function (or its compression function)

as a black box, so that widely available library code or hardware can be used to implement them

in a simple way.

1 Introduction

This paper is about the authenticationand SHA, an issue brought to the forenformation using

cryptographic hash functions like MD5the security needs of the Internet.

Hash functions are an important part of cryptography. They are the workhorses that we

cryptographers use and abuse for all sorts of things, and yet we understand them least of all the

cryptographic primitives.

A hash function takes a variable-length input string and creates a fixed-length output.

That “hash” of the input is a shortcut, not unlike a person’s initials. You can refer to the

input string by its hash. The fact that it is a fixed-length string allows us to easily use the

hash value as a referrer to the actual string itself. Because a hash function takes a long string and reduces it to a short one, it is inevitable

that there will be two strings that hash to the same value, or collide in cryptographer-speak.

For example, the names Jon Callas and Jane Cannoy collide with initials to the hash of JC.

Collisions are important in the understanding of hash functions, and we’ll talk more about

them in a bit.

Although initials are an easy way to describe the basic concept, initials make a bad hash

function for cryptographic purposes. A cryptographic hash function has a number of other

properties that make it useful cryptographically.

• It should be hard to reverse a hash function. Knowing the hash, there should be no

good way to find the input string that generated it. Given that (typically) hash

functions lose data, this is a relatively easy property to create. The same property is

also true for initials: knowing JC and nothing else, there is no good way to get to my

name.

• Given a hash value, it should be hard to identify a possible source string. This

property is one that initials lack. It is very easy to look at a set of initials and know

if a name matches it. With a cryptographic hash function, however, we want the

relationship between a source and a result to be as opaque as possible.

• Given one source string, it should be hard to find a second string that collides with

its hash. It should be especially hard to change a string usefully and get a collision.

In an extreme case, it should be hard to change “I agree to pay $100” to “I agree to pay

$500” and have that collide. Note that the difference between the two strings is only a

single bit.

• It should also be hard to find two strings that collide in their hash values. These

requirements give us very flexible functions that are used for lots of different things.

Here are some examples:

• Random number generators themselves often use hash functions to produce their

output. The one in PGP software does.

• File integrity systems use hash functions as quick checks on the files. For example,

you can keep a list of the hashes of the files on your computer, and you can see if that file

has changed by comparing the hash of the file on disk to the one in the database.

Software distribution sites also often list the hash value of the distributed file so that

people who want to see if they have the right file can compute and compare hashes.

• Complex cryptographic systems that create data integrity use hash functions as a

component. We’ll talk more about them later. Note that for almost all of these uses,

there’s an assumption that there won’t be collisions. If two passphrases collide in

their hash, either can decrypt a file. If two software packages hash to the same value,

then one can be mistaken for the other.

Commonly Used Hash Functions

Table 1 lists some commonly used hash functions, especially the ones we presently

use in PGP software.

Name Size Description MD5 128bits

SHA-1 160bits

160bits

RIPE-MD/160

SHA-256

256bits

MD5 was the sole hash function that PGP software used

prior to PGP 5.0. Weaknesses in MD5 first showed up

in 1996. MD5 is itself an improvement on MD4, which

was never used in PGP software and was the first

common hash function to be fully broken.

SHA-1 appeared in PGP 5.0, and also in OpenPGP.

SHA-1 is an improvement on MD5 that was created by

NIST to be wider and also to correct problems in MD5.

RIPE-MD/160 is a hash function similar to SHA-1.

RIPE-MD/160 was created to be an improvement over

MD5. However, it was created by the European

Réseaux IP Européens (RIPE) organization rather than

the US NIST. We expect it has similar security

characteristics to SHA-1.

SHA-256 is one of a new family of hashes created by

the US NIST that are collectively called the “SHA-2”

family. It has different internal structure, but comes

SHA-512

SHA-384

from the same basic construction as the other hash

functions in this table.

512biThis is another member of the “SHA-2” family, along

ts with SHA-256

384biSHA-384 is a variant of SHA-512 that has a smaller

ts output. In general, SHA-384 is not used, because it has

no advantages over SHA-512 except for the hash size. It

runs at the same speed as SHA-512, so usually if we

need something stronger than SHA-256, we go directly

to SHA-512. There is also a SHA-224 which is a similar

truncation of SHA-256.

Table1 Commonly Used Hash Functions

Difficulties with Hash Functions

Presently (mid-2006), we know the suite of hash functions we have been using is not

perfect, and some of them are quite imperfect. These problems came to light in the summer

of 2004 when Xiaoyung Wang announced that she and her team produced collisions in a

number of hash functions [WANG04]. Adi Shamir, the “S” in the RSA algorithm, said at the

time, “Last week, I thought that hash functions were the component we understood best. Now I

see that they are the component we understand least.” In early 2005, Xiaoyung’s attacks were

extended to SHA-1, which had survived her first work [WANG05].

We are still coping with these problems, all of which revolve around hash function

collisions, two strings producing the same hash value. One of the axioms of the branch of

mathematics called combinatorics is called the Pigeonhole Principle. At its simplest, the Pigeonhole Principle states that if you have thirteen pigeons and only twelve pigeonholes, then

at least one hole must contain at least two pigeons. Pretty obvious, isn’t it? That’s why it’s an

axiom.

If you apply this principle to hash functions, consider sixteen-byte hashes. Also consider

the entire set of seventeen-byte strings. According to the Pigeonhole Principle, there are going to

be at least two original strings that produce the same sixteen-byte hash. As a matter of fact,

there have to be a whole lot of them. The collisions are the equivalent of pigeons lumping

themselves together. If the collisions are evenly distributed (and thus the hash function

perfect), then there will be 256 collisions per hash value, and according to the Pigeonhole

Principle, there has to be at least one hash with at least 256 collisions.

Finding a collision ought to be no better than guessing, but how hard is that?

Answering that

question raises another interesting mathematical problem called The Birthday Problem, which

we first saw when talking about block sizes. The probability that a given person has the same

birthday as Alice is about 1/365 11. But if you have a room full of people, what is the chance

that there will be a collision on their birthday? Specifically, with how many people are there

even odds that there will be two people in the room who share the same birthday?

The general case answer to this question is the same as finding collisions in a hash function.

We can think of a birthday as yet another hash function with perhaps better properties than

initials, but still nowhere near perfect. Nonetheless, birthdays are fairly randomly distributed12.

For birthdays, it turns out that the odds of a birthday collision are even at about 23 people. In

the general case, the odds are even at about the square root of the number of options.

I’m sure you noticed my use of the weasel-word “about.” This is because the answer

isn’t exactly the square root, but close to it. In the general case, the chance of collisions is

pro(pigeons,holes)1holes!

(holespigeons)!*holespigeonsn2So, if we have an n-bit hash function, there are even odds of a collision when we have

2

strings we’ve hashed. Thus, we say that a 160-bit hash function should have 80 bits of security.

280

is a very large number. It’s about twice Avogadro’s Number, which is the number of

molecules in a mole, or to put it in convenient terms, the number of molecules in a rounded

tablespoon of water. It’s a big number. When Wang came to Crypto 2004 with [WANG04]

in hand, it shook cryptographers up. She didn’t have a paper that showed how to create

collisions, she merely had a lot of them. As you can see, because collisions are supposed to be

hard to find, merely possessing a handful of collisions on each of a handful of 128-bit hash

functions means that something is up. For cryptographers, the main question was, “What does

she know that we don’t?” Six months later, her techniques were extended to attack the prime

160-bit hash function.

Here is a summary of what we’ve learned in the last two years:

• Wang is an excellent cryptanalyst. She doesn’t have any fundamental mathematical insights

that other mathematicians don’t have; she’s merely the world’s best hash function cryptanalyst by

leaps and bounds. • Some other theoretical work that wasn’t particularly practical is getting a lot of thought.

For example, a few months before [WANG04], John Kelsey and Bruce Schneier showed in

[KSHASH04] that when looking for a SHA-1 collision of a given string, you could do it in

2106work instead of 2160 but you need to have messages 260 long to be able to do so. Before Wang

showed flaws in how we were doing things, this was interesting but not practical. Now some of

us wonder if this impractical flaw is an indication of a structural problem. We don’t know, yet.

• There are a number of proposals on how to modify existing functions to withstand Wang’s

attacks. They’re all very good, but the obvious follow-on question is, “What new attack will

happen next year, that this fix doesn’t account for?” Of course, this question is unanswerable.

We just can’t protect against unknown attacks. However, a number of these proposed solutions

are practical to implement. Simple techniques like doubling every byte as you hash (instead

of hashing ABC, hash AABBCC) or inserting a zero byte after every four bytes [SY05] or

adding in some random data at the front of the data to be hashed protect against these

known problems.

• We’re starting to get an idea of how to design hash functions better. In October 2005, NIST

hosted a workshop on hash functions, and cryptographers are starting to get a better idea of

how to make good hash functions. A second workshop is planned for August 2006. There is

also growing support for a competition similar to the AES competition to produce new hash

functions.

•There are also good ideas on how to proceed from an engineering standpoint. At PGP

Corporation, we have been leading this initiative.

At PGP Corporation, we started migrating away from MD5 back in 1997. PGP 5.0 started

a gentle migration away from MD5 toward SHA-1, keeping MD5 solely for backwards

compatibility. PGP 8.0.3 introduced support for reading but not generating SHA-256,

SHA-384, and SHA-512. PGP 9.0 started a gentle migration from SHA-1 to SHA-256.

2023年7月31日发(作者:)

在消息认真中键入Hash函数

使用加密功能,如MD5或SHA哈沙为消息或认证成为一个标准的方法在许多互联网应用程序和协议。虽然很容易实施,这些机制通常是基于特设技术,缺乏健全的安全分析我们提出新的、简单、实用的认证方案施工的基础,信息在加密的哈希函数。我们的方案NMAC和HMAC ,被证明是安全的只要哈希函数有一些合理的潜在优势。 iloreover加密,我们的方案的安全紧密相关的哈希函数此外我们的计划是非常有效和实用的。他们的表现是本质上说这的潜在的哈希函数,iloreover他们使用哈希函数(或其压缩作用)作为一个黑盒,以便得到更广泛的图书馆代码或硬件可以用来实现他们S一个简单的方法.

1介绍

摘要本文针对上海,authenticationand问题带到forenformation哈希函数,如使用密码的安全需求MD5the互联网。

哈希算法是密码学主要的一部分。这是我们加密人员与泛滥的破解技术抗争的主力,我们知道他们最不喜欢就是密码图形。

一个hash算法提供了可变长度的输入字符串和固定长度的结果。输入的很简便就是“hash”的意思,这个词不是人名的缩写。你可以用hash来输入数据,固定长度的字符串允许我们使用hash值来引用实际字符串本身。

因为hash算法使用长的字符串,再变成一个短的。不可避免有2个字符串通过hash算法会得出一样的结果,这个在密码学中叫“碰撞”。举个你可以明白hash值的例子,假如Jon Callas和Jane Cannoy他们名字的hash值都是JC。碰撞是了解hash算法很重要的部分,我们将会在比特(bit)的单位上有更多的介绍。

尽管缩写是一个很简单描述原文的方式,缩写造成了密码学目的的hash算法的错误。密码学的hash算法。有很多用在加密技术中的属性。

●很难逆向运算hash算法。据hash知识,没有一个好的办法找到hash值对应的那个字符串。我们已经知道了hash算法会丢失数据,创造了一个简单的相对性。这个相同的性质也是名字缩写的:除了JC没其它的信息,不能找出我的名字,是JonCallas?是Jane Cannoy?还是?

●一个hash值,它应该很难确定一个本来的字符串。这个性质是缩写遗漏(initials

lack)。看缩写的时候如果知道名字的匹配是很简单的。在密码学中,我们想找出源信息和这个结果之间的联系,他们之间的关系是尽可能不透明的。

●确定一个源字符串,我们根据这个字符串的hash值很难找出第二个字符串。很难有效的改变字符串获得一个碰撞。也很难改变“我同意支付100美元”到“我同意支付500美元”而获得碰撞。注意这2个字符串之间只有1位不同。

●也很难找出碰撞的2个字符串的hash值。这个算法在很多不同的事情上给了我们灵活的想法,这里有一些例子:

●当你在PGP软件中输入密码的时候,我们使用hash算法来生成一个密钥。中间的过程就是hash算法,通常一遍遍的使用来降低破解者的暴力破解的风险。

●PGP软件的随机数生成器在传入数据后,会根据你键盘和鼠标的移动时时更新。这样使得观察者不确定这个值,也没有不变的随机数字。我们使用hash算法消除观察者的数据中的不均匀性。

●随机数生成器使用hash算法产生输出值。这个过程PGP软件也做了。

●文件完整性算法,使用hash算法可以很快的检查文件。比如:你可以保留文件的hash列表在你的电脑上。hash数据库中的值也变了,你就看到计算机内的文件变化了。软件分布系统站点通常有分布的

●复杂密码系统使用hash算法创建数据完整性作为它的一个系统组件,我们稍后会了解这个。注意几乎所有算法现在都在被广泛使用,这有一个假设它们不会发生碰撞。如果2个密钥发生了hash碰撞,任何一个密钥都可以解密文件。如果2个软件包有相同的hash值时,一个肯定被误认为是另外一个。

通用Hash算法

表格1列出了一些hash算法的共同点,特别是PGP使用的。

表格1:通用Hash算法

名称

MD5

大小(Bits) 描述

128 MD5是hash系列算法中的最低标准,PGP软件在PGP5.0版本以前使用。MD5的脆弱性在1996年第一次出现。MD5是MD4的改进,PGP软件不再使用它的原因是它是第一个被破解的通用hash算法

SHA-1 160 SHA-1是MD5的改进,由NIST设计,解决MD5的问题后被广泛使用。 RIPE-MD/160 160 RIPE-MD/160是一个和SHA-1差不多的Hash算法。设计RIPE-MD/160为了改善超过MD5。它被Reseaux IPEuropéens(RIPE)组织设计,而不是美国NIST我们认为它的安全性和SHA-1差不多。

SHA-256 256 SHA-256是美国NIST最新设计的新Hash算法。也属于“SH-2”的类型,它有和其它不同的内部结构,但和其它hash算法的基本结构都是一样的。

SHA-512

SHA-384

512

384

512这是“SH-2”算法的一种,和SHA-256差不多。

SHA-384有比SHA-512更小的输出。一般不常用SHA-384是因为除了大小以外没有任何优势,如果我们需要比SHA-256强度高的算法,我们会直接选SHA-512。同样SHA-224是SHA-256缩小版。

Hash算法难度

我确定你注意到我使用了一个很含糊的词“大约”。这是因为答案不是准确,只是接近平方根。大概的说,碰撞的机率是:

pro(pigeons,holes)1holes!

(holespigeons)!*holespigeons

其中,Prob是机率的意思,只表示函数名,pigeons是鸽子,holes是笼子洞。Pigeons和holes都是输入变量的名字。

省下你的数学运算。如果你解决了鸽子数目的问题,结果的机率是一个洞的2个鸽子里1面每一个都有的概率。可以算出约为1.2holes,对于我们使用的那个问题来说,我2们也可以认为等于holes。特别是当我们去处理一个非常大的数字的时候,这样去推测很方便,这个方法也是理论数学中被惯用的手法。

所以,如果我们有一个n-bit的hash算法,如果我们有2 个字符串的碰撞的机率相等。也就是说,160-bit的hash运算只有80-bit的安全性。280是很大的一个数字。大约2n2倍的阿伏伽德罗常数。阿伏伽德罗常数d是摩尔体积的分子数,或者用一个方便的东西表示,就是一汤勺水中水分子的数目。那是个很大的数。王小云 带着报告参加了2004年密码界峰会,她震撼了密码界。她没有用一张纸来展示如何碰撞,她仅仅只用了他们中的一部分。就像你看到的,因为碰撞很难发现。仅仅有128-bit的hash算法中的一部分中有碰撞,也就意味着碰撞已经出现了。对于密码分析学家的主要问题是“她知道我们不能够做什么?”6和月以后,她的技术扩展到攻击质数的160-bit的hash算法。

这就是我们在最后2年所总结的:

●王小云是最优秀的密码分析专家。她有着其它数学家没有的基础数学洞察力;她非常迅速的成为世界上为数不多的、最优秀的hash算法密码分析专家。

●一些其它的理论工作不是去进行应用实际,而是更多的思考。

●有很多议案关于如何修改剩下的算法来抵抗王小云的攻击。他们都非常棒,但是一个明显的问题是,“明年有什么攻击,这个修正可以解决吗?”当然,这个问题是不能回答的。我们不可能反对未知的攻击来保护我们的算法。无论如何,其中的很多议案确实是解决的好办法。一个简单的技术诸如当进行hash运算时使用每双字节(用AABBCC来代替ABC),或者插入0比特在每4个字节后面,或者添加随机数据在准备hash运算的数据之前,用这些办法解决了已知的问题。

●我们开始考虑一个如何设计一个好的hash算法的想法。在2005年10月,NIST主持了一个关于hash算法的工作组。密码专家开始考虑想出一个如何设计一个好的hash算法的想法。第2个工作组在2006年8月开始计划。同样也有像AES相似的竞争方式来产生一个新的hash算法。

●工程师的观点中也有一些好的想法。在PGP团队中,我们已经发扬了首创精神。

在PGP团队,我们开始转移MD5到1997年的水平。PGP5.0开始从MD5向SHA-1发展,保持MD5的唯一目的是为了向后兼容性。PGP8.0.3介绍了这个技术支持,也可以在阅读中找到,但是没有SHA-256、SHA-384和SHA-512的算法。PGP9.0开始从SHA-1向SHA-256发展。

Keying Hash Functions for Message Authentication

(Preliminaryversion)

MihirBellarey RanCanettizHugo Krawczykx January25,1996

The use of cryptographic hash functions like DM5 or SHA for message authentication has

become a standard approach in many Internet applications and protocols. Though very easy to

implement, these mechanisms are usually based on ad hoc techniques that lack a sound security

analysis

We present new, simple, and practical constructions of message authentication schemes

based

on a cryptographic hash function. Our schemes, 1}1iIAC and H1iIAC, are proven to be secure

as

long as the underlying hash function has some reasonable cryptographic strengths. 1iloreover,

the security of our schemes is tightly related to that of the hash function

In addition our schemes are efficient and practical. Their performance is essentially that of

the underlying hash function. 1iloreover they use the hash function (or its compression function)

as a black box, so that widely available library code or hardware can be used to implement them

in a simple way.

1 Introduction

This paper is about the authenticationand SHA, an issue brought to the forenformation using

cryptographic hash functions like MD5the security needs of the Internet.

Hash functions are an important part of cryptography. They are the workhorses that we

cryptographers use and abuse for all sorts of things, and yet we understand them least of all the

cryptographic primitives.

A hash function takes a variable-length input string and creates a fixed-length output.

That “hash” of the input is a shortcut, not unlike a person’s initials. You can refer to the

input string by its hash. The fact that it is a fixed-length string allows us to easily use the

hash value as a referrer to the actual string itself. Because a hash function takes a long string and reduces it to a short one, it is inevitable

that there will be two strings that hash to the same value, or collide in cryptographer-speak.

For example, the names Jon Callas and Jane Cannoy collide with initials to the hash of JC.

Collisions are important in the understanding of hash functions, and we’ll talk more about

them in a bit.

Although initials are an easy way to describe the basic concept, initials make a bad hash

function for cryptographic purposes. A cryptographic hash function has a number of other

properties that make it useful cryptographically.

• It should be hard to reverse a hash function. Knowing the hash, there should be no

good way to find the input string that generated it. Given that (typically) hash

functions lose data, this is a relatively easy property to create. The same property is

also true for initials: knowing JC and nothing else, there is no good way to get to my

name.

• Given a hash value, it should be hard to identify a possible source string. This

property is one that initials lack. It is very easy to look at a set of initials and know

if a name matches it. With a cryptographic hash function, however, we want the

relationship between a source and a result to be as opaque as possible.

• Given one source string, it should be hard to find a second string that collides with

its hash. It should be especially hard to change a string usefully and get a collision.

In an extreme case, it should be hard to change “I agree to pay $100” to “I agree to pay

$500” and have that collide. Note that the difference between the two strings is only a

single bit.

• It should also be hard to find two strings that collide in their hash values. These

requirements give us very flexible functions that are used for lots of different things.

Here are some examples:

• Random number generators themselves often use hash functions to produce their

output. The one in PGP software does.

• File integrity systems use hash functions as quick checks on the files. For example,

you can keep a list of the hashes of the files on your computer, and you can see if that file

has changed by comparing the hash of the file on disk to the one in the database.

Software distribution sites also often list the hash value of the distributed file so that

people who want to see if they have the right file can compute and compare hashes.

• Complex cryptographic systems that create data integrity use hash functions as a

component. We’ll talk more about them later. Note that for almost all of these uses,

there’s an assumption that there won’t be collisions. If two passphrases collide in

their hash, either can decrypt a file. If two software packages hash to the same value,

then one can be mistaken for the other.

Commonly Used Hash Functions

Table 1 lists some commonly used hash functions, especially the ones we presently

use in PGP software.

Name Size Description MD5 128bits

SHA-1 160bits

160bits

RIPE-MD/160

SHA-256

256bits

MD5 was the sole hash function that PGP software used

prior to PGP 5.0. Weaknesses in MD5 first showed up

in 1996. MD5 is itself an improvement on MD4, which

was never used in PGP software and was the first

common hash function to be fully broken.

SHA-1 appeared in PGP 5.0, and also in OpenPGP.

SHA-1 is an improvement on MD5 that was created by

NIST to be wider and also to correct problems in MD5.

RIPE-MD/160 is a hash function similar to SHA-1.

RIPE-MD/160 was created to be an improvement over

MD5. However, it was created by the European

Réseaux IP Européens (RIPE) organization rather than

the US NIST. We expect it has similar security

characteristics to SHA-1.

SHA-256 is one of a new family of hashes created by

the US NIST that are collectively called the “SHA-2”

family. It has different internal structure, but comes

SHA-512

SHA-384

from the same basic construction as the other hash

functions in this table.

512biThis is another member of the “SHA-2” family, along

ts with SHA-256

384biSHA-384 is a variant of SHA-512 that has a smaller

ts output. In general, SHA-384 is not used, because it has

no advantages over SHA-512 except for the hash size. It

runs at the same speed as SHA-512, so usually if we

need something stronger than SHA-256, we go directly

to SHA-512. There is also a SHA-224 which is a similar

truncation of SHA-256.

Table1 Commonly Used Hash Functions

Difficulties with Hash Functions

Presently (mid-2006), we know the suite of hash functions we have been using is not

perfect, and some of them are quite imperfect. These problems came to light in the summer

of 2004 when Xiaoyung Wang announced that she and her team produced collisions in a

number of hash functions [WANG04]. Adi Shamir, the “S” in the RSA algorithm, said at the

time, “Last week, I thought that hash functions were the component we understood best. Now I

see that they are the component we understand least.” In early 2005, Xiaoyung’s attacks were

extended to SHA-1, which had survived her first work [WANG05].

We are still coping with these problems, all of which revolve around hash function

collisions, two strings producing the same hash value. One of the axioms of the branch of

mathematics called combinatorics is called the Pigeonhole Principle. At its simplest, the Pigeonhole Principle states that if you have thirteen pigeons and only twelve pigeonholes, then

at least one hole must contain at least two pigeons. Pretty obvious, isn’t it? That’s why it’s an

axiom.

If you apply this principle to hash functions, consider sixteen-byte hashes. Also consider

the entire set of seventeen-byte strings. According to the Pigeonhole Principle, there are going to

be at least two original strings that produce the same sixteen-byte hash. As a matter of fact,

there have to be a whole lot of them. The collisions are the equivalent of pigeons lumping

themselves together. If the collisions are evenly distributed (and thus the hash function

perfect), then there will be 256 collisions per hash value, and according to the Pigeonhole

Principle, there has to be at least one hash with at least 256 collisions.

Finding a collision ought to be no better than guessing, but how hard is that?

Answering that

question raises another interesting mathematical problem called The Birthday Problem, which

we first saw when talking about block sizes. The probability that a given person has the same

birthday as Alice is about 1/365 11. But if you have a room full of people, what is the chance

that there will be a collision on their birthday? Specifically, with how many people are there

even odds that there will be two people in the room who share the same birthday?

The general case answer to this question is the same as finding collisions in a hash function.

We can think of a birthday as yet another hash function with perhaps better properties than

initials, but still nowhere near perfect. Nonetheless, birthdays are fairly randomly distributed12.

For birthdays, it turns out that the odds of a birthday collision are even at about 23 people. In

the general case, the odds are even at about the square root of the number of options.

I’m sure you noticed my use of the weasel-word “about.” This is because the answer

isn’t exactly the square root, but close to it. In the general case, the chance of collisions is

pro(pigeons,holes)1holes!

(holespigeons)!*holespigeonsn2So, if we have an n-bit hash function, there are even odds of a collision when we have

2

strings we’ve hashed. Thus, we say that a 160-bit hash function should have 80 bits of security.

280

is a very large number. It’s about twice Avogadro’s Number, which is the number of

molecules in a mole, or to put it in convenient terms, the number of molecules in a rounded

tablespoon of water. It’s a big number. When Wang came to Crypto 2004 with [WANG04]

in hand, it shook cryptographers up. She didn’t have a paper that showed how to create

collisions, she merely had a lot of them. As you can see, because collisions are supposed to be

hard to find, merely possessing a handful of collisions on each of a handful of 128-bit hash

functions means that something is up. For cryptographers, the main question was, “What does

she know that we don’t?” Six months later, her techniques were extended to attack the prime

160-bit hash function.

Here is a summary of what we’ve learned in the last two years:

• Wang is an excellent cryptanalyst. She doesn’t have any fundamental mathematical insights

that other mathematicians don’t have; she’s merely the world’s best hash function cryptanalyst by

leaps and bounds. • Some other theoretical work that wasn’t particularly practical is getting a lot of thought.

For example, a few months before [WANG04], John Kelsey and Bruce Schneier showed in

[KSHASH04] that when looking for a SHA-1 collision of a given string, you could do it in

2106work instead of 2160 but you need to have messages 260 long to be able to do so. Before Wang

showed flaws in how we were doing things, this was interesting but not practical. Now some of

us wonder if this impractical flaw is an indication of a structural problem. We don’t know, yet.

• There are a number of proposals on how to modify existing functions to withstand Wang’s

attacks. They’re all very good, but the obvious follow-on question is, “What new attack will

happen next year, that this fix doesn’t account for?” Of course, this question is unanswerable.

We just can’t protect against unknown attacks. However, a number of these proposed solutions

are practical to implement. Simple techniques like doubling every byte as you hash (instead

of hashing ABC, hash AABBCC) or inserting a zero byte after every four bytes [SY05] or

adding in some random data at the front of the data to be hashed protect against these

known problems.

• We’re starting to get an idea of how to design hash functions better. In October 2005, NIST

hosted a workshop on hash functions, and cryptographers are starting to get a better idea of

how to make good hash functions. A second workshop is planned for August 2006. There is

also growing support for a competition similar to the AES competition to produce new hash

functions.

•There are also good ideas on how to proceed from an engineering standpoint. At PGP

Corporation, we have been leading this initiative.

At PGP Corporation, we started migrating away from MD5 back in 1997. PGP 5.0 started

a gentle migration away from MD5 toward SHA-1, keeping MD5 solely for backwards

compatibility. PGP 8.0.3 introduced support for reading but not generating SHA-256,

SHA-384, and SHA-512. PGP 9.0 started a gentle migration from SHA-1 to SHA-256.