CRC 32 vs CRC 64 vs CRC 128 vs MD5

Discussion:

(too old to reply)

zhengyu

2003-07-20 04:36:40 UTC

hello,

I got a few questions on one way hashing function.

I was investigating CRC32 and found that it wasn't particularly good at
doing signatures. Then I looked at CRC64, it was much collision-safer and
computationally it is much better than MD5, now, if I do CRC128, would it be
just as safe as MD5?? If so CRC128 is probably a lot more computationally
efficient than MD5.

Could someone help me on this please?

Jimmy

Francois Grieu

2003-07-20 05:21:59 UTC

Permalink

if I do CRC128, would it be just as safe as MD5?

Hint: with a generic CRC using a degree n polynomial, the n+1 bits
sequence consisting of the polynomial coefficients gives null CRC,
as well as the sequence of n+1 zero bits.

François Grieu

Bill Unruh

2003-07-20 15:52:04 UTC

Permalink

"zhengyu" <***@comcast.net> writes:

]hello,

] I got a few questions on one way hashing function.

] I was investigating CRC32 and found that it wasn't particularly good at
]doing signatures. Then I looked at CRC64, it was much collision-safer and
]computationally it is much better than MD5, now, if I do CRC128, would it be
]just as safe as MD5?? If so CRC128 is probably a lot more computationally
]efficient than MD5.

For what purpose? For cryptographic purposes? No, no crc is
cryptographically safe. Ie it is easy for an attacker to find another
message which has the same hash as your message. If you want a
dictionary hash, or a database hash, go ahead and use crc.

John E. Hadstate

2003-07-20 16:15:17 UTC

Permalink

Post by zhengyu
hello,
I got a few questions on one way hashing function.
I was investigating CRC32 and found that it wasn't particularly good at
doing signatures. Then I looked at CRC64, it was much collision-safer and
computationally it is much better than MD5, now, if I do CRC128, would it be
just as safe as MD5??

Safe for what? As Paul pointed out, the CRCs (and other error detection
codes) have certain properties that guarantee bounds on the probabilities of
detecting various patterns of errors. It is assumed that these errors are
introduced by random sources. In other words, they are not being introduced
deliberately.

Post by zhengyu
If so CRC128 is probably a lot more computationally
efficient than MD5.

Since the two don't do comparable things, and are not interchangeable,
comparing computational efficiencies is meaningless.

If detection of corruption by a random adversary is your goal, you could
probably use a truncated cryptographic hash in place of a CRC-32 and be
safer than if you tried to use a CRC-32 to protect against a determined
adversary. However, a CRC *guarantees* detection of *all* single-bit
errors. I doubt that there are similar guarantees for a truncated
cryptographic hash.

For example, I can envision a case where you could change 1 bit of the
message text and change 80 bits in the cryptographic hash, but none of those
bits are included in the truncated portion of the hash.

zhengyu

2003-07-20 17:53:32 UTC

Permalink

I wasn't doing cryptography or error correction.

I need a hardware implementation of hashing function for random strings
(such as URL) that doesn't
colide very often.

If above is the goal, would CRC64 better suited than CRC32??

Post by John E. Hadstate

Post by John E. Hadstate
be

Post by zhengyu
just as safe as MD5??

Post by zhengyu
If so CRC128 is probably a lot more computationally
efficient than MD5.

Since the two don't do comparable things, and are not interchangeable,
comparing computational efficiencies is meaningless.
If detection of corruption by a random adversary is your goal, you could
probably use a truncated cryptographic hash in place of a CRC-32 and be
safer than if you tried to use a CRC-32 to protect against a determined
adversary. However, a CRC *guarantees* detection of *all* single-bit
errors. I doubt that there are similar guarantees for a truncated
cryptographic hash.
For example, I can envision a case where you could change 1 bit of the
message text and change 80 bits in the cryptographic hash, but none of those
bits are included in the truncated portion of the hash.

John E. Hadstate

2003-07-20 23:54:30 UTC

Permalink

Post by zhengyu
I wasn't doing cryptography or error correction.

Gee, think how much time we could have saved if you'd said this at the
beginning.

Post by zhengyu
I need a hardware implementation of hashing function for random strings
(such as URL) that doesn't
colide very often.
If above is the goal, would CRC64 better suited than CRC32??

CRC-64 will have a much lower probability of collision (by a factor of about
2**16) than CRC-32.

That said, let's think about what a collision is likely to mean in your
application. If you're using the hash to index a hash table, you are
probably not using 32 bits of the CRC-32 and you're certainly not using 64
bits of the CRC-64. In both cases, you are truncating the hash.

If, when you compute the hash, you find that the indexed row in the table is
used by a URL that is not equal to the URL on which you just computed the
hash (collision), you will initiate a sequential search from that point to
find an unused row. The collision potential needs to be calculated based on
the number of rows in your hash table, not the size of your hash.

Jimmy Zhang

2003-07-21 19:58:02 UTC

Permalink

What if the string I deal with are mostly text and not very long.
Is CRC 64 the best choice for avoid colision?

Jimmy

Post by John E. Hadstate

Post by zhengyu
I wasn't doing cryptography or error correction.

Gee, think how much time we could have saved if you'd said this at the
beginning.

Post by zhengyu
I need a hardware implementation of hashing function for random strings
(such as URL) that doesn't
colide very often.
If above is the goal, would CRC64 better suited than CRC32??

CRC-64 will have a much lower probability of collision (by a factor of about
2**16) than CRC-32.
That said, let's think about what a collision is likely to mean in your
application. If you're using the hash to index a hash table, you are
probably not using 32 bits of the CRC-32 and you're certainly not using 64
bits of the CRC-64. In both cases, you are truncating the hash.
If, when you compute the hash, you find that the indexed row in the table is
used by a URL that is not equal to the URL on which you just computed the
hash (collision), you will initiate a sequential search from that point to
find an unused row. The collision potential needs to be calculated based on
the number of rows in your hash table, not the size of your hash.

John E. Hadstate

2003-07-21 21:02:36 UTC

Permalink

Post by Jimmy Zhang
What if the string I deal with are mostly text and not very long.
Is CRC 64 the best choice for avoid colision?
Jimmy

From what you have said so far, I do not think so.

There are two types of collision here. The first type is when two URLs map
to the same CRC. To prevent that type of collision, CRC-64 is likely better
than CRC-32.

The second type of collision occurs when you try to map hash values into
hash table indices. I don't know how you plan to map hash values to hash
table indices, but I think the probability of collisions is going to be
governed by the number of rows in your hash table. If you have less than
4,294,967,296 rows in your hash table, you can use CRC-32 and it will be
just as effective as if you had used CRC-64.

Jimmy Zhang

2003-07-22 23:55:29 UTC

Permalink

What if I don't use the hash value as index?
What if I only store the hash value and do brutal force comparison?

Post by John E. Hadstate

Post by Jimmy Zhang
What if the string I deal with are mostly text and not very long.
Is CRC 64 the best choice for avoid colision?
Jimmy

From what you have said so far, I do not think so.
There are two types of collision here. The first type is when two URLs map
to the same CRC. To prevent that type of collision, CRC-64 is likely better
than CRC-32.
The second type of collision occurs when you try to map hash values into
hash table indices. I don't know how you plan to map hash values to hash
table indices, but I think the probability of collisions is going to be
governed by the number of rows in your hash table. If you have less than
4,294,967,296 rows in your hash table, you can use CRC-32 and it will be
just as effective as if you had used CRC-64.

Bill Unruh

2003-07-23 03:53:06 UTC

Permalink

"Jimmy Zhang" <***@comcast.net> writes:

]What if I don't use the hash value as index?
]What if I only store the hash value and do brutal force comparison?

Why? That seems a totally useless use of a hash. Each use requires a
search that is of order the size of the database and that many hash
computations.
Maybe you had better tell us again what you want the hash for?

]"John E. Hadstate" <***@null.nil> wrote in message
]news:3f1c518e$***@news.vic.com...
]>
]> "Jimmy Zhang" <***@comcast.net> wrote in message
]> news:eBXSa.115836$***@sccrnsc02...
]> > What if the string I deal with are mostly text and not very long.
]> > Is CRC 64 the best choice for avoid colision?
]> >
]> > Jimmy
]> >
]>
]> From what you have said so far, I do not think so.
]>
]> There are two types of collision here. The first type is when two URLs
]map
]> to the same CRC. To prevent that type of collision, CRC-64 is likely
]better
]> than CRC-32.
]>
]> The second type of collision occurs when you try to map hash values into
]> hash table indices. I don't know how you plan to map hash values to hash
]> table indices, but I think the probability of collisions is going to be
]> governed by the number of rows in your hash table. If you have less than
]> 4,294,967,296 rows in your hash table, you can use CRC-32 and it will be
]> just as effective as if you had used CRC-64.
]>
]>
]>
]>

Mark Wooding

2003-07-23 08:31:02 UTC

Permalink

Post by Bill Unruh
Why? That seems a totally useless use of a hash. Each use requires a
search that is of order the size of the database and that many hash
computations.

If your search keys are hopelessly redundant, and you adjoin a hash to
them, you can speed things up by a a logarithmic factor. I'm not saying
it's sensible...

Post by Bill Unruh
Maybe you had better tell us again what you want the hash for?

Yup.

-- [mdw]

John E. Hadstate

2003-07-23 13:58:17 UTC

Permalink

Post by Jimmy Zhang
What if I don't use the hash value as index?
What if I only store the hash value and do brutal force comparison?

<English lesson>
That's "brute force".
</English lesson>

To reiterate, CRC-64 is less likely to have collisions (given random data,
which you don't have) than CRC-32 by a factor of about 65536.

At this point, I can't tell you what to do. If I were doing a "hardware
implementation" of some application like you've hinted at, the hardware
would include a P-IV processor, 128 Megs of RAM, a 192 Meg Silicon Disk and
2 Ethernet ports all happily ensconced on a VME form factor board set and I
would not be considering either CRC-32 or CRC-64 for my hash algorithm!

Jimmy Zhang

2003-07-24 00:05:50 UTC

Permalink

John,

Actually my goal is to check the uniqueness of every member for a given
string pool of a fixed size (less than 256) so I can actually afford to
save the hash key value in registers. The reason I am using CRC64 is that
the signature is considerably shorter than the max string length.

Thanks,
Jimmy

Post by John E. Hadstate

Post by Jimmy Zhang
What if I don't use the hash value as index?
What if I only store the hash value and do brutal force comparison?

<English lesson>
That's "brute force".
</English lesson>
To reiterate, CRC-64 is less likely to have collisions (given random data,
which you don't have) than CRC-32 by a factor of about 65536.
At this point, I can't tell you what to do. If I were doing a "hardware
implementation" of some application like you've hinted at, the hardware
would include a P-IV processor, 128 Megs of RAM, a 192 Meg Silicon Disk and
2 Ethernet ports all happily ensconced on a VME form factor board set and I
would not be considering either CRC-32 or CRC-64 for my hash algorithm!

Terry Ritter

2003-07-21 03:36:29 UTC

Permalink

Post by zhengyu
I wasn't doing cryptography or error correction.
I need a hardware implementation of hashing function for random strings
(such as URL) that doesn't
colide very often.
If above is the goal, would CRC64 better suited than CRC32??

CRC is ideal for hardware implementation. In general,
the advantage of a CRC increases with size. We would
expect a 32-bit CRC to start to have collisions after
about 2**16 or 65k items. So if you expect many more
than that, a 64-bit CRC would be better for you. And
if you expect more than 2**32 items, a 128-bit CRC would
be better yet.

In any case, you do have to be able to handle a collision
if, unexpectedly, that does occur.

---
Terry Ritter http://www.ciphersbyritter.com/
Crypto Glossary http://www.ciphersbyritter.com/GLOSSARY.HTM