The Kember Identity
Posted May 2nd, 2009 by ConorThis morning on twitter I came across something called The Kember Identity. Elliott Kember seems to have thought of something that simply hasn’t occured to the rest of the world. That thing has been aptly named “The Kember Identity”. It’s homepage is located here.
Anyway the Kember Identity is an md5 hash. Each md5 hash contains 32 characters so the KID is the theoretical hash that when it is encrypted it returns itself. At the moment this is a theory as it has not been proven. So Elliott has started a competition to see if anyone can figure out if the KID actually exists and if so what it is.
I have chosen to enter this competition with a PHP script that I wrote just a few minutes ago. It’s fairly simple and I have added a html table in there to clean it up a bit. Instead of computing endlessly this script displays all failed equations as it goes. When it finds the KID it will stop and show it in bright green. I ran the script for a few minutes and got to 200,000 hashes with no luck. Also FireFox froze a few times while running it. The best thing to do is just leave it to do it’s job and don’t do anything else while it is working!
You can view it here: http://files.macaoidh.name/php/the-kember-identity/md5.php
<?php
function encript($str){ return md5($str); }
function generateSum(){ return md5(mt_rand()); }
function loKIDe(){
$s=generateSum();
$md5=encript($s);
echo '<tr><td>'.$s.'</td><td>'.$md5.'</td>';
if($s!=$md5) return false;
else{
echo '<tr style="color:green"><td>'.$s.'</td><td>'.$md5.'</td></tr>';
return true;
}
}
echo '
<h1>Finding the Kember Identity...</h1>
<table>
<tr>
<th>32 Digit Number</th>
<th>Md5 Of Num</th>
<th>Count</th>
</tr>
';
$num='';
while(loKIDe()==false){
$num++;
echo '<td>'.$num.'</td></tr>';
}
echo '</table><br/><br/> If you are seeing this then the line above is equal';
?>
This script should, in theory, find the KID. But it is really a matter of time. I am going to keep it running all night tonight and see if I have any luck. Kae is thinking of writing a JavaScript version that will spread the load between multiple computers thus reducing calculation time. So the race is on! I hope that I am the first to find this magical, mysterious number!
Hi Conor,
32 characters = 256 bits
that means to do a brute force search like you’re doing means doing 2^256 searches
try measuring the time it takes to do one search and then multiply by 2^256
i think you will find that it is a huge number and the universe will have ended by the time you have finished your search
approaches to speeding up the search are using C or even assembler to code the search
but the best approach is to choose better candidates likely to return an identity
this is likely to require extensive mathematic analysis
-D.
it’s 32 hex characters, each of which represents a nibble, the search space is only 2^128. Hashes are designed to be enormously sensitive to input. But even if this is perfect, then every possible string has a probability of 1/^2^128 of being a particular hash (in this case itself). The odds are therefore pretty good, with the probability of there being at least one match (1 – 1/2^128). You become more likely than not to have found under half-way through the set, so the true search space is
smaller again.
You can get a rough idea of how quickly your hardware can perform MD5s with the “openssl speed” command:
colmmacc@minerva (~) $ openssl speed md5
Doing md5 for 3s on 16 size blocks: 4567334 md5′s in 3.00s
Doing md5 for 3s on 64 size blocks: 3846399 md5′s in 3.00s
Doing md5 for 3s on 256 size blocks: 2627654 md5′s in 2.98s
Doing md5 for 3s on 1024 size blocks: 1152486 md5′s in 3.00s
Doing md5 for 3s on 8192 size blocks: 182798 md5′s in 3.00s
OpenSSL 0.9.8g 19 Oct 2007
built on: Thu Mar 26 21:30:51 UTC 2009
options:bn(64,64) md2(int) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr2)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -O3 -Wa,–noexecstack -g -Wall -DMD32_REG_T=int -DMD5_ASM
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 24359.11k 82056.51k 225731.35k 393381.89k 499160.41k
If we say 4 million a second. It will probably take about 1,348,785,383,850,751,773,621,316 CPU years to find the answer.
That’s a long time. I will be dead before that happens.
But there is an upside. I have been informed that an organisation, not to be named, is attempting to crack the md5 encryption method by creating an archive of all possible md5 values. This is an enormous task. I think that it is a bad idea because when it is completed there will no longer be any use for md5 encryption. sha1 is the future!
Anyway my eventual point is by undergoing this process they will recover the Kember Identity, but it might take a while until they get to the 32nd character since they have only done 8 characters in the last few years. They also have a few of the worlds most powerfull super computers on their side.
SO i’m going to sit back and wait for those guys. But I am sure that the KID does exist! :L
Ok, firstly MD5 is not an encryption algorithim, well – it’s not a cipher – it’s a hash. Even if the full list of MD5s is created (and I wonder where they would store such a list, as there are fewer particles in the universe) it wouldn’t render MD5 totally useless – it would still hash things just fine.
Secondly, MD5 has /already/ been broken, there are published collisions. SHA-256 is now, and there is an contest under way to replace it, because the general class of hash functions is weak.
Thirdly, although there are 2^128 hashes, there is an infinite amount of inputs that could generate each and every one of those hashes. There’s no reason at all why the Kember identity would need to be discovered as part of a process to identify an input for each hash.
Lastly – supercomputers aren’t very good at calculating MD5′s. It’s not a FLOP heavy process, and no coordination is neccessary. It’s better to distribute the load SETI style, or put the task on a cloud.
The KID may exist, the probability is very high, but it’s not a sureity.
I suppose saying the KID exists is like saying God exists. People think that it exists but no one as yet has been able to prove it
One idea a friend of mine had was to write it in AJAX and PHP. Using those techniques he would be able to spread the load onto multiple computers.
But it would still take ages!
Actually, the probability that the KID exists, presuming MD5 is randomly distributed over its range (it should be) is 1 – ((2 ^ 128 – 1) / (2 ^ 128)) ^ (2 ^ 128). Which is 0.63. Not all that high.
Also FireFox froze a few times while running it. The best thing to do is just leave it to do it’s job and don’t do anything else while it is working!
ha ha!! thats why you dont use a web language for number crunching and then try to view it in your browser