[A51] The call of Kraken

Sascha Krissler sascha.krissler at web.de
Sun Jul 25 23:11:40 CEST 2010

In theory the math works thusly:

40 tables.
8   round functions per table
-> 320 candidate chains per keystream segment
64 keystream segments per burst
1   disk access per candidate chain
-> 20480 disk accesses per burst
4kbyte per disk access, 80mbyte per burst, 200 seconds with a 10ms HD, 10 seconds with a 500usec SSD.

20480 chains have to be processed in parallel on the GPU, the chain length is 2^15.
Half of the chains do not produce a hit in the tables, so their length is 2^14 (which
means after the initial chain regeneration there is no need to check the start value
from the table, cause you do not find one).
A single SIMD engine can produce around 32M A5/1 chain links per second or 1000 chains
of size 2^15. The 5830M can thus produce 10k chains per second and needs 1.5 seconds
per burst to compute the candidate chains. Lets assume 3 seconds there cause
the lookup code is not as efficient as the generation code.
So unless you plan to use 5 SSD disks, the 5830M will not be to small.
Also, the 5830M is saturated with 81920 chains, which is 4 bursts in parallel, below that
it will be just as fast as a 5870 (minus the clock difference)

Note this is theoretical, you would implicitly signup as beta tester if you bought some hardware now,
but the order of magnitude is accurate.

>On Sun, 2010-07-25 at 21:21 +0200, Fabio Pietrosanti (naif) wrote:
>> Regarding the computer hardware setup, do you think that may be
>> feasible, following your testing environment to use a notebook?
>> Maybe some notebook, dedicated for gaming, have very strong ATI
>> Mobility Radeon HD 5800 series card:
>> http://www.amd.com/us/products/notebook/graphics/ati-mobility-hd-5800/Pages/ati-mobility-hd-5800.aspx
>> http://www.notebookcheck.net/ATI-Mobility-Radeon-HD-5830.24733.0.html
>Cracking is relative light on the GPU - but a decent card will be nice
>to keep the latency down, but one needn't go all out on the graphics
>card either.
>> Also the disk i/o bottleneck is important, so i just made some
>> scouting of some notebook that had other than powerful ATI card and
>> multicore CPU also eSATA and USB3 interface. Found this HP Envy 15:
>> http://hpfansite.com/hp-envy/hp-2010-envy-15-review-ati-mobility-radeon-5830-core-i7-quad/
>>       * 1.73 GHz Intel Core i7 820M quad core processor
>>       * 8 GB of DDR3 RAM (2 X 4 slots)
>>       * 1 GB ATI Mobility Radeon 5830 graphics
>>       * 2 USB ports, USB + eSata, HDMI
>> SATA3 have a bandwidth up to 600MB/s, eSATA can do up to 300MB/s with
>> USB3 that can reach up 625MB/s:
>> http://en.wikipedia.org/wiki/List_of_device_bit_rates
>> In such condition, which could be the most performance hard drive
>> setup for lookup speed?
>> I am thinking about a possible layout like this:
>> - 400GB on internal disk drive
>> - 1TB on USB3 external drive (625MB/s maximum)
>> - 600GB on eSata external drive (300MB/s maximum)
>> Does it sounds reasonable for a mobile setup of the airprobe/kraken
>> environment.
>> I mean which is the disk i/o bandwidth are you currently experiencing
>> with the 1.5 minutes timing?
>This sounds reasonable, but once you bring a GPU to the party, disk
>access time immediately becomes your new bottle neck, up to the point
>where decompressing the table and the final bit of search swamps the
>CPU. I suspect you need SSD storage, or an insanely fast disk array to
>see this happen.
>I have done some tuning on the code, and the time it takes to analyze
>one single burst is down to 60~70 seconds, and this is bound by the
>slowest disks that only deliver around 60 random reads pr second, while
>the faster disk are considerably faster.
>For instance on a laptop, the internal drive is probably slower than
>many external drives. This difference in performance can be addressed by
>storing fewer tables on the slower disks.
>Another way to get around the speed difference could be to have kraken
>accept 4-8 bursts, and process them all in parallel, and when a key is
>found the work queue is flushed. In this scenario it is more likely that
>the faster disk will manage to find a key before they run out of work.
>In short, one needs to optimize for random reads across 1.7TB* of
>storage. IO bandwidth is not that much of an issue, more disks and
>faster disks are always better.
>* Tables were originally computed at 2TB, but through better compression
>they only need 1.7TB of storage.
>A51 mailing list
>A51 at lists.reflextor.com
WEB.DE DSL ab 19,99 Euro/Monat. Bis zu 150,- Euro Startguthaben und 
50,- Euro Geldprämie inklusive! https://freundschaftswerbung.web.de

More information about the A51 mailing list