This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 WRI World Congress on Computer Science and Information Engineering
Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding
Los Angeles, California USA
March 31-April 02
ISBN: 978-0-7695-3507-4
Hashing long strings is difficult, especially when the alphabet is small. Chess and GO game board hashing has almost always been accomplished by using (letter position) pairs to index into a table of random numbers which are exclusiveor’d to create the hash value. The table of random numbers can be a huge source of different hash functions by varying any bit of any random number. Algorithms are developed here that can find hashes that are perfect, minimal, and even ordered for very large cases. The Human Genome is a great source of small alphabet strings that are long, so it is used as a test case here. An algorithm is presented that can solve for an ordered minimal perfect hash for the Genome. It can also solve for the lesser cases of minimal perfect and perfect hash at higher speed. A statistical criterion is derived for obtaining the ordered minimal perfect hash with high probability. The algorithm and the statistical criterion lead to a duplicate finding algorithm that might prove to be fastest for important cases.
Citation:
Albert Lindsey Zobrist, "Ordered Minimal Perfect Hash of the Human Genome and Implications for Duplicate Finding," csie, vol. 4, pp.106-111, 2009 WRI World Congress on Computer Science and Information Engineering, 2009
Usage of this product signifies your acceptance of the Terms of Use.