HashDB is a community-sourced library of hashing algorithms used in malware.
HashDB can be used as a stand alone hashing library, but it also feeds the HashDB Lookup Service run by OALabs. This service allows analysts to reverse hashes and retrieve hashed API names and string values.
HashDB can be cloned and used in your reverse engineering scripts like any standard Python module. Some example code follows.
>>> import hashdb
>>> hashdb.list_algorithms()
['crc32']
>>> hashdb.algorithms.crc32.hash(b'test')
3632233996
OALabs run a free HashDB Lookup Service that can be used to query a hash table for any hash listed in the HashDb library. Included in the hash tables are the complete set of Windows APIs as well as a many common strings used in malware. You can even add your own strings!
The HashDB lookup service has an IDA Pro plugin that can be used to automate hash lookups directly from IDA! The client can be downloaded from GitHub here.
HashDB relies on community support to keep our hash library current! Our goal is to have contributors spend no more than five minutes adding a new hash, from first commit, to PR. To achieve this goal we offer the following streamlined process.
-
Make sure the hash algorithm doesn’t already exist… we know that seems silly but just double check.
-
Create a branch with a descriptive name.
-
Add a new Python file to the
/algorithms
directory with the name of your hash algorithm. Try to use the official name of the algorithm, or if it is unique, use the name of the malware that it is unique to. -
Use the following template to setup your new hash algorithm. All fields are mandatory and case sensitive.
#!/usr/bin/env python DESCRIPTION = "your hash description here" # Type can be either 'unsigned_int' (32bit) or 'unsigned_long' (64bit) TYPE = 'unsigned_int' # Test must match the exact has of the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789' TEST_1 = hash_of_string_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 def hash(data): # your hash code here
-
Double check your Python style, we use Flake8 on Python 3.9. You can try the following lint commands locally from the root of the git repository.
pip install flake8 flake8 ./algorithms --count --exit-zero --max-complexity=15 --max-line-length=127 --statistics --show-source
-
Test your code locally using our test suite. Run the folling commands locally from the root of the git repository. Note that you must run pytest as a module rather than directly or it won't pick up our test directory.
pip install pytest python -m pytest
-
Issue a pull request — your new algorithm will be automatically queued for testing and if successful it will be merged.
That’s it! Not only will your new hash be available in the HashDB library but a new hash table will be generated for the HashDB Lookup Service and you can start reversing hashes immediately!
PRs with changes outside of the /algorithms
directory are not part of our automated CI and will be subjected to extra scrutiny.
All hashes must have a valid description in the DESCRIPTION
field.
All hashes must have a type of either unsigned_int
or unsigned_long
in the TYPE
field. HashDB currently only accepts unsigned 32bit or 64bit hashes.
All hashes must have the hash of the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 in the TEST_1
field.
All hashes must include a function hash(data)
that accepts a byte string and returns a hash of the string.
Some hash algorithms hash the module name and API separately and combine the hashes to create a single module+API hash. An example of this is the standard Metasploit ROR13 hash. These algorithms will not work with the standard wordlist and require a custom wordlist that includes both the module name and API. To handle these we allow custom algorithms that will only return a valid hash for some words.
Adding a custom API hash requires the following additional components.
-
The
TEST_1
field must be set to 4294967294 (-1). -
The hash algorithm must return the value 4294967294 for all invalid hashes.
-
An additional
TEST_API_DATA_1
field must be added with an example word that is valid for the algorithm. -
An additional
TEST_API_1
field must be added with the hash of theTEST_API_DATA_1
field.
A big shout out to the FLARE team for their efforts with shellcode_hashes. Many years ago this project set the bar for quick and easy malware hash reversing and it’s still an extremely useful tool. So why duplicate it?
Frankly, it’s all about the wordlist and accessibility. We have seen a dramatic shift towards using hashes for all sorts of strings in malware now, and the old method of hashing all the Windows’ DLL exports just isn’t good enough. We wanted a solution that could continuously process millions of registry keys and values, filenames, and process names. And we wanted that data available via a REST API so that we could use it our automation workflows, not just our static analysis tools. That being said, we wouldn’t exist without shellcode_hashes, so credit where credit is due 🙌