MySQL native function implementation of FNV hash (and others)

MySQL does not support HASH index for several storage engines. The common practice is to emulate them by creating an additional row and maintain it by triggers.
For reasons described below I’ve re-implemented FNV1 and FNV1A 64bit for MySQL – as native functions (performance of MySQL UDFs vs. native functions at kazuho’s blog). To round out the mix I’ve added MurmurHash2, 64A and 64B.
Download the MySQL native functions here.
Michael J. Radwin has written a MySQL UDF for the FNVs as did Baron Schwartz for FNV1 and MurmurHash64A – but unlike the first the latter gives other hashes than that what is regarded as the reference implementation by Landon Curt Noll. This might become an issue as soon as you begin creating these hashes in other places than the one DBMS – e.g., decentrally in the application, or if you want something compatible with implementations found in i.e., memcached. Utilize the first UDF or the native functions.
After you’ve compiled MySQL with them you can invoke the hash functions by FNV1_64, FNV1A_64 and MURMURHASH2 or MURMUR_HASH2 (naming to not interfere with Maatkit’s UDFs). They return 64bit / 4 byte integers which fit into UNSIGNED BIGINT(21) columns.
I’ve done some queries comparing time but was unable to measure any significant performance difference – due to near-practice but bad test design, though. So allow me omit the test system and give you the times for an first impression.
The aggregates SUM and MAX are only to skip MySQL deliver any large datasets which might tamper with query time. Seconds in braces are for the entire data of about 22M rows:
SELECT SQL_NO_CACHE SUM(LENGTH(resourceURI)) FROM `resources` WHERE resource BETWEEN 1 AND 1000000; -- yields 51'945'833 -- yields 1'221'834'418 w/o range -- 3.0055 sec (72.3431 sec) SELECT SQL_NO_CACHE MAX(FNV1A_64(resourceURI)) FROM `resources` WHERE resource BETWEEN 1 AND 1000000; -- 3.0033 sec (75.6182 sec) SELECT SQL_NO_CACHE MAX(FNV1_64(resourceURI)) FROM `resources` WHERE resource BETWEEN 1 AND 1000000; -- 3.0050 sec (74.6193 sec) SELECT SQL_NO_CACHE MAX(MURMURHASH2(resourceURI)) FROM `resources` WHERE resource BETWEEN 1 AND 1000000; -- 2.9256 sec (76.2182 sec) Maatkit UDF: 5.0234 sec














The benchmark results basically say that hashing a string is not expensive.