base36 unique IDs with Python
On implementing double opt-in and -out options for mailing lists I needed “magic tokens”, i.e., strings which are unique for every email address in our databases. Widely used are MD5 hashes of some formatted time strings, such as the current date with microseconds. Or the latter are used to seed an random number generator for generating the MD5 or SHA1 hash.
As the result is a long integer it is classically displayed as hexadecimal. And stored without any further conversion as such – which is IMHO a waste of space. Why not using every 36 letters and digits?
In Python you can generate an unique ID by:
import uuid
uuid.uuid1()
# UUID('3208c170-743b-11dd-a60f-000e354e9618')
uuid.uuid4()
# UUID('cb7b64ca-068f-4590-9886-cf375d26f796')
Converting every part from base16 (hexadecimal) to base10 (decimal) is simple:
int('ff', 16)
# 255
Luckily Aloysio Figueiredo and Kip Bryan have published an one-liner to convert from base10 to any other radix:
def baseN(num,b):
return ((num == 0) and "0" ) or ( baseN(num // b, b).lstrip("0") + "0123456789abcdefghijklmnopqrstuvwxyz"[num % b])
Putting it together I got base36 unique IDs for my tokens by this Python code:
import uuid
def baseN(num,b):
return ((num == 0) and "0" ) or ( baseN(num // b, b).lstrip("0") + "0123456789abcdefghijklmnopqrstuvwxyz"[num % b])
def uuid1_base36():
'-'.join([baseN(int(p, 16), 36) for p in str(uuid.uuid1()).split('-')])
Do you see any benefit from displaying and storing hashes in hexadecimal?


Follow me on Twitter