A couple of weeks ago while browsing the dark web, I came across an interesting password dump. It contained 45GB of data, which is about 1.5 billion usernames and passwords in clear text. The dump was XOR ciphered with the key “00000000”, which made decrypting and decompressing the file child’s play. The directory structure was also different than what I’m used to seeing for large dumps like this. I learned it was called a Trie and GeeksforGeeks has a really good write up on how to implement search and insert functions for this kind of structure. If you’re too lazy to read about it, here’s a rundown:
The top level folder is the first letter of the email address, the next level within is a file with the second letter of the email. That file contains all of the email/password combinations. It makes searching extremely quick since a program can look at the first two characters and know exactly which folder the string you’re looking for is in.
I started working on this during the week that Facebook leaked 87 million users’ information, and my initial thought was to scrape my friend list for email addresses and see how many of them had passwords associated with the breach. The whole “selling your data” thing irked some of my friends but really, how did you not see that coming?
Unfortunately, Facebook restricted apps from accessing contact information after the initial reports of abuse by Cambridge Analytica. Instead I pulled down my Gmail contact list and frequently contacted email addresses. They all got dumped as a CSV file and I wrote a search script to locate these emails in the data dump. I quickly ran into a problem. Not all of my contacts are in the dump, so I was wasting a lot of time on emails that wouldn’t return any results. I wrote up another script that checks email addresses in a file with the HaveIBeenPwned database. It dumps the emails confirmed to be involved in breaches into another text file that gets passed to my Trie search function. There is even an option to tell you which breaches you’ve been involved in. The initial search revealed 182 email addresses in my contacts list that were involved in breaches.
Then I fired up my search function again, this time looping through the output of the HIBP search script to search for known breached emails.
This gave me 83 passwords for breached accounts that included one of my own! A subset of that data is below, with some of the weaker passwords still visible.
- Avoid sharing your personal email if you can help it. If a company is giving you a service for free (Facebook, Gmail, Twitter, etc.) YOU are the product.
- Password reuse should be punishable by death. Use unique passwords for each site, get a password manager like LastPass or 1Password to help you stay secure.
- Use 2 Factor Authentication EVERYWHERE. It’s slightly inconvenient, but would you rather take 5 seconds longer to log in or have your personal information downloaded by people like me?
- Check your email address on Have I Been Pwned to see if your information was leaked in a data breach. If it has, contact me and I can let you know which of your passwords was stolen.