One of the largest data breaches yet is not from a company in the US, and GDPR doesn’t apply since they are outside the European Union. On December 28 last year, Bob Diachenko from Hackern.io found 202,730,434 private resumes from Chinese citizens on a database without any authentication protecting it.
This database ran on a MongoDB instance hosted on Amazon AWS. The breach revealed everything you would put on a resume and some items you wouldn’t normally see. It included mobile phone numbers, email addresses, marriage status, children, politics, heights, weights, driver licenses, literacy levels, and salary expectations.
The structure of the data in the database had the same structure as a web scraping script found on Github later, which was removed shortly after discovery. Looking at other scrips by the same user, the user writes code in Chinese further indicating a user in China targeted Chinese residence. The script appeared to target Chinese classifieds like bj.58.com. According to Diachenko many of the resumes in the leaked database have marks on them indicating they are private. bj.58.com denied any breach after performing their own internal investigation. So, the actual source of the breach is still unknown.
While investigating, Diachenko found “a dozen” IP addresses had already accessed the database. Even accounting for the original creator of the database, the two server engines Diachenko used to identify the database and Diachenko himself, that still leaves around eight IP addresses unaccounted for. Any one of these could have downloaded the database or at least a portion of it. While we are no longer able to access this data now, we may see it elsewhere in the future.
As more data became available and presented by the researcher over the last week, the original sources were quickly taken down. The original database no longer responds, and we can no longer access the scraping script that updates the server. This indicates the database creator must be following the story closely.
This database was likely setup recently. Diachenko found this leak when two external searches reported this database to him the same day. These searches provide reports to him at regular intervals. Previously the search engines didn’t see this database. The database admin likely recently created this database or recently changed its authentication settings, leaving it open. We see Amazon buckets, MongoDB and other databases left open far too often. Database admins must lock down any database before the database has data else someone may find it and steal the data within hours of it being left open.
For more information on this see the original post here.