The unprotected 854.8 gigabytes of the database was stored in an instance of MongoDB, a NoSQL high performance and cross-platform document-oriented database, hosted by an American server hosting company.
In total, the database contained 202,730,434 records about job candidates from China, including candidates’ personal information such as their full name, date of birth, phone number, email address, marriage status, and driver’s license information, along with their professional experience and job expectations.
Bob Diachenko, director of cyber risk research at Hacken.io and bug bounty platform HackenProof, discovered the existence of database two weeks ago, which had been secured shortly after his notification on Twitter.
However, it is worth noting that “MongoDB log showed at least a dozen IPs who might have accessed the data before it was taken offline,” Diachenko said.
Though the source of the data is still unknown, Diachenko believes someone might have used an old resume scraping tool called “data-import” to collect all these job seekers’ resumes from different Chinese classified websites, like bj.58.com.
Diachenko believes so because the format of the leaked database exactly matches the way scraping tool stores collected information.
Diachenko also communicated with the BJ.58.com team, who then told him that the leaked data did not originate from its website, but suggested that it could have been leaked from a third party that collects data from many CV websites.
“We have searched all over the database of us and investigated all the other storage, turned out that the sample data is not leaked from us,” BJ.58.com told Diachenko.
This isn’t the first time when MongoDB instances are found exposed to the Internet. In recent years, we have published several similar reports where unprotected MongoDB servers exposed billions of records.