Back to The Latest

The Problem: The Labor Market Data-Sharing Ecosystem is Broken


Organizations have made significant progress in their ability to manage data regarding employees’ education, skills, training, workplace performance, engagement, and motivational drivers. By applying analytics and AI to these types of data, companies hope to more accurately and effectively match employees to the right jobs, staff projects and drive personal development while personalizing people management practices, rewards, and recognition.


This vision, however, faces behavioral and cultural obstacles along with a younger demographic who frequently change jobs or opt into portfolio careers (multiple part-time or freelance jobs at once rather than one full-time job). Therefore, the amount of data footprint, generated for the individual while working in each organization shrinks to an extent that it might be insufficient to algorithmically make valuable predictions and drive value through analytics and AI. We are approaching a reality where most of the relevant talent data is actually generated and held outside the organization and divided between various labor market data aggregators (e.g., job boards, LinkedIn, gig platforms), each maintaining and controlling pieces of the holistic employee picture.


To add to this complexity, GDPR introduces a standard for global legislation in data privacy and states: “personal data should be processed on the basis of the consent of the data subject concerned or some other legitimate basis.” Obtaining consent is in itself cumbersome; and moreover, the purpose of data processing should be clear and also require consent. A further obstacle is that individuals have the right to have their personal data erased from any platform, including backups and archives. It is therefore not unreasonable that we will gradually face a reality where employees and alumni will request to limit their records processed by employers.



The past two decades have been marked by an emerging market for third-party job search and gig platforms. These entities cultivate some of the richest labor market information and individuals’ career records. Yet, third parties can also apply frictions on the reuse of this aggregated data, largely on account of intellectual property issues, normalization, and organization of that data. Proprietary restrictions can make accessing such real-time data either difficult or prohibitively expensive. This, in turn, hinders utility while the proprietary firms naturally desire to maximize profits from selling the data.


Restrictions on the use of this data can stymie the development of new products, services or analysis while serving as a competitive moat for existing platforms and aggregators; it also poses a barrier to innovation in HR. This barrier is further compounded by the fact that third-party platforms have grown fairly concentrated, with a limited number of players that aggregate a substantial portion of the relevant data. To illustrate this, to date: has processed over 100 million searchable resumes; ZipRecruiter over 430 million job applications; LinkedIn 500 million active profiles. And, Upwork and together have 40 million self-employed professionals using their platform to manage their gig businesses.

There is also an issue of interoperability and standardization of this data. A central element in the labor market is that workers and jobs vary greatly. This heterogeneity among skills, industries, and credentials makes preserving a consistent level of quality naturally more difficult. Without a consistent format or structure, algorithmically making sense of the data becomes harder. This constrains the interoperability and ultimately leads to siloed data.



It has been widely recognized that individuals’ resumes, LinkedIn profiles, and any other self-reported career records, cannot be taken as a trustworthy source of information. According to HireRight’s 2018 employment screening benchmark report, 84% of employers have identified lies or misrepresentations on applicants’ resumes; up dramatically from 2012, when 66% reported finding fabrications¹. This study found some comical fabrications, like a high school principal, claiming two degrees from a university that had closed years before she supposedly graduated, or a government appointee who withdrew his nomination after allegations of resume padding — the nominee blamed the discrepancies on a tornado that hit his prior employer.


A study by Automatic Data Processing (ADP) of some 5.5 million background investigations showed discrepancies in 46% of educational, employment and reference checks². Apparently, this explains why 73% of U.S. employers perform employment pre-hire checks and 51% verify education credentials and certifications, and why the Employment Background Checks market is expected to grow to USD 5.46 billion by 2025 from USD 3.74 billion in 2016³.


On top of this, regulations regarding candidate screening raise compliance concerns for HR professionals; for example, in the U.S. an employer is responsible and can be held accountable for verifying the background and references of any job applicant before hiring that applicant. A claim can be made by any injured party against an employer based on the theory that the employer should have known about the employee’s background which, if known, indicates a dangerous or untrustworthy character.


When it comes to data privacy, we often refer to rules set by centralized platforms that determine who has permission to access data and who gets informed when it happens. Yet, the real threat to online privacy lies in the fact that we give our information to multiple players freely, with each storing this information in their centralized databases, which have become easy targets for hackers. The answer to this doesn’t rely on giving users the ability to control how platforms use their information, but it requires a fundamentally new technological approach that handles information in a different way.



Recently, people’s concerns about social media have metastasized with the revelations of privacy violations by Cambridge Analytica and evidence of Russia-produced fake news undermining the electoral process in the U.S., UK, Italy, and Germany. There’s heightened worry as well of data-sharing between platforms and top device makers, without users’ consent. A recent study surveying over 33,000 respondents in 28 countries shows the decline of users’ trust in social media and other tech providers with regard to privacy measures and data collection: 84% agree that the protection of privacy and personal information is one of the most important responsibilities for social media platforms, while only 40% (less than half) indicate that they actually trust social media platforms to behave responsibly with user data. And, 40% responded that they have deleted at least one social media account in the past year because they didn’t trust the platform to treat their personal information properly⁴.


Personal data is becoming one of the most valuable things in society. Every day, internet users provide 2.5 quintillion bytes of data⁵ to companies for free. These centralized platforms that determine who has permission to access data, use this totally free resource to make billions in profits, support business decisions, sell to marketing companies, and draw people onto their websites, while the users become “economically disenfranchised”.


Given the above, it is clear that we need a new data-sharing architecture that ensures user control of their data. The idea of Self Sovereign personal data is not new. Now with the emergence of new technologies, all interested stakeholders are converging to make it happen.



[1] Employment Screening Benchmark Report, HireRight, 2018
[2] Annual Screening Index, ADP, 2009
[3] The Employment Screening Service Market to 2025,, February 2018
[4] The 2019 Edelman Trust Barometer, Edelman, 2019
[5] How Much Data Do We Create Every Day?, Forbes, 2018