By Dr Lindsey Zuloaga, Chief Data Scientist at HireVue – the global leader in video interviewing, assessments, chatbot and recruiting automation technology
Machine learning is an exciting field that is contributing to a new era of advanced automation by constantly learning from new data and enabling systems to evolve and improve over time. However, in a year that has been anything but ‘normal’, Data Scientists are faced with a significant issue – how they can counteract the impact of abnormal data caused by the pandemic?
The impact of COVID-19 on Data Science as a profession
The pandemic led to widespread cutbacks across a plethora of companies, stalling the development of talent in numerous industries. Fortunately for Data Scientists, they have been incredibly lucky to be in a high-demand field and able to do most of their jobs remotely. That being said, Data Science teams that had not proven their value or did not play a central role within their companies had a much higher risk of being cut.
The pandemic shook the industry in other ways as well, with COVID-19 proving hugely disruptive to Data Scientists’ models. Across industries and datasets, it was highly likely that the patterns and behaviors that Data Scientists were tracking were majorly distorted, presenting challenges to decision making. While some models may have been able to self-correct, many others have been rendered useless. With the inability to rely on historical data to train models and expect them to work as pre-pandemic, Data Scientists were tasked to work on new processes to keep deployed models responsive.
So, if the model is only as good as the data, what do we do when the data coming in does not reflect normal behaviors or situations? For many teams, the simple answer has been to shift away from automatic learning to a lot of manual intervention.
Naturally, in the hiring space, we saw a surge of hiring in certain areas and a decline in others. Some of our biggest customers put a complete pause on all hiring, like hospitality, while supermarket / grocery store hiring, for example, blew up. Trends are showing that a lot of people used this time to reevaluate their careers and change direction, and we saw major changes in how companies are hiring.
Our predictions for 2021 couldn’t rely entirely on Machine Learning tools – we had to look at the recovery trends and take into account a lot more contextual information that we were getting on the ground – through customer relationships, economic trends, etc. We learned to expect the opposite of what Machine Learning models might have told us – a recovery rather than a downward spiral.
What COVID-19 reminded us about bias and being good stewards of data
The staggering differences in the way ethnic minorities were adversely affected by the pandemic is something that couldn’t be ignored in public health responses, and it shouldn’t be ignored across industries as we move forward in our recovery. As Data Scientists, we always need to start with curiosity, and ask systems-level questions about the trends, anomalies, and patterns we’re seeing. These questions help us solve prediction problems by intervening when it’s appropriate and we’re seeing demographic differences in outcomes that can become baked into models, possibly propagating bias.
The approach to handling biased data will vary depending on the application. As an example of bias mitigation, at HireVue we build models to assess job candidates by predicting job-related outcomes from video interviews and games. Following the Equal Employment Opportunity Commission (EEOC) guidelines, we strive to minimize demographic group differences in outcomes. Our models are therefore optimized to both accurately predict the outcome while minimizing group differences. Meaning that certain input data which led to these differences gets de-weighted or ignored. We carefully and rigorously audit and test for bias related to age, gender, or ethnicity throughout the process — before, during, and after development of every assessment model. Models are regularly re-tested and re-trained to ensure bias does not creep in as the customer’s data and changing requirements of the job evolve.
Moving forward, something my team is on the lookout for is data anomalies stemming from the “shecession.” Women were 1.8 times more likely than men to have lost their jobs and have dramatically increased their unpaid caring roles for children and older relatives. Is this showing up in interview take-rates? Perhaps in length of answers because women were interviewing with kids in the next room? Once we understand the issues women are facing in the job market right now, we can determine if we counteract bias through math alone. The remedy may be quite different, depending on what we learn. It may be that changes to the candidate experience serve as a substantial remedy, such as encouragement to continue or re-record answers.
Learning from the anomalies and moving forward
From a data perspective alone, the past year has been a fascinating and instructive reminder about the value of Machine Learning. This totally unprecedented situation is a reminder that not every problem requires a black and white, mathematical solution. Data Science teams must stay informed about macro, sociological trends and how they’ll show up in data sets, then determine whether using automated learning, manual intervention, or a combination of both is appropriate.
As we slowly emerge from the pandemic and head into recovery, there might continue to be a chasm between what we see in the training data and the “real-world” data, but if as a profession, we remain both vigilant and agile, we can ensure our models continue to function as intended.