By: Jon Farland
It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is the sexiest job of the 21st century. Since then, there has been an explosion of job opportunities and university degree programs claiming to give students all of the skills they need to accel in the field of data science. Yet, the scarcity of battle-hardened data science talent is as evident today as it was ten years ago.
This scarcity is certainly not for lack of interest: A quick scan of the “r/datascience” and “r/machinelearning” forums on Reddit reveals how many employees with any sort of technical background are keenly interested in ditching their own industries just to “get into data science”. In fact, these two subreddits alone each contain approximately 754k and 2.4M members, respectively.
Adequate pay and compensation are assuredly not to blame either; According to glassdoor.com the median salary for a data scientist with between 0 and 1 year of experience in the San Francisco area is about $128k. This relatively high level of compensation for junior talent is not limited to areas of the US known for their focus on technology and innovation either. The same estimate for a junior data scientist’s salary in Boise, Idaho is $126k, only 1.5% less than San Francisco. For comparison, a review of the US Bureau of Labor Statistics 2020 earnings data shows that even achieving a Doctoral degree yields annual earnings of only $98k, or about 22% less than an entry-level data scientist in Boise.
One reason for this apparent misalignment between supply and demand in the labor market might be the general miscommunication of, and confusion surrounding, the necessary skills required to be an effective data scientist. Like cooking a delicious meal, an effective data scientist can apply advanced analytical techniques to potentially large and disparate sets of collected data, in order to drive value for their organization. Typically, this value is clearly defined in a business case demonstrating a quantifiable return-on-investment (ROI) and, while being technical by its very nature, nevertheless remains critically dependent upon clear communication to stakeholders and decision makers.
Of course, the job of a data scientist doesn’t stop at achieving high accuracy from their favorite model, or at minimizing the rates of false positives and false negatives; It remains mission-critical to demonstrate the ongoing value of that great analytical or predictive model over time and across data.
Unfortunately, most data science candidates are led to believe that littering their resumes with high-accuracy projects or being an expert on every single modeling technique is what they need to be successful in data science. A model that shows great performance on data that doesn’t practically reflect the reality of the business is not useful and does not drive value for the organization. Additionally, can that model hold up under regulatory compliance? Are the predictions explainable even if the model is complex? Is it biased toward protected classes such as race, gender or religion? How about if the current data being captured begins to drift and reflect a completely different reality than the data it was trained upon? Unlike a research scientist, a data scientist’s job doesn’t stop when the experiment is done, and the report is written. Great data scientists and their team generate a living, breathing animal that is constantly providing transparent and consistent value to its organization.
From the business’ perspective, creating and retaining an all-star data science team is also mission-critical. The largest cost to any business is almost invariably labor and thus retaining the team that built that living, breathing value-generating machine should be a priority focus. It’s no secret that retaining good data science talent is at least two-fold: empowering the team with the tools they need to be successful and providing the coaching required for growth. H2O.ai has worked with thousands of data science teams across the globe and in our experience, the tools needed to be successful have some clearly identifiable properties:
- Provide for experimentation, repeatability and documentation;
- Facilitate trust with transparency;
- Reflect the dynamic nature of the world we live in;
- Can scale to meet the needs of the organization.
In a world where everyone is trying to be the smartest one in the room, the quality of emotional intelligence is often overlooked when hiring for leadership in data science. But it is exactly that ability to appreciate where a data scientist is in their career, understand where they want to go, and to provide the coaching needed to get there. There will always be another company that will offer more money, but it’s rare to find a leader in data science that generates the trust and vision required to grow a team past the current quarter or fiscal year.