(This is the first of a three-part series on hiring data scientists).

Big data is transforming every aspect of our culture and work environment.  We are tracking everything; a trend that is leading to a critical need for specialists who can mine and interpret data.  McKinsey projects that by 2018, the U.S. will need as many as 180,000 data scientists -those with the technical savvy and analytic brain power to derive meaning from all data.

But the quest for a data scientist starts with one question:  do you even need to add a data scientists to your team?  Despite what the IT hype machine might tell you, not all organizations have data or problems with the size ad complexity that require the unique set of skills possessed by these data wizards.  Data scientists usually possess doctoral degrees and a broad set of technical and quantitative skills, are in high demand and command high salaries.  Further, since companies are competing fiercely for the limited number of qualified candidates, data scientists are also difficult to retain.  An experienced analyst with a B.S. or M.S. in a quantitative discipline might suit your needs in the short term.

Given this information, we believe there are three essential questions to consider before you begin the recruiting and hiring process for a data scientist:

1. What specific problems or issues warrant the services of a data scientist?

Hilary Mason, CEO and Founder of Fast Forward Labs, told Credit Suisse’s Financialist newsletter that if “you can make better decisions than you are based on available data,” then you might be ready for a data scientist.  A company should consider the problems they face and the problems that are the domain of data science.   Data science is concerned with using data assets to influence strategic decision making and develop advanced capabilities.  Some examples of problems they might address include

  • Predicting future events or outcomes
  • Optimizing processes or resource use (prescriptive)
  • Developing a 360 view of consumers and their behavior
  • Fraud detection
  • Automated decision engines
  • Processing and interpreting language from social media posts, reviews, news, internet, publications, etc.

Solving these problems requires the ability to build complex models drawing on both math and computer science expertise to discover patterns and insights from huge sets of traditional “spreadsheet-like” data, as well as free form text, audio, video, and machine data.

2. How Will Your Organization Use The Data?

A company will also want to think about how they intend to deploy the data scientist and use their output.  Organizations don’t want to assign a data scientist to a six-month project targeted at a specific problem. (E.g. reducing returned mail) Data scientists are better suited to problems that will lead to reusable results, a new product offering, or both.  Mason calls this Level 2.  Level 2 is “taking your business in a direction that was never possible without the data.”  So instead of reducing returned mail rates the data scientist might design matching algorithms that not only identify bad addresses, but can also be applied to identifying fraudulent accounts, matching consumers to their social media profiles and assets.  It is easy to see how this internal benefit might be packaged into a data product and marketed outside the organization as well.

3. Do You Have the Infrastructure To Support A Data Scientist?

Once you determine that you need a data scientist, another important question is:  are you ready for one.  If your existing data infrastructure can’t support the type of analysis and experiments the data scientist will be performing, that resource will either end up idling while you try to catch your infrastructure up, or worse yet they’ll get frustrated by not having the tools they need and leave.  In the same vein your data itself has to be ready for a data scientist.  While data scientists are more than capable of cleaning and munging in data, this goes back to a “highest and best use” argument.  Let your existing lower level resources clean the raw data and improve its overall quality before passing it to the data scientist.  That way the data scientist can go straight to the insight discovery phase.

Now that you have some guidelines regarding how to assess your need and readiness for a data scientist, in the next part of this three-part series, we will describe the qualifications of an ideal data scientist and how to go about finding one.