This is the final post in a three-part series on hiring a data scientists.  In the first post, we explored whether or not your business needs a data scientist.  In the second post, we covered how to write the job description and where to post the job to maximize your chances of finding the best possible candidates.  Today, we focus on the end game of the interview process and what you need to do once you’ve signed the deal with your new data scientist.


The resumes are pouring in, you have screened several candidates and you’ve invited these candidates for the onsite interview.  With the role of a data scientist requiring such a broad set of skills and traits, how do you conduct an interview that allows candidates to highlight those skills? Most likely, they are nervous about the interview which inhibits their ability to show their inner skills/traits.  Therefore, it is necessary to work ‘behind the scenes’ and conduct the interview in such a way that allows you, or, the interviewer, to identify the candidate’s inner skills/traits, rather than leaving this burden on them.

After all, it is the interviewer’s responsibility to find the best Data Scientist fit for the position; not the interviewee’s.

For purposes of this post, let us assume the interview is a full day, while recognizing that this might not always be feasible.

A good approach is to divide the interview process into three parts:

  1. Questioning
  2. Testing
  3. Case Study

It is important to approach each part using a strategy intended to extract the unseen or unspoken information from every candidate. Successfully doing this will result in finding the best candidate for the position, as well as hinder uncertainty in the hiring process.


This part represents the typical interview format.  However, the strategic approach is to ask questions that will target the individual’s:

  • Theoretical problem solving: A good approach (originally borrowed from large consulting firms and banks) is to have the candidate solve several problems using a white board.  The problems should focus on estimation, probability, quantitative reasoning, and creative thinking
  • Basic business knowledge: Ask basic business questions to validate the candidate’s ability to communicate in the language of business and understands how businesses operate
  • Subject matter expertise: Ask specific definitions or involve mini cases or scenarios that one might encounter in business (especially if their previous work is similar to the position they are interviewing for).  One example could be “What would you do to increase sales at your local pizza shop?”
  • Personality: Personality questions probe whether this person is a problem solver, informative, easy to teach, and self-motivated (the personality traits of a data scientist are found in Post 2).  Some examples are: “Describe how you have mastered something” or “Describe something you have built or created.”


The testing stage consists of administering tests on applications the candidate will use heavily.  The test should emulate the working environment (i.e. candidate sits in a room on a workstation), restrict access to the internet (have them leave their phone in a secure place), and be timed. This design works for testing most applications and tools and, with time permitting, the interviewer could give multiple tests.   Some examples of testing include:

  • Database applications (e.g. SQL)
  • Primary statistics package
  • Data visualization tools
  • A programming language or two

Testing should involve a sample dataset and require the candidate to operate on that data, such as:

  • Writing SQL syntax and creating output
  • Running common statistical analyses like regressions or decision trees and interpreting the output
  • Creating a sample dashboard with a variety of chart types, and/or writing a simple program to play a game like tic-tac-toe.


The case study presents the candidate with an actual problem you are working on or have solved in the past.  It could also come from outside your organization, but an internal problem is ideal.  Then give the candidate just enough time and resources to generate their own solution.  The goal is to observe how well they gather the information needed, analyze the data, and use both internal and external resources to solve the problem.  This is also an opportunity to confirm that they can use the tools at the level indicated in the testing phase.  Accuracy, functionality, and creativity are good measures of their result.  In other words did they solve the problem, is their solution complete and does it work, and how innovative was that solution?

One or more candidates will survive the process, but be prepared that no single candidate will likely ace all portions.


What comes next?  The typical negotiation and acceptance of an offer will occur, and then you should immediately set your sights on retention.  You have invested significant time and resources in bringing this individual on board and the last thing you want is a flight risk.  The challenge of solving problems is what drives data scientists, not by the glory that follows.  Therefore, to successfully retain your new, valuable resource, be sure to:

  • Continually present them with a stream of difficult and unique problems
  • Assist in their development
  • Avoid assigning maintenance tasks or incremental improvements
  • Do not ignore salary, benefits, and satisfaction

If you recall the personality traits (mentioned in post 2 of this 3-post series), the inheritance from them creates a natural passion to solve problems which results in their desire to seek new knowledge or skills that will help them solve problems at best. Therefore, the ultimate key for retention is keeping them challenged and actively involved with solving worthwhile problems.

Hopefully, this series of posts has provided a good roadmap to finding, hiring, and retaining your first and subsequent data scientists.  Happy hunting!