The critical components we need from candidates include:
# Fully owning the end-to-end modeling lifecycle (data ingestion, feature engineering/selection, model training/test, model inference, scoring)
- So, not just doing one or two of those items, but have experience in all phases of the lifecycle. With Associates, that may not be as doable and that’s ok
- The more senior they are, the more experience they should have (working on different domains, various problem statements, different types of models, etc..)
- The years of experience requested in the JD is for years of actual data science experience – not their work total experience; 4 to 15 years of work experience.
# Look for other project examples beyond NLP or Classification models. Only ever building classification models is likely not going to fit
- NLP and Keras are fine but better if they include regression, neural net, ensemble, random forest, xgboost, svm, etc..
- Building chatbots is of little interest or need so if that’s all they highlight in their projects, don’t bother passing along
- I’ve seen lots of data extraction from documents on resumes – again, nice experience but not critical
# Experience in AWS is of big value (we’ve recently decided to stay with AWS and not move to GCP) and python is a must
- Building and deploying in Sagemaker is a plus
# Proven experience using documentation and code review systems – ideally GitHub – for PR is a huge plus
- My team is particularly critical of data scientists able to write organized and clean code because so much of what we are doing needs to be efficient and scalable.