An ordinary Machine Learning Team

Photo by Ian Schneider on Unsplash

A machine learning project starts with a lot of questions: what’s the project about? Which problem would it solve? Which pain would it ease? How should we plan it? What are the expected results? and a myriad of other questions. Of all of them, the utmost vital question is how we should build a robust team with suitable members. Having a dream team will not only respond to these questions but also make a butter-smooth transition from ideas to the production-ready solution.

Key Roles

  • Data Analysts
  • Data Engineers
  • Data Scientist
  • Research Scientists
  • ML Engineers
  • Developers

There are also other supporting roles:

  • QA (Quality Assurance)
  • TA (Test Automation)
  • Annotators

Data Analyst

This is the magnifier to find the golden needle in the haystack. Their duties are:

  • get insights from user data with:

descriptive statistics: give information that describes the data — some examples include Customer demographics, Landing page conversion rates, loyalty and retention rates)

Inferential statistics: deduce the characteristics of users as a population — some examples are user’s trend, statistical hypothesis testing (e.g: A/B testing),

  • cooperate with the product-business team and create the product roadmap from these insights Define model evaluation procedure and acceptance criteria
  • Analyze feedbacks data from the deployed model
  • Tools: Excel, SQL, Tableau, Power BI …

Data Engineer

  • Build and maintain the infrastructure used to collect, transform and store data, an example is the ETL process (Extract Transform Load)
  • Develop annotator tools that help collect labeled data
  • Manage and orchestrate the pipeline of how data is ingested and moved across different means of data storage As the number of data increases, they need to possess skills for distributed computing and storage (alias big data)
  • Tools: Data storage, Message brokers, Pipeline management tools, Data warehouse …

Data Scientist

  • Analyze, process, interpret data
  • Find features/ insights from data with statistical methods (feature engineering)
  • Communicate findings with business/ product team/ stakeholders
  • Build Machine Learning models that serve as prototypes or deployed in production
  • Tools: statistics, databases, Machine learning, machine learning frameworks …


This role involves developing new algorithms for product-related fields. This leads to breakthroughs and competitive edges to competitors.

  • Build and maintain tools and infrastructure to deploy, serve, monitor, and update model
  • Develop prediction interfaces (client-side or cloud service endpoints)
  • Handle scalability with containers and orchestration platforms like Kubernetes


  • Integrate the machine learning product with the main application
  • Abstracting the machine learning prediction with user-friendly features

The Jack-of-all-trades Full Stack Data Scientist

Finding staff for all these specific roles is sometimes challenging and not cost-effective in a small company. This requires one or a few data scientists to handle several roles at the same time, they are Full Stack Data Scientist. They are thus required to have a wider range of knowledge and skills.


  • Evaluate the model when it is deployed based on the acceptance criteria
  • Perform regression tests to ensure the model match the real use-cases


  • Work closely with the ML Engineers to define automatic testing scenarios
  • Schedule and do performance tests with the model deployed on servers


They collect labeled data following data requirements with third-party tools or tools designed by the in-house data engineer. They can be:

  • Qualified in-house annotators
  • Contract annotators
  • Outsourced annotation services

Data quality and quantity are varied depending on which group of annotators. In-house annotator’s data quality is best but can be not enough for the considered application. Outsourced label data can come at a good amount but will need quality control.


Having all these key team members will hopefully help us to archive the shiny production-ready solution that receives eager clients. All the rest is the finger-crossing and the efforts of other teams in your company.




I'm a data scientist and a tinkerer, I love to do practical projects but also care about the underlying theories

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Machine Learning: Process for solving any Machine Learning problem

Machine Learning Process, The Data Science Portal

Introduction to Remote-sensing, LandSat, and Google Earth Engine

Table of Stories

Interpreting an NLP model with LIME and SHAP

LIME result showing the word killed as the feature with the most weights for the disaster prediction

Time Series: Feature Engineering Datetime Month

I thought animals lived in Zoos? Selecting computer vision models from model zoos

Data-driven apartment-hunting

Data science and Predictive Modelling on Cryptocurrency — Part -1

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nguyen Hoang Nguyen

Nguyen Hoang Nguyen

I'm a data scientist and a tinkerer, I love to do practical projects but also care about the underlying theories

More from Medium

ML Model lifecycle management

Machine Shop and Modern Machine Shop — How is it different?

Modern Machine Shop

Valuing AI — Part 1: The price of certainty

Feature Engineering in a Kaggle Recommendation Challenge