"If It can not be measured, it can not be managed"
Peter Druker Tweet
One of the primary reasons for the failure of data science is the lack of collaboration between the data scientists, IT and operation teams. This creates a bottleneck to put model hypothesis from experimentation to production in a swift manner or to iterate many version changes of models in production. To deal with this situation, the industry borrowed the idea of DevOps of traditional software projects and repurposed it for data science and machine learning projects under the name MLOPs.
Although MLOPs helped to streamline the process to a great extent, yet it is not enough to ensure the success of a data science project unless it is adequately managed! Compared to managers of software projects, the data science project managers are still working without standard project management KPIs, and before they realize, their plans are spiralling out of control towards failures.
Let us first understand what are the current issues and challenges in a data science project management.
Challenges of Data Science Projects Management
Data Science Projects KPIs
Indeed, we cannot just reuse all the KPIs of software project but we can tailormade them for the data science projects, similarly to how MLOps was spun off from DevOps. Here we list down some useful KPIs that you can leverage for your own data science projects.
- Clear Goal and Vision
A well-articulated goal is very important to measure the success of the overall project. The goal should not be ambiguous, instead, it should be quantifiable and measurable so that you can track the project’s progress against them.
An example of an ambiguous goal from the business is – “We want to prevent our customers from leaving our service”. You cannot validate your project outcome against this goal.
Rather, a quantifiable goal looks like this – “We want to reduce customer churn rate by 10% in the next financial year”. This has a measurable goal and a specific timeline against which you will have to drive your data science project.
2. User Stories Delivered
We can also define clear goals and timelines for the smaller phases of the data science project. To do this, you can adopt the agile methodology to create user stories for your project, assign to team members, and timeline it in 2–3 weeks sprint. Below are some possible examples of well-defined stories for a typical data science project.
● Story 1 — “Collect data from the MongoDB and Oracle database and create a combined dataset.”
● Story 2 — “Perform detailed exploratory data analysis and prepare a report.”
● Story 3 — “Create a hypothesis model with a base accuracy of at least 60%.”
● Story 4 — “Prepare a business report of our insights with visualizations.”
Do notice, that all these stories have a well defined tangible expected deliverables, like “dataset”, “report”, “model”, “business report”.
By tracking the user stories, you can arrive at the below two metrics:
- The outcome of each story at the end of the sprint is determined if the deliverable was completed successfully or not and this is very useful to gauge the productivity of an individual. If the individual is failing to deliver stories in too many sprints, then you should appropriate action.
- The success of each sprint can be determined by how many stories were closed successfully. This will help to understand the overall productivity of the team and also the health of the whole project. If you see back to back sprint with a few successful stories, then it is a sign the things are not under control, and you need to evaluate the gap.
Reusability is always desired in software projects. Certain artefacts can be created keeping reusability in mind, which can not only improve the productivity for the current project but can also be beneficial for other projects. Similarly, if you can leverage reusable artefacts in your project you can save a lot of time.
In a data science project, you can create reusable artefacts like data scraping or collection tools, frameworks, ML models, etc. To give a perspective, Tensorflow and PyTorch were created by Google and FB because they wanted to develop Frameworks that could be reused internally for all the projects and that was later open-sourced for reusability by the broader community. The no. of such artefacts you produce or reuse is useful metrics to indicate the productivity yielded by the project.
4 No. of Production Deployments
No matter how much experiment and POC you do for creating the machine learning model unless you are no able to deploy models in production, the efforts cannot be justified. And once the model is deployed, rarely model performs perfectly; hence multiple iterations and enhancements of models are needed in production.
Some projects do deployment after each sprint or in a predefined cycle, but the idea is to deploy smaller changes quite often in production. If the number of production deployments over a period of time is less, it indicates you take time to deliver an idea into production. It is time to identify the bottleneck at the end to end process or in MLOps pipeline.
5. Actionable Insights Delivered
The key outputs of data science projects to the business are actionable insights from their advanced analytics or machine learning model. The actionable insights are usually different kinds of business optimization suggestions to improve processes like operations, sales, inventory, etc.
An efficient data science projects should produce many actionable insights over a period of time. This can be tracked either in a monthly or quarterly basis and is a crucial KPI to highlight how much business value your project is providing. If there are fewer insights produced over a period, then you should check the other KPIs listed above to identify the issue.
6. Return of Investment (ROI)
When a company invests in a data science project, it ultimately boils down to whether it can help the company to maximize the revenue or minimize the loss. This is the pinnacle of success for a data science project. How much your project was able to give back to the company on top of their investment is known as ROI and it is the ultimate KPI for you keep an eye on.
Even if after many months or years, the data science project is nowhere moving towards break-even for the organization, then it is worth reassessing the project from top to bottom. On the other hand, if you could deliver a significant ROI to the company, then congratulations your data science project is super successful.
Conclusion
Currently, there are no standard Data science project management KPIs available that have been successfully proven. But with passing the time and learning from project failures, this industry will eventually produce a robust project management framework, just like the software industry matured over the years. Meanwhile, in this article, we borrowed KPI ideas from software projects and showed how you could leverage them for your data science project.
Share this Post