1. Poorly labeled dataset
I assume you have data. Well, I believe You would not even start otherwise. You probably also took care to make a dataset quite balanced. But have you checked the labels? Seriously, badly labeled datasets are a common thing. E.g., popular publicly available visual datasets like ImageNet or COCO have many very poorly annotated images.
If you have your very own dataset, are you sure the person who labeled it did it precisely according to your guidelines? Or do all the annotators apply the same best practices? Do not treat such mistakes as “a way to make my AI more robust.” Instead, check your labels, and make sure they are correct. You can save a lot of time and money doing that.
2. Too wide approach to the task you want to solve
It may be tempting to create an ultimate universal AI solution that is applicable to a given task regardless of the data characteristics and working conditions. You may think, “I will train and create that model once, and then I will be able to use it in my 10 similar tasks,” or “I can use data from my single customer to make a product that fits everyone”. Well, I wish you luck. But, your solution will probably be way less robust and more vulnerable to small changes in data distribution than you may expect. Better try creating a solution that can be successfully applied to a single scenario instead of making a one-for-all AI model.
3. Unexpected data drift
I wish all of your AI solutions and initiatives to be successful and be able to age enough to be exposed to data drift. But it would be best to assume it will happen and be prepared for it. It will occur sooner or later because many factors can affect the data distribution and change it over time. So monitor the data distribution in the AI model working environment, prepare the procedure of gathering new data, labeling it, and retraining the model, and consider it in your AI project plan and budget.
4. Too narrow approach to data and a biased dataset
It’s easy to forget about corner cases or rare circumstances when you create your AI based on data collected in controlled conditions. For example, do you use video/image/audio machine learning algorithms to analyze people? Ok, keep in mind that people are different, look different, speak differently, and behave differently. Make your AI inclusive.
And remember that the weather is not always warm and sunny. Obstacles can obscure the view, and noises can jam the speech. Furthermore, the input data quality in real-life scenarios might be very far from the clean data samples you have gathered in your lab to make PoC. Be aware of that.
5. Risk of overfitting
I know it’s a cliché, but it is so common that I must put it here on that list. It is not unusual that project stakeholders push to ignore the risks and best practices to make “good-looking” solutions that can be shown or sold. Sporadically, it’s hard to explain to them what overfitting is. And let’s be honest, machine learning researchers and engineers can solve almost every problem on earth if only “Their training set is their validation set.” Overfitting is helpful to check if your AI pipeline and training procedure are implemented correctly. But, in other scenarios, you should not let it sneak into your solution.
6. Missed Non-Functional Requirements
It seems natural to track your machine learning models’ performance metrics during experiments and choose the one with the highest score. But it’s also equally important to track and analyze other models’ characteristics to fulfill all the non-functional requirements of the AI solution. For example, what is the size of the model? The number of FLOPS? Power consumption? Is the inference fast enough to be usable in real-life scenarios?
Monitor such metrics to make AI that fits its planned working conditions.
Sometimes things just work good-enough. Your baseline or a little-improved-baseline solution can fulfill the performance objectives. Unfortunately, it’s often pretty easy to spend enormous time improving your solution metrics “by another 0.05%”. Making a real-life AI solution is not a Kaggle.com challenge. Track when you achieve your metrics goal and do not ruin your AI Return of Investment because of unnecessary work. You can spend this better, like creating a better dataset and more robust AI.
To sum up
That AI risk list is certainly not finished. But I hope that points will be helpful for you to monitor and mitigate the risks of your AI projects.
Do you know other AI risks that may occur during a machine learning project? Please share it in the comments!