Here at T-Mobile when a new account is opened, there are fraud checks that occur both pre- and post-activation. Fraud that is missed has a tendency of falling into first payment default, looking like a delinquent new account. The objective of this project was to investigate newly created accounts headed towards delinquency to find additional fraud.
For the longevity of this project we wanted to implement it as an end to end automated solution for building and productionizing models that included multiple modeling techniques and hyper parameter tuning.
We wanted to utilize MLflow for model comparison, graduation to production, and parallel hyper parameter tuning using Hyperopt. To achieve this goal, we created multiple machine learning notebooks where a variety of models could be tuned with their specific parameters. These models were saved into a training MLflow experiment, after which the best performing model for each model notebook was saved to a model comparison MLflow experiment.
In the second experiment the newly built models would be compared with each other as well as the models currently and previously in production. After the best performing model was identified it was then saved to the MLflow Model Registry to be graduated to production.
We were able to execute the multiple notebook solution above as part of an Azure Data Factory pipeline to be regularly scheduled, making the model building and selection a completely hand off implementation.
Every data science project has its nuances; the key is to leverage available tools in a customized approach that fit your needs. We are hoping to provide the audience with a view into our advanced and custom approach of utilizing the MLflow infrastructure and leveraging these tools through automation.