Dr. Steven Gustafson is Noonum’s CTO and Artificial Intelligence Scientist, passionate about solving tough problems while having fun and building great teams.
Having developed many end-to-end machine learning (ML) and artificial intelligence (AI) systems as an AI scientist, AI product owner or chief scientist, I have seen how those responsible for the software engineering often fails to consider the nuances of ML systems. Being a CTO of a startup for the past three years has allowed me to explore the integration of ML into core engineering design patterns. By working with a CTO or lead engineer who understands both AI and software, the best architectural and engineering management models can extend beyond traditional software applications such as databases. and web applications to better control and optimize an ML factory.
Continuous integration and deployment
Reduce the risk of releasing broken apps. Always compile with unit and component tests and deploy with verification tests using code that is itself under version control. Once you commit to a development branch, the system deploys to a development environment.
Once all manual and end-to-end smoke testing is complete, a manual action is deployed to production. ML models are included, along with the pipeline that runs the ML model. Reference data is used to verify that ML models and pipelines are accurate. Rolling back to software and database releases in the right versions includes rolling back to ML models and their data, all of which can be automatically integrated and deployed.
Infrastructure as code
You should also try to avoid deploying or configuring the infrastructure incorrectly. Use code to specify the infrastructure and run scripts to recreate and verify the infrastructure the system needs. Similarly, the infrastructure required to build and test ML models, as well as to run them in production, must be defined as code. Once all the infrastructure associated with developing and deploying ML models is specified as code, it can be updated to accommodate changes to the ML models or their usage, or the infrastructure can be restored to last working ML model if needed.
Manual smoke testing is always useful, and keeping testing fresh and up-to-date with new features, use cases, and data is an ongoing task. ML model predictions are no different. If any part of the application is served by an ML model recommendation, identify assertions that can be made, such as where there should be at least five suggestions in an application or whether an email alert is constructed correctly, or whether the model can handle missing elements. data as expected. A faulty ML model or pipeline should not be allowed to be published and produce bad results in the application.
Process alarms and notifications
In an ML system, there is incoming data, models are executed, model output is stored and analyzed, and application tables are created. All of these processes run on regularly scheduled tasks or are part of a queue or event system.
Whenever a script fails, log the error and push it to an alarming dashboard (to debug later) and notify employees via email, Slack or another method. When a notification is superfluous, adjust the alarm: Every alarm should be an occasion that requires action. Similar to latency in an application, store and analyze the results of an ML model application that will facilitate data changes, infrastructure capacity pressures, or simply unexpected drifts in prediction types.
Model testing and version control
Unit testing and version control are the standards for most software, but not for ML model development or its underlying data. ML models are known to create unexpected results due to new data. First, apply version control to the code used to generate the model from a specific set of data, which must also be under its own version control. The data and the model must be aligned for replicability and restoration purposes.
To deploy a new model, model version changes can be in what is called a “commit”, similar to any repository upgrade, and the new model is then checked out and placed into the development pipeline. The deployment process should run prediction tests on the validation data (only used in this step) to ensure that the expected level of quality has been maintained, and warnings should occur if accuracy decreases or falls below with a minimum tolerance.
Functional style architecture
ML systems require a significant amount of data processing and transformation. When ML systems are developed, it is usually done in stages and ends up being very procedural: data is compiled into a file, cleaned and loaded into a table, processed and placed into a cluster, run into a model and stored in a database. This initial design is used to build the model, understand the data, and control performance. The deployed application, however, can become very complex and difficult to maintain if this pattern is replicated.
Instead, a functional-style approach of performing discrete transformations of the data and passing the results to the next step can better optimize and manage processes, reducing memory and storage requirements while increasing productivity. ‘efficiency. To avoid overwhelming a data system by storing ML system predictions, queuing and messaging systems are used to manage the volume of events. Additionally, since elastic cloud systems can lose nodes from time to time, queuing and messaging can buffer data entering the ML system and ensure everything gets processed.
By supporting the creation, updates, and releases of ML models to be part of the entire software development operation process, a more robust overall system is created. This allows you to better serve customers and users while making the lives of your engineers and scientists more satisfying and efficient.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs, and technology executives. Am I eligible?