Join the DZone community and get the full member experience.
Join For Free
Organizations are fully adopting Artificial Intelligence (AI) and proving that AI is valuable. Enterprises are looking for valuable AI use cases that abound in their industry and functional areas to reap more benefits. Organizations are responding to opportunities and threats, gain improvements in sales, and lower costs. Organizations are recognizing the special requirements of AI workloads and enabling them with purpose-built infrastructure that supports the consolidated demands of multiple teams across the organization. Organizations adopting a shift-left paradigm by planning for good governance early in the AI process will minimize AI efforts for data movement to accelerate model development.
In an era of rapidly evolving AI, data scientists should be flexible in choosing platforms that provide flexibility, collaboration, and governance to maximize adoption and productivity. Let's dive into the workflow automation and pipeline orchestration world. Recently, two prominent terms have appeared in the artificial intelligence and machine learning world: MLOps and LLMOps.
MLOps (Machine Learning Operations) is a set of practices and technology to standardize and streamline the process of construction and deployment of machine learning systems. It covers the entire lifecycle of a machine learning application from data collection to model management. MLOps provides a provision for huge workloads to accelerate time-to-value. MLOps principles are architected based on the DevOps principles to manage applications built-in ML (Machine Learning).
The ML model is created by applying an algorithm to a mass of training data, which will affect the behavior of the model in different environments. Machine learning is not just code, its workflows include the three key assets Code, Model, and Data.
Figure 1: ML solution is comprised of Data, Code, and Model
These assets in the development environment will have the least restrictive access controls and less quality guarantee, while those in production will be the highest quality and tightly controlled. The data is coming from the real world in production where you cannot control its change, and this raises several challenges that need to be resolved. For example:
To resolve these types of issues, there are combined practices from DevOps, data engineering, and practices unique to machine learning.
Figure 2: MLOps is the intersection of Machine Learning, DevOps, and Data Engineering - LLMOps rooted in MLOps
Hence, MLOps is a set of practices that combines machine learning, DevOps, and data engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.
The recent rise of Generative AI with its most common form of large language models (LLMs) prompted us to consider how MLOps processes should be adapted to this new class of AI-powered applications.
LLMOps (Large Language Models Operations) is a specialized subset of MLOps (Machine Learning Operations) tailored for the efficient development and deployment of large language models. LLMOps ensures that model quality remains high and that data quality is maintained throughout data science projects by providing infrastructure and tools.
Use a consolidated MLOps and LLMOps platform to enable close interaction between data science and IT DevOps to increase productivity and deploy a greater number of models into production faster. MLOps and LLMOps will both bring Agility to AI Innovation to the project.
LLMOps tools include MLOps tools and platforms, LLMs that offer LLMOps capabilities, and other tools that can help with fine-tuning, testing, and monitoring. Explore more on LLMOps tools.
MLOps and LLMOps have two different processes and techniques in their primary tasks. Table 1 shows a few key tasks and a comparison between the two methodologies:
Table 1: Key tasks of MLOPs and LLMOps methodologies
Adapting any implications into MLOps required minimal changes to existing tools and processes. Moreover, many aspects do not change:
Table 2: Key properties of LLMs and implications for MLOps
An ML solution comprises data, code, and models. These assets are developed, tested, and moved to production through deployments. For each of these stages, we also need to operate within an execution environment. Each of the data, code, models, and execution environments is ideally divided into development, staging, and production.
Two major patterns can be used to manage model deployment.
The training code (Figure 3, deploy pattern code) which can produce the model is promoted toward the production environment after the code is developed in the dev and tested in staging environments using a subset of data.
The packaged model (Figure 4, deploy pattern model) is promoted through different environments, and finally to production. Model training is executed in the dev environment. The produced model artifact is then moved to the staging environment for model validation checks, before deployment of the model to the production environment. This approach requires two separate paths, one for deploying ancillary code such as inference and monitoring code and the other "deploy code" path where the code for these components is tested in staging and then deployed to production. This pattern is typically used when deploying a one-off model, or when model training is expensive and read-access to production data from the development environment is possible.
The choice of process will also depend on the business use case, maturity of the machine learning infrastructure, compliance and security guidelines, resources available, and what is most likely to succeed for that particular use case. Therefore, it is a good idea to use standardized project templates and strict workflows. Your decisions around packaging ML logic as version-controlled code vs. registered models will help inform your decision about choosing between the deploy models, deploy code, and hybrid architectures.
With LLMs, it is common to package machine-learning logic in new forms. These may include:
Figure 5 is a machine learning operations architecture and process that uses Azure Databricks.
The field of LLMOps is quickly evolving. Here are key components and considerations to bear in mind. Some, but not necessarily all of the following approaches make up a single LLM-based application. Any of these approaches can be taken to leverage your data with LLMs.
RAG LLMs use two systems to obtain external data:
A good rule of thumb is to start with the simplest approach possible, such as prompt engineering with a third-party LLM API, to establish a baseline. Once this baseline is in place, you can incrementally integrate more sophisticated strategies like RAG or fine-tuning to refine and optimize performance. The use of standard MLOps tools such as MLflow is equally crucial in LLM applications to track performance over different approach iterations. Quick, on-the-fly model guidance.
Evaluating LLMs is a challenging and evolving domain, primarily because LLMs often demonstrate uneven capabilities across different tasks. LLMs can be sensitive to prompt variations, demonstrating high proficiency in one task but faltering with slight deviations in prompts. Since most LLMs output natural language, it is very difficult to evaluate the outputs via traditional Natural Language Processing metrics. For domain-specific fine-tuned LLMs, popular generic benchmarks may not capture their nuanced capabilities. Such models are tailored for specialized tasks, making traditional metrics less relevant. It is often the case that LLM performance is being evaluated in domains where text is scarce or there is a reliance on subject matter expert knowledge. In such scenarios, evaluating LLM output can be costly and time-consuming.
Some prominent benchmarks used to evaluate LLM performance include:
A well-defined LLMOps architecture is essential for managing machine learning workflows and operationalizing models in production environments.
Here is an illustration of the production architecture with key adjustments to the reference architecture from traditional MLOps, and below is the reference production architecture for LLM-based applications:
Figure 7: RAG workflow using a self-hosted fine-tuned model (Image Source: Databricks)
Automation of workload is variable and intensive and will help in filling the gap between the data science team and the IT operations team. Planning for good governance early in the AI process will minimize AI efforts for data movement to accelerate model development. The emergence of LLMOps highlights the rapid advancement and specialized needs of the field of Generative AI and LLMOps is still rooted in the foundational principles of MLOps.
In this article, we have looked at key components, practices, tools, and reference architecture with examples such as: