Foundation Models: The 5-Step Workflow to Build AI Faster

Deep learning has enabled us to build detailed, specialized AI models. This is achievable provided we gather enough data, label it, and use that to train and deploy these models, such as customer service chatbots or fraud detection systems in banking.

In the past, building a new model for a specialized task, like predictive maintenance in manufacturing, required starting from scratch with data selection, curation, labeling, model development, training, and validation. However, foundation models are fundamentally changing that paradigm.

What is a Foundation Model?

A foundation model represents a more focused, centralized effort to create a versatile base model. Through a process known as fine-tuning, this base foundation model can be adapted to create a more specialized model. For instance, if you need an AI model for programming language translation, you can start with a foundational model and then fine-tune it with specific programming language data. This process of fine-tuning and adapting base foundation models significantly accelerates AI model development.

So, how does this work in practice? Let's explore the five key stages of the workflow for creating an AI model using this approach.

Stage 1: Prepare the Data

The first stage involves preparing the data needed to train the AI model. This requires a massive amount of data, potentially petabytes, spanning numerous domains. The dataset can be a combination of available open-source data and proprietary information.

This stage involves several critical data processing tasks:

  • Categorization: This step describes the data. For example, identifying which data is in English or German, or which pertains to Ansible versus Java.
  • Filtering: Various filters are applied to the data. This allows for the removal of undesirable content like hate speech, profanity, and abuse, ensuring the model isn't trained on it. Other filters can flag copyrighted material or private and sensitive information.
  • Deduplication: Duplicate data is also identified and removed from the dataset.

The result of this stage is a "base data pile." This curated collection of data can be versioned and tagged, which is crucial for governance. It allows teams to document precisely what data the AI model was trained on and which filters were applied.

Stage 2: Train the Model

With the base data pile prepared, the next stage is to train the model. This begins by selecting an appropriate foundational model. There are many types to choose from, including generative models, encoder-only models, lightweight models, and high-parameter models. The choice depends on the intended application. Are you building a chatbot or a classifier? You must pick the foundational model that best matches your use case and then pair it with the corresponding data pile.

Next, the data pile is tokenized. Foundation models operate on tokens rather than raw words, and a large data pile can result in trillions of tokens. The training process then begins, using these tokens.

Note: This process can be time-consuming, depending on the model's size. Training large-scale foundation models can take several months and require thousands of GPUs. However, once this intensive phase is complete, the most significant computational costs are covered.

Stage 3: Validate the Model

When training is finished, the model must be benchmarked. This involves running the model and assessing its performance against a set of established benchmarks that help define its quality. From this validation, a "model card" can be created, which documents the trained model and the benchmark scores it achieved.

Stage 4: Fine-Tune the Model

Up to this point, the primary role involved has been the data scientist. In the tuning stage, the application developer comes into play. This individual does not need to be an AI expert. Their role is to engage with the model, for example, by generating prompts that elicit high-quality performance. They can also provide additional local data to fine-tune the model and further improve its output. This stage is significantly faster than building a model from the ground up, often taking only hours or days.

Stage 5: Deploy the Model

Finally, the model is ready for deployment. It can be run as a service offering on a public cloud or embedded directly into an application that operates closer to the edge of the network. Regardless of the deployment strategy, the model can be continuously iterated upon and improved over time.

A New Era for AI Development

Overall, foundation models are transforming how we build specialized AI models. This five-stage workflow empowers teams to create sophisticated AI and AI-derived applications with greater efficiency, dramatically speeding up the development lifecycle.