Published 12 February 20259 min read read

Build AI-Powered Applications with Microsoft.Extensions.AI

Integrating LLMs into your applications is getting easier every day. A few months ago, Microsoft released a set of core libraries for AI building blocks, designed as a unified layer of C# abstractions for interacting with AI services. Let's dive into LLMs by integrating Microsoft.Extensions.AI into a Minimal API and see how easy it is to supercharge our applications with generative AI!

Old fashioned typewriter typing the text "Microsoft.Extensions.AI"

With the rise of generative AI and its growing capabilities, I had to try integrating some AI in a .NET API myself. While Microsoft.Extensions.AI is still in preview, and the documentation is a bit lacking, I was surprised at how easy it was to get good results quickly!

Let’s dive into LLMs by integrating Microsoft.Extensions.AI into a Minimal API and see how easy it is to supercharge our applications with generative AI!

Getting Started with LLMs

When working with LLMs, you have two main options:

Using an AI cloud provider (such as Azure, OpenAI, Anthropic, etc.)
- As of this preview, Microsoft only supports Azure (both OpenAI and Azure Model Inference) and OpenAI.
Using a local LLM (with Ollama, for example)

We’ll focus on using a local LLM with Ollama since it’s an easy, no-cost way to experiment with AI. Even if you plan to use a cloud provider, using a local LLM during development is a easier and more cost effective way to develop AI-Powered applications.

There are a few ways to start using Ollama:

Installing Ollama Locally

The simplest way to use Ollama is by installing the client on your computer from Ollama's official download page.

Once installed, you can interact with Ollama using the CLI or by calling its REST API.

Before you can start, you'll need to pull a model:

1ollama pull llama3.2
2ollama run llama3.2

1curl http://localhost:11434/api/pull -d '{
2  "model": "llama3.2"
3}'

After that, you can begin chatting using the CLI or by making API calls:

1curl http://localhost:11434/api/chat -d '{
2  "model": "llama3.2",
3  "messages": [
4    {
5      "role": "user",
6      "content": "why is the sky blue?"
7    }
8  ]
9}'

Running Ollama in Docker

You can also use Ollama with Docker.

1# run Ollama with CPU only
2docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
3
4# run Ollama with support for Nvidia GPUs (using [Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation))
5docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Now you can execute commands inside the container with docker exec -it <container-name> like:

1docker exec -it ollama ollama pull Llama3.2
2docker exec -it ollama ollama run Llama3.2 # to immediatly talk to the llm inside the container

Integrating Ollama using Aspire

Finally, you can easily set up Ollama using .NET Aspire.

This is my preferred method since it's extremely easy to configure, add or swap multiple LLM models, and reference them in any project that needs LLM access.

To add Ollama to your Aspire setup, install the CommunityToolkit.Aspire.Hosting.Ollama package and add the following to your AppHost/Program.cs.

1var ollama = builder.AddOllama("ollama")
2    .WithDataVolume()
3    .WithContainerRuntimeArgs("--gpus=all");
4
5var llamaModel = ollama.AddModel("llama3.2");

Now you can add a reference to any project with .WithReference(llamaModel) . I personally prefer adding a connection name to my reference so I don’t need to modify my API code when swapping LLM Models, like this: .WithReference(llamaModel, "some_connectionName")

Make sure to add WithDataVolume, otherwise, Ollama will download your LLM models every time you restart Aspire. Depending on your internet speed and model size, this can take anywhere from a few minutes to several hours!

Integrating the IChatClient into a .NET API

Microsoft.Extensions.AI is a base abstraction for AI features in .NET. It provides many utility functions and classes and is designed as a foundational layer for working with LLMs. But for now, we’re only interested in IChatClient, which allows us to perform chat completion with our chosen LLM.

Before we can set up a basic implementation, we need to install Microsoft.Extensions.AI and Microsoft.Extensions.AI.Ollama.

Your First LLM-Powered API

A minimal setup looks something like this:

1var builder = WebApplication.CreateBuilder(args);
2
3var ollamaUrl = "http://localhost:11434";
4var ollamaModel = "llama3.2";
5
6builder.Services.AddChatClient(new OllamaChatClient(ollamaUrl, ollamaModel));
7
8var app = builder.Build();
9
10app.MapGet("/hello/{name}", async (IChatClient chat, string name) =>
11{
12  var response = await chat.CompleteAsync($"Congratulate {name} on adding a LLM to their API!");
13  return response.Message.Text;
14});
15
16app.Run();

Now you can call the endpoint and receive a friendly message congratulating you on integrating an LLM into your API!

Using Multiple LLMs Simultaneously

Your application will likely use more than one LLM model. It's common to select different models based on task complexity and goals. Not all models are created equal, and using cheaper models for simpler tasks helps reduce costs.

Microsoft.Extensions.AI includes an AddKeyedChatClient method, which adds IChatClient as a keyed service to the DI container (keyed services were introduced in .NET 8):

1builder.Services.AddKeyedChatClient("llama", new OllamaChatClient(ollamaUrl, ollamaModel));
2builder.Services.AddKeyedChatClient("qwen", new OllamaChatClient(ollamaUrl, "qwen2:1.5b"));
3
4var app = builder.Build();
5
6app.MapGet("/hello/{name}", async (
7  [FromKeyedServices("llama")] IChatClient llamaChat,
8  [FromKeyedServices("qwen")] IChatClient qwenChat,
9  string name) =>
10{
11  var nameResponse =
12    await qwenChat.CompleteAsync($"Turn the name {name} into an epic name for a great hero of the ages!");
13  var epicName = nameResponse.Message.Text;
14
15  var plot = await qwenChat.CompleteAsync(
16    $"Generate a one-sentence plot for a epic medieval fantasy story about our hero {epicName}!");
17
18  var response = await llamaChat.CompleteAsync($"""
19                  Write a short story about our epic hero {epicName}!
20
21                  {epicName} was a hero of legend, known far and wide for their bravery and cunning.
22                  {plot.Message.Text}
23                  """);
24  return response.Message.Text;
25});
26
27app.Run();

In this example, we use two IChatClient instances: one for 'Llama 3.2 3B' and another for 'Qwen 2 0.5B'. The smaller Qwen model generates quick additional information, which is then used in a more complex prompt executed by the Llama model.

Empowering LLMs with Function Invocation

A key feature of IChatClient is the ability to add middleware. Similar to middleware in WebAPI or Minimal APIs, IChatClient middleware allows us to enhance its functionality.

One of the more interesting middleware options is UseFunctionInvocation. Many LLMs can call external functions if they are available. Normally, this requires a lot of manual work since the LLM will respond with a message that contains the intention and arguments of what function to call. We, as developers, are then supposed to call this function and return the result to the LLM as part of the whole chat history.

With UseFunctionInvocation, a middleware intercepts the function invocation intent, executes the function, and automatically returns the result to the LLM. This means that our call to CompleteAsync completes only after the LLM has finished generating a message, even if it involves invoking one or more functions.

Teaching Your LLM All 1025 Pokémon

One exciting use case is giving our LLM access to APIs. In this example, we’ll create a tool that allows the LLM to call the free PokeAPI.

The AddChatClient method returns a ChatClientBuilder, which lets us attach middleware to the injected IChatClient, like this:

1builder.Services.AddChatClient(new OllamaChatClient(ollamaUrl, ollamaModel))
2.UseLogging()
3.UseFunctionInvocation();

Next, we need to provide IChatClient with the tools it can use by passing a ChatOptions object. While we could define this within the ChatClientBuilder pipeline using ConfigureOptions, it’s worth noting that IChatClient cannot resolve services during function invocation. Since we use HttpClient to call the PokeAPI, it’s better to construct ChatOptions in the route itself to ensure proper HttpClient injection:

1app.MapGet("/compare/{pokemonOne}/{pokemonTwo}", async (
2  IChatClient llamaChat,
3  IPokemonClient client,
4  string pokemonOne,
5  string pokemonTwo) =>
6{
7  var chatOptions = new ChatOptions
8  {
9    Tools =
10    [
11      AIFunctionFactory.Create(
12        client.GetPokemonSpeciesAsync, 
13        new AIFunctionFactoryCreateOptions
14        {
15          Name = "get_pokemon_by_name",
16          Description = "Get detailed information about a Pokemon based on its name"
17        })
18    ]
19  };
20
21  var pokemonOneResponse = await llamaChat.CompleteAsync(
22    $"""
23     Compare the Pokemon {pokemonOne} and {pokemonTwo} and tell me which one is better!
24     Phrase your answer in the Pokedex style.
25     """, chatOptions);
26
27  return pokemonOneResponse.Message.Text;
28});

Several things happen here:

We use AIFunctionFactory.Create to build an AIFunction, representing a function that the AI can call. The factory collects metadata such as argument names, types, function name, description, and return type.
We provide our own Name and Description. Just like in prompt engineering, clear and explicit instructions matter. Software function names don’t always convey enough context, so it’s best to supply LLM-friendly names.
Finally, we pass the ChatOptions to the CompleteAsync method, letting the LLM know what tools it can use.

With this setup, our LLM now has access to all 1025 Pokémon via the PokeAPI.

Conclusion

Building simple AI-powered applications has become almost trivial with Microsoft.Extensions.AI. It only takes a few lines of code to start using LLMs in your API.

Adding extra features like chat history is just as straightforward. CompleteAsync supports an IList<ChatMessage>, making chat history easy to implement.

It is worth repeating that Microsoft.Extensions.AI is mainly meant as a core library that provides a unified layer of C# abstractions for interacting with AI services. Getting it up and running is easy, but if you have more complex requirements you either have to get your hands dirty or upgrade your setup to something like Semantic Kernel.

Do play around with Microsoft.Extensions.AI and if you have an idea or are maintaining a library targeting AI Services, make sure you make it compatible with Microsoft.Extensions.AI!

Build AI-Powered Applications with Microsoft.Extensions.AI

What to read next: