Build AI- Powered Applications with Microsoft. Extensions. AI
Integrating LLMs into your applications is getting easier every day. A few months ago, Microsoft released a set of core libraries for AI building blocks, designed as a unified layer of C# abstractions for interacting with AI services. Let's dive into LLMs by integrating Microsoft.Extensions.AI into a Minimal API and see how easy it is to supercharge our applications with generative AI!

With the rise of generative AI and its growing capabilities, I had to try integrating some AI in a .NET API myself. While Microsoft.
is still in preview, and the documentation is a bit lacking, I was surprised at how easy it was to get good results quickly!
Let’s dive into LLMs by integrating Microsoft.
into a Minimal API and see how easy it is to supercharge our applications with generative AI!
Getting Started with LLMs
When working with LLMs, you have two main options:
- Using an AI cloud provider (such as Azure, OpenAI, Anthropic, etc.)
- As of this preview, Microsoft only supports Azure (both OpenAI and Azure Model Inference) and OpenAI.
- Using a local LLM (with Ollama, for example)
We’ll focus on using a local LLM with Ollama since it’s an easy, no-cost way to experiment with AI. Even if you plan to use a cloud provider, using a local LLM during development is a easier and more cost effective way to develop AI-Powered applications.
There are a few ways to start using Ollama:
Installing Ollama Locally
The simplest way to use Ollama is by installing the client on your computer from Ollama's official download page.
Once installed, you can interact with Ollama using the CLI or by calling its REST API.
Before you can start, you'll need to pull a model:
1ollama pull llama3.22ollama run llama3.2
1curl http://localhost:11434/api/pull -d '{2"model": "llama3.2"3}'
After that, you can begin chatting using the CLI or by making API calls:
1curl http://localhost:11434/api/chat -d '{2"model": "llama3.2",3"messages": [4{5"role": "user",6"content": "why is the sky blue?"7}8]9}'
Running Ollama in Docker
You can also use Ollama with Docker.
1# run Ollama with CPU only2docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama34# run Ollama with support for Nvidia GPUs (using [Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation))5docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Now you can execute commands inside the container with docker exec -
like:
1docker exec -it ollama ollama pull Llama3.22docker exec -it ollama ollama run Llama3.2 # to immediatly talk to the llm inside the container
Integrating Ollama using Aspire
Finally, you can easily set up Ollama using .NET Aspire.
This is my preferred method since it's extremely easy to configure, add or swap multiple LLM models, and reference them in any project that needs LLM access.
To add Ollama to your Aspire setup, install the CommunityToolkit.
package and add the following to your AppHost/
.
1var ollama = builder.AddOllama("ollama")2.WithDataVolume()3.WithContainerRuntimeArgs("--gpus=all");45var llamaModel = ollama.AddModel("llama3.2");
Now you can add a reference to any project with .
. I personally prefer adding a connection name to my reference so I don’t need to modify my API code when swapping LLM Models, like this: .
Make sure to add WithDataVolume
, otherwise, Ollama will download your LLM models every time you restart Aspire. Depending on your internet speed and model size, this can take anywhere from a few minutes to several hours!
Integrating the IChatClient into a .NET API
Microsoft.
is a base abstraction for AI features in .NET. It provides many utility functions and classes and is designed as a foundational layer for working with LLMs. But for now, we’re only interested in IChatClient
, which allows us to perform chat completion with our chosen LLM.
Before we can set up a basic implementation, we need to install Microsoft.
and Microsoft.
.
Your First LLM-Powered API
A minimal setup looks something like this:
1var builder = WebApplication.CreateBuilder(args);23var ollamaUrl = "http://localhost:11434";4var ollamaModel = "llama3.2";56builder.Services.AddChatClient(new OllamaChatClient(ollamaUrl, ollamaModel));78var app = builder.Build();910app.MapGet("/hello/{name}", async (IChatClient chat, string name) =>11{12var response = await chat.CompleteAsync($"Congratulate {name} on adding a LLM to their API!");13return response.Message.Text;14});1516app.Run();
Now you can call the endpoint and receive a friendly message congratulating you on integrating an LLM into your API!
Using Multiple LLMs Simultaneously
Your application will likely use more than one LLM model. It's common to select different models based on task complexity and goals. Not all models are created equal, and using cheaper models for simpler tasks helps reduce costs.
Microsoft.
includes an AddKeyedChatClient
method, which adds IChatClient
as a keyed service to the DI container (keyed services were introduced in .NET 8):
1builder.Services.AddKeyedChatClient("llama", new OllamaChatClient(ollamaUrl, ollamaModel));2builder.Services.AddKeyedChatClient("qwen", new OllamaChatClient(ollamaUrl, "qwen2:1.5b"));34var app = builder.Build();56app.MapGet("/hello/{name}", async (7[FromKeyedServices("llama")] IChatClient llamaChat,8[FromKeyedServices("qwen")] IChatClient qwenChat,9string name) =>10{11var nameResponse =12await qwenChat.CompleteAsync($"Turn the name {name} into an epic name for a great hero of the ages!");13var epicName = nameResponse.Message.Text;1415var plot = await qwenChat.CompleteAsync(16$"Generate a one-sentence plot for a epic medieval fantasy story about our hero {epicName}!");1718var response = await llamaChat.CompleteAsync($"""19Write a short story about our epic hero {epicName}!2021{epicName} was a hero of legend, known far and wide for their bravery and cunning.22{plot.Message.Text}23""");24return response.Message.Text;25});2627app.Run();
In this example, we use two IChatClient
instances: one for 'Llama 3.2 3B' and another for 'Qwen 2 0.5B'. The smaller Qwen model generates quick additional information, which is then used in a more complex prompt executed by the Llama model.
Empowering LLMs with Function Invocation
A key feature of IChatClient
is the ability to add middleware. Similar to middleware in WebAPI or Minimal APIs, IChatClient
middleware allows us to enhance its functionality.
One of the more interesting middleware options is UseFunctionInvocation
. Many LLMs can call external functions if they are available. Normally, this requires a lot of manual work since the LLM will respond with a message that contains the intention and arguments of what function to call. We, as developers, are then supposed to call this function and return the result to the LLM as part of the whole chat history.
With UseFunctionInvocation
, a middleware intercepts the function invocation intent, executes the function, and automatically returns the result to the LLM. This means that our call to CompleteAsync
completes only after the LLM has finished generating a message, even if it involves invoking one or more functions.
Teaching Your LLM All 1025 Pokémon
One exciting use case is giving our LLM access to APIs. In this example, we’ll create a tool that allows the LLM to call the free PokeAPI.
The AddChatClient
method returns a ChatClientBuilder
, which lets us attach middleware to the injected IChatClient
, like this:
1builder.Services.AddChatClient(new OllamaChatClient(ollamaUrl, ollamaModel))2.UseLogging()3.UseFunctionInvocation();
Next, we need to provide IChatClient
with the tools it can use by passing a ChatOptions
object. While we could define this within the ChatClientBuilder
pipeline using ConfigureOptions
, it’s worth noting that IChatClient
cannot resolve services during function invocation. Since we use HttpClient
to call the PokeAPI, it’s better to construct ChatOptions
in the route itself to ensure proper HttpClient
injection:
1app.MapGet("/compare/{pokemonOne}/{pokemonTwo}", async (2IChatClient llamaChat,3IPokemonClient client,4string pokemonOne,5string pokemonTwo) =>6{7var chatOptions = new ChatOptions8{9Tools =10[11AIFunctionFactory.Create(12client.GetPokemonSpeciesAsync,13new AIFunctionFactoryCreateOptions14{15Name = "get_pokemon_by_name",16Description = "Get detailed information about a Pokemon based on its name"17})18]19};2021var pokemonOneResponse = await llamaChat.CompleteAsync(22$"""23Compare the Pokemon {pokemonOne} and {pokemonTwo} and tell me which one is better!24Phrase your answer in the Pokedex style.25""", chatOptions);2627return pokemonOneResponse.Message.Text;28});
Several things happen here:
- We use
AIFunctionFactory.
to build anCreate AIFunction
, representing a function that the AI can call. The factory collects metadata such as argument names, types, function name, description, and return type. - We provide our own
Name
andDescription
. Just like in prompt engineering, clear and explicit instructions matter. Software function names don’t always convey enough context, so it’s best to supply LLM-friendly names. - Finally, we pass the
ChatOptions
to theCompleteAsync
method, letting the LLM know what tools it can use.
With this setup, our LLM now has access to all 1025 Pokémon via the PokeAPI.
Conclusion
Building simple AI-powered applications has become almost trivial with Microsoft.
. It only takes a few lines of code to start using LLMs in your API.
Adding extra features like chat history is just as straightforward. CompleteAsync
supports an IList<ChatMessage>
, making chat history easy to implement.
It is worth repeating that Microsoft.
is mainly meant as a core library that provides a unified layer of C# abstractions for interacting with AI services. Getting it up and running is easy, but if you have more complex requirements you either have to get your hands dirty or upgrade your setup to something like Semantic Kernel.
Do play around with Microsoft.
and if you have an idea or are maintaining a library targeting AI Services, make sure you make it compatible with Microsoft.
!
What to read next:
I really hope you enjoyed this article. If you did, you might want to check out some of these articles I've written on similar topics.
- Read .NET Aspire & Next.js: The Dev Experience You Were Missing— 7 min read read.NET Aspire & Next.js: The Dev Experience You Were Missing
- Read Improve Signalr and React Performance User Experience— 5 min read readImprove Signalr and React Performance User Experience
- Read Sync React with SignalR Events— 8 min read readSync React with SignalR Events