Local LLMs: Host and Deploy Your Own LLM with Ollama

Local LLMs: Host and Deploy Your Own LLM with Ollama

Introduction

If you're like me, you've often wanted to test ideas without spending money. Using ChatGPT in my apps usually means jumping through a lot of hoops because the API is paid. So, I thought it would be nice to have a lightweight LLM running locally for personal use. I found a cool project called 'Ollama'. The concept is simple: it lets you download and host any publicly available model. This includes models like Llama, Mistral, Gemma, and their fine-tunes on Hugging Face. The best part is that Ollama has the same standardized API as OpenAI, so you can use your existing codebase and just switch out the endpoint to work seamlessly with your own local LLM. Let's look into the installation and get our hands dirty with some code.

Installation

The installation is straightforward. Visit their GitHub page, where you'll find binaries for Windows and Mac. They also have a Docker image if you prefer that. Once installed, you can access it from your CLI using ollama <command>.

Basic Usage

Let's go through some basic commands:

  1. To serve the Ollama API, use ollama serve, which runs the service in the background. We will discuss client usage in a bit.
  2. To install a model, use ollama pull <model name>. You can check out the entire registry of valid models and their different size variants here.
  3. Use ollama list to list all installed models.
  4. To chat with any model, run ollama run <model name>, which will initiate a ChatGPT-like conversational flow.

Rest API

As mentioned earlier, you can easily use the Ollama service inside your applications. First, ensure the Ollama service is active by running ollama serve. Its API is structured the same as OpenAI's. Here's an example of initiating a chat with a locally installed Llama3 model through Ollama:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "Why is the sky blue?" }
  ]
}'

Python Client

You can also use the Python client ollama-python. To install it, run:

pip install ollama

The code is the same as OpenAI's Python client. Here's an example:

import ollama
response = ollama.chat(model='llama3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

There is also a JavaScript client, which you can read more about here.

Easy Transition

Since their API is the same as OpenAI's, it makes managing a codebase easier. If you later decide to use the ChatGPT API instead, you won't need to edit any code, making for a nice developer experience and a seamless transition. Conversely, you can easily switch to your own self-hosted LLM from a ChatGPT-powered app without major code changes or overly complicated wrappers.

Conclusion

Overall, Ollama provides a simple and efficient way to host and deploy your own LLMs locally. It offers a smooth setup process and integrates well with existing codebases thanks to its compatibility with OpenAI's API. This flexibility allows for cost-effective testing and development without sacrificing the ability to switch back to paid services if needed. With numerous community integrations, including web UIs and RAG applications, Ollama is a promising solution for developers looking to leverage LLMs in their projects. For more information on these integrations, check out the full list here.