\ Ollama has become the standard for running Large Language Models (LLMs) locally. In this tutorial, I want to show you the most important things you should know about Ollama.
https://youtu.be/AGAETsxjg0o?embedable=true
Watch on YouTube: Ollama Full Tutorial
Ollama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. It bundles model weights, configuration, and data into a single Modelfile package. Ollama offers a command-line interface (CLI), a REST API, and a Python/JavaScript SDK, allowing users to download models, run them offline, and even call user-defined functions. Running models locally gives users privacy, removes network latency, and keeps data on the user’s device.
Visit the official website to download Ollama https://ollama.com/. It’s available for Mac, Windows, and Linux.
\ Linux:
curl -fsSL https://ollama.com/install.sh | sh
macOS:
brew install ollama
Windows: download the .exe installer and run it.
Before running models, it is essential to understand Quantization. Ollama typically runs models quantized to 4 bits (q4_0), which significantly reduces memory usage with minimal loss in quality.
Recommended Hardware:
7B Models (e.g., Llama 3, Mistral): Requires ~8GB RAM (runs on most modern laptops).
13B — 30B Models: Requires 16GB — 32GB RAM.
70B+ Models: Requires 64GB+ RAM or dual GPUs.
GPU: An NVIDIA GPU or Apple Silicon (M1/M2/M3) is highly recommended for speed.
\ Go to the Ollama website and click on the “Models” and select the model for your test.
After that, click on the model name and copy the terminal command:
Then, open the terminal window and paste the command:
It will allow you to download and chat with a model immediately.
Ollama’s CLI is central to model management. Common commands include:
You can “fine-tune” a model’s personality and constraints using a Modelfile. This is similar to a Dockerfile.
# 1. Base the model on an existing one FROM llama3 # 2. Set the creative temperature (0.0 = precise, 1.0 = creative) PARAMETER temperature 0.7 # 3. Set the context window size (default is 4096 tokens) PARAMETER num_ctx 4096 # 4. Define the System Prompt (The AI’s “brain”) SYSTEM """ You are a Senior Python Backend Engineer. Only answer with code snippets and brief technical explanations. Do not be conversational. """
FROM defines the base model
SYSTEM sets a system prompt
PARAMETER controls inference behavior
After that, you need to build the model by using this command:
ollama create [change-to-your-custom-name] -f Modelfile
This wraps the model + prompt template together into a reusable package.
Then run in:
ollama run [change-to-your-custom-name]
Press enter or click to view image in full size
Ollama can run as a local server that apps can call. To start the server use the command:
ollama serve
It listens on http://localhost:11434 by default.
import requests r = requests.post( "http://localhost:11434/api/chat", json={ "model": "llama3", "messages": [{"role":"user","content":"Hello Ollama"}] } ) print(r.json()["message"]["content"])
This lets you embed Ollama into apps or services.
Use Ollama inside Python applications with the official library. Run these commands:
Create and activate virtual environments:
python3 -m venv .venv source .venv/bin/activate
Install the official library:
pip install ollama
Use this simple Python code:
import ollama # This sends a message to the model 'gemma:2b' response = ollama.chat(model='gemma:2b', messages=[ { 'role': 'user', 'content': 'Write a short poem about coding.' }, ]) # Print the AI's reply print(response['message']['content'])
This works over the local API automatically when Ollama is running.
You can also call a local server:
import requests r = requests.post( "http://localhost:11434/api/chat", json={ "model": "llama3", "messages": [{"role":"user","content":"Hello Ollama"}] } ) print(r.json()["message"]["content"])
Ollama also supports cloud models — useful when your machine can’t run very large models.
First, create an account on https://ollama.com/cloud and sign in. Then, inside the Models pag,e click on the cloud link and select any model you want to test.
\ In the models list, you will see the model with the -cloud prefix**,** which means it is available in the Ollama cloud.
Click on it and copy the CLI command. Then, inside the terminal, use:
ollama signin
To sign in to your Ollama account. Once you sign in with ollama signin, then run cloud models:
ollama run nemotron-3-nano:30b-cloud
While Ollama is local-first, Ollama Cloud allows you to push your custom models (the ones you built with Modelfiles) to the web to share with your team or use across devices.
ollama push your-username/change-to-your-custom-model-name
That is the complete overview of Ollama! It is a powerful tool that gives you total control over AI. If you like this tutorial, please like it and share your feedback in the section below.
Cheers! ;)
\


