Ollama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. It bundles model weights, configurationOllama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. It bundles model weights, configuration

Complete Ollama Tutorial (2026) – LLMs via CLI, Cloud & Python

2026/01/05 13:09
5분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

\ Ollama has become the standard for running Large Language Models (LLMs) locally. In this tutorial, I want to show you the most important things you should know about Ollama.

https://youtu.be/AGAETsxjg0o?embedable=true

Watch on YouTube: Ollama Full Tutorial

What is Ollama?

Ollama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. It bundles model weights, configuration, and data into a single Modelfile package. Ollama offers a command-line interface (CLI), a REST API, and a Python/JavaScript SDK, allowing users to download models, run them offline, and even call user-defined functions. Running models locally gives users privacy, removes network latency, and keeps data on the user’s device.

Install Ollama

Visit the official website to download Ollama https://ollama.com/. It’s available for Mac, Windows, and Linux.

\ Linux:

curl -fsSL https://ollama.com/install.sh | sh

macOS:

brew install ollama

Windows: download the .exe installer and run it.

How to Run Ollama

Before running models, it is essential to understand Quantization. Ollama typically runs models quantized to 4 bits (q4_0), which significantly reduces memory usage with minimal loss in quality.

Recommended Hardware:

  • 7B Models (e.g., Llama 3, Mistral): Requires ~8GB RAM (runs on most modern laptops).

  • 13B — 30B Models: Requires 16GB — 32GB RAM.

  • 70B+ Models: Requires 64GB+ RAM or dual GPUs.

  • GPU: An NVIDIA GPU or Apple Silicon (M1/M2/M3) is highly recommended for speed.

\ Go to the Ollama website and click on the “Models” and select the model for your test.

After that, click on the model name and copy the terminal command:

Then, open the terminal window and paste the command:

It will allow you to download and chat with a model immediately.

Ollama CLI — Core Commands

Ollama’s CLI is central to model management. Common commands include:

  • ollama pull — Download a model
  • ollama run — Run a model interactively
  • ollama list or ollama ls — List downloaded models
  • ollama rm — Remove a model
  • ollama create -f — Create a custom model
  • ollama serve — Start the Ollama API server
  • ollama ps — Show running models
  • ollama stop — Stop a running model
  • ollama help — Show help

Advanced Customization: Custom model with Modelfiles

You can “fine-tune” a model’s personality and constraints using a Modelfile. This is similar to a Dockerfile.

  • Create a file named Modelfile
  • Add the following configuration:

# 1. Base the model on an existing one FROM llama3 # 2. Set the creative temperature (0.0 = precise, 1.0 = creative) PARAMETER temperature 0.7 # 3. Set the context window size (default is 4096 tokens) PARAMETER num_ctx 4096 # 4. Define the System SYSTEM """ You are a Senior Python Backend Engineer. Only answer with code snippets and brief technical explanations. Do not be conversational. """

FROM defines the base model

SYSTEM sets a system prompt

PARAMETER controls inference behavior

After that, you need to build the model by using this command:

ollama create [change-to-your-custom-name] -f Modelfile

This wraps the model + prompt template together into a reusable package.

Then run in:

ollama run [change-to-your-custom-name]

Press enter or click to view image in full size

Ollama Server (Local API)

Ollama can run as a local server that apps can call. To start the server use the command:

ollama serve

It listens on http://localhost:11434 by default.

Raw HTTP

import requests r = requests.post( "http://localhost:11434/api/chat", json={ "model": "llama3", "messages": [{"role":"user","content":"Hello Ollama"}] } ) print(r.json()["message"]["content"])

This lets you embed Ollama into apps or services.

Python Integration

Use Ollama inside Python applications with the official library. Run these commands:

Create and activate virtual environments:

python3 -m venv .venv source .venv/bin/activate

Install the official library:

pip install ollama

Use this simple Python code:

import ollama # This sends a message to the model 'gemma:2b' response = ollama.chat(model='gemma:2b', messages=[ { 'role': 'user', 'content': 'Write a short poem about coding.' }, ]) # Print the AI's reply print(response['message']['content'])

This works over the local API automatically when Ollama is running.

You can also call a local server:

import requests r = requests.post( "http://localhost:11434/api/chat", json={ "model": "llama3", "messages": [{"role":"user","content":"Hello Ollama"}] } ) print(r.json()["message"]["content"])

Using Ollama Cloud

Ollama also supports cloud models — useful when your machine can’t run very large models.

First, create an account on https://ollama.com/cloud and sign in. Then, inside the Models pag,e click on the cloud link and select any model you want to test.

\ In the models list, you will see the model with the -cloud prefix**,** which means it is available in the Ollama cloud.

Click on it and copy the CLI command. Then, inside the terminal, use:

ollama signin

To sign in to your Ollama account. Once you sign in with ollama signin, then run cloud models:

ollama run nemotron-3-nano:30b-cloud

Your Own Model in the Cloud

While Ollama is local-first, Ollama Cloud allows you to push your custom models (the ones you built with Modelfiles) to the web to share with your team or use across devices.

  • Create an account at ollama.com.
  • Add your public key (found in ~/.ollama/id_ed25519.pub).
  • Push your custom model:

ollama push your-username/change-to-your-custom-model-name

Conclusion

That is the complete overview of Ollama! It is a powerful tool that gives you total control over AI. If you like this tutorial, please like it and share your feedback in the section below.

Cheers! ;)

\

시장 기회
Cloud 로고
Cloud 가격(CLOUD)
$0.02174
$0.02174$0.02174
-1.13%
USD
Cloud (CLOUD) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!