TL;DR: Ollama is an AI tool that lets you download, run, and manage AI models on your own computer. It works on macOS, Linux, and Windows and exposes a local API so you can use those models in scripts, apps, and workflows.

What is Ollama?

Ollama is a platform for running LLMs locally. It gives you a simple command-line interface, a local server, and an API that works on your machine by default. That means you can download a model, start it from the terminal, and send prompts to it without building the runtime stack yourself.

So, how does Ollama work behind the scenes? 

After installation, you pull a model from the Ollama library. Ollama stores the model locally, loads it when needed, and makes it available through the terminal or the local REST API. By default, models stay in memory for 5 minutes after use, reducing startup delay for repeated prompts.

This local approach has clear trade-offs. You get more control over privacy, latency, and deployment. At the same time, performance depends on your own CPU, GPU, RAM, and disk space. Ollama supports Apple GPUs via Metal, NVIDIA GPUs on Windows and Linux, AMD Radeon GPUs via ROCm, and additional Vulkan-based support on Windows and Linux. 

Key Features of Ollama

Ollama has grown popular because it keeps the setup simple while still offering useful control. These are the features that matter most in day-to-day use:

  • Local model execution
  • Cross-platform support
  • Straightforward CLI commands
  • Local REST API
  • Model customization
  • Broad model coverage
  • Optional cloud features

Did You Know? The Ollama library supports over 1,700 different local LLMs. It has over 140,000 GitHub stars and 11,500 forks. (Source: Splunk)

How to Install Ollama on Mac/Linux/Windows?

If you are looking up how to install Ollama, the good news is that the process is simple across all major desktop platforms. The exact method depends on your operating system. 

macOS

Ollama is available on macOS, and Apple Silicon systems can use GPU acceleration through Metal. You can download the app from Ollama’s site or use the installation script in the official repository. 

curl -fsSL https://ollama.com/install.sh | sh

Linux

On Linux, Ollama provides an install script, and the docs also cover service-based startup with systemctl. This is useful if you want Ollama running as a background service. 

curl -fsSL https://ollama.com/install.sh | sh

sudo systemctl start ollama

sudo systemctl status ollama

Windows

Ollama runs as a native Windows application. The official docs state that it works on Windows 10 22H2 or newer, supports NVIDIA and AMD Radeon GPUs, and installs in your home directory by default without requiring Administrator access. 

irm https://ollama.com/install.ps1 | iex

Master in-demand generative AI skills and tools, including Agentic AI, LLMs, RAG, Langchain, prompt engineering, and more with our Applied Generative AI Course.

Ollama vs Cloud-Based LLMs

Here is a practical comparison of Ollama and cloud-hosted large language model platforms.

Factor

Ollama

Cloud-Based LLMs

Where Does the inference Run?

On your device

On provider servers

Internet Dependence

Needed to download models, but local inference can stay offline

Usually needed for every request

Data Path

Local prompts can stay on your machine

Prompts are sent to a remote API

Setup Effort

Requires installation and local resources

Faster to start with an API key

Cost Model

No per-call API fee, but you bear hardware costs

Usage-based pricing is common

Scalability

Limited by local hardware

Easier to scale across users and workloads

Best Fit

Private experiments, local assistants, offline-friendly tools

Large apps, high concurrency, managed infrastructure

Supported Models List

Ollama platform supports diverse model families across general language tasks, coding, reasoning, vision, and embeddings.

  • Llama 3.1 and Llama 3.2: Strong general-purpose model families from Meta, with sizes ranging from smaller local-friendly options to much larger variants. Llama 3.2 also includes text-focused multilingual models, and the broader Llama collection is one of the most common reasons people search for ollama vs llama.
  • Gemma 3: Google’s lightweight family with multimodal support, a 128K context window, and support for more than 140 languages.
  • Qwen 3 and Qwen 2.5: Strong multilingual model families with multiple sizes and broad general-purpose coverage.
  • Qwen2.5-Coder and Qwen3-Coder: Coding-focused models aimed at code generation, code reasoning, and code fixing.
  • DeepSeek-R1: A reasoning-oriented family available in many sizes, from small local variants to very large versions.
  • LLaVA: A multimodal model for image-and-text understanding, useful for visual reasoning tasks.
  • nomic-embed-text: An embedding model for retrieval, search, and semantic similarity workflows. It is designed only for embeddings, not chat.

Real-World Use Cases of Ollama

Ollama fits best in situations where local control matters. 

  • A developer can use it to build a private coding assistant
  • A team can use it to test prompts without sending internal text to a remote model provider
  • A student can use it to study model behavior, compare open models, and experiment with prompts on their own machine

Ollama API Integration

Ollama exposes a local REST API after installation. By default, the base URL is: http://localhost:11434/api

For Ollama’s cloud-hosted models, the same API pattern is available through: https://ollama.com/api

That split is important. It shows that Ollama can support both local and cloud workflows, but its identity is still rooted in local inference. 

A simple request looks like this:

curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Explain recursion in simple terms."
}'

Key Takeaways

  • Ollama helps you install, run, customize, and integrate models without having to build everything from scratch
  • Ollama is especially useful for people who care about local execution, privacy, repeatable workflows, and hands-on experimentation
  • Ollama does not replace every cloud AI workflow, but it gives developers, learners, and teams a serious local option 

FAQs

1. What is Ollama and its purpose?

Ollama is used to run large language models locally on your machine. It helps developers build AI apps, test models, and work with LLMs without relying on cloud-based services.

2. Is Ollama like ChatGPT?

Not exactly. ChatGPT is a hosted AI service, while Ollama is a tool to run models locally. Ollama lets you use similar models but with more control and privacy.

3. Does Ollama cost money?

Ollama itself is free to use. However, some models or advanced use cases may involve costs depending on how they are sourced or deployed.

4. What models does Ollama support?

Ollama supports models like LLaMA, Mistral, Gemma, and other open-source LLMs. The available models depend on what Ollama provides or supports for local deployment.

5. Is Ollama open source?

Ollama is not fully open source, but it supports running open-source models locally and provides tools for developers to work with them.

Our AI & Machine Learning Program Duration and Fees

AI & Machine Learning programs typically range from a few weeks to several months, with fees varying based on program and institution.

Program NameDurationFees
Applied Generative AI Specialization

Cohort Starts: 13 May, 2026

16 weeks$2,995
Oxford Programme inStrategic Analysis and Decision Making with AI

Cohort Starts: 14 May, 2026

12 weeks$3,390
Professional Certificate in AI and Machine Learning

Cohort Starts: 15 May, 2026

6 months$4,300
Microsoft AI Engineer Program

Cohort Starts: 22 May, 2026

6 months$2,199
Professional Certificate Program inMachine Learning and Artificial Intelligence

Cohort Starts: 25 May, 2026

20 weeks$3,750
Applied Generative AI Specialization

Cohort Starts: 27 May, 2026

16 weeks$2,995
Applied Generative AI Specialization

Cohort Starts: 29 May, 2026

16 weeks$2,995