Using Ollama | Coco Docs

Ollama allows you to run large language models locally on your machine, providing privacy, offline capability, and no API costs. This guide covers everything you need to know about using Ollama with coco.

What is Ollama?

Ollama is a tool that makes it easy to run large language models locally. It provides:

Privacy: Your code never leaves your machine
No API Costs: Run models without paying per request
Offline Capability: Work without internet connection
Performance: Direct access to your hardware (CPU/GPU)
Model Variety: Access to many open-source models

Installation

1. Install Ollama

macOS:

bash

1# Using Homebrew
2brew install ollama
3
4# Or download from https://ollama.ai/

Linux:

bash

1# Install script
2curl -fsSL https://ollama.ai/install.sh | sh
3
4# Or using package managers
5# Ubuntu/Debian
6sudo apt install ollama
7
8# Arch Linux  
9yay -S ollama

Windows:

bash

# Download installer from https://ollama.ai/
# Or use Windows Subsystem for Linux (WSL)

2. Start Ollama Service

bash

1# Start the Ollama service
2ollama serve
3
4# Or run as background service (Linux/macOS)
5sudo systemctl start ollama

3. Pull a Model

bash

1# Recommended models for code generation
2ollama pull qwen2.5-coder:7b      # Best balance of speed/quality
3ollama pull llama3.1:8b           # Good general purpose model
4ollama pull codellama:13b         # Specialized for code
5
6# Smaller models for faster responses
7ollama pull qwen2.5-coder:1.5b    # Very fast, good for simple commits
8ollama pull llama3.2:3b           # Fast and capable
9
10# Larger models for better quality (requires more RAM)
11ollama pull qwen2.5-coder:32b     # Highest quality code model
12ollama pull llama3.1:70b          # Excellent but requires 40GB+ RAM

Quick Setup with Coco

Using `coco init`

The easiest way to configure Ollama with coco:

bash

1# Run the setup wizard
2coco init
3
4# Select "ollama" when prompted for provider
5# Choose from your installed models
6# Wizard will configure everything automatically

Manual Configuration

Create or update your .coco.config.json:

json

1{
2  "service": {
3    "provider": "ollama",
4    "model": "qwen2.5-coder:7b",
5    "endpoint": "http://localhost:11434",
6    "authentication": {
7      "type": "None"
8    }
9  }
10}

Recommended Models

For Code Generation (Recommended)

Qwen2.5-Coder Series (Best for coco):

bash

1ollama pull qwen2.5-coder:1.5b    # 1.5B params - Very fast, 2GB RAM
2ollama pull qwen2.5-coder:3b      # 3B params - Fast, 4GB RAM  
3ollama pull qwen2.5-coder:7b      # 7B params - Balanced, 8GB RAM ⭐ Recommended
4ollama pull qwen2.5-coder:14b     # 14B params - High quality, 16GB RAM
5ollama pull qwen2.5-coder:32b     # 32B params - Highest quality, 32GB RAM

CodeLlama Series:

bash

ollama pull codellama:7b          # 7B params - Good for code, 8GB RAM
ollama pull codellama:13b         # 13B params - Better quality, 16GB RAM
ollama pull codellama:34b         # 34B params - High quality, 32GB RAM

For General Use

Llama 3.1/3.2 Series:

bash

1ollama pull llama3.2:1b          # 1B params - Very fast, 2GB RAM
2ollama pull llama3.2:3b          # 3B params - Fast and capable, 4GB RAM
3ollama pull llama3.1:8b          # 8B params - Excellent balance, 8GB RAM ⭐ Recommended
4ollama pull llama3.1:70b         # 70B params - Top quality, 40GB+ RAM

DeepSeek R1 Series (Latest):

bash

ollama pull deepseek-r1:1.5b     # 1.5B params - Very fast reasoning
ollama pull deepseek-r1:8b       # 8B params - Good reasoning, 8GB RAM
ollama pull deepseek-r1:32b      # 32B params - Excellent reasoning, 32GB RAM

Configuration Options

Basic Configuration

json

1{
2  "service": {
3    "provider": "ollama",
4    "model": "qwen2.5-coder:7b",
5    "endpoint": "http://localhost:11434",
6    "tokenLimit": 2048,
7    "temperature": 0.4,
8    "maxConcurrent": 1,
9    "authentication": {
10      "type": "None"
11    }
12  }
13}

Advanced Configuration

json

1{
2  "service": {
3    "provider": "ollama", 
4    "model": "qwen2.5-coder:7b",
5    "endpoint": "http://localhost:11434",
6    "tokenLimit": 4096,
7    "temperature": 0.3,
8    "maxConcurrent": 1,
9    "maxParsingAttempts": 5,
10    "requestOptions": {
11      "timeout": 120000,
12      "maxRetries": 3
13    },
14    "authentication": {
15      "type": "None"
16    },
17    "fields": {
18      "numCtx": 4096,
19      "numPredict": 2048,
20      "repeatPenalty": 1.1,
21      "topK": 40,
22      "topP": 0.9,
23      "seed": -1,
24      "stop": ["\n\n", "```"]
25    }
26  }
27}

Configuration Parameters Explained

Parameter	Description	Default	Recommended
`model`	Ollama model name	-	`qwen2.5-coder:7b`
`endpoint`	Ollama server URL	`http://localhost:11434`	Default
`tokenLimit`	Max tokens per request	`2048`	`2048-4096`
`temperature`	Randomness (0.0-1.0)	`0.4`	`0.3-0.4`
`maxConcurrent`	Concurrent requests	`1`	`1` (Ollama limitation)
`numCtx`	Context window size	`2048`	`4096`
`numPredict`	Max tokens to generate	`128`	`1024-2048`
`repeatPenalty`	Repetition penalty	`1.1`	`1.1`
`topK`	Top-K sampling	`40`	`40`
`topP`	Top-P sampling	`0.9`	`0.9`

Performance Optimization

Hardware Requirements

Minimum Requirements:

RAM: 8GB (for 7B models)
Storage: 10GB free space
CPU: Modern multi-core processor

Recommended Setup:

RAM: 16GB+ (for better performance)
GPU: NVIDIA GPU with 8GB+ VRAM (optional but faster)
Storage: SSD for faster model loading
CPU: 8+ cores for better inference speed

GPU Acceleration

NVIDIA GPU (CUDA):

bash

1# Ollama automatically uses GPU if available
2# Verify GPU usage
3ollama ps
4
5# Check GPU memory usage
6nvidia-smi

Apple Silicon (M1/M2/M3):

bash

# Ollama automatically uses Metal acceleration
# Monitor with Activity Monitor

AMD GPU (ROCm - Linux only):

bash

# Install ROCm drivers first
# Ollama will detect and use AMD GPU

Model Selection by Hardware

8GB RAM:

json

1{
2  "service": {
3    "model": "qwen2.5-coder:3b"  // or llama3.2:3b
4  }
5}

16GB RAM:

json

1{
2  "service": {
3    "model": "qwen2.5-coder:7b"  // or llama3.1:8b
4  }
5}

32GB+ RAM:

json

1{
2  "service": {
3    "model": "qwen2.5-coder:14b"  // or codellama:13b
4  }
5}

Remote Ollama Setup

Running Ollama on Another Machine

Server Setup:

bash

1# On the server machine
2OLLAMA_HOST=0.0.0.0:11434 ollama serve
3
4# Or set environment variable permanently
5export OLLAMA_HOST=0.0.0.0:11434
6ollama serve

Client Configuration:

json

1{
2  "service": {
3    "provider": "ollama",
4    "model": "qwen2.5-coder:7b",
5    "endpoint": "http://192.168.1.100:11434",
6    "authentication": {
7      "type": "None"
8    }
9  }
10}

Docker Setup

Run Ollama in Docker:

bash

1# CPU only
2docker run -d \
3  -v ollama:/root/.ollama \
4  -p 11434:11434 \
5  --name ollama \
6  ollama/ollama
7
8# With GPU support
9docker run -d \
10  --gpus=all \
11  -v ollama:/root/.ollama \
12  -p 11434:11434 \
13  --name ollama \
14  ollama/ollama
15
16# Pull models
17docker exec -it ollama ollama pull qwen2.5-coder:7b

Troubleshooting

Common Issues

1. Ollama Service Not Running

bash

1# Check if Ollama is running
2curl http://localhost:11434/api/tags
3
4# Start Ollama service
5ollama serve
6
7# Or as system service (Linux)
8sudo systemctl start ollama
9sudo systemctl enable ollama

2. Model Not Found

bash

1# List installed models
2ollama list
3
4# Pull the model if missing
5ollama pull qwen2.5-coder:7b
6
7# Check model name in coco config matches exactly

3. Connection Refused

bash

1# Check Ollama endpoint
2curl http://localhost:11434/api/version
3
4# Verify endpoint in config
5{
6  "service": {
7    "endpoint": "http://localhost:11434"  // Check port
8  }
9}

4. Out of Memory Errors

bash

1# Use smaller model
2ollama pull qwen2.5-coder:3b
3
4# Or reduce context size
5{
6  "service": {
7    "fields": {
8      "numCtx": 2048  // Reduce from 4096
9    }
10  }
11}

5. Slow Performance

bash

1# Check system resources
2htop  # or Activity Monitor on macOS
3
4# Use smaller model for faster responses
5ollama pull qwen2.5-coder:1.5b
6
7# Reduce token limits
8{
9  "service": {
10    "tokenLimit": 1024,
11    "fields": {
12      "numPredict": 512
13    }
14  }
15}

Debugging Commands

bash

1# Check Ollama status
2ollama ps
3
4# Test model directly
5ollama run qwen2.5-coder:7b "Write a commit message for adding authentication"
6
7# Check Ollama logs (Linux)
8journalctl -u ollama -f
9
10# Verbose coco output
11coco commit --verbose

Best Practices

1. Model Selection

Start small: Begin with 3B-7B models, upgrade if needed
Code-specific models: Use qwen2.5-coder or codellama for better code understanding
Match hardware: Don't use models larger than your RAM can handle

2. Configuration Tuning

json

1{
2  "service": {
3    "temperature": 0.3,        // Lower for more consistent commits
4    "maxParsingAttempts": 5,   // Higher for Ollama (less reliable parsing)
5    "tokenLimit": 2048,        // Balance context vs speed
6    "fields": {
7      "numCtx": 4096,          // Larger context for better understanding
8      "repeatPenalty": 1.1     // Reduce repetitive output
9    }
10  }
11}

3. Performance Tips

Keep models loaded: Ollama keeps recently used models in memory
Use SSD storage: Faster model loading and inference
Monitor resources: Watch RAM/CPU usage during inference
Batch operations: Process multiple commits together when possible

4. Privacy and Security

Local processing: All data stays on your machine
No internet required: Works completely offline
Secure by default: No API keys or external services
Audit trail: Full control over model and data

Integration Examples

Team Setup

Shared Model Configuration:

json

1{
2  "service": {
3    "provider": "ollama",
4    "model": "qwen2.5-coder:7b",
5    "endpoint": "http://team-ollama-server:11434",
6    "temperature": 0.3,
7    "tokenLimit": 2048
8  },
9  "conventionalCommits": true,
10  "mode": "interactive"
11}

CI/CD Integration

GitHub Actions Example:

yaml

1name: Generate Commit Messages
2on: [push]
3
4jobs:
5  commit-check:
6    runs-on: ubuntu-latest
7    steps:
8      - uses: actions/checkout@v3
9      
10      - name: Setup Ollama
11        run: |
12          curl -fsSL https://ollama.ai/install.sh | sh
13          ollama serve &
14          sleep 10
15          ollama pull qwen2.5-coder:3b
16          
17      - name: Install Coco
18        run: npm install -g git-coco
19        
20      - name: Generate Commit Message
21        run: coco --verbose commit

Development Workflow

Pre-commit Hook:

bash

1#!/bin/sh
2# .git/hooks/pre-commit
3
4# Generate commit message suggestion
5echo "Suggested commit message:"
6coco commit
7
8echo "Continue with commit? (y/n)"
9read -r response
10if [ "$response" != "y" ]; then
11    exit 1
12fi

Comparison with Cloud APIs

Feature	Ollama	OpenAI API	Anthropic API
Privacy	✅ Local	❌ Cloud	❌ Cloud
Cost	✅ Free	💰 Pay per use	💰 Pay per use
Offline	✅ Yes	❌ No	❌ No
Speed	⚡ Hardware dependent	⚡ Fast	⚡ Fast
Quality	📊 Model dependent	📊 Excellent	📊 Excellent
Setup	🔧 More complex	🔧 Simple	🔧 Simple
Updates	🔄 Manual	🔄 Automatic	🔄 Automatic

Advanced Use Cases

Custom Model Fine-tuning

bash

1# Create custom model for your codebase
2ollama create my-coco-model -f Modelfile
3
4# Modelfile example
5FROM qwen2.5-coder:7b
6PARAMETER temperature 0.3
7PARAMETER top_p 0.9
8SYSTEM "You are a commit message generator for a TypeScript React project. Focus on conventional commits format."

Multi-Model Setup

json

1{
2  "service": {
3    "provider": "ollama",
4    "model": "qwen2.5-coder:7b"
5  }
6}

Switch models based on context:

bash

1# For complex changes, use larger model
2COCO_SERVICE_MODEL=qwen2.5-coder:14b coco commit
3
4# For simple changes, use faster model  
5COCO_SERVICE_MODEL=qwen2.5-coder:1.5b coco commit

This comprehensive guide provides everything needed to successfully use Ollama with coco, from basic setup to advanced configurations and troubleshooting.

What is Ollama?

Installation

1. Install Ollama

2. Start Ollama Service

3. Pull a Model

Quick Setup with Coco

Using coco init

Manual Configuration

Recommended Models

For Code Generation (Recommended)

For General Use

Configuration Options

Basic Configuration

Advanced Configuration

Configuration Parameters Explained

Performance Optimization

Hardware Requirements

GPU Acceleration

Model Selection by Hardware

Remote Ollama Setup

Running Ollama on Another Machine

Docker Setup

Troubleshooting

Common Issues

Debugging Commands

Best Practices

1. Model Selection

2. Configuration Tuning

3. Performance Tips

4. Privacy and Security

Integration Examples

Team Setup

CI/CD Integration

Development Workflow

Comparison with Cloud APIs

Advanced Use Cases

Custom Model Fine-tuning

Multi-Model Setup

Using `coco init`