Author: AcrossWP

Run AI Locally: Complete Guide to llama.cpp + WordPress

Set up a powerful, private AI assistant on your WordPress site using local models. No API keys. No subscriptions. Full control.

Published: April 2026 | 12 min read | For Developers

Why Run AI Locally? The Privacy-First Revolution

Every time you use a cloud AI API, your data travels across the internet to someone else’s servers. OpenAI, Anthropic, Google—they all log requests. Your content, your users’ interactions, your business logic—it’s all third-party data.

What if you could run a powerful AI model on your own hardware, keeping everything private, offline-capable, and completely under your control?

That’s exactly what llama.cpp makes possible.

💡 What you’ll learn: This guide covers three deployment scenarios—from your laptop to a GPU rig on your LAN to a production cloud server. Pick the one that fits your setup.

The Case for Local AI

Privacy: Data never leaves your infrastructure
Cost: No per-request fees; one-time hardware investment
Latency: Sub-100ms inference on modern hardware
Ownership: Full control over model behavior and updates
Offline: Works without internet (for local deployments)

The catch? You need to run the models yourself. But that’s where llama.cpp comes in—it makes that easy.

Installation & Setup

What is llama.cpp?

llama.cpp is a lightweight C++ inference engine that runs language models locally. It’s blazingly fast, supports GPU acceleration, and requires minimal dependencies. Think of it as the “Apache for AI models.”

Install llama.cpp on macOS

The fastest way is via Homebrew:

brew install llama.cpp

Verify the installation:

llama-server --help

Install on Linux

Build from source (takes 5–10 minutes):

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release

The resulting binary is at build/bin/llama-server.

For details, visit the official llama.cpp downloads page.

Downloading & Choosing Models

What is GGUF?

GGUF is the binary format that llama.cpp uses. It’s optimized for inference speed and memory efficiency. Models are hosted on Hugging Face in multiple quantization levels.

Quantization	Size	Quality	Speed	Best For
Q2_K	Smallest	Lower	Fastest	Devices with <2GB RAM
Q4_K_M	Small	Good	Fast	Laptops, modest servers
Q5_K_M	Medium	Better	Moderate	Better quality, still efficient
Q8_0	Largest	Best	Slowest	High-end GPUs, unlimited RAM

Download a Model

Install the Hugging Face CLI:

pip install -U huggingface_hub

Download TinyLlama (636 MB, great for testing):

huggingface-cli download \
  TheBloke/TinyLlama-1.1B-Chat-GGUF \
  tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
  --local-dir ~/models \
  --local-dir-use-symlinks False

Verify the download:

ls -lh ~/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

Recommended Starter Models

Model	Size	Use Case
TinyLlama 1.1B (Q4_K_M)	636 MB	Testing, ultra-low resource
Phi-3 Mini 3.8B (Q4_K_M)	2.2 GB	Fast & practical
Mistral 7B (Q4_K_M)	4.1 GB	High quality
Llama 3 8B (Q4_K_M)	4.7 GB	Best-in-class, needs GPU or 16GB+ RAM

Scenario 1: Same Machine (Localhost)

This is the simplest setup. WordPress and llama.cpp both run on your laptop or desktop.

🖥️ Localhost Setup

Best for: Local development, prototyping, single-user testing

Requirements: One machine with enough RAM for your chosen model

Complexity: ⭐ (Easiest)

Step 1: Start the llama.cpp Server

llama-server --models-dir ~/models

The server will start on http://127.0.0.1:8080. You’ll see output confirming the model loaded successfully.

Step 2: Open the Web UI (Optional)

Open your browser to http://127.0.0.1:8080/ and start chatting with your model immediately. This verifies everything works before WordPress integration.

Step 3: Configure the WordPress Plugin

Go to Settings → AI Provider for llama.cpp
Set the Server URL to http://127.0.0.1:8080
Click Save

The plugin will auto-detect your available models. WordPress now has local AI powers.

✅ What you get: Instant inference on your machine. No internet needed. Responses in 1–3 seconds. Perfect for writing assistance, content generation, and brainstorming.

Scenario 2: Local Network (LAN)

Run llama.cpp on one machine (e.g., a dedicated GPU rig) and access it from WordPress on another machine on the same network.

🌐 Local Network Setup

Best for: Dedicated inference machine, multi-user teams, leveraging a GPU rig

Requirements: Two machines on the same WiFi/Ethernet network

Complexity: ⭐⭐ (Easy, with networking basics)

Step 1: Start the Server with Network Access

On the machine with the model, start the server with --host 0.0.0.0 to accept network connections:

llama-server \
  --models-dir ~/models \
  --host 0.0.0.0 \
  --port 8080

Step 2: Find the Server’s Local IP

On macOS:

ipconfig getifaddr en0

On Linux:

hostname -I

You’ll see something like 192.168.1.50. Note this down.

Step 3: Test from Your WordPress Machine

curl http://192.168.1.50:8080/v1/models

If you get a JSON response listing your models, you’re connected. If not, check:

Both machines are on the same network (WiFi/Ethernet)
Firewall isn’t blocking port 8080
The IP address is correct (try ping 192.168.1.50)

Step 4: Configure WordPress

Go to Settings → AI Provider for llama.cpp
Set the Server URL to http://192.168.1.50:8080 (replace with your IP)
Click Save

💡 Pro tip: Assign a static IP in your router settings so the inference machine’s address doesn’t change.

Scenario 3: Remote Server (Internet)

Expose llama.cpp to the internet so you can access it from anywhere. This requires a secure tunnel.

☁️ Remote Server Setup

Best for: Cloud servers, production deployments, multi-location teams

Requirements: Cloud VM (AWS, DigitalOcean, etc.) and a tunnel service

Complexity: ⭐⭐⭐ (Most involved, but straightforward)

Step 1: Start with Authentication

llama-server \
  --models-dir ~/models \
  --host 0.0.0.0 \
  --api-key your-secret-key-here

⚠️ Security Critical: Never run a public llama.cpp server without an API key. Anyone on the internet could make requests and consume your resources. Always use --api-key.

Step 2: Create a Tunnel with Cloudflare (Recommended)

Install cloudflared:

On macOS:

brew install cloudflared

On Linux (Debian/Ubuntu):

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
sudo dpkg -i cloudflared.deb

Option A: Quick Tunnel (No Account Needed)

cloudflared tunnel --url http://localhost:8080

You’ll get a public HTTPS URL like https://something-random.trycloudflare.com. This URL changes when you restart.

Option B: Named Tunnel (Stable URL)

For production, use a named tunnel with a permanent URL. First, sign up for a free Cloudflare account and add your domain.

Set up (one time):

cloudflared tunnel login
cloudflared tunnel create llama
cloudflared tunnel route dns llama llama.yourdomain.com

Run it:

cloudflared tunnel run --url http://localhost:8080 llama

Now your server is available at https://llama.yourdomain.com—permanently, with automatic HTTPS.

✨ Why Cloudflare? Free tier includes unlimited tunnels, automatic HTTPS, DDoS protection, and no bandwidth limits. Perfect for self-hosted AI.

Step 3: Configure WordPress

Set the Server URL to your tunnel URL (e.g., https://llama.yourdomain.com or https://something-random.trycloudflare.com).

🚀 You’re live: Your WordPress site now has remote AI inference. All encrypted with HTTPS.

Security & Best Practices

API Keys & Authentication

Always use --api-key for remote servers. Without it, anyone can abuse your inference.
Use a strong, random key: openssl rand -hex 32
Rotate keys periodically (every 90 days in production)
Store keys in environment variables or a secrets manager, never in code

Network Security

LAN: Use --host 0.0.0.0 only on private networks. Firewalls should block external access to port 8080.
Remote: Always use HTTPS (Cloudflare Tunnel provides this). Never expose HTTP to the internet.
Both: Consider a reverse proxy (nginx, Caddy) for rate limiting and additional authentication

Performance Optimization

Choose the Right Model Size

1–3B models: 500ms–2s per response (fast interactions)
7–8B models: 2–10s per response (high quality, but slower)
13B+ models: 10–60s per response (production-grade, needs serious hardware)

For WordPress, start with a 3–7B model. It’s the sweet spot for quality and speed.

Quantization Impact

Q2_K → Fastest speed, lower quality
Q4_K_M → Best balance (recommended)
Q8_0 → Best quality, slower inference

Advanced Flags for Speed

llama-server \
  --models-dir ~/models \
  -t 8 \
  -b 256 \
  -c 2048 \
  --n-gpu-layers 99

Breakdown:

-t 8 — Use 8 CPU threads (adjust based on your core count)
-b 256 — Batch size for processing multiple requests
-c 2048 — Context window (smaller = faster, but less context)
--n-gpu-layers 99 — Offload to GPU if available (orders of magnitude faster)

Ready to Get Started?

You now have everything you need to run AI locally on WordPress. No cloud subscriptions. No API costs. Just pure, private, offline-first AI.

Download the AI Provider for llama.cpp plugin:

WordPress.org Plugin Directory

April 21, 2026

Turn Off AI Features — A Kill Switch for WordPress AI (Now Live)
AI is becoming a first-class citizen in WordPress. With new capabilities and the emerging Connectors model, more features are starting to depend on AI being available.

That’s useful—but not always desirable.

I’ve released a lightweight plugin that introduces a kill switch for AI in WordPress, giving you full control over whether AI is enabled on your site.

👉 Plugin: https://wordpress.org/plugins/turn-off-ai-features/

Why a Kill Switch for AI?

As AI capabilities expand, so do the scenarios where you may want to disable them:
- Compliance & privacy requirements
- Editorial control (no generated content)
- Performance considerations
- Consistency across environments (dev/staging/prod)
- Avoiding unintended third-party integrations
Instead of disabling features plugin by plugin, you get a centralized control point.

What This Plugin Does

Turn Off AI Features acts as a global kill switch.

At runtime, it hooks into WordPress’ AI capability check and forces it off when enabled.
```
add_filter('wp_supports_ai', function ($supported) {
    return get_option('toaif_disable_ai', '0') === '1' ? false : $supported;
}, 1000);
```
This ensures:
- AI is disabled system-wide
- No need to patch individual plugins
- Clean, predictable behavior
Features

🔌 Global AI Kill Switch

A simple checkbox located in:

Settings → General

Toggle it on, and AI features are disabled across the site.

⚙️ WP-CLI Support

Control the kill switch via CLI:
```
wp toaif disable
wp toaif enable
wp toaif status
```
Ideal for:
- CI/CD pipelines
- Automated deployments
- Environment-based toggling
🪶 Lightweight & Safe
- No external dependencies
- No tracking or data collection
- Fully uses WordPress APIs
- Clean, prefixed architecture (toaif_)
Built for the Future (Connectors & AI Integrations)

With WordPress moving toward AI Connectors, sites will increasingly rely on external AI services.

This plugin gives you a fail-safe control layer:

If AI should not run — it won’t.

No ambiguity. No hidden behavior.

Compatibility Note

WordPress 7.0 is not released yet, so the plugin is built with a stable compatibility baseline:
- Uses safe feature detection (function_exists)
- Gracefully degrades on older versions
- Does not break if AI APIs are unavailable
Who Should Use This
- Agencies managing multiple client sites
- Enterprise teams with governance requirements
- Developers testing AI vs non-AI environments
- Site owners who want predictable behavior
Installation
1. Install from WordPress.org
2. Go to Settings → General
3. Enable “Turn off AI features”
That’s it.

Final Thoughts

AI in WordPress is evolving quickly. Having a kill switch is not about rejecting AI—it’s about control, predictability, and flexibility.

This plugin is intentionally minimal:
- No bloat
- No assumptions
- Just a clean way to turn AI off when needed
👉 Try it here:
https://wordpress.org/plugins/turn-off-ai-features/

If you’re building on top of AI features or planning around the Connectors ecosystem, this gives you a reliable baseline to work from.
April 16, 2026
From “Disable AI Toolkit” to Approval: Lessons from a WordPress Plugin Review
Submitting a plugin to the WordPress.org directory is rarely a one-shot process. My recent experience building a simple plugin to turn off AI features surfaced a few non-obvious constraints that are worth documenting—especially if you’re working on modern features like AI connectors or core integrations.

This post walks through what went wrong, what I changed, and how to align with the Plugin Review Team’s expectations without unnecessary back-and-forth.

The Initial Idea

The plugin itself is straightforward:
- Add a toggle in Settings → General
- Hook into wp_supports_ai
- Allow CLI control via WP-CLI
- Provide a simple on/off mechanism for AI features
Conceptually simple. Practically, the review process surfaced issues in naming, prefixing, and scope clarity.

Issue 1: Naming Is More Than Semantics

The original name was:

Disable AI Toolkit

This failed for two reasons:
- “Disable AI” is a saturated pattern
- “Toolkit” is considered generic padding
- It implied broader functionality than implemented
The key takeaway:

WordPress reviewers evaluate similarity patterns, not just exact matches.

What worked better

Instead of trying to tweak the same phrase, I moved to:

Turn Off AI Features

It’s:
- More descriptive
- Focused on features, not “AI” as a whole
- Less likely to collide with existing plugins
Issue 2: Prefixing (This Is Critical)

This was the biggest technical blocker.

Problem

Using something like:
```
update_option('disable_ai_toolkit', '1');
```
This fails because:
- disable is a common word
- Not a unique namespace
- High collision risk
Solution

Introduce a distinct prefix:
```
update_option('toaif_disable_ai', '1');
```
Where:
- toaif = Turn Off AI Features
Applied consistently across:
- Options
- Settings
- Functions
- Classes
- CLI commands
Issue 3: WP-CLI Namespace Collisions

Initial command:
```
wp ai disable
```
Problem:
- ai is too generic
- Potential collision with future core commands or plugins
Fix
```
wp toaif disable
```
And:
```
WP_CLI::add_command('toaif', 'TOAIF_Disable_CLI');
```
Issue 4: Slug and Text Domain Coupling

This is easy to overlook.

If your slug is:
```
turn-off-ai-features
```
Then:
```
Text Domain: turn-off-ai-features
```
Mismatch here will trigger warnings during review.

Issue 5: Hardcoded Slug in Hooks

This pattern is fragile:
```
add_filter(
  'plugin_action_links_turn-off-ai-features/turn-off-ai-features.php',
  ...
);
```
Better approach
```
add_filter(
  'plugin_action_links_' . plugin_basename(__FILE__),
  ...
);
```
This ensures:
- No breakage if slug changes
- Cleaner implementation
Final Plugin Architecture

Core toggle
```
add_filter('wp_supports_ai', function ($supported) {
    return get_option('toaif_disable_ai', '0') === '1' ? false : $supported;
}, 1000);
```
Settings
- Stored via register_setting
- Rendered in General Settings
- Sanitized to '0' | '1'
CLI
```
wp toaif disable
wp toaif enable
wp toaif status
```
What the Review Team Actually Cares About

Based on this process, priorities are clear:

1. Collision Safety
- Unique prefixes everywhere
- No generic identifiers
2. Naming Distinction
- Avoid “pattern reuse” (e.g., Disable X Toolkit)
- Prefer clear + specific phrasing
3. Accuracy of Scope
- Don’t oversell features in the name
4. Consistency
- Slug = text domain
- Prefix applied everywhere
Practical Checklist Before Resubmitting
- Plugin name is distinct, not pattern-based
- Slug updated and requested via email
- All options prefixed (toaif_)
- No generic prefixes (disable_, ai_)
- CLI namespace is unique
- Text domain matches slug
- No hardcoded plugin paths
Final Thoughts

The plugin review process is not just about passing checks—it’s about enforcing ecosystem stability at scale.

Once you align with:
- naming uniqueness
- prefix discipline
- realistic scope
…the approval process becomes predictable.

If you’re building around upcoming features like AI connectors or core integrations, getting these fundamentals right early will save multiple review cycles.

If you’re working on something similar or want to standardize your plugin boilerplate for approval readiness, it’s worth investing in a reusable structure that enforces these rules from day one.
April 16, 2026
Introducing AI Provider for llama.cpp: Local AI for WordPress
AI is becoming a core part of the WordPress ecosystem, but most solutions today rely on external APIs. That often means recurring costs, latency, and data leaving your server.

To address this, I’ve released a new plugin:
👉 https://wordpress.org/plugins/ai-provider-for-llamacpp

AI Provider for llama.cpp enables WordPress to connect directly to a locally hosted llama.cpp server, allowing you to run AI models without external dependencies.

Why Use Local AI in WordPress?

Running AI locally gives you more control and flexibility:
- No API costs
- Better data privacy
- Faster response times (depending on setup)
- Full control over models and infrastructure
This plugin bridges WordPress with llama.cpp, making local AI practical inside your site.

Key Features

Seamless Integration with WordPress AI Client

The plugin integrates directly with the WordPress AI Client, making it easy to use AI features within your workflows.

Works Without API Keys

For local setups, no API key is required. Just run your llama.cpp server and connect.

Automatic Model Discovery

Available models are fetched automatically from your server—no manual setup needed.

OpenAI-Compatible API Support

Since llama.cpp uses an OpenAI-compatible API, it fits naturally into existing AI workflows.

Simple Configuration

Set your server URL from:
Settings → llama.cpp

(Default: http://127.0.0.1:8080)

How It Works

The plugin acts as a connector between WordPress and your AI model:
1. WordPress sends a request via the AI Client
2. The plugin forwards it to your llama.cpp server
3. The model processes the request
4. The response is returned to WordPress
Getting Started
1. Install the plugin from WordPress.org
2. Run your llama.cpp server
3. Go to Settings → llama.cpp and set your server URL
4. Check Settings → Connectors to confirm it’s active
That’s it—you’re ready to use local AI inside WordPress.

Use Cases

You can use this plugin for:
- AI-powered content generation
- Internal tools with private data
- Experimenting with local LLMs
- Reducing dependency on paid AI APIs
Looking Ahead

This is the initial release, and there’s more planned:
- Support for additional providers (like Ollama)
- Better UI for managing models
- Performance improvements
- More developer hooks
Try It Out

👉 https://wordpress.org/plugins/ai-provider-for-llamacpp

If you test it, I’d really appreciate feedback—especially around setup, usability, and compatibility.

Final Thoughts

Local AI is becoming increasingly practical, and WordPress is a strong platform to build on top of it.

This plugin is a step toward making AI:
- More accessible
- More private
- More flexible for developers
More updates coming soon.
April 16, 2026
The Only Local Dev Tools I Use: LocalWP vs. WordPress Studio
For a long time, LocalWP has been my go-to tool for local WordPress development. It’s fantastic for setting up big projects, managing databases, and doing a lot of things at once. But recently, I started using WordPress Studio for my smaller projects, and it’s been a game changer.

What I Use LocalWP For

LocalWP is like a full-featured workshop. It’s perfect for when I need to work on a big client site. I love how it lets me easily switch between PHP versions, set up an SSL certificate, and get a public link to share my work. The user interface is clean, and the ability to spin up a new site with a single click is a massive time-saver. For any project that needs a lot of different features or has a complex setup, LocalWP is the clear winner.

What I Use WordPress Studio For

On the other hand, WordPress Studio is like a quick, lightweight tool. It’s built on WebAssembly, which means it’s incredibly fast. I use it for two main things:
- Quick tests and experiments: If I want to see if a plugin works with a new version of WordPress or just test a small idea, I can get a site running in a second.
- Creating new add-ons: When I’m building a new plugin or theme, I use Studio because it’s so simple. I don’t need to worry about complex settings; I can just focus on the code.
While it’s not as powerful as LocalWP, its speed and simplicity are what make it great for these tasks.

The Only Tricky Part: The Database

The one difference that took some getting used to is the database. LocalWP uses MySQL, which is what I’ve used for years. Studio, however, uses SQLite. This is an easy problem to solve because there are great tools available.

Personally, I use DBeaver because it’s a powerful tool I’m already familiar with. It connects to the SQLite database file and gives me a clean way to view, edit, and manage everything I need.

I’ve also heard great things about the SQLite Viewer for Studio repo. It was built specifically for this purpose and automatically finds the database file, which is very helpful.

If you want to try the latest Model Context Protocol (MCP) server and client, you can use the repository at https://github.com/modelcontextprotocol/servers-archived/tree/main/src/sqlite. This allows you to alter, access, and delete the database using simple language commands. Just a heads-up, this repository is archived, so it’s a good idea to use it with caution.

Both of these tools solve the database problem, so you can pick the one that fits your workflow.

My Final Verdict

I don’t think one tool is better than the other. Instead, they work perfectly together. LocalWP is my main tool for big, complex projects, while WordPress Studio is my quick-start tool for small tests and new add-ons. Together, they cover everything I need, making my workflow faster and more efficient than ever.
September 10, 2025

Author: AcrossWP

Run AI Locally: Complete Guide to llama.cpp + WordPress

Why Run AI Locally? The Privacy-First Revolution

The Case for Local AI

Installation & Setup

What is llama.cpp?

Install llama.cpp on macOS

Install on Linux

Downloading & Choosing Models

What is GGUF?

Download a Model

Recommended Starter Models

Scenario 1: Same Machine (Localhost)

🖥️ Localhost Setup

Step 1: Start the llama.cpp Server

Step 2: Open the Web UI (Optional)

Step 3: Configure the WordPress Plugin

Scenario 2: Local Network (LAN)

🌐 Local Network Setup

Step 1: Start the Server with Network Access

Step 2: Find the Server’s Local IP

Step 3: Test from Your WordPress Machine

Step 4: Configure WordPress

Scenario 3: Remote Server (Internet)

☁️ Remote Server Setup

Step 1: Start with Authentication

Step 2: Create a Tunnel with Cloudflare (Recommended)

Option A: Quick Tunnel (No Account Needed)

Option B: Named Tunnel (Stable URL)

Step 3: Configure WordPress

Security & Best Practices

API Keys & Authentication

Network Security

Performance Optimization

Choose the Right Model Size

Quantization Impact

Advanced Flags for Speed

Ready to Get Started?

Turn Off AI Features — A Kill Switch for WordPress AI (Now Live)

Why a Kill Switch for AI?

What This Plugin Does

Features

🔌 Global AI Kill Switch

⚙️ WP-CLI Support

🪶 Lightweight & Safe

Built for the Future (Connectors & AI Integrations)

Compatibility Note

Who Should Use This

Installation

Final Thoughts

From “Disable AI Toolkit” to Approval: Lessons from a WordPress Plugin Review

The Initial Idea

Issue 1: Naming Is More Than Semantics

What worked better

Issue 2: Prefixing (This Is Critical)

Problem

Solution

Applied consistently across:

Issue 3: WP-CLI Namespace Collisions

Fix

Issue 4: Slug and Text Domain Coupling

Issue 5: Hardcoded Slug in Hooks

Better approach

Final Plugin Architecture

Core toggle

Settings

CLI

What the Review Team Actually Cares About

1. Collision Safety

2. Naming Distinction

3. Accuracy of Scope

4. Consistency

Practical Checklist Before Resubmitting

Final Thoughts

Introducing AI Provider for llama.cpp: Local AI for WordPress

Why Use Local AI in WordPress?

Key Features

Seamless Integration with WordPress AI Client

Works Without API Keys

Automatic Model Discovery