Author: AcrossWP

  • Run AI Locally: Complete Guide to llama.cpp + WordPress

    Set up a powerful, private AI assistant on your WordPress site using local models. No API keys. No subscriptions. Full control.

    Published: April 2026 | 12 min read | For Developers


    Why Run AI Locally? The Privacy-First Revolution

    Every time you use a cloud AI API, your data travels across the internet to someone else’s servers. OpenAI, Anthropic, Google—they all log requests. Your content, your users’ interactions, your business logic—it’s all third-party data.

    What if you could run a powerful AI model on your own hardware, keeping everything private, offline-capable, and completely under your control?

    That’s exactly what llama.cpp makes possible.

    💡 What you’ll learn: This guide covers three deployment scenarios—from your laptop to a GPU rig on your LAN to a production cloud server. Pick the one that fits your setup.

    The Case for Local AI

    • Privacy: Data never leaves your infrastructure
    • Cost: No per-request fees; one-time hardware investment
    • Latency: Sub-100ms inference on modern hardware
    • Ownership: Full control over model behavior and updates
    • Offline: Works without internet (for local deployments)

    The catch? You need to run the models yourself. But that’s where llama.cpp comes in—it makes that easy.


    Installation & Setup

    What is llama.cpp?

    llama.cpp is a lightweight C++ inference engine that runs language models locally. It’s blazingly fast, supports GPU acceleration, and requires minimal dependencies. Think of it as the “Apache for AI models.”

    Install llama.cpp on macOS

    The fastest way is via Homebrew:

    brew install llama.cpp

    Verify the installation:

    llama-server --help

    Install on Linux

    Build from source (takes 5–10 minutes):

    git clone https://github.com/ggml-org/llama.cpp.git
    cd llama.cpp
    cmake -B build
    cmake --build build --config Release

    The resulting binary is at build/bin/llama-server.

    For details, visit the official llama.cpp downloads page.


    Downloading & Choosing Models

    What is GGUF?

    GGUF is the binary format that llama.cpp uses. It’s optimized for inference speed and memory efficiency. Models are hosted on Hugging Face in multiple quantization levels.

    QuantizationSizeQualitySpeedBest For
    Q2_KSmallestLowerFastestDevices with <2GB RAM
    Q4_K_MSmallGoodFastLaptops, modest servers
    Q5_K_MMediumBetterModerateBetter quality, still efficient
    Q8_0LargestBestSlowestHigh-end GPUs, unlimited RAM

    Download a Model

    Install the Hugging Face CLI:

    pip install -U huggingface_hub

    Download TinyLlama (636 MB, great for testing):

    huggingface-cli download \
      TheBloke/TinyLlama-1.1B-Chat-GGUF \
      tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
      --local-dir ~/models \
      --local-dir-use-symlinks False

    Verify the download:

    ls -lh ~/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

    Recommended Starter Models

    ModelSizeUse Case
    TinyLlama 1.1B (Q4_K_M)636 MBTesting, ultra-low resource
    Phi-3 Mini 3.8B (Q4_K_M)2.2 GBFast & practical
    Mistral 7B (Q4_K_M)4.1 GBHigh quality
    Llama 3 8B (Q4_K_M)4.7 GBBest-in-class, needs GPU or 16GB+ RAM

    Scenario 1: Same Machine (Localhost)

    This is the simplest setup. WordPress and llama.cpp both run on your laptop or desktop.

    🖥️ Localhost Setup

    Best for: Local development, prototyping, single-user testing

    Requirements: One machine with enough RAM for your chosen model

    Complexity: ⭐ (Easiest)

    Step 1: Start the llama.cpp Server

    llama-server --models-dir ~/models

    The server will start on http://127.0.0.1:8080. You’ll see output confirming the model loaded successfully.

    Step 2: Open the Web UI (Optional)

    Open your browser to http://127.0.0.1:8080/ and start chatting with your model immediately. This verifies everything works before WordPress integration.

    Step 3: Configure the WordPress Plugin

    1. Go to Settings → AI Provider for llama.cpp
    2. Set the Server URL to http://127.0.0.1:8080
    3. Click Save

    The plugin will auto-detect your available models. WordPress now has local AI powers.

    ✅ What you get: Instant inference on your machine. No internet needed. Responses in 1–3 seconds. Perfect for writing assistance, content generation, and brainstorming.


    Scenario 2: Local Network (LAN)

    Run llama.cpp on one machine (e.g., a dedicated GPU rig) and access it from WordPress on another machine on the same network.

    🌐 Local Network Setup

    Best for: Dedicated inference machine, multi-user teams, leveraging a GPU rig

    Requirements: Two machines on the same WiFi/Ethernet network

    Complexity: ⭐⭐ (Easy, with networking basics)

    Step 1: Start the Server with Network Access

    On the machine with the model, start the server with --host 0.0.0.0 to accept network connections:

    llama-server \
      --models-dir ~/models \
      --host 0.0.0.0 \
      --port 8080

    Step 2: Find the Server’s Local IP

    On macOS:

    ipconfig getifaddr en0

    On Linux:

    hostname -I

    You’ll see something like 192.168.1.50. Note this down.

    Step 3: Test from Your WordPress Machine

    curl http://192.168.1.50:8080/v1/models

    If you get a JSON response listing your models, you’re connected. If not, check:

    • Both machines are on the same network (WiFi/Ethernet)
    • Firewall isn’t blocking port 8080
    • The IP address is correct (try ping 192.168.1.50)

    Step 4: Configure WordPress

    1. Go to Settings → AI Provider for llama.cpp
    2. Set the Server URL to http://192.168.1.50:8080 (replace with your IP)
    3. Click Save

    💡 Pro tip: Assign a static IP in your router settings so the inference machine’s address doesn’t change.


    Scenario 3: Remote Server (Internet)

    Expose llama.cpp to the internet so you can access it from anywhere. This requires a secure tunnel.

    ☁️ Remote Server Setup

    Best for: Cloud servers, production deployments, multi-location teams

    Requirements: Cloud VM (AWS, DigitalOcean, etc.) and a tunnel service

    Complexity: ⭐⭐⭐ (Most involved, but straightforward)

    Step 1: Start with Authentication

    llama-server \
      --models-dir ~/models \
      --host 0.0.0.0 \
      --api-key your-secret-key-here

    ⚠️ Security Critical: Never run a public llama.cpp server without an API key. Anyone on the internet could make requests and consume your resources. Always use --api-key.

    Step 2: Create a Tunnel with Cloudflare (Recommended)

    Install cloudflared:

    On macOS:

    brew install cloudflared

    On Linux (Debian/Ubuntu):

    curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
    sudo dpkg -i cloudflared.deb

    Option A: Quick Tunnel (No Account Needed)

    cloudflared tunnel --url http://localhost:8080

    You’ll get a public HTTPS URL like https://something-random.trycloudflare.com. This URL changes when you restart.

    Option B: Named Tunnel (Stable URL)

    For production, use a named tunnel with a permanent URL. First, sign up for a free Cloudflare account and add your domain.

    Set up (one time):

    cloudflared tunnel login
    cloudflared tunnel create llama
    cloudflared tunnel route dns llama llama.yourdomain.com

    Run it:

    cloudflared tunnel run --url http://localhost:8080 llama

    Now your server is available at https://llama.yourdomain.com—permanently, with automatic HTTPS.

    ✨ Why Cloudflare? Free tier includes unlimited tunnels, automatic HTTPS, DDoS protection, and no bandwidth limits. Perfect for self-hosted AI.

    Step 3: Configure WordPress

    Set the Server URL to your tunnel URL (e.g., https://llama.yourdomain.com or https://something-random.trycloudflare.com).

    🚀 You’re live: Your WordPress site now has remote AI inference. All encrypted with HTTPS.


    Security & Best Practices

    API Keys & Authentication

    • Always use --api-key for remote servers. Without it, anyone can abuse your inference.
    • Use a strong, random key: openssl rand -hex 32
    • Rotate keys periodically (every 90 days in production)
    • Store keys in environment variables or a secrets manager, never in code

    Network Security

    • LAN: Use --host 0.0.0.0 only on private networks. Firewalls should block external access to port 8080.
    • Remote: Always use HTTPS (Cloudflare Tunnel provides this). Never expose HTTP to the internet.
    • Both: Consider a reverse proxy (nginx, Caddy) for rate limiting and additional authentication

    Performance Optimization

    Choose the Right Model Size

    • 1–3B models: 500ms–2s per response (fast interactions)
    • 7–8B models: 2–10s per response (high quality, but slower)
    • 13B+ models: 10–60s per response (production-grade, needs serious hardware)

    For WordPress, start with a 3–7B model. It’s the sweet spot for quality and speed.

    Quantization Impact

    • Q2_K → Fastest speed, lower quality
    • Q4_K_M → Best balance (recommended)
    • Q8_0 → Best quality, slower inference

    Advanced Flags for Speed

    llama-server \
      --models-dir ~/models \
      -t 8 \
      -b 256 \
      -c 2048 \
      --n-gpu-layers 99

    Breakdown:

    • -t 8 — Use 8 CPU threads (adjust based on your core count)
    • -b 256 — Batch size for processing multiple requests
    • -c 2048 — Context window (smaller = faster, but less context)
    • --n-gpu-layers 99 — Offload to GPU if available (orders of magnitude faster)

    Ready to Get Started?

    You now have everything you need to run AI locally on WordPress. No cloud subscriptions. No API costs. Just pure, private, offline-first AI.

    Download the AI Provider for llama.cpp plugin:

    WordPress.org Plugin Directory

  • Turn Off AI Features — A Kill Switch for WordPress AI (Now Live)

    AI is becoming a first-class citizen in WordPress. With new capabilities and the emerging Connectors model, more features are starting to depend on AI being available.

    That’s useful—but not always desirable.

    I’ve released a lightweight plugin that introduces a kill switch for AI in WordPress, giving you full control over whether AI is enabled on your site.

    👉 Plugin: https://wordpress.org/plugins/turn-off-ai-features/


    Why a Kill Switch for AI?

    As AI capabilities expand, so do the scenarios where you may want to disable them:

    • Compliance & privacy requirements
    • Editorial control (no generated content)
    • Performance considerations
    • Consistency across environments (dev/staging/prod)
    • Avoiding unintended third-party integrations

    Instead of disabling features plugin by plugin, you get a centralized control point.


    What This Plugin Does

    Turn Off AI Features acts as a global kill switch.

    At runtime, it hooks into WordPress’ AI capability check and forces it off when enabled.

    add_filter('wp_supports_ai', function ($supported) {
        return get_option('toaif_disable_ai', '0') === '1' ? false : $supported;
    }, 1000);
    

    This ensures:

    • AI is disabled system-wide
    • No need to patch individual plugins
    • Clean, predictable behavior

    Features

    🔌 Global AI Kill Switch

    A simple checkbox located in:

    Settings → General

    Toggle it on, and AI features are disabled across the site.


    ⚙️ WP-CLI Support

    Control the kill switch via CLI:

    wp toaif disable
    wp toaif enable
    wp toaif status
    

    Ideal for:

    • CI/CD pipelines
    • Automated deployments
    • Environment-based toggling

    🪶 Lightweight & Safe

    • No external dependencies
    • No tracking or data collection
    • Fully uses WordPress APIs
    • Clean, prefixed architecture (toaif_)

    Built for the Future (Connectors & AI Integrations)

    With WordPress moving toward AI Connectors, sites will increasingly rely on external AI services.

    This plugin gives you a fail-safe control layer:

    If AI should not run — it won’t.

    No ambiguity. No hidden behavior.


    Compatibility Note

    WordPress 7.0 is not released yet, so the plugin is built with a stable compatibility baseline:

    • Uses safe feature detection (function_exists)
    • Gracefully degrades on older versions
    • Does not break if AI APIs are unavailable

    Who Should Use This

    • Agencies managing multiple client sites
    • Enterprise teams with governance requirements
    • Developers testing AI vs non-AI environments
    • Site owners who want predictable behavior

    Installation

    1. Install from WordPress.org
    2. Go to Settings → General
    3. Enable “Turn off AI features”

    That’s it.


    Final Thoughts

    AI in WordPress is evolving quickly. Having a kill switch is not about rejecting AI—it’s about control, predictability, and flexibility.

    This plugin is intentionally minimal:

    • No bloat
    • No assumptions
    • Just a clean way to turn AI off when needed

    👉 Try it here:
    https://wordpress.org/plugins/turn-off-ai-features/


    If you’re building on top of AI features or planning around the Connectors ecosystem, this gives you a reliable baseline to work from.

  • From “Disable AI Toolkit” to Approval: Lessons from a WordPress Plugin Review

    Submitting a plugin to the WordPress.org directory is rarely a one-shot process. My recent experience building a simple plugin to turn off AI features surfaced a few non-obvious constraints that are worth documenting—especially if you’re working on modern features like AI connectors or core integrations.

    This post walks through what went wrong, what I changed, and how to align with the Plugin Review Team’s expectations without unnecessary back-and-forth.


    The Initial Idea

    The plugin itself is straightforward:

    • Add a toggle in Settings → General
    • Hook into wp_supports_ai
    • Allow CLI control via WP-CLI
    • Provide a simple on/off mechanism for AI features

    Conceptually simple. Practically, the review process surfaced issues in naming, prefixing, and scope clarity.


    Issue 1: Naming Is More Than Semantics

    The original name was:

    Disable AI Toolkit

    This failed for two reasons:

    • “Disable AI” is a saturated pattern
    • “Toolkit” is considered generic padding
    • It implied broader functionality than implemented

    The key takeaway:

    WordPress reviewers evaluate similarity patterns, not just exact matches.

    What worked better

    Instead of trying to tweak the same phrase, I moved to:

    Turn Off AI Features

    It’s:

    • More descriptive
    • Focused on features, not “AI” as a whole
    • Less likely to collide with existing plugins

    Issue 2: Prefixing (This Is Critical)

    This was the biggest technical blocker.

    Problem

    Using something like:

    update_option('disable_ai_toolkit', '1');
    

    This fails because:

    • disable is a common word
    • Not a unique namespace
    • High collision risk

    Solution

    Introduce a distinct prefix:

    update_option('toaif_disable_ai', '1');
    

    Where:

    • toaif = Turn Off AI Features

    Applied consistently across:

    • Options
    • Settings
    • Functions
    • Classes
    • CLI commands

    Issue 3: WP-CLI Namespace Collisions

    Initial command:

    wp ai disable
    

    Problem:

    • ai is too generic
    • Potential collision with future core commands or plugins

    Fix

    wp toaif disable
    

    And:

    WP_CLI::add_command('toaif', 'TOAIF_Disable_CLI');
    

    Issue 4: Slug and Text Domain Coupling

    This is easy to overlook.

    If your slug is:

    turn-off-ai-features
    

    Then:

    Text Domain: turn-off-ai-features
    

    Mismatch here will trigger warnings during review.


    Issue 5: Hardcoded Slug in Hooks

    This pattern is fragile:

    add_filter(
      'plugin_action_links_turn-off-ai-features/turn-off-ai-features.php',
      ...
    );
    

    Better approach

    add_filter(
      'plugin_action_links_' . plugin_basename(__FILE__),
      ...
    );
    

    This ensures:

    • No breakage if slug changes
    • Cleaner implementation

    Final Plugin Architecture

    Core toggle

    add_filter('wp_supports_ai', function ($supported) {
        return get_option('toaif_disable_ai', '0') === '1' ? false : $supported;
    }, 1000);
    

    Settings

    • Stored via register_setting
    • Rendered in General Settings
    • Sanitized to '0' | '1'

    CLI

    wp toaif disable
    wp toaif enable
    wp toaif status
    

    What the Review Team Actually Cares About

    Based on this process, priorities are clear:

    1. Collision Safety

    • Unique prefixes everywhere
    • No generic identifiers

    2. Naming Distinction

    • Avoid “pattern reuse” (e.g., Disable X Toolkit)
    • Prefer clear + specific phrasing

    3. Accuracy of Scope

    • Don’t oversell features in the name

    4. Consistency

    • Slug = text domain
    • Prefix applied everywhere

    Practical Checklist Before Resubmitting

    • Plugin name is distinct, not pattern-based
    • Slug updated and requested via email
    • All options prefixed (toaif_)
    • No generic prefixes (disable_, ai_)
    • CLI namespace is unique
    • Text domain matches slug
    • No hardcoded plugin paths

    Final Thoughts

    The plugin review process is not just about passing checks—it’s about enforcing ecosystem stability at scale.

    Once you align with:

    • naming uniqueness
    • prefix discipline
    • realistic scope

    …the approval process becomes predictable.

    If you’re building around upcoming features like AI connectors or core integrations, getting these fundamentals right early will save multiple review cycles.


    If you’re working on something similar or want to standardize your plugin boilerplate for approval readiness, it’s worth investing in a reusable structure that enforces these rules from day one.

  • Introducing AI Provider for llama.cpp: Local AI for WordPress


    AI is becoming a core part of the WordPress ecosystem, but most solutions today rely on external APIs. That often means recurring costs, latency, and data leaving your server.

    To address this, I’ve released a new plugin:
    👉 https://wordpress.org/plugins/ai-provider-for-llamacpp

    AI Provider for llama.cpp enables WordPress to connect directly to a locally hosted llama.cpp server, allowing you to run AI models without external dependencies.


    Why Use Local AI in WordPress?

    Running AI locally gives you more control and flexibility:

    • No API costs
    • Better data privacy
    • Faster response times (depending on setup)
    • Full control over models and infrastructure

    This plugin bridges WordPress with llama.cpp, making local AI practical inside your site.


    Key Features

    Seamless Integration with WordPress AI Client

    The plugin integrates directly with the WordPress AI Client, making it easy to use AI features within your workflows.

    Works Without API Keys

    For local setups, no API key is required. Just run your llama.cpp server and connect.

    Automatic Model Discovery

    Available models are fetched automatically from your server—no manual setup needed.

    OpenAI-Compatible API Support

    Since llama.cpp uses an OpenAI-compatible API, it fits naturally into existing AI workflows.

    Simple Configuration

    Set your server URL from:
    Settings → llama.cpp

    (Default: http://127.0.0.1:8080)


    How It Works

    The plugin acts as a connector between WordPress and your AI model:

    1. WordPress sends a request via the AI Client
    2. The plugin forwards it to your llama.cpp server
    3. The model processes the request
    4. The response is returned to WordPress

    Getting Started

    1. Install the plugin from WordPress.org
    2. Run your llama.cpp server
    3. Go to Settings → llama.cpp and set your server URL
    4. Check Settings → Connectors to confirm it’s active

    That’s it—you’re ready to use local AI inside WordPress.


    Use Cases

    You can use this plugin for:

    • AI-powered content generation
    • Internal tools with private data
    • Experimenting with local LLMs
    • Reducing dependency on paid AI APIs

    Looking Ahead

    This is the initial release, and there’s more planned:

    • Support for additional providers (like Ollama)
    • Better UI for managing models
    • Performance improvements
    • More developer hooks

    Try It Out

    👉 https://wordpress.org/plugins/ai-provider-for-llamacpp

    If you test it, I’d really appreciate feedback—especially around setup, usability, and compatibility.


    Final Thoughts

    Local AI is becoming increasingly practical, and WordPress is a strong platform to build on top of it.

    This plugin is a step toward making AI:

    • More accessible
    • More private
    • More flexible for developers

    More updates coming soon.

  • The Only Local Dev Tools I Use: LocalWP vs. WordPress Studio

    For a long time, LocalWP has been my go-to tool for local WordPress development. It’s fantastic for setting up big projects, managing databases, and doing a lot of things at once. But recently, I started using WordPress Studio for my smaller projects, and it’s been a game changer.


    What I Use LocalWP For

    LocalWP is like a full-featured workshop. It’s perfect for when I need to work on a big client site. I love how it lets me easily switch between PHP versions, set up an SSL certificate, and get a public link to share my work. The user interface is clean, and the ability to spin up a new site with a single click is a massive time-saver. For any project that needs a lot of different features or has a complex setup, LocalWP is the clear winner.


    What I Use WordPress Studio For

    On the other hand, WordPress Studio is like a quick, lightweight tool. It’s built on WebAssembly, which means it’s incredibly fast. I use it for two main things:

    • Quick tests and experiments: If I want to see if a plugin works with a new version of WordPress or just test a small idea, I can get a site running in a second.
    • Creating new add-ons: When I’m building a new plugin or theme, I use Studio because it’s so simple. I don’t need to worry about complex settings; I can just focus on the code.

    While it’s not as powerful as LocalWP, its speed and simplicity are what make it great for these tasks.


    The Only Tricky Part: The Database

    The one difference that took some getting used to is the database. LocalWP uses MySQL, which is what I’ve used for years. Studio, however, uses SQLite. This is an easy problem to solve because there are great tools available.

    Personally, I use DBeaver because it’s a powerful tool I’m already familiar with. It connects to the SQLite database file and gives me a clean way to view, edit, and manage everything I need.

    I’ve also heard great things about the SQLite Viewer for Studio repo. It was built specifically for this purpose and automatically finds the database file, which is very helpful.

    If you want to try the latest Model Context Protocol (MCP) server and client, you can use the repository at https://github.com/modelcontextprotocol/servers-archived/tree/main/src/sqlite. This allows you to alter, access, and delete the database using simple language commands. Just a heads-up, this repository is archived, so it’s a good idea to use it with caution.

    Both of these tools solve the database problem, so you can pick the one that fits your workflow.


    My Final Verdict

    I don’t think one tool is better than the other. Instead, they work perfectly together. LocalWP is my main tool for big, complex projects, while WordPress Studio is my quick-start tool for small tests and new add-ons. Together, they cover everything I need, making my workflow faster and more efficient than ever.