Jean Galea

AI, Investing, Health, and Building Businesses

  • Start Here
  • AI & Tech
    • AI
    • Tech
    • Modern Web Stack
    • Business
  • Investing
    • Investing Basics
    • Crypto
    • Stocks
    • P2P Lending
    • Real Estate
    • Calculators
    • Dividends
    • FIRE & Early Retirement
    • European Investing Hub
  • Life
    • Essays
    • Barcelona
    • Padel
    • Health & Fitness
    • Hobbies
    • Family
  • About
    • My Story
    • Projects
    • AI Consultancy
  • Blog
  • Community
  • Search

Which Local AI App Do You Actually Need? AnythingLLM, Hermes, LM Studio, and the Rest

Published: June 17, 2026Leave a Comment

Diagram showing local AI tools split into two layers: a surface layer with AnythingLLM, Open WebUI and Hermes sitting on top of an engine layer with Ollama, LM Studio, Jan, GPT4All and Msty

This afternoon I pointed Hermes, an AI agent running on my Mac, at a language model running on the same machine. No API key, no per-token bill. The change took three lines in a config file, and the model answered in under a second.

Setting it up sent me down a rabbit hole. Hermes gets sold with the same line as a dozen other tools: run AI privately, on your own machine. AnythingLLM uses it. So does Ollama, the engine Hermes was now talking to. That line stretches across about eight genuinely different tools, and people keep installing the wrong one because every homepage reads the same.

So here is the map I wish someone had handed me. Once you see it, the whole category stops being a soup of similar-sounding names and becomes two short lists.

The two layers hiding under “local AI”

Every tool in this space sits in one of two layers.

The bottom layer is the engine: it loads the model weights and runs inference on your hardware. The top layer is the surface: the app you actually sit in and work with. Some products are only an engine. Some are only a surface. A few are both, which is exactly why the category feels confusing.

The reason this matters: you usually want to choose one engine and then choose a surface for the job in front of you. Mixing them up leads to running three apps that each bundle their own copy of the same model, or paying for a polished agent product when a free runner would have done the job.

Ollama is the clearest example of a pure engine. It pulls and serves models through a local API and a command line, with barely any interface of its own. That minimalism is the point. It became the substrate that almost everything else points at. When AnythingLLM, Open WebUI, or Hermes connect to “a local model,” they are usually connecting to Ollama.

The engines: tools that actually run the model

These five load weights and do inference on your machine. Four of them also ship a chat window, which is why they get mistaken for surfaces.

Ollama

A headless backend with a command line and a stable local API, including an OpenAI-compatible endpoint that other apps plug into. It runs models through llama.cpp under the hood. There is now a basic graphical app, but the identity is still the engine you share across every other tool. MIT licensed, free. If you only install one thing from this article, install this.

LM Studio

The most polished local runner with a real interface. It runs GGUF models and, on Apple Silicon, Apple’s MLX format, and it exposes its own OpenAI-compatible server so other apps can borrow it. It does light document chat and acts as an MCP client, so a local model can call tools mid-conversation. The catch is the license: LM Studio is closed source. It is free for personal and internal business use, but you cannot resell it or build a hosted service on top of it.

Jan

Effectively the open-source answer to LM Studio. It runs models locally, connects out to cloud providers when you want, and ships under a permissive Apache 2.0 license. If the closed-source part of LM Studio bothers you, Jan is the swap. It leans toward chat and running models rather than heavy document work.

GPT4All and Msty

Two ends of one shelf. GPT4All from Nomic AI is the simple, zero-fuss local runner with private document chat built in, and it is MIT licensed. Its last release as of writing was February 2025, so treat it as low-velocity rather than actively evolving. Msty is the opposite bet: closed source and paid for the good features, but slicker, with knowledge bases and an agent mode. You reach for Msty if you will happily pay for polish, and GPT4All if you want something that just opens and works.

The surfaces: where you actually work

These three connect to an engine or a cloud API. None of them run a model themselves. The difference between them is the job they are built for.

AnythingLLM

The turnkey “private ChatGPT over my documents.” You drop in PDFs and docs, AnythingLLM embeds them into a workspace, and you chat with that knowledge base. It adds a no-code agent builder and full MCP support on top. It connects to Ollama, LM Studio, or any commercial API, runs on Mac, Windows, Linux and Docker, and the desktop app is MIT licensed and free with no account. This is the one to hand a non-technical team that wants a knowledge base without wiring anything together.

Open WebUI

The self-hosted web surface for people who want depth and control. Open WebUI is the most serious document tool of the group, with support for nine vector databases, several extraction engines, and inline document references in chat, plus Python functions you can add as tools. It runs as a self-hosted web app, through Docker or Kubernetes, rather than a native desktop app. It pairs most naturally with Ollama. One honest note: it is often called open source, but its license carries a branding-preservation clause, so it is source-available rather than open source in the strict sense.

Hermes Agent

The odd one out, and the most interesting. Hermes Agent from Nous Research is not a chat-with-docs tool. It is an autonomous agent: it runs persistently, schedules its own work through a built-in cron, spawns subagents for parallel tasks, keeps a memory across sessions, and reaches you through Telegram, Discord, Slack, WhatsApp, and a native desktop app. It connects to any endpoint, including a local model on Ollama, which is the setup I ran this afternoon. MIT licensed and free; Nous sells an optional model subscription separately.

One naming trap worth clearing up: Hermes Agent the app is not the same thing as Hermes 4 the language model. Nous Research makes both. If a friend who knows the Hermes model line hears you say “Hermes,” they will picture the model. The app is a separate product that happens to share the name. This piece is about the app.

The two words that cause most of the confusion

Two terms get stamped on almost every one of these tools, and both have stopped meaning anything useful on their own.

The first is RAG, the document-chat feature where the app reads your files and answers from them. Five of these tools do it now. It has become a checkbox, not a category, so “has RAG” tells you nothing. What separates them is depth. Open WebUI and AnythingLLM are built for real knowledge bases. LM Studio and GPT4All do the lighter version where you attach a file to a message. If serious document work is your goal, that distinction is the whole decision.

The second is “agentic,” which hides two very different things. One is tool-calling inside a chat: the model can reach a calculator or a web search while you talk to it. AnythingLLM, LM Studio, Jan, Open WebUI, and Msty all do versions of this. The other is a true autonomous agent that runs on its own, takes actions on your machine, and keeps working when you are not watching. Only Hermes is built for that. Putting “AnythingLLM has agents” in the same bucket as Hermes overstates the first and undersells the second. They are different scales of the same word.

The whole landscape on one screen

Tool Layer Built for Runs models itself? License
Ollama Engine Headless model backend Yes MIT, free
LM Studio Engine + chat Polished local runner Yes Closed, free to use
Jan Engine + chat Open runner and chat Yes Apache 2.0, free
GPT4All Engine + chat Simple local chat Yes MIT, free
Msty Engine + chat Polished runner, paid features Yes Closed, freemium
AnythingLLM Surface Knowledge base over your docs No MIT, free
Open WebUI Surface Deep self-hosted RAG and tools No Source-available
Hermes Agent Surface Autonomous agent No MIT, free

So which one do you need?

Work it as two decisions, in order.

First, pick your engine. If you want one backend that every other app can share, run Ollama and forget about it. If you would rather have a runner with a good built-in window, use LM Studio, or Jan if you want that experience under a fully open license. Pay for Msty only if polish and built-in knowledge bases are worth the subscription to you. GPT4All is the path of least resistance if you just want a single app that opens and chats, with the caveat that it is barely moving.

Then pick your surface by the job.

If the job is simply chatting with a local model, you may not need a surface at all. The runner’s own window is enough.

If the job is chatting with your own documents or building a knowledge base, install AnythingLLM for the turnkey version or Open WebUI for the deeper self-hosted one, and point either at your engine.

If the job is an agent that works on its own schedule and reaches you in your messaging apps, that is Hermes, pointed at whatever model you like. This is the local-machine cousin of the AI layer I built for myself instead of installing OpenClaw, and it is worth understanding what an AI agent actually is before you wire one up.

A word on the license column, because “open source” is doing a lot of unspoken work across these homepages. Genuinely open, in the standard sense, are AnythingLLM, Jan, Ollama, GPT4All, and Hermes. Open WebUI is source-available with a branding clause. LM Studio and Msty are closed source. If the freedom to fork, self-host without restriction, or build on top matters to you, that line is not a footnote. It is part of the decision, and it is exactly the kind of thing worth checking before you commit, the same way I evaluate any AI tool before paying for it.

One thing this article deliberately leaves out: where the models run when they are not on your machine. That is the cloud side, the hosted GPUs and model marketplaces, and I covered it separately in AI Model Hosting in 2026. Local and cloud are two halves of the same stack.

What I actually run

My own setup is deliberately small. Ollama is the engine, always running, serving one model (Llama 3.1 8B) to anything that asks. As of this afternoon, Hermes is the surface, the agent layer, pointed at that local model instead of a cloud API. That is the whole stack: one engine, one surface.

Making that switch is what made all of this click. Moving Hermes from a cloud model to a local one was three lines: the model name, the provider set to Ollama, and a base URL pointing at the local server. The agent didn’t care. It kept its memory and its tools, and simply stopped sending my data anywhere. The surface and the engine are separate, swappable parts, and once you hold them apart in your head, the eight-tool soup turns into a short shopping list: one engine you like, and a surface for each job you actually have.

If you have been meaning to try local AI and bounced off the wall of identical homepages, start with Ollama today and add exactly one surface for the job you have this week. You can always add the next one when a real need shows up, which is the only good reason to add software at all. The rest of my current toolset lives in my AI tech stack if you want to see how these pieces fit alongside everything else.

Related

Docker container engine logo — the container platform that powers tools like wp-env, DDEV, and DevKinsta for WordPress development
Local WP vs Docker: When to Use Each for WordPress Development
etoro buy orders
Best Stock Trading Apps of 2026 – My Top Picks
Developer workspace with a colorful code editor on a screen lit by ambient blue and red lighting
The Complete Guide to Local WordPress Development and Testing
Oladoctor homepage showing online doctor consultations, prescriptions and medical certificates across Spain, with a Trustpilot 4.8 rating
Healthcare for Expats in Spain: A Real Look at Oladoctor vs Doctoralia, Top Doctors, and the Insurer Apps
Best Mobile Apps to Use in Barcelona in 2026
Ing direct
Best Commission-Free Banks in Spain (Updated 2026)

Filed under: General

About Jean Galea

I build things on the internet and write about AI, investing, health, and how to live well. Founder of AgentVania and the Good Life Collective.

Leave a Reply Cancel reply

Thanks for choosing to leave a comment. Please keep in mind that all comments are moderated according to our comment policy, and your email address will NOT be published. Please Do NOT use keywords or links in the name field.

Jean Galea

Investor | Dad | Global Citizen | Athlete

Follow @jeangalea

  • My Padel Journey
  • Affiliate Disclaimer
  • Cookies
  • Contact

Copyright © 2006 - 2026