The LLMs in Your Home Are Good Enough
Local models with tooling can be just as powerful and useful as their frontier counterparts.
I’m writing this on my Framework Desktop with 128GB of RAM, a consumer desktop that costs about as much as an Apple Macbook Pro. On one monitor is my writing app and on the other is Jarvis, my local assistant written in python. He consumes a massive markdown file where I store all my thoughts, ideas and tasks. He also communicates with Jones, my deep researcher agent, who performs long-running research online. I have another agent in the works called Scrooge who analyzes my finances.
These agents are focused, private and they all run on my desktop. This wasn’t possible 1 - 2 years ago. The models weren’t good enough. The hardware wasn’t good enough. This dramatically changed in the beginning of 2026 and the latest improvements for local models keep coming.
What I’m realizing as I work with these models more is that they simply do not need to be trillion parameter frontier models in order to accomplish the majority of my tasks. In fact, a mediocre model with the right tooling is often better than a frontier model.
How tools are used
When prompted “What are my tasks for today?”, Jarvis —using gemma4:26b— scans the tools accessible to him. Each tool is a small python function that can be used to accomplish a goal. The model knows it needs to find “today” from the list of date blocks in my main markdown file, so it uses fetch_dates tool which returns an array of all my content in a sorted array. Once it finds the date block associated with today, it then calls fetch_tasks_from_dates(today_block) and returns an array of all tasks.
Why do we need to have explicit code to perform this action? Can’t something like Claude or ChatGPT handle this with only prompts? Probably. Frontier models are exceedingly good at complex tasks. When I attempted to do this locally with only prompts, I kept getting inconsistent results. The model would trip up on showing only incomplete or completed tasks because the prompt was requesting tasks across multiple dimensions (state, date). Once I introduced the tools to accomplish the task with code, the local model has never failed since.
Using frontier models for mundane tasks can sometimes feel like driving a Ferrari to get your groceries.
Where it gets very interesting (and dangerous!) is allowing Jarvis to write his own tools. Since python is dynamically interpreted, I could even allow it to execute on the fly tools through eval(). In laymen terms, Jarvis could create the python tool and run it without saving it at all; the code is created and executed on the fly. I have not yet done this mostly because I don’t think the local models are good enough, but I expect that they will hit that capability within the next year.
Multiple models and a classifier
Jarvis is recursive. He also classifies every message I send using a tiny gemma model, then routes accordingly. If I ask a medical question, he will call himself but bypass the default model and select medgemma:27b, an open-source medical model that can answer medical questions and analyze medical imagery. If I ask him a deep reasoning question, he will do the same but call qwen3.6. This allows me to have small, focused models that are excellent at specific tasks.
Why do I need frontier again?
I still use Claude Code for work, but I’m finding myself increasingly uninterested in frontier models for anything outside my professional life. I rarely send personal information out of my local network, so I’ve never been able to use frontier models for things like medical questions or analysis of personal information. Now I can just do all of this with my local setup. Here’s a non-exhaustive list of what I’m currently doing with local models:
Jarvis will use mental models like inversion or second-order thinking to debate me on topics and force me to think through issues I’m dealing with
If I ask a question that requires deeper research, Jarvis will call Jones and wait for them to finish, then present me with the synthesized report
If I add “QUESTION: ….” in my main markdown file, Jarvis will periodically scan the file for any questions that are unanswered, then attempt to find the answer and add it to the file below as “ANSWER: …”
Jarvis is both a CLI tool and a full TUI. I can call him in the command line and get a response, or load a full fledged interface
When I open my mail, I process it with a ScanSnap scanner and discard the physical document. It’s then uploaded to Paperless-ngx and processed using
glm-ocrfor OCT text, categorized and a title set bygemma4:26band then associated with the correct corespondent.
Frontier models will always have their place, especially in enterprise settings, where speed matters and in specific domains where generalized knowledge of everything is important, but for personal knowledge management or routine tasks around the home, I cannot imagine using anything other than the models I have under my control.
Local models will keep getting better and cheaper. Apple shipped a (albeit expensive) laptop this past few months that can run very large models as well as my desktop. The hardware I had to buy now to do this will be obsolete in a year or two. I believe companies like Anthropic and OpenAI are moving aggressively into the enterprise space because they need to lock in contracts with big companies and provide guardrails appealing to businesses; consumers will eventually just rely on local models provided by Google or Apple on their personal hardware without needing to pay a hefty subscription fee to the big AI companies.



