I Built a Local-First VSCode Code Mentor with Gemma 4 — Your Code Never Leaves Your Machine

DEV Community

Enny Rodríguez

May 8, 2026, 05:12 PM

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 Most AI coding tools ask for the same tradeoff: "Give me your code, and I'll give you help." I wanted to try the opposite. What if a coding mentor lived inside VS Code, understood your repository, helped with real developer tasks, and kept your code on your own machine by default? So I built Gemma Local Code Mentor. Gemma Local Code Mentor is a local-first VS Code extension powered by Gemma 4. It can: Explain selected code Suggest refactors Generate tests Summarize files Summarize repository architecture Answer questions about the repo Run through a local FastAPI backend Use Ollama as the default local model runtime Keep Local Only Mode enabled by default No telemetry. I built a VS Code extension plus a Dockerized FastAPI backend for developers who want AI help without sending private code to a remote API. The workflow is simple: Select code in VS Code. Run a Gemma: command. The extension sends context to 127.0.0.1:8765. The backend builds a task-specific prompt. Gemma 4 responds through a local provider. The result appears in a VS Code side panel. The extension currently includes these commands: Gemma: Explain Selection Gemma: Refactor Selection Gemma: Generate Tests Gemma: Summarize File Gemma: Summarize Architecture Gemma: Ask Repository Gemma: Toggle Local Only Mode Gemma: Open Panel This is not just a chat box glued into an editor. The backend has structured prompt builders, response parsing, provider routing, tests, repository context handling, and privacy checks. There are many AI coding assistants now, but the privacy model often feels backwards. For open source code, cloud tools are usually fine. For client code, internal company projects, security-sensitive prototypes, or early startup ideas, uploading code somewhere else can be a blocker. I wanted a coding assistant with different defaults: Feature Typical Cloud Assistant Gemma Local Code Mentor Runs in VS Code Yes Yes Explains code Yes Yes Generates tests Yes Yes Refactors code Yes Yes Sends code to cloud Often No by default Works with local models Usually no Yes Has a local-only switch Rare Yes Can be hacked by contributors Limited Fully open source The goal is not to beat every commercial coding assistant. The goal is to prove that a useful AI coding mentor can be local-first from day one. Suggested demo flow: Open a real code file in VS Code. Select a function. Run Gemma: Explain Selection. Run Gemma: Generate Tests. Ask a repository-level question. Show the side panel with Local Only Mode: ON. Show the backend running locally. Repository: / GemmaLocalCodeMentor Gemma Local Code Mentor Gemma Local Code Mentor is a local-first VSCode extension and Dockerized FastAPI backend for explaining, refactoring, testing, and summarizing code with local Gemma models. What It Does The project runs on the developer's machine: VSCode extension in TypeScript. Local FastAPI backend on 127.0.0.1:8765. Ollama as the default local model runtime. Local sample provider for development and tests without installed models. Double-model routing Fast model for short explanations and lightweight chat. Deep model for refactors, tests, architecture, and larger context. Local Only Mode enabled by default. Architecture flowchart LR A["VSCode Extension"] --> B["FastAPI Backend :8765"] B --> C["Prompt Orchestrator"] B --> D["Repo Context Builder"] B --> E["Local Index Store"] B --> F["Model Router"] F --> G["Fast Gemma Model"] F --> H["Deep Gemma Model"] G --> I["Ollama"] H --> I B --> J["Response Parser"] J --> A Loading Commands gemma.explainSelection gemma.refactorSelection gemma.generateTests gemma.summarizeFile gemma.summarizeArchitecture gemma.askRepo gemma.togglePrivacyMode gemma.openPanel … View on GitHub Direct link: https://github.com/ennydev-2026/GemmaLocalCodeMentor I used Gemma 4 as the reasoning layer behind the local code mentor. The project is designed around two model roles: Gemma 4 E4B for fast tasks like short explanations and lightweight chat Gemma 4 31B Dense for deeper tasks like refactoring, test generation, architecture summaries, and larger context That choice was intentional. A code mentor should not use the largest model for every single request. If I ask what a small function does, I want a fast answer. If I ask for tests, architecture, or a refactor, I want deeper reasoning. So the backend includes a model router: fast mode uses the fast model deep mode uses the deep model auto mode chooses based on task type and context size This makes Gemma 4 feel more like a practical local development tool instead of a single hardcoded model call. flowchart LR A["VS Code Extension"] --> B["FastAPI Backend on 127.0.0.1:8765"] B --> C["Prompt Builders"] B --> D["Repository Context Builder"] B --> E["Model Router"] E --> F["Gemma 4 E4B Fast Model"] E --> G["Gemma 4 31B Dense Deep Model"] F --> H["Ollama"] G --> H B --> I["JSON Response Parser"] I --> A The stack: VS Code extension in TypeScript FastAPI backend in Python Ollama as the default local runtime Docker support Mock provider for development and tests .gemmaignore support Local URL safety checks Backend test coverage with pytest Local-First Is a Product Decision The privacy layer is not just a README promise. The repo includes: Local Only Mode enabled by default Backend URL validation No telemetry No cloud fallback No external API calls while local-only is enabled .gemmaignore for excluding sensitive files Mock mode so contributors can work without installing a model first That matters because local AI changes who can safely use these tools. A freelancer can use it on client code. Backend: cd backend python -m venv .venv source .venv/bin/activate pip install -r requirements.txt uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload Mock mode, no model required: cd backend GEMMA_PROVIDER=mock uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload Extension: cd extension npm install npm run compile Then open the project in VS Code, press F5, and run any Gemma: command. This is where I want the community involved. I would love contributors for: Better repository indexing Smarter prompt templates More language-aware code analysis Inline code actions Diff previews before applying refactors Local embeddings for repo search Better test framework detection llama.cpp provider support MLX provider support A polished marketplace-ready VSIX UI improvements for the side panel If you care about local AI, open models, privacy-respecting devtools, or VS Code extensions, jump in. Fork it. You can install the extension directly in VS Code using this identifier: text ennydev-2026.gemma-local-code-mentor ## Final Thought AI coding tools are becoming part of the daily developer workflow. That means defaults matter. The default should not always be: > "Upload your code first." Sometimes the best place for your code is exactly where it already is: **on your machine.** What would you add to a local-first VS Code code mentor?