Using Gemini CLI with a Local LLM

Introduction to Gemini CLI and Local LLM

Gemini CLI is an open-source AI agent that allows interaction with Gemini models from the terminal. Normally, it connects to Google’s API endpoint, but by redirecting the API destination, you can use a locally running LLM as its backend.

Architecture Overview

The overall architecture involves setting the GOOGLE_GEMINI_BASE_URL environment variable to redirect Gemini CLI’s API requests to an arbitrary endpoint. LiteLLM Proxy exposes Gemini API-compatible endpoints and relays incoming requests to a local model running on Ollama.

Setup

To set up, first install Ollama and pull a model. Then, install LiteLLM and configure the LiteLLM Proxy. Finally, start the proxy and set the environment variables to start Gemini CLI.

Installing Ollama and Pulling a Model

Install Ollama via Homebrew and start it as a service. Pull a model, such as qwen2.5:3b, which supports tool calling in its Ollama template.

Installing LiteLLM

Create a Python virtual environment and install LiteLLM. Configure LiteLLM Proxy by creating a litellm_config.yaml file and defining model_group_alias for the model names used by Gemini CLI.

Configuring LiteLLM Proxy

Start the proxy with the config file. Set the environment variables and start Gemini CLI. You should now be getting responses from the local LLM.

Gotchas During Setup

Missing model_group_alias entries can cause 500 errors. The model must support tool calling in its Ollama template. Monitor the proxy logs for requested model names and add any missing entries to model_group_alias as they appear.

Conclusion

Combining LiteLLM Proxy and Ollama allows you to swap Gemini CLI’s backend to a local LLM. While the setup is relatively straightforward, there are a few things to notice, such as model name changes and tool calling support status of Ollama models.