On-the-Fly Document Context with RamaLama

On-the-Fly Document Context with RamaLama

Over the past year, RamaLama has been steadily adding new features to improve the developer experience of building and testing retrieval-augmented generation (RAG) applications in containerized environments. One persistent request has been the ability to supply local document context directly to the chat endpoint; without needing to deploy a full RAG pipeline or stand up additional infrastructure.

Developers experimenting with LLMs often want to quickly ask questions about local documentation, code, or reports, or validate that a model can answer queries against real-world project files. In practice, spinning up a vector store or RAG stack just to get a one-off answer slows down iteration and makes debugging harder. In many cases, it’s overkill for the problem at hand.

What’s New

RamaLama now supports a simple --rag flag on the chat CLI. This enables users to inject document context into a model session by passing a file or directory path.

For example:

ramalama chat --rag /path/to/README.md

Attaches the content of the README file to the chat session, allowing you to ask questions like “summarize the readme” or “what is the minimum Python version required by this project?” without additional setup.

The same applies to directories:

ramalama chat --rag ./docs/

The CLI will recursively collect documents (and images, if present), making their content available as direct context to the model for that session. There’s no database, indexing, or persistent state; all context is transient, and nothing leaves the local environment.

Example Use Cases

  • Documentation QA: Validate or summarize local docs before pushing changes.
  • Codebase Exploration: Ask about project structure, dependencies, or code snippets directly from source files.
  • Ad-hoc Analysis: Quickly query research PDFs or meeting notes with zero setup.
  • Multi-modal Preview: Pass images alongside text for basic multimodal questions (with support expanding).

Why This Matters

Most developers prototyping RAG-style features or testing document QA want tight feedback loops. Large language models are increasingly context-hungry, and there’s an expectation that they should “just work” with local files—especially in isolated or containerized environments where privacy and reproducibility matter.

By enabling on-the-fly document context, RamaLama reduces the friction for quick experiments, debugging, and team onboarding. It also serves as a natural stepping stone towards more robust retrieval workflows: developers can start by passing files directly, then later swap in a proper retriever or indexer as needed, all within the same containerized deployment.

Moving to Full RAG Workflows

When your workflow grows beyond ad-hoc questions and you’re ready to build a reusable, production-grade RAG pipeline, RamaLama supports an easy transition. By switching to the ramalama rag command, you can preprocess your documents and package them as an image:

ramalama rag /path/to/docs quay.io/myrepository/ragdata

This command bundles your document index and all retrieval logic into a self-contained container image, which can be reused across environments, shared via container registries, and deployed anywhere your models are running. This approach provides the benefits of a full RAG system—fast retrieval, persistent indexes, and reproducible builds—while keeping the operational simplicity of containerization.

Getting Started

The feature is available now. To use it, just update RamaLama and run:

ramalama chat --rag /path/to/your/docs

This approach is simple, fast, and lets you validate RAG ideas before investing in infrastructure. As always, feedback and issue reports are welcome.