← All posts

Building a Personal RAG Chatbot in a Few Days: Learning by Engineering

How I built a small personal RAG chatbot using FastAPI, PostgreSQL, and Docker as a practical engineering exercise.

Building a Personal RAG Chatbot in a Few Days: Learning by Engineering

Recently, I built a small personal Retrieval-Augmented Generation (RAG) chatbot.

It was not a long research project.

It was not something I spent weeks architecting.

And it definitely was not built from years of prior AI engineering experience.

It was a compact engineering exercise built over a few days, mostly through reading, experimenting, and applying engineering fundamentals to a domain I had not worked in before.

That is exactly why I wanted to write about it.

This project reminded me of something I strongly believe:

Engineering is rarely about already knowing the exact technology.
It is about being able to decompose unfamiliar systems fast enough to build useful solutions.

The goal was simple.

I wanted a chatbot that could answer questions using my own technical writing and project documentation instead of relying purely on generic model knowledge.


The Problem

Large language models are powerful.

But they have an obvious limitation when you want domain-specific answers.

If someone asks:

“Tell me about your backend engineering experience”

a general model can generate something plausible.

But plausibility is not accuracy.

I wanted responses grounded in:

  • My project documentation
  • Technical notes
  • Markdown-based writing
  • Structured knowledge I control

This meant I needed retrieval.

Instead of expecting the model to already know my data, I wanted it to fetch relevant context dynamically.

That naturally led to Retrieval-Augmented Generation.


Why RAG?

The obvious alternative was fine-tuning.

At first glance, that sounds attractive.

Train the model directly on your data and let it internalize your knowledge.

But for this use case, it would have introduced unnecessary complexity:

  • Longer experimentation cycles
  • Retraining after updates
  • Higher compute cost
  • Harder debugging

RAG offered something much simpler.

It separates knowledge storage from generation.

That means updating the system is as simple as updating documents and re-indexing.

No retraining required.

For a lightweight personal knowledge system, that architectural simplicity mattered.


System Architecture

The system follows a simple pipeline:

Markdown Documents
Document Parser
Chunking
Embeddings
PostgreSQL Storage
Semantic Retrieval
Prompt Construction
LLM Response

Each layer has one responsibility.

This separation made iteration much easier.

If responses were weak, I could inspect retrieval.

If retrieval was weak, I could inspect chunking.

Keeping concerns isolated made debugging straightforward.


Why FastAPI

Coming from Flask and Django experience, FastAPI felt like the right tool.

It provided:

  • Strong request validation
  • Async support
  • Clean structure
  • Explicit typing

A typical request model looked like:

class ChatRequest(BaseModel):
    message: str
    user_hash: str

FastAPI also made the project easy to organize:

app/
  api/
  services/
  models/
  retrieval/
  config/

That modularity became very useful as the system evolved.


Why PostgreSQL

A common question is:

Why not use a dedicated vector database?

Tools like Pinecone and Weaviate are excellent.

But this project had different priorities.

I wanted:

  • Low operational complexity
  • Minimal infrastructure overhead
  • Familiar tooling
  • Easy deployment

PostgreSQL offered the right balance.

This project was about understanding retrieval mechanics, not building hyperscale search infrastructure.

It reinforced an important engineering lesson:

The best tool is often the simplest one that solves the actual problem.


The Real Challenge: Chunking

One of the most interesting lessons was how important chunking is.

At first, I tried fixed-length chunking.

It worked, but retrieval quality was inconsistent.

Why?

Because semantic meaning often spans logical sections.

Breaking content purely by character count often destroys context.

A better approach was preserving:

  • Section boundaries
  • Paragraph grouping
  • Topic continuity

This dramatically improved retrieval quality.

It quickly became clear that many “model quality” problems are actually retrieval preparation problems.


Retrieval Flow

When a user sends a query, the system follows this process:

1. Validate request

The API receives and normalizes the query.

2. Generate query embeddings

The query is converted into vector representation.

3. Search semantically similar chunks

PostgreSQL retrieves the closest matches.

4. Construct prompt context

Relevant chunks are assembled into the context window.

5. Generate response

The model responds using retrieved context.

Conceptually simple.

Practically, the challenge is tuning each stage well enough that the final context remains useful.


Prompt Engineering

Retrieval alone is not enough.

The model still needs behavioral constraints.

My early prompts were too permissive.

That caused:

  • Unsupported assumptions
  • Overconfident answers
  • Context stretching

The solution was stronger grounding rules:

Answer only using retrieved context.
If information is unavailable, explicitly say so.
Do not fabricate details.

That single change noticeably improved trustworthiness.


Dockerized Deployment

I wanted consistency between local development and deployment.

Docker solved that cleanly.

It provided:

  • Environment reproducibility
  • Dependency isolation
  • Easier deployment workflows

This reduced friction significantly during iteration.


Challenges I Hit

Even for a small project, several practical challenges appeared.

Retrieval tuning

Small chunking changes produced very different behavior.

This required several iterations.

Prompt strictness

Loose prompts made the system sound smarter than it actually was.

Tighter constraints improved reliability.

Deployment details

Reverse proxy routing, container orchestration, and infrastructure edge cases still needed attention.

These are often the least glamorous parts of engineering, but they are what make systems usable.


What This Reinforced

The biggest takeaway was not about RAG itself.

It was about engineering.

This project reminded me that learning unfamiliar domains is usually not about waiting until you feel ready.

It is about:

  • Understanding system boundaries
  • Breaking problems into layers
  • Reading enough to move forward
  • Building fast feedback loops
  • Iterating quickly

I had never built a retrieval-augmented system before.

That did not matter.

The same engineering process applied.

And that is exactly what makes software engineering transferable.


What’s Next

If I continue iterating on this project, I would explore:

  • Hybrid search
  • Reranking layers
  • Streaming responses
  • Conversational memory
  • Retrieval benchmarking
  • Better observability

There is still a lot to improve.

But as a compact engineering exercise, it achieved exactly what I wanted.

It helped me learn by building.


Final Thoughts

This project was intentionally small.

It was built quickly.

It was exploratory.

And that is what made it valuable.

Sometimes the fastest way to learn an unfamiliar technical domain is not endless theory.

It is building something real.

Even if small.

This chatbot was exactly that.

A practical example of learning by engineering.


Source Code

https://github.com/jeyem/personal-chatbot