Building RAG Applications From Scratch: The No-BS Guide for ML Engineers and Data Scientists

GenAI Series Part3 :Hands-On Code and Optimization Tips for Production Systems

May 16, 2025

∙ Paid

Hey folks! Today, we will be diving into the world of Retrieval-Augmented Generation (RAG) - arguably the coolest advancement in AI applications right now. If you've been anywhere near the LLM space, you've heard this term thrown around constantly. But I've noticed a lot of the content out there is either too theoretical, or too scattered, or doesn't give you the practical steps to actually build something useful.

So let's fix that!

In this post, I'll break down what RAG is, why it matters, and most importantly, show you how to build a working RAG system from scratch using open-source tools. We'll create a movie and series recommendation engine, and I'll share every bit of code you need to get it running.

Throughout the post, I'll also suggest alternative tools at each step. As the RAG ecosystem is evolving rapidly, there's often more than one way to implement each component. And it becomes confusing. Knowing your tools can help you customize your approach based on your specific needs, constraints, or preferences.

Also, just for your info, in the last few posts on GenAI, I talked about the GenAI architectural journey , prompt engineering fundamentals, and Vibe Coding. Do take a look at them as well.

Are you ready to level up your LLM skills? Check out the Generative AI Engineering with LLMs Specialization on Coursera! This comprehensive program takes you from LLM basics to advanced production engineering, covering RAG, fine-tuning, and building complete AI systems. If you want to go from dabbling to deployment? This is your next step!

All code for this post can be found in my GenAI repo.

Why RAG Is a Game-Changer

So, let's start with the obvious question: why should you care about RAG?

Think about it - Large Language Models (LLMs) like GPT-4 are great, but they suffer from four major problems:

Knowledge cutoffs - They only know what they were trained on
No access to your data - They can't see your private documents
Hallucinations - They can make up plausible-sounding but completely wrong answers
No citations - They can't tell you where their information came from

"Wait," I hear some of you saying, "but don't models like Gemini and Claude already have search capabilities and citation features?"

That's true! Modern flagship LLMs have evolved to integrate with search engines and can provide citations from web sources. But there are a few ways in which custom RAG still wins:

Private data access - General LLMs with search can only access public web content, not your internal documents, databases, or proprietary information
Customized retrieval - With your own RAG pipeline, you control exactly how the retrieval works, what sources are prioritized, and how context is formatted
Specialized knowledge domains - For technical, medical, legal, or other niche information, a custom RAG with curated sources normally beats a generic web search
Cost control - Running searches through LLMs can get expensive at scale, while custom RAG can be optimized for your specific use case
No connectivity required - A custom RAG system can work entirely offline once set up

What RAG provides us with is a way to "augment" the LLM models’ external knowledge. Instead of the LLM trying to answer everything from its own parameters or general web search, we retrieve relevant information from our own data sources and include it in the prompt.

And, the best thing is that you don't need to fine-tune or train models yourself. This makes RAG incredibly cost-effective compared to full model fine-tuning, while giving you complete control over your information sources.

Core Components of a RAG System

So, when we get into it, a RAG system generally has three main parts:

Document processing pipeline - Ingests, cleans, and chunks your data
Retrieval system - Creates embeddings and finds relevant information
Generation component - Packages everything for the LLM to create the final response

Let's see how these components work together:

Simple concept, but the magic is in the implementation details. So, let's get practical.

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.