Get Started With LangChain
If you are a developer with FOMO with the current AI wave because you don’t know about collecting data and training your own “AI” model then you don't have to keep loitering because you don’t need to reinvent the wheel!
Hello LangChain! 👋
Langchain is a framework designed to simplify creating and managing applications that use language models. It helps developers build complex applications by combining components and functionalities like input processing, model querying, and output handling.
If you got perplexed with this definition, it’s not your fault, it’s ChatGPT. Don’t worry you have me 😉
Let’s take the pragmatic approach
So, You are the Tech lead of a big car company. You want to integrate a chatbot on your website which can help people make better decisions as to what car they should buy. How would you make this ChatBot useful?
You can use the ChatGPT API but that won’t give you the kind of results you want because it may or may not have detailed information about your cars.
So now the way to go is, you assemble a team to collect all the data about your cars in PDF format and then tell them to train an AI model that will take in all the data and provide meaningful responses.
This may sound simple, but it’s NOT. So what do you do then?
Here comes LangChain. You ask the sales team to give you all data about the cars in PDF format and instead of training your model, you can just use any model from Meta, Google, or OpenAI. You give it your data, give it a prompt and it will respond to you based on the data you have given it. Sounds easy right? Yes compared to the first approach this is super easy. Now you just saved months of work and tons of cash.
With this example, you must have gotten an idea of what LangChain is all about. It’s a framework that aims to help developers create their own AI solutions based on Language Models. The chain part of the work means that you can do multiple tasks like fetching data, processing data, handling prompts, doing some actions, and showing outputs.
Okay so now that the basic idea and use case are over let’s move on to some concepts you need to get started.
Concepts
Models: Models are the LLMs (Large Language Models) that we use in the application to provide relevant results from the data.
e.g. ChatGPT Turbo, Llama2, Gemma
Chains: Chains are nothing but the sequence of operations that handle tasks step by step.
Prompts: Prompts are the specific instructions or questions that are given to the model to get the desired response. This is the same as when you ask ChatGPT or Gemini some questions or instruct them to do some task.
Agents: Agents are components within Langchain that can make decisions on what actions to take based on user input and the context. A simple example can be, an agent can decide if to look up the data from a web source if it’s not present in the given PDFs.
Embeddings and Vector Stores: This is one of the most important concepts in Langchain. Embeddings are numerical representations of text or other data types that capture their semantic meaning. They convert words, sentences, or documents into high-dimensional vectors (lists of numbers) that language models can process and compare. Vector Stores are specialized databases designed to store and manage these embeddings. They allow for efficient searching and retrieval of vectors based on their similarity.
These concepts become more clear once you see them in action.
What is the flow of a Langchain Application?
At first glance, you can see a lot of new terminology here. Let’s break it down one by one without making it too baroque.
RAG
Retrieval Augmented Generation is a system that smartly stores and retrieves relevant information based on the requirements of the user.
The first step in RAG is to provide it with data, this is the Load Data step, where we can give it PDF, Word docs, code, or whatever is relevant. It will break the data down into small chunks and convert it into High-Dimensional vectors, this process is called Embedding. All this information is stored in a Vector Store, a Database that can store and manage these High Dimensional Vectors efficiently. This Vector store also stores the Embedding related to Queries of the user which we referred to as Query Vector Store. Now if we require any information then we use a Retriever to get that relevant information from Vector Store. The retriever then passes on the user inquiry it has and the data it got from RAG to the LLM along with the prompt that we have used. The LLM then processes all the information taking the data from the vector store as context and the prompt as instruction and processes the information to generate an adequate response.
Conclusion
With this article, I meant to familiarize you with the concept of LangChain and what is the general workflow of this framework. I hope that this article has taught you something new and useful. If this article was able to add any value then give this a clap. Thanks for your time!
If you found LangChain useful and want to get hands-on practice then I would suggest you look at this free playlist on YouTube