Introduction
Generative AI, especially the Generative Large Language Models, have taken over the world since their birth. This was only possible because they could integrate with different applications, from generating working programmable codes to creating fully GenerativeAI-managed Chat Support Systems. But most of the Large Language Models in the Generative AI space have been closed to the public; most were not open-sourced. While there do exist a few Open Source models, but are nowhere near the closed-source Large Language Models. But recently, FalconAI, an LLM, was released, which topped the OpenLLM leaderboard and was made Open Sourced. With this model in this guide, we will create a chat application with Falcon AI, LangChain, and Chainlit.
Learning Objectives
- To leverage Falcon Model in Generative AI Applications
- To build UI for Large Language Models with Chainlit
- To work with Inference API to access pre-trained models in Hugging Face
- To chain Large Language Models and Prompt Templates with LangChain
- To integrate LangChain Chains with Chainlit for building UI Applications
This article was published as a part of the Data Science Blogathon.
Table of contents
What is Falcon AI?
In the Generative AI field, Falcon AI is one of the recently introduced Large Language Models known for taking first place in the OpenLLM Leaderboard. Falcon AI was introduced by UAE’s Technology Innovation Institute (TII). Falcon AI’s architecture is designed in a way that is optimized for Inference. When it was first introduced, Falcon AI topped the OpenLLM Leaderboard by moving ahead of state-of-the-art models like Llama, Anthropic, DeepMind, etc. The model was trained on AWS Cloud with 384 GPUs attached continuously for two months.
Currently, it consists of two models, Falcon 40B(40 Billion Parameters) and Falcon 7B(7 Billion Parameters). The main part is that the Falcon AI makers have mentioned that the model will be Open Sourced, thus allowing developers to work with it for commercial use without restrictions. Falcon AI even provides the Instruct models, the Falcon-7B-Instruct and Falcon-40B-Instruct, with which we can quickly get started to build chat applications. In this guide, we will work with the Falcon-7B-Instruct model.
What is Chainlit?
Chainlit library is similar to Python’s Streamlit Library. But the intended purpose of this Chainlit library is to build chat applications with Large Language Models quickly, i.e., to create a UI similar to ChatGPT. Developing conversational chat applications within minutes with the Chainlit package is possible. This library is seamlessly integrated with LangFlow and LangChain(the library to build applications with Large Language Models), which we will do later in this guide.
Chainlit even allows for visualizing multi-step reasoning; it lets us see the intermediate results to know how the Large Language Model reached the output to a question. So you can clearly see the chain of thoughts of the model through the UI itself to understand how the LLM concluded the given question. Chainlit is restricted to a text conversation and allows for sending and receiving Images to and from the respective Generative AI models. It even lets us update the Prompt Template in the UI instead of returning to the code and changing it.
Generating HuggingFace Inference API
There are two ways to work with the Falcon-7B-Instruct model. One is the traditional way, where we download the model to the local machine and then use it directly. But because this is a Large Language Model, it will need high GPU memory to make it work. Hence we go with the other option, calling the model directly through the Inference API. Inference API is a HuggingFace API token with which we can access all the transformer models in the HuggingFace.
To access this token, we need to create an Account in HuggingFace, which we can do by going to the official HuggingFace website. After logging in/signing in with your details, go to your profile and click on the Settings section. The process from there will be
So in Settings, go to Access Tokens. You will create a new token, which we must work with the Falcon-7B-Instruct model. Click on the New Token to create the new token. Enter a name for the token and set the Role option to Write. Now click on Generate to generate our new Token. With this token, we can access the Falcon-7B-Instruct model and build applications.
Preparing the Environment
Before we dive into our application, we will create an ideal environment for the code to work. For this, we need to install the necessary Python libraries needed. Firstly, we will start by installing the libraries that support the model. For this, we will do a pip install of the below libraries.
$ pip install huggingface_hub
$ pip install transformers
These commands will install the HuggingFace Hub and the Transformers libraries. These libraries call the Falcon-7B-Instruct model, which resides in the HuggingFace. Next, we will be installing the LangChain library for Python.
$ pip install langchain
This will install the LangChain Package for Python, which we will work with to create our chat application with the Falcon Large Language Model. Finally, without the UI, the conversational application is not done. So for this, we will be downloading the chainlit library.
$ pip install chainlit
This will install the Chainlit library for Python. With the help of this library, we will be building the UI for our conversational chat application. After installing chainlit, we need to test the package. For this, use the below command in the terminal.
chainlit hello
After entering this command, a new window with the address localhost and PORT 8000 will appear. The UI will then be visible. This tells that the chainlit library is installed properly and ready to work with other libraries in Python.
Creating the Chat Application
In this section, we will start building our application. We have all the necessary libraries to go forward to build our very own conversational chat application. The first thing we will be doing is importing the libraries and storing the HuggingFace Inference API in an environmental object.
import os
import chainlit as cl
from langchain import HuggingFaceHub, PromptTemplate, LLMChain
os.environ['API_KEY'] = 'Your API Key'
- So we start by importing the os, chainlit and langchain libraries.
- From langchain, we have imported the HuggingFaceHub. This HuggingFaceHub will let us call the Falcon-7B-Instruct model through the Inference API and receive the responses generated by the model.
- The PromptTemplate is one of the elements of LangChain, necessary for building applications based on the Large Language Model. It defines how the model should interpret the user’s questions and in what context it should answer them.
- Finally, we even import the LLMChain from LangChain. LLMChain is the module that chains different LangChain components together. Here we will be chaining our Falcon-7B-Instruct Large Language Model with the PromptTemplate.
- Then we store our HuggingFace Inference API in an environment variable, that is, os.environ[‘API_KEY’]
Instruct the Falcon Model
Now we will be inferring the Falcon Instruct model through the HuggingFaceHub module. For this, first, we must provide the path to the model in the Hugging Face. The code for this will be
model_id = 'tiiuae/falcon-7b-instruct'
falcon_llm = HuggingFaceHub(huggingfacehub_api_token=os.environ['API_KEY'],
repo_id=model_id,
model_kwargs={"temperature":0.8,"max_new_tokens":2000})
- First, we must give the id of the model we will work with. For us, it will be the Falcon-7B-Instruct model. The id of this model can be found directly on the HuggingFace website, which will be ‘tiiuae/falcon-7b-instruct’.
- Now we call the HuggingFaceHub module, where we pass the API token, assigned to an environment variable, and even the repo_id, i.e., the id of the model we will be working with.
- Also, we provide the model parameters, like the temperature and the maximum number of new tokens. Temperature is how much the model should be creative, where 1 means more creativity, and 0 tells no creativity.
Now we have clearly defined what model we will be working with. And the HuggingFace API will let us connect to this model and run our queries to start building our application.
Prompt Template
After the model selection, the next is defining the Prompt Template. The Prompt Template tells how the model should behave. It tells how the model should interpret the question provided by the user. It even tells how the model should conclude to give the output to the user’s query. The code for defining our Prompt Template would be
template = """
You are an AI assistant that provides helpful answers to user queries.
{question}
"""
prompt = PromptTemplate(template=template, input_variables=['question'])
The above template variable defines and sets the context of the Prompt Template for the Falcon model. The context here is simple, the AI needs to provide helpful answers to user queries, followed by the input variable {question}. Then this template, along with the variables defined in it, is given to the PromptTemplate function, which is then assigned to a variable. This variable is now the Prompt Template, which will later be chained together with the model.
Chain Both Models
Now we have both the Falcon LLM and the Prompt Template ready. The final part will be chaining both these models together. We will work with the LLMChain object from the LangChain library for this. The code for this will be
falcon_chain = LLMChain(llm=falcon_llm,
prompt=prompt,
verbose=True)
With the help of LLMChain, we have chained the Falcon-7B-Instruct model with our very own PromptTemplate that we have created. We have even set the verbose = True, which is helpful to know what happens when the code is being run. Now let’s test the model by giving a query to it
print(falcon_chain.run("What are the colors in the Rainbow?"))
Here, we have asked the model what the rainbow colors are. The rainbow contains VIBGYOR (Violet, Indigo, Blue, Green, Yellow, Orange, and Red) colors. The output generated by the Falcon 7B Instruct model is spot on to the question asked. Setting the verbose option lets us see the Prompt after formatting and tells us where the chain starts and ends. Finally, we are ready to create a UI for our conversational chat application.
Chainlit – UI for Large Language Models
In this section, we will work with Chainlit Package to create the UI for our application. Chainlit is a Python library that lets us build Chat Interfaces for Large Language Models in minutes. It is integrated with LangFlow and even LangChain, the library we previously worked on. Creating the Chat Interface with Chainlit is simple. We have to write the following code:
@cl.langchain_factory(use_async=False)
def factory():
prompt = PromptTemplate(template=template, input_variables=['question'])
falcon_chain = LLMChain(llm=falcon_llm,
prompt=prompt,
verbose=True)
return falcon_chain
Steps
- First, we start with the decorators from Chainlit for LangChain, the @cl.langchain_factory.
- Then we define a factory function that contains the LangChain code. The code here we need is the Prompt Template and the LLMChain module of LangChain, which builds and chains our Falcon LLM.
- Finally, the return variable must be a LangChain Instance. Here, we return the final chain created, i.e., the LLMChain Instance, the falcon_chain.
- The use_async = False tells the code not to use the async implementation for the LangChain agent.
Let’s Run the Code!
That’s it. Now when we run the code, a Chat Interface will be visible. But how is this possible The thing is, Chainlit takes care of everything. Behind the scenes, it manages the webhook connections, it is responsible for creating a separate LangChain Instance(Chain, Agent, etc) for each user that visits the site. To run our application, we type the following in the terminal.
$ chainlit run app.py -w
The -w indicates auto-reload whenever we make changes live in our application code. After entering this, a new tab gets opened with localhost:8000
This is the opening page, i.e., the welcome screen of Chainlit. We see that Chainlit builds an entire Chat Interface for us just with a single decorator. Let’s try interacting with the Falcon Model through this UI
We see that the UI and the Falcon Instruct model are working perfectly fine. The model can provide swift answers to the questions asked. It really tried to explain the second question based on the user’s context (explain to a 5-year-old). This is the beginning of what we can achieve with these Open Sourced Generative AI models. With little to few modifications, we can be able to create much more problem-oriented, real scenario-based applications.
As the Chat Interface is a website, it is completely possible to host it on any of the cloud platforms. We can containerize the application, then try to deploy it in any container-based services in Google Cloud, AWS, Azure, or other cloud services. With that, we can share our applications with the outside world.
Conclusion
In this walkthrough, we have seen how to build a simple Chat Application with the new Open Source Falcon Large Language Model, LangChain, and Chainlit. We have leveraged these three packages and have interconnected them to create a full-fledged solution from Code to Working Application. We have even seen how to obtain the HuggingFace Inference API Key to access thousands of pre-trained models from the HuggingFace library. With the help of LangChain, we chained the LLM with custom Prompt Templates. Finally, with Chainlit, we could create a Chat Application Interface around our LangChain Falcon model within minutes.
Some of the key takeaways from this guide include:
- Falcon is an Open Source model and is one of the powerful LLm, which is presently at the top of the OpenLLM Leaderboard
- With Chainlit, it is possible to create UI for LLM within minutes
- Inference API lets us connect to many different models present in the HuggingFace
- LangChain helps in building custom Prompt Templates for the Large Language Models
- Chainlit’s seamless integration with LangChain allows it to build LLM applications quicker and error-free
Frequently Asked Questions
A. The Inference API is created by HuggingFace, allowing you to access thousands of pre-trained models in the HuggingFace library. With this API, you can access a variety of models, including Generative AI models, Natural Language Processing Models, Audio Classification, and Computer Vision models.
A. They are. Especially the Falcon 40B(40 Billion Parameters) model. This model has surpassed other state-of-the-art models like Llama and DeepMind and acquired the top position in the OpenLLM Leaderboard.
A. Chainlit is a Python Library that is developed for creating UI. With Chainlit, creating ready-to-work Chat Interfaces for Large Language Models within minutes is possible. The Chainlit Package seamlessly integrates with LangFlow and LangChain, other packages that are worked with to create applications with Large Language Models.
A. Yes. The Falcon 40B(40 Billion Parameters) and the Falcon 7B(7 Billion Parameters) are Open Sourced. This states that anyone can work with these models to create commercial applications without restrictions.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.