PandasAI: Data Analysis with AI-Powered Simplicity
Learn how to improve your data adventure with AI-powered Pandas.
PandasAI is a super helper. It’s a mix of Pandas, a library that helps with data in Python, and AI, which is artificial intelligence. It has special tricks to make working with data easy. Whether you’re into numbers, words or anything, PandasAI is here to help.
So, get ready as we explore PandasAI and improve your data adventures.
What is PandasAI?
PandasAI is an intelligent helper for pandas, the Python data library. It uses AI to make pandas even better. But remember, it doesn’t replace the original pandas; it’s more like a handy sidekick.
With PandasAI, you can do stuff with your data in an easier way because it understands everyday language, so you don’t have to be a coding expert.
PandasAI is all about making data work smoother and smarter, without replacing what you already know.
Let’s begin with PandasAI
We start by installing and setting up PandasAI using two approaches: the LangChain Model and Direct Implementation.
a. LangChain Model
To use this model, we begin by installing the package.
pip install langchain
Next, we move on to creating a LangChain object, adding our API key and calling the PandasAI.
# instantiate a LangChain object
from pandasai import PandasAI
from langchain.llms import OpenAI
# inserting an API key
langchain_llm = OpenAI(openai_api_key="my-openai-api-key")
pandasai = PandasAI(llm=langchain_llm)
Now, we’re all set to use our PandasAI library with the LangChain model. In the code above, PandasAI will use a LangChain llm (language model) and convert it into a PandasAI llm for us.
b. Direct Implementation
We start by installing the package as well.
pip install pandasai
This command will download and install the Pandas AI package on your computer.
In this guide, we’ll use the Direct Implementation method to work with PandasAI.
Before we move on to importing our libraries and dataset, it’s crucial to get an OpenAI API key. To get this key, create an account with OpenAI, and then click here to generate the key.
Remember to save a copy of the API key because the website won’t allow you to copy it again after the first time.
Importing Libraries and Loading Data
In this section, we begin by importing the traditional Pandas Library. Then, we import the PandasAI library. Afterwards, we load our dataset into a dataframe. Finally, we complete the process by setting up an LLM, adding the API key, and using PandasAI.
# Importing pandas
import pandas as pd
# Importing PandasAI
from pandasai import PandasAI
#Loading Dataframe
df = pd.read_excel('xxxxxxxxxxxxxx', sheet_name= 'xxxxxxxxxxx')
# Instantiating an LLM
from pandasai.llm.openai import OpenAI
# Assigning API key
llm = OpenAI(api_token="INSERT_YOUR_API_KEY_HERE")
# Calling PandasAI
pandas_ai = PandasAI(ll
In this article, we will work with a dataset called “supply_chain_data.” The dataset comprises various attributes related to the fashion and makeup product supply chain. These attributes provide valuable information for understanding the flow of products from suppliers to customers. You can also check out my supply chain analysis project that uses the same dataset by clicking here.
Now that we’ve installed our package and imported the necessary libraries and datasets, let’s see how PandasAI can make data cleaning, exploration, analysis, and visualization easier. Plus, we’ll prove how PandasAI can handle many dataFrames.
Before we continue, it’s important to know that PandasAI provides a data privacy option. You can activate it by instantiating PandasAI with “enforce_privacy = True.” This ensures that only column names, not the actual data, are sent to the LLM.
Data Exploration with PandasAI
In this section, let's see how we can use everyday language to explore our dataset with PandasAI.
First, let’s use a text prompt to check the five-point summary of our data.
pandas_ai(df, prompt='five point summary of the data')
The above code will return:
Next, we’ll use a prompt to check the data type of our dataset.
pandas_ai(df, prompt='data type for each column')
The above code will return:
Lastly, we’ll use a prompt to check the shape of the data.
pandas_ai(df, prompt='what is shape of the data')
The above code will return:
Data Analysis With PandasAI
In this section, we will query our dataset with PandasAI to gain some insights.
Let’s start by writing a text prompt to find the total revenue generated for each product type and round up the results.
pandas_ai(
df, prompt='what is the total revenue generated for
each product type and round up the result'
)
The above code will return:
Next, we check for the total revenue generated by each shipping carrier and round up the result.
pandas_ai(
df, prompt='total revenue generated from each shipping
carriers and round up the result'
)
The above code will return:
Handling Multiple DataFrame with PandasAI
As mentioned earlier, PandasAI has the ability to query many datasets using everyday language. In this section, we’ll use a sample dataset to show how we can do this with PandasAI.
First, let's create our sample data frame.
# DataFrame 1
df1 = pd.DataFrame({
'sales': [100, 200, 300],
'store': ['SuperStore', 'ShopRite', 'JustRite']
})
# DataFrame 2
df2 = pd.DataFrame({
'revenue': [400, 500, 600],
'store': ['SuperStore', 'ShopRite', 'JustRite'],
'location': ['Lagos', 'Delta', 'Sokoto']})
# DataFrame 3
df3 = pd.DataFrame({
'profit': [700, 800, 900],
'location': ['Lagos', 'Delta', 'Sokoto'],
'employees': [10, 15, 20]})
Next, let’s use a text prompt to query our new dataset.
pandas_ai([df1,df2,df3], prompt='How many employee work at ShopRite?')
15
pandas_ai([df1,df2,df3], prompt='What is the location of JustRite store?')
Sokoto
This is valuable for our data analysis when our data are scattered across several data frames.
Data Visualization with PandasAI
It's exciting to know that PandasAI can also generate plots and graphs.
To create plots and graphs with PandasAI, you’ll need a paid API Key. Using a free API Key is likely to lead to a RateLimitError.
Don’t worry, when you create a new account, OpenAI provides a free $5 trial credit. You can use that for experimenting while considering a paid plan. Click here to learn more about the free trial credits.
So, let's plot our first chart that shows the sales based on each product type using PandasAI.
pandas_ai(
df, prompt='Plot a chart showing sales based on
different product types'
)
The above code will return:
Next, let’s plot another chart showing the total revenue generated from shipping carriers.
pandas_ai(
df, prompt='Plot a chart showing total revenue
generated from shipping carriers'
)
The above code will return:
You can save any charts generated in PandasAI by setting the “save_charts” parameter to “True” when you call PandasAI.
pandas_ai = PandasAI(llm, save_charts=True)
Saved charts are found in ./pandasai/exports/charts directory.
Current limitations with using PandaAI
PandasAI has proven to be a valuable tool for enhancing our data tasks. But, it’s important to be aware that using PandasAI does come with some limitations, and these are:
Using PandasAI for sensitive data isn’t advised because it sends data to OpenAI’s servers. Even though there’s an option to protect privacy by not sending the head of the dataFrame to the servers, there are still worries about potential privacy issues.
Using PandasAI for large dataFrames is not a good idea. The tool sends a copy of your dataFrame to the cloud for processing, which can slow down your work and use up a lot of resources when dealing with big datasets.
Using PandasAI isn’t free and extensive usage could lead to high costs.
Conclusion
PandasAI is a tool that mixes Pandas and AI to make working with basic data tasks a breeze. It’s great for using everyday language to explore, analyze, and create visuals with your data. But, be careful with sensitive data because it might raise privacy concerns. Also, be mindful that using it for big datasets can be costly and need a lot of resources. Despite its limits, PandasAI can be handy for folks who aren’t experts in coding, making data tasks more manageable.