In this article, you will learn python pandas IP class 12. As per the revised curriculum of CBSE Class 12 Informatics Practices, we will cover DataFrame basics and Creating dataframe in this part. So let us begin:
Topics Covered
Introduction to DataFrame – Python pandas IP class 12
Observe the following picture:
As you are familiar with Pandas – Series in the previous post, DataFrame is another important data structure.
In the data structure, data can be represented as a one-dimensional data structure or two-dimensional data structure. Pandas Series represents one dimension data structure, similarly, DataFrame represents a two-dimensional data structure.
When there is a thought of two dimensions, consider MS Excel as the best example of two-dimensional data representation. It represents data in tabular form in rows and columns.
DataFrame can be divided into two simple words: i) Data and ii) Frame. So we can say that data can be surrounded in a frame of rows and columns. It can store any type of data within the frame. DataFrame is widely used to analyze big data.
In the above image 2D array is represented, which will be determined by m x n, where m=rows and n=cols. So according to the above example, we have a 2D array of 3 x 5 with 15 elements.
In the next section of python pandas IP class 12 we will discuss characteristics of a dataframe.
Characteristics of DataFrame
- DataFrame has two indexes/axes i.e row index & column index
- In DataFrames indexes can be numberes/letters/strings
- DataFrame is a collection of different data types
- DataFrame is value mutable i.e. values can be changed
- DataFrame is also size mutable i.e. indexes can be added or deleted anytime
Now you are familiar with DataFrame, so in the next section of python pandas IP class 12 we will see how to create a dataframe:
Creating DataFrame
To create DataFrame following module should be imported:
import pandas as pd
Syntax:
dfo = pandas.DataFrame(<2D DataStructure>, <columns=column_sequence>,<index=index_sequence>,<dtype=data_type>,<copy=bool>)
Where
dfo refers to an object instantiated variable to DataFrame
pandas refer to instantiated objects imported through import object, generally, pd is an object alias name in programs
DataFrame() is a function that create a DataFrame
2D DataStructure: This is first and mandatory parameter of DataFrame function which can be a list, a series, a dictionary, a NumPy ndarray or any other 2D datasturtcure
columns: It is an optional parameter of DataFrame function that specifies columns used in DataFrame, by default it starts with 0.
index: It is also an optional parameter of the DataFrame function that specifies rows used in DataFrame, by default it starts with 0.
dtype: It specifies datatype of DataFrame elements, it is also an optional part. If dtype is not specified then it accepts none.
Now in next section of python pandas IP class 12 we will see how to create dataframe with various options:
Creating empty DataFrame & Display
To create an empty DataFrame , DataFrame() function is used without passing any parameter and to display the elements print() function is used as follows:
import pandas as pd
df = pd.DataFrame()
print(df)
Creating DataFrame from List and Display (Single Column)
DataFrame can be created using a list for a single column as well as multiple columns. To create a single column DataFrame using a list declare and define a list and then pass that list object to DataFrame() function as following:
import pandas as pd
l =[5,10,15,20,25]
df = pd.DataFrame(l)
print(df)
Ouptut:
Have a look at creating dataframe from list and display them with multiple columns from python pandas IP class 12.
Creating DataFrame from List and Display (Multiple Columns)
Let’s have look at following code that creates multiple columns DataFrame using a list:
import pandas as pd
l=[['Ankit',72,65,78],['Mohit',60,67,65],['Shreya',80,86,83]]
df=pd.DataFrame(l)
print(df)
Output:
In the next section of python pandas IP class 12 I will cover the topic of specifying columns using columns parameter.
Specifying column names
To specify column names use columns parameter and specify the names of columns as following in DataFrame() fuction:
import pandas as pd
l=[['Ankit',72,65,78],['Mohit',60,67,65],['Shreya',80,86,83]]
df=pd.DataFrame(l,columns=['Name','English','Maths','Physics'])
print(df)
Output:
Creating DataFrame from series
As you learned about series in an earlier post, DataFrame can be also created from series. In the following example, two series objects are created to store player statistics in two different series and then DataFrame() function is used, have a look:
import pandas as pd
player_matches = pd.Series({'V.Kohli':200,'K.Rahul':74,'R.Sharma':156,'H.Padya':80})
player_runs=pd.Series({'V.Kohli':95878,'K.Rahul':3612,'R.Sharma':7863,'H.Padya':2530})
df = pd.DataFrame({'Matches':player_matches,'Runs':player_runs})
Output:
Creating DataFrame from Dictionaries
Dictionary objects are also 2D data structures and can be passed to DataFrame() function. Users can create DataFrame from the dictionary of Series and a list of dictionaries.
The following example displays DataFrame created from the dictionary of Series:
import pandas as pd
player_stats={'Name':['V.Kohli','K.Rahul','R.Sharma','H.Pandya'],'Matches':[200,74,156,80],'Runs':[9587,3612,7863,2530]}
df = pd.DataFrame(player_stats)
print(df)
In the next section of python pandas IP class 12 we will discuss Creating dataframe using a list of dictionaries.
Creating DataFrame using a list of dictionaries
List of the dictionary is a list having multiple dictionary objects, if any value is missed in dictionary specification then NaN (Not a Number) will be displayed in the output. Let’s take a look in the following example:
import pandas as pd
players=[{'V.Kohli':107,'K.Rahul':120,'R.Sharma':78,'H.Pandya':30},\
{'V.Kohli':35,'R.Sharma':175,'H.Pandya':58},\
{'V.Kohli':60,'K.Rahul':32,'H.Pandya':30}]
df = pd.DataFrame(players)
print(df)
Output:
Creating DataFrame using nested dictionary
When we are creating a dictionary the key of the dictionary will be considered as a column index. If you want to assign a row index from a dictionary you can use the nested dictionary concept.
Take a look at the following example and observe the output:
import pandas as pd
score={2018:{'Virat Kohli':2345,'Rohit Sharma':2205},
2019:{'Virat Kohli':1987,'Rohit Sharma':1876}}
df=pd.DataFrame(score)
print(df)
Output:
Creating DataFrame from ndArrays
To create DataFrame using ndArrays, nd Array should be created by importing NumPy module. Let’s have a look into the following example:
import pandas as pd
import numpy as np
a = np.array([[10,20,30],[77,66,55]],np.int32)
df = pd.DataFrame(a)
print(df)
Output:
Follow this link to read the questions based on Python Pandas Dataframe:
So here we covered all the concepts given in your revised syllabus for python pandas IP class 12.
Very well explained
Thanks and keep visiting, encourage us by sharing with maximum