This article consists of the Chapter-wise term 1 class 12 IP revision notes as a comprehensive guide. There are two units for the Term I syllabus.
Topics Covered
Distribution of Marks
Unit No | Unit Name | Marks |
1 | Data handling using pandas and data visualization | 25 |
2 | Database Query using SQL | 25 |
3 | Introduction to Computer Neworks | 10 |
4 | Societal Impacts | 10 |
Total | 70 |
So let’s start with this unit 1 data handling using pandas for Term 1 Class 12 IP revision notes:
Data handling using pandas
- Python supports number of libraries to deal with data
- Python libraries provide python modules provides basically written in C, to access the number of functions for I/O operations, complex problem-solving modules, data science and interface design for GUI applications etc.
- In other words, libraries are collections of modules and packages that fulfil specific needs or applications.
- Some commonly used libraries are python standard library, NumPy library, SciPy library, Tkinter library, Pandas library, Matplotlib library etc.
Introduction to Pandas – Term 1 Class 12 IP revision notes
- Pandas stand for (PANel DAta System)
- It was developed by Wes McKinney
- It is open-source python library that makes data science or data analysis easy and effective
- It provides the flexible and powerful functions and properties for 1D and 2D data structure
- It provides high-performance data analysis tools
- It is used in major fields like academic, commercial such as finance, economics, statistics and analytics etc.
Difference between NumPy and Pandas – Term 1 Class 12 IP revision notes
Key Point | NumPy | Pandas |
Data | Requires homogeneous data | Can have heterogeneous data |
Effectiveness | NumPy is very effective for same kind of collection | provides a simple interface for operations like select, access, plot, join and group by function |
Kind of data | It is a handy tool for numeric data | It is a handy tool for data processing in the tabular form of data |
Memory | Consumes less memory | Consumes more memory |
Indexing | Indexing is very quick | Indexing is slow compared to NumPy |
Features of Pandas – Term 1 Class 12 IP revision
- Efficient to read different types of data like integer, float, double etc.
- In a data frame rows and columns can be added, deleted or modified anytime
- Support group by, aggregate functions, joining, merging
- Capable to pull data from MySQL database and CSV files and vice-versa
- Can extract data from large data set and combine multiple tabular data structures in a single unit
- Can find and fill missing data
- Reshaping and reindexing can be done in various forms
- Can be used for future prediction from received data
- Provides functions for data visualization using matplotlib and seaborn
Installing Pandas – Term 1 Class 12 IP revision
- The installation can be done in pandas using pip command.
- Open cmd prompt to use pip commands
- The following commands can be useful for installation with pip installer:
- Checking whether pandas is installed or not – pip list
- Installing pandas – pip install pandas
- To uninstall pandas – pip uninstall pandas
Importing Pandas for a program
To import pandas follow this command
import pandas as pd
Data Structures in Pandas – Term 1 Class 12 IP revision notes
- The way of storing, organizing, and maintaining data for appropriate applications is known as a data structure
- Can help in extracting information easily
- Pandas provide the following data structures:
- Series:
- It is a 1-dimensional data structure
- Stores homogeneous data
- It is data mutable and sizes immutable data structure
- Dataframe:
- It is 2 dimensional data structure
- Stores heterogeneous data
- It is data mutable as well as size mutable
- Panel
- It is a 3-dimensional data structure
- Series:
Working with series
- A set of ordered dictionaries with associated indexes and values.
- An associated index refers to the numeric position starting with 0
- Users can also assign values or labels for the index in series
- The series() function is used to create series
- Syntax:
import pandas as pd
<series_object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
Parameters
- data :
- sequences, array, iterable, dictionary, list, or scalar value
- Contains data stored in Series.
- If data is a dict, argument order is maintained.
- indexarray-like or Index (1d)
- Values must be hashable and have the same length as data.
- Non-unique index values are allowed.
- Will default to RangeIndex (0, 1, 2, …, n) if not provided.
- If data is dict-like and index is None, then the keys in the data are used as the index.
- If the index is not None, the resulting Series is reindexed with the index values.
- dtypestr, numpy.dtype, or ExtensionDtype, optional
- Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usage.
- namestr, optional
- The name to give to the Series.
- copybool, default False
- Copy input data. Only affects Series or 1d ndarray input.
Creating series empty series
import pandas as pd
s=pd.Series()
print(s)
Creating series using a sequence (List)
import pandas as pd
s=pd.Series([23,45,67,78])
print(s)
Creating series using a sequence (List) and Assigning index
import pandas as pd
s=pd.Series([23,45,67,78],index=['JAN','FEB','MARCH','APRIL'])
print(s)
Creating series using multiple lists
import pandas as pd
sales=[23,45,67,78]
mon=['Jan','Feb','Macrh','April']
s=pd.Series(data=sales, index=mon)
print(s)
Creating series using range() function
import pandas as pd
s=pd.Series(range(5))
print(s)
Creating series using range and assigning index using for loop
import pandas as pd
s=pd.Series(range(5), index=[i for i in 'pqrst'])
print(s)
Creating series using missing values
import pandas as pd
import numpy as np # for NaN value
s=pd.Series([23,np.NaN,67,np.NaN])
print(s)
Creating series using scalar value
import pandas as pd
s = pd.Series(7,range(5))
print(s)
Creating series using numpy array
import pandas as pd
import numpy as np
ar=np.array([22,33,44,55])
s=pd.Series(ar)
print(s)
Creating series using dictionary
import pandas as pd
d={'Sachin':45,'Kapil':67,'Bhavin':89,'Mahesh':78}
s=pd.Series(d)
print(s)
Creating series using mathematics expression
import pandas as pd
import numpy as np
ar=np.arange(11,16)
s=pd.Series(ar,index=ar*3)
print(s)
Select Access elements of a series
There are certain ways to select and access elements of a series. The most popular ways are indexing and slicing.
Indexing
- can be used with series with label index or positional index
- the positional index always starts with 0 and labelled index will be the index assigned by user
- Example of positional index
import pandas as pd
s=pd.Series([45,67,87,11,23])
#accessing the positional index
print(s[1])
#accessing multiple index with positional index
print(s[[1,3]])
- Example of labelled index
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
#accessing the single label index
print("Accessing the single label index:",s['Feb'])
#accessing multiple indexes with labelled index
print("Accessing multiple indexes with labelled index",s[['Feb','Mar']])
Changing the index using reset_index() function
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
s.reset_index(inplace=True,drop=True)
print(s)
Accessing series using Slicing
- used to extract elements from the series
- slice can be done using [start:stop:step]
- it will return the n-1 values from the series when positional indexes are used
- it will return all the values from series when labelled indexes are used
- Example
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
print("Example 1 with position slicing, excludes the value at the 4th index")
print(s[1:4])
print("Example 2 with label slicing, includes all the labels")
print(s['Jan':'Apr'])
print("Example 3 Reverse order")
print(s[::-1])
Modifying values using Slice
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
print("Example 1 with position slicing, excludes the value at the 4th index")
s[1:3]=30
s['Jan':'Apr':3]=20
print(s)
Attributes of Series
- The attributes are also known as properties
- The syntax of accessing attributes/perperties are as following
- <series_object>.properties
Properties (Attribute) | Use | Example |
index | Returns the index of the series | s.index Ouptut: Index([‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’, ‘May’], dtype=’object’) |
name.name | Assigns a name to the index | s.index.name=’Month’ Output: Month Jan 45 Feb 67 Mar 87 Apr 11 May 23 dtype: int64 |
name | Assigns a name to the series | s.name=’Monthly Data’ print(s.name) Output: Monthly Data |
values | Returns the list of values from the series | s.values Output: [45 87 67 11 23] |
dtype | Returns the data type of the size | s.dtype Output: int64 |
shape | Returns the number of rows and columns in tuple form, as series is only 1D data structure so it returns the only number of rows | s.shape Output: (5,) |
nbytes | Returns the no. of bytes from the series | s.nbytes Output: 40 |
ndim | Returns the dimension of given series, which is always 1 | s.ndim Output: 1 |
size | Returns the number of elements from the series | s.size Output: 5 |
itemsize | Returns the size of the specified item from the series | s[2].itemsize Output: 8 |
hasnans | Returns True if the series contains NaN value | s.hasnans Output: False s=pd.Series([2,None,3]) s.hasnans Output: True |
empty | Returns True if the series is empty | s.empty Output: False s=pd.Series() s.empty Output: True |
Methods of Series
Certain methods are used for series manipulations. The methods required some parameters to be passed int to the brackets. These methods are as follows:
Method | Use | Example |
head() | Returns top 5 rows from the series by default otherwise specified rows from series. | s.head(2) Output: Jan 45.0 Feb 67.0 Name: Monthly Data, dtype: float64 s.head() Output: Jan 45.0 Feb 67.0 Mar 87.0 Apr NaN May 23.0 Name: Monthly Data, dtype: float64 |
count() | Count the Non-NaN values in the series | s.count() Output: 4 |
tail() | Returns bottom 5 rows from the series by default otherwise specified rows from the series. | s.tail(2) Output Apr NaN May 23.0 Name: Monthly Data, dtype: float64 s.tail() Output Jan 45.0 Feb 67.0 Mar 87.0 Apr NaN May 23.0 Name: Monthly Data, dtype: float64 |
len() | This function is used to return the length of the given series. | len(s) Output: 5 |
Mathematical Operations
- The mathematical operations such as add, subtract, multiply and division can be performed on multiple series.
- While performing mathematical operations series must be matched.
- All missing values or mismatched values will be filled by NaN.
- Example:
import pandas as pd
s=pd.Series([45,67,87,None,23],index=['Jan','Feb','Mar','Apr','May'])
s1=pd.Series([21,22,23,24,25],index=['Jan','Mar','Apr','May','June'])
s2=s+s1
print(s2)
The calculation will be done as follows:
index | s | s1 | s + s1 |
Jan | 45 | 21 | 66 |
Feb | 67 | NaN | NaN |
Mar | 87 | 22 | 109 |
Apr | NaN | 23 | NaN |
May | 23 | 24 | 47 |
June | NaN | 25 | NaN |
Output:
Apr NaN
Feb NaN
Jan 66.0
June NaN
Mar 109.0
May 47.0
dtype: float64
- add() function can be also used for the addition.
- Add function also supports fill_value parameter to fill the NaN value.
- Example: s=s.add(s1,fill_value=0)
Now in the next section of Term 1 Class 12 IP revision notes we will see the data frame portion.
DataFrame
- It is 2D data structure of Pandas.
- It processes the data in tabular form.
- It is having row indexes and column labels.
- Each column consists of a different data type of values.
- pd.DataFrame() method is used to create a data frame.
- The syntax to create dataframe is as following:
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
- Parameters
- data: ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion order. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order.
- index: Index or array-like, Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index is provided.
- columnsIndex or array-like:Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
- dtypedtype, default None: Data type to force. Only a single dtype is allowed. If None, infer.
- copybool or None, default None, Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False.
Creating Empty DataFrame
import pandas as pd
df=dp.DataFrame()
Output:
Empty DataFrame
Columns: []
Index: []
Creating data frame using numpy array
import pandas as pd
import numpy as np
a1=np.array([11,22,33,44])
a2=np.array([20,30,40,50])
a3=np.array([2,4,6,8])
cols=['A','B','C','D']
df=pd.DataFrame([a1,a2,a3],columns=cols)
print(df)
It will assign default index columns for the dataframe in cols list not created.
Creating dataframe using list of dictionaries
import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
print(df)
Creating dataframe using dictionary of lists
import pandas as pd
df=pd.DataFrame({'Team':['Australia','India','England'],'Rank':['II','I','III'],'Points':[123,137,120]})
print(df)
Creating dataframe from series
import pandas as pd
df={'KL Rahul':pd.Series([2,21,48],index=['Pak','NZ','AFG']),
'Rohit Sharma':pd.Series([0,17,53],index=['Pak','NZ','AFG']),
'Virat Kohli':pd.Series([57,32,10],index=['Pak','NZ','AFG'])}
pint(df)
Iteration on dataframe
- Iteration can be done in two ways: iterate over rows, iterate over columns
- Pandas provided two functions for iteration: iterrows, iteritems
Iterating by rows
import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
for (ri,s) in df.iterrows():
print("~"*50)
print("City:",ri)
print("~"*50)
print("\nTemprature Record:")
print(s)
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
City: Ahmedabad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temprature Record:
Mon 30
Tue 40
Wed 44
Name: Ahmedabad, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
City: Baroda
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temprature Record:
Mon 41
Tue 28
Wed 42
Name: Baroda, dtype: int64
Iterate over dataframe by columns
import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
for (ci,s) in df.iteritems():
print("~"*50)
print("Day:",ci)
print("~"*50)
print("\nTemprature Record:")
print(s)
Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Mon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temprature Record:
Ahmedabad 30
Baroda 41
Name: Mon, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Tue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temprature Record:
Ahmedabad 40
Baroda 28
Name: Tue, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Wed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temprature Record:
Ahmedabad 44
Baroda 42
Name: Wed, dtype: int64
Follow this link to read comprehensive notes on
Now in the next section of Term 1 Class 12 IP revision notes we are going to discuss some operations on dataframe. These operations are add/delete or insert/remove rows/columns from dataframe, select/access rows/columns from dataframe, rename, head and tail function, indexing using labels and boolean indexing.
Add columns in dataframe
#Method 1 – Specify the column label doesn’t exist
- To add a column, use column lable by assigning data to the new column
- Example
df['Thurs']=[33,32]
Output
Mon Tue Wed Thurs
Ahmedabad 30 40 44 32
Baroda 41 28 42 41
If the column label is given which is already present in the dataframe then it will update the data in the same column.
Add multiple Columns
df[['Thurs','Fri']]=[[32,41],[33,43]]
#Method 2 Insert method
Syntax: df.insert(index, column_label, data, allow_duplicate)
Parameters:
- index: the index where column is going to be inserted
- column_label: the column label which is going to be inserted
- data: data is going to be inserted, provide the list of values for multiple or all rows get the fix value
- allow_duplicate: allows to insert the duplicate values or not, values can be True or False
Example:
df.insert(2,"Thurs",[20,22],False)
Output:
Mon Tue Thurs Wed
Ahmedabad 30 40 20 44
Baroda 41 28 22 42
#Method 3 assign() function
- assign() function will create a new dataframe with newly added column
- take a new dataframe object and initialize it with new columns and data as parameters for assigned function
Syntax:df=df.assign(new_column=[data_list])
Parameters:
- new_column: specify the column name needs to be added
- data_list: provide the data list for the, specify None for NaN
Example:
df=df.assign(Thurs=[20,24])
Output:
Mon Tue Wed Thurs
Ahmedabad 30 40 44 20
Baroda 41 28 42 22
#Method 4 By using dictionary
- Create a dictionary for data
- Assign the data to the column
df['Fri']={30:'Ahmedabad',35:'Baroda'}
Add row in dataframe
- You can add row using following two ways
- append()
- loc
The Append Method
df=df.append({'Mon':34,'Tue':37,'Wed':42},ignore_index=True)
The loc[] method
df.loc[3]=[34,56,78]
Select/Access data from dataframe
- There are various methods to select/access data from dataframe.
- Data can be extracted in following ways:
- All rows, all columns
- All rows, limited columns
- Limited rows, limited columns
- Limited rows, All columns
- Extracting data based on label indexing
- the label-based indexing can be done using loc method
print(df.loc['Ahmedabad'])
- Extracting data based on positional indexing
print(df.loc[3])
- The above method will extract data from a particular row as specified in the square bracket.
- The loc can be also used to modify the content.
- Passing a single column label and returning the column as a series.
Syntax: df.loc[:, column]
Example:
print(df.loc[:,'Mon'])
- Passing range of column labels and returning the range of columns from dataframe
Syntax: dfl.loc[;,’Mon’,’Tue’]
Example:
print(df.loc[:,'Mon':'Tue'])
- Passing range of columns labels and row labels and returning the data
Syntax:df.loc[row1:rown,Col1:Coln]
Example:
print(df.loc['Ahmedabad':'Baroda','Mon':'Tue'])
- Specifying a list of row indexes and returning data:
df.loc[[row1,ro2,…]]
Example:
print(df.loc[['Ahmedabad','Baroda']])
- Boolean Indexing
- Uses to filter records from the dataframe
- The condition must be needed
- returns True or False
- Indexes can be created with either True or False or with 0 and 1
- Example:
#Creating DataFrame with boolean indexing
#Method1
df=pd.DataFrame({'R.No':[2,4,5,7,9,11],
'name':['Divya','Deepu','Sanjay','Mohan','Chirag','Dhara']},
index=[True,False,True,True,False,False])
#Accessing boolean indexes
print(df.loc[True])
print(df.iloc[1])
DataFrame Properties
Property/Attribute | Purpose | Example |
df.index | Display list of row labels | print(df.index) Ouput: Index([‘Ahmedabad’, ‘Baroda’], dtype=’object’) |
df.columns | Diplay list of columns | print(df.columns) Output: Index([‘Mon’, ‘Tue’, ‘Wed’], dtype=’object’) |
df.axes | Display the tuples of row labels and columns labels | print(df.axes) Output: [Index([‘Ahmedabad’, ‘Baroda’], dtype=’object’), Index([‘Mon’, ‘Tue’, ‘Wed’], dtype=’object’)] |
df.dtypes | Returns the data types of dataframe | print(df.dtypes) Output: Mon int64 Tue int64 Wed int64 dtype: object |
df.size | Fetch the size of dataframe which is the product of no. of rows and columns | print(df.size) Output: 6 |
df.shape | It displays the tuple of no. of rows and columns. | print(df.shape) Output: (2, 3) |
df.ndim | Returns the dimension of dataframe which is always 2. | print(df.ndim) Output: 2 |
df.values | Returns the list of values from dataframe. | print(df.values) Output: [[30 40 44] [41 28 42]] |
df.T | Transpose the dataframe. | print(df.T) |
df.count | Counts the number of values in the dataframe as parameters passed by the user. The possible arguments are as follows: 1. 0 – Rows 2. 1 – Columns 3. axis= ‘index’ – Rows 4. axis=’columns’ – Columns | df.count(0) df.count(10) df.count(axis=’index’) df.count(axis=’columns’) |
df.empty | Checks whether the dataframe is empty or not. If it’s empty returns True otherwise false. | print(df.empty) Output: False |
Fetching records from dataframe using different conditions
Consider the following dataframe and write a command to do the following:
Year2018 | Year2019 | Year2020 | Year2021 | |
India | 44 | 35 | 25 | 22 |
Pakistan | 41 | 32 | 22 | 18 |
England | 40 | 30 | 20 | 15 |
Australia | 38 | 28 | 18 | 12 |
- Fetch the records on India, Pakistan and Australia for the year 2019 and 2021.
print(df.loc[df.index.isin(['India','Pakistan','Australia']),['Year2019','Year2021']])
- Display the matches played by England in 2021.
print(df.loc[df.index=='England','Year2021'])
- Display the records for 2020 year for all the teams
print(df['Year2020'])
print(df.loc[:,'Year2020'])
print(df.loc[:,df.columns.isin(['Year2020'])])
- Display the records of the team which played matches between 40 to 50 in 2018.
#method1
df1=df[df['Year2018'].between(40,50)]
print(df1['Year2018'])
#method2
df1=df[(df['Year2018']>=40)&(df['Year2018']<=50)]
print(df1['Year2018'])
#method 3
df1=df.query('40>= Year2018 <=50')
print(df1['Year2018'])
- Display records in reverse order
print(df[::-1])
- Display bottom 3 records
print(df.tail(3))
Display the records for 2019 for the teams that played more than 30 matches
#Method1
df1=df.loc[df.Year2019>30]
print(df1['Year2019'])
#Method2
df1=df.query('Year2019>30')
print(df1['Year2019'])
#Method3
df1=df[df.Year2019>30]
print(df['Year2019'])
Delete Columns
- There are three ways to delete columns
- Delete using drop() function
- Delete using label
- Delete using column property
- Delete using drop() function
- The syntax of drop method is as follows
df=df.drop(column_list,axis=1)
Example:
df=df.drop(['Year2018','Year2019'],axis=1)
- Delete column using column label
- The syntax of drop method using column label is as follows
df=df.drop(columns=column_list)
Example:
df=df.drop(columns=['Year2018','Year2019'])
- Delete column using columns properpty
- The syntax is as follows
df=df.drop(df.columns[columnindex],axis=1)
Example:
df=df.drop(df.columns[[1,2]],axis=1)
Delete Rows
There are certain ways to delete rows from dataframe. They are:
- Using Index Name
- Using Drop Method
- Using .index
Using Index Name
#Method 1
df=df.drop('Pakistan')
#Method 2
df=df.drop(index='Pakistan')
#Multiple rows
df=df.drop(['Pakistan','England'])
df.drop(['Pakistan','England'], inplace=True)
#Method 3
df=df.drop(df.index[[1,3]])
Rename column names in Dataframe
#Method1
df=df.rename({'Year2018':2018,'Year2019':2019},axis=1)
#method2
df.rename(columns={'Year2018':2018,'Year2019':2019},inplace=True)
head() Function
- The head() function is used to display top/first n rows from the dataframe.
- If no parameter is supplied to the head() function, it will display 5 records by default.
- Example:
df.head(3)
- The tail() function displays bottom/last n rows from the dataframe.
- It is opposite of head() function.
Binary Operations
- Binary operations can be done on multiple dataframes
- These operations can be addition, subtraction, multiplication and division.
- For these operations, indexes should be matched.
- If indexes are not matching then it will return NaN values.
CSV File
- It stands for Comma Separated Value file
- Each value is separated by a comma by default
- It is known as a separator or delimiter character
- It is a common file format to store tabular data
- It can be operated or opened by text editor (Notepad) or spreadsheet software (MS Excel)
- There are two operations performed on CSV file to dataframe
- Write Data into CSV – to_csv() function is used to write
- Load Data from CSV – read_csv() function is used to read data
Writing Data from pandas to csv using to_csv() function
- to_csv() function requires following parameters:
- path : This parameter specifies the csv file path which can be external or internal
- sep: It specifies separator character replaced by comma
- na_rep: It specifies the value in place of NaN. The default is ”.
- float_format: This option specifies the number format to store in CSV file. As you know python displays a large number after decimal values in output. So this option reduces the length of digits into specified digits.
- header: It is used to export data column header into CSV. It can be specified as True or False. By default it is True.
- columns: To write columns into CSV. By default it is None.
- index: To write row number or not. By default it is True.
import pandas as pd
df=pd.DataFrame({'Name':['Akash','Lucky','Nirav','Sameer'],
'Sales':[200,130,189,176],
'Comm':[500,470,495,444]})
df.to_csv("E:\\data.csv")
- read_csv() function requires following paramters:
- file_path or buffer: It is similar as to_csv() parameter.
- sep: It is too similar to to_csv() sep parameter.
- index_col: Make a passed column as an index
- header: Change the header of as passed row
import pandas as pd
df=pd.read_csv("E:\Python Programs\CSV1.csv")
print(df)
Watch this video for more unerstanding:
Data Visualization
- Helps to understand data in a better way
- It is a process of representing data in graphics or pictures
- It can use various charts and graphs to show trends, relationships between variables and comparisons
- It provides an effective way to communicate information to intended users
- Some popular examples are traffic symbols, ultrasound reports, atlas book of maps, the speedometer of a vehicle etc.
- It is effectively used in many fields like health, finance, science, mathematics, engineering etc.
Plotting using matplotlib – installing and importing matplotlib
- The matplotlib library is used to plot data on chart or graph
- It can be installed using pip install command – pip install matplotlib
- You can import the matplotlib package using – import matplotlib.pyplot
- pyplot module contains a collection of functions used to plot data
- Matplotlib provides control over every aspect of a figure
- It offers interactive and non-interactive plotting and can save images in different formats
- It was written by J.D.Hunter and developed by full-fledged community
- It is distributed under a BSD-Style License
- the plot() function will create necessary figures and axes to achieve the desired plot
Basic components of a chart
The matplotlib chart has the following components:
- Figure: The surrounding or outline area of a plot is a called a figure
- Axes: They are the lines where the data can be plotted. There are two or three types of axes. the axes contain title, x-label and y-label.
- Artist: Everything present in the figure is called artist, generally consisting of text objects, Line objects and collection objects.
- Labels: This indicated what data is to be plotted.
- Title: It is used to specify the title for the plot
- Legend: It shows different types of sets of data plotted in different colours or marks in the chart
Line plot
- Line plot plots data on straight lines
- The styles of lines can be modified easily using markers and line styles
- It requires X-axes and Y-axes data
- It is mostly used to visualize the trend in data over an interval of time
- The important functions used for the line chart are as following:
- plot(x,y,color,others): Draw lines as per specified lines
- xlabel(“label”): For label to x-axis
- ylabel(“label”): For label to y-axis
- title(“Title”): For title of the axes
- legend(): For displaying legends
- show() : Display the graph
import matplotlib.pyplot as mpp
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],'Red')
mpp.xlabel('Employee')
mpp.ylabel('Sales')
mpp.title('Progress Report Chart')
mpp.show()
Customizing line chart
Matplolib provides various functions for customizing chart. The following functions are used to customize the chart.
Function | Use |
gird() | Shows the grid lines on plot figure |
legend() | Display the legends |
savefig() | Save the current figure |
xticks() | Set the current tick location for the x-axis |
yticks | Set the current tick location for the y-axis |
Changing line colour and line style
- Matplotlib provides different styles and colours for lines. The following tables are showing styles and colours.
- The colours can be used for the same are with the abbreviations b,c,g,k,m,r,w,y shows the colour blue, cyan, green, black, magenta, red, white and yellow respectively.
- The styles are -,–,-.,: shows solid line, dashed line, dash-dot line, and dotted line respectively.
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],'m',linestyle='--')
Changing marker, marker size and linewidth
- The marker, marker size and linwidth can be changed accordingly as and when needed.
- The values for the marker are as follows:
Marker | Description |
---|---|
‘o’ | Circle |
‘*’ | Star |
‘.’ | Point |
‘,’ | Pixel |
‘x’ | X |
‘X’ | X (filled) |
‘+’ | Plus |
‘P’ | Plus (filled) |
‘s’ | Square |
‘D’ | Diamond |
‘d’ | Diamond (thin) |
‘p’ | Pentagon |
‘H’ | Hexagon |
‘h’ | Hexagon |
‘v’ | Triangle Down |
‘^’ | Triangle Up |
‘<‘ | Triangle Left |
‘>’ | Triangle Right |
‘1’ | Tri Down |
‘2’ | Tri Up |
‘3’ | Tri Left |
‘4’ | Tri Right |
‘|’ | Vline |
‘_’ | Hline |
- Marker size can be any numeric value for the marker with the parameter markersize or ms.
- You can set the line width using linewidth parameter to change the width of line
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],color='g', linestyle='-.', marker='o', linewidth=3)
Plotting Bar Chart
- Bar charts are used to show the comparison between data
- The bar() function or kind=’bar’ parameter of plot function is used to plot the bar chart
- It shows the bar chart with rectangular bars
- The rectangular bar has a height up to the corresponding value
- It requires two data series
- The bars can be plotted vertically or horizontally
- The parameters for bar function is as following:
x | sequence of scalars representing the x coordinates of the bars. align controls if x is the bar center (default) or left edge. |
width | scalar or array-like, optional. the width(s) of the bars default 0.8 |
bottom | scalar or array-like, optional. the y coordinate(s) of the bars default None. |
histtype | {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}, Default ‘bar’ |
orientation | {‘horizontal’, ‘vertical’}, Default ‘horizontal’ |
align | {‘center’, ‘edge’}, optional, default ‘center’ |
import matplotlib.pyplot as plt
courses=['C','C++','Java','DotNet','Python','Perl']
no_of_std=[20,22,25,30,45,21]
plt.bar(courses,no_of_std)
plt.xlabel('Courses')
plt.ylabel('Strength')
plt.title('Strength per course')
plt.show()
Customizing Bar Chart
You can customize the bar chart as a line chart.
Observe the following code:
plt.bar(courses,no_of_std,width=.5,color=['r','g','b'],label='Courses', dgecolor='k',linewidth=3,linestyle='-.')
Plotting Histogram
- It is a powerful technique for data visualization
- It is a graphical display of frequencies
- It is an accurate graphical representation of the probability distribution of numerical data
- It was introduced by Karl Person
- It plots the quantitative variable
- It shows what portion of data set falls into each category specified as non-overlapping intervals called bins
- To make a histogram the data is sorted into “bins” and the number of data points in each bin is counted
- The height of each column in the histogram is then proportional to the number of data points its bin contains
- df.plot(kind=’hist’) will create a histogram
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Ankush', 'Divya', 'Priya', 'Manu', 'Tanu','Bhavin'],
'Height' : [60,61,63,65,61,60],
'Weight' : [53,65,52,58,50,51]}
df=pd.DataFrame(data)
df.plot(kind='hist',bins=5,edgecolor='red',linewidth=3,
color=['m','y','k'],linestyle=':',fill=False,hatch='o')
plt.show()
- The values for hatch parameters are: ‘-‘, ‘+’, ‘x’, ‘\’, ‘*’, ‘o’, ‘O’, ‘.’
- The hist() function is used to create histogram. Click here to see the parameters
plt.hist(df,bins=[50,55,60,65],edgecolor='red',linewidth=3,
color=['m','y','k'],linestyle=':',fill=False,hatch='o')
Just have a look at another example of histogram:
import matplotlib.pyplot as m
english=[77,66,88,99,55,44,33,79,68,83]
maths=[56,89,70,50,60,65,90,80,47,82]
m.hist([english,maths], orientation='horizontal', histtype="stepfilled", cumulative=True)
m.show()
Societal Impacts
- Nowadays the digital technologies surrounding the world around us
- In this world, everything is almost interconnected in a way or other
- A network is a group of devices connected together for sharing resources and information
- The following networks are examples of different networks
- Social Media Network
- Mobile Network
- Computer Networks
- Various networks like airlines, groups of schools, groups of colleges, hospitals etc.
- The main purpose of the network is to share data and resources as well as establish a connection for communication
- The size of the network may vary from small to large
- Network consists of different hosts like servers, desktops, laptops and smartphones and some network devices such as switches, hubs, routers, modems etc.
- Data Packets refer to data divided into small units for the communication
- The devices can be connected through a wired or wireless
- A single computer connected to a network that receives, creates, stores or sends data to different networks is called a node
- Computer control and manages the resources, users, files and databases in the network is called a server
Digital Footprint
- While surfing on the internet we are leaving a trail of data that reflects the actions performed by us online, which is called a digital footprint
- A digital footprint can be created knowingly or unknowingly
- It includes the following:
- Websites visited
- Sent Emails
- Online forms
- IP address
- Location information
- Device information
- The information left as a digital footprint could be used for advertising or misused or exploited
- So be aware of what you are uploading, writing, downloading, filling in the form etc. online
- There are two kinds of digital footprints:
- Active Digital Footprint
- Data submitted intentionally online
- It includes emails, responses, and posts written on different online platforms
- Passive Digital Footprint
- Data submitted unintentionally online
- It includes data generated online when a website is visited, using a mobile app, browsing the internet etc.
- Active Digital Footprint
- A person who uses the internet may have a digital footprint
- When you examine the browser settings you will get it stores browsing history, cookies, passwords, auto-fills etc.
- Besides browsers, most of the digital footprints are stored on the servers
- You cannot access these data, cannot erase or remove them, or you didn’t have any control over how the data can be used
- Even if you delete data from your end but it remains there
- There is no guarantee that digital footprint will be deleted from the internet completely
- These can be used to track the user, their location, device and other usage details
Net and communication etiquette
- While using the internet, users need to be aware of how to conduct themselves, behave properly with others online, follow some ethics, morals and maintain some values online
- Anyone who is using digital technology and internet is a digital citizen or netizen
- Everyone who is using internet should practice a safe, ethical and legal use of digital technology
- He/She must be abiding by net etiquette, communication etiquette and social media etiquette
Follow these links for further topics:
- Net and communication etiquette
- Data and Net protection IPR
- Cyber Crime and IT Act
- Ewaste Hazards and Management
So I hope you have enjoyed Term 1 Class 12 IP revision notes. If you have any concerns related to any topic or any other doubts related to any topic from the Term 1 Class 12 IP revision notes, feel free to ask in the comment section. Like and share this article with your classmates and friends. Thank you for reading this article, TATA!!!!