Notes Chapter 11 Python Pandas II Dataframes And Other Operations
Introduction
- In last chapter, we got some information about python pandas ,data structure and series. It is not able to handle the data in the form of 2D or multidimensional related to real time.
- For such tasks, python pandas provides some other data structure like dataframes and panels etc.
- Dataframe objects of Pandas can store 2 D hetrogenous data.
- On the other hand, panels objects of Pandas can store 3 D hetrogenous data.
- In this chapter, we will discuss them.
DataFrame Data Structure
- A DataFrame is a kind of panda structure which stores data in 2D form.
- Actually, it is 2 dimensional labeled array which is an ordered collection of columns where columns can store different kinds of data.
- A 2D array is a collection of row and column where each row and column shows a definite index starts from 0.
- In the given diagram, there are 5 rows and 5 columns. Row and column index are from 0 to 4 respectively.
- Each cell has the address like-
- A[2][1], A[1][4] etc like shown in the diagram.
Characteristics of DataFrame
Characteristics of a DataFrame are as follows-
- It has 2 index or 2 axes.
- It is somewhat like a spreadsheet where row index is called index and column index is called column name.
- Indexes can be prepared by numbers, strings or letters.
- It is possible to have any kind of data in columns.
- its values are mutable and can be changed anytime.
- Size of DataFrame is also mutable i.e. The number of row and column can be increaded or decreased anytime.
Creation and presentation of DataFrame
- DataFrame object can be created by passing a data in 2D format.
- import pandas as pd
- <dataFrameObject> = pd.DataFrame(<a 2D Data Structure>,\ [columns=<column sequence>],[index=<index sequence>])
- You can create a DataFrame by various methods by passing data values. Like-
- 2D dictionaries
- 2D ndarrays
- Series type object
- Another DataFrame object
Creation of DataFrame from 2D Dictionary
A. Creation of DataFrame from dictionary of List or ndarrays.
Creation of DataFrame from 2D Dictionary
B. Creation of DataFrame from dictionary of Dictionaries-
Creation of Dataframe from 2D ndarray
Creation of DataFarme from 2D Dictionary of same Series Object
Creation of DataFrame from object of other DataFrame
Displaying DataFrame Object
DataFrame Attributes
- When we create an object of a DataFrame then all information related to it like size, datatype etc can be accessed by attributes.
- <DataFrame Object>.<attribute name>
- Some attributes are –
DataFrame Attributes
Selecting and Accessing from DataFrame
Selection of subset from DataFrame
Selection of subset from DataFrame
Accessing and modifying values in DataFrame
a) Syntax to add or change a column-
<DFObject>.<Col Name>[<row label>]=<new value>
Accessing and modifying values in DataFrame
b) Syntax to add or change a row-
<DFObject> at[<RowName>, : ] =<new value>
<DFObject> loc[<RowName>, : ] =<new value>
Accessing and modifying values in DataFrame
c) Syntax to change single value-
<DFObject>.<ColName>[<RowName/Lebel>]
Accessing and modifying values in DataFrame
d) Syntax for Column deletiondel
<DFObject>[<ColName>] or
df.drop([<Col1Name>,<Col2Name>, . . ], axis=1)
Iteration in DataFrame
- Sometimes we need to perform iteration on complete DataFrame. In such cases, it is difficult to write code to access values separately. Therefore, it is necessary to perform iteration on dataframe which is
- to be done as-
- <DFObject>.iterrows( ) it represents dataframe in row-wise subsets .
- <DFObject>.iteritems( ) it represents dataframe in column-wise subsets.
Use of pandas.iterrows () function
Use of pandas.iteritems() function
Program for iteration
- Write a program to iterate over a dataframe containing names and marks, then calculates grades as per marks (as per guideline below) and adds them to the grade column.
Marks > =90 Grade A+
Marks 70 – 90 Grade A
Marks 60 – 70 Grade B
Marks 50 – 60 Grade C
Marks 40 – 50 Grade D
Marks < 40 Grade F
Program for iteration
Binary Operations in a DataFrame
It is possible to perform add, subtract, multiply and devision operations on DataFrame.
To Add – ( +, add or radd )
To Subtract – (-, sub or rsub)
To Multiply– (* or mul)
To Divide – (/ or div)
We will perform operations on following dataframes –
Addition
Subtraction
Multiplication
Division
Other important functions
Other important functions of DataFrame are as under-
<DF>.info ( )
<DF>.describe ( )
Other important functions
Other important functions of DataFrame are as under-
<DF>.head ([ n=<n>] ) here, default value of n is 5.
<DF>.tail ( [n=<n>])
Cumulative Calculations Functions
In DataFrame, for cumulative sum, function is as under-
<DF>.cumsum([axis = None]) here, axis argument is optional.
Index of Maximum and Minimum Values
Handling of Missing Data
- The values with no computational significance are called missing values.
- Handling methods for missing values-
- Dropping missing data
- Filling missing data (Imputation)
Comparison of Pandas Objects