Notes Chapter 10 Introducing Python Pandas
Introduction
• Pandas or Python Pandas is a library of Python which is used for data analysis.
• The term Pandas is derived from “Panel data system” , which is an ecometric term for multidimentioal, structured data set ecometrics.
• Now a days, Pandas has become a popular option for Data Analysis.
• Pandas provides various tools for data analysis in simpler form.
• Pandas is an Open Source, BSD library built for Python Programming language.
• Pandas offers high performance, easy to use data structure and data analysis tools.
• The main author of Pandas is Wes McKinney.
• In this chapter, we will learn about Pandas.
Installing Pandas
• “pip” command is used to install Pandas. For this, open the location of pip storage in command prompt (cmd). Goto the location in windows where pip file is stored.look at the following screen-

• Command window will look like-

• Run the command- “pip install pandas”

• The following screen comes after and Pandas will be successfully installed.

Using Pandas
• Before proceeding, we need to first import the Pandas.

Features of Pandas
Pandas, is the most popular library in Scientific Python ecosystem for doing data analysis. Pandas is capable of many taska including-
1.It can read or write in many different data formats(Integer, float, double etc).
2.It can calculate in all ways data is organized.
3.It can easily select subsets of data from bulky data sets ab=nd even combine multiple datasets together.
4.It has functionality to find anfd fill missing data.
5.It allows you to apply operations to independent groups within the data.
6.It supports reshaping of data into different forms.
7.It supports advanced time-series functionality(which is the use of a model to predict future values based on previously observed values).
8.It supports visualization by integrating matplotlib and seaborn etc libraries.
Pandas is best at handling huge tabular data sets comprising different data formats.
NumPy Arrays
• Before proceeding towards Pandas’ data structure, let us have a brief review of NumPy arrays because-
1.Pandas’ some functions return result in form of NumPy array.
2.It will give you a jumpstart with data structure.
• NumPy (“Numerical Python” or Numeric Python”) is an open source module of Python that provides functions for fast mathematical computation on arrays and matrices.
• To use NumPy, it is needed to import. Syntax for that is-
>>>import numpy as np
(here np, is an alias for numpy which is optional)

• NumPy arrays come in two forms-
* 1-D array – also known as Vectors.
* Multidimentional arrays –
Also known as Matrices.
2D NumPy Arrays

NumPy Arrays Vs Python Lists
• Although NumPy array also holds elements like Python List , yet Numpy arrays are different data structures from Python list. The key differences are-
• Once a NumPy array is created, you cannot change its size. you will have to create a new array or overwrite the existing one.
• NumPy array contain elements of homogenous type, unlike python lists.
• An equivalent NumPy array occupies much less space than a Python list.
• NumPy array supports Vectorized operation, i.e. you need to perform any function on every item one by one which is not in

NumPy Data Types
NumPy supports following data types-

Ways to Create NumPy Arrays
• empty() function can be used to create empty array or an unintialized array of specified shape and dtype.
numpy.empty(Shape,[dtype=<datatype>,] [ order = ‘C’ or ‘F’]
Where:dtype: is a data type of python or numpy to set initial values.
Shape: is dimension.
Order : ‘C’ means arrangement of data as row wise(C means C like).
Order : ‘F’ means arrangement of data as row wise ( F means Fortran like)


1. arange( ) function is used to create array from a range.

2. linspace( ) function can be used to prepare array of range.

Pandas Data Structure
“A data structure is a particular way of storing and organizing data in a computer so that it can be accessed and worked with in appropriate ways. For ex-
-If you want to store similar type of data items together and process them in identical way , array is the solution.
– If you want to store data in such a way so that you get access of the very last data item you inserted, stack is the solution.
-If you want to store data in such a way so that data item inserted first get accessed first, Queue is the solution.
There are many more other types of data structure suited for different types of functionality.
Further, We will come to know about Series and DataFrame data structures of Python.
Series Data Structure
–Series is a data structure of pandas. It represents a 1D array of indexed data.
–It has two main components-
• An array of actual data.
• An associated array of indexes or data labels.
–Both components are 1D arrays with the same length.

Creation of Series Objects
There are many ways to create series type object –
1.Using Series ( )-
<Series Object> = pandas.Series( ) it will create empty series.

2.Non-empty series creation–
Import pandas as pd
<Series Object> = pd.Series(data, index=idx) where data can be python sequence, ndarray, python dictionary or scaler value.

Series Objects creation
1. Creation of series with Dictionary-

2. Creation of series with Scalar value-

Creation of Series Objects –Additional functionality
1. When it is needed to create a series with missing values, this can be achieved by filling missing data with a NaN (“Not a Number”) value.

2. Index can also be given as-

3. Dtype can also be passed with Data and index

4. Mathematical function/Expression can also be used-

Series Object Attributes
Some common attributes
<series object>.<AttributeName>


Accessing Series Object

For Object slicing, follow the following syntax-
<objectName>[<start>:<stop>:<step >]
Operations on Series Object
Elements modification-
<series object>.<index] = <new_index_array>

It is possible to change indexes
<series object>.<index] = <new_index_array>

head() and tail () Function
1. head(<n> ) function fetch first n rows from a pandas object. If you do not provide any value for n, will return first 5 rows.
2. tail(<n> ) function fetch last n rows from a pandas object. If you do not provide any value for n, will return last 5 rows.

Series Objects – Vector Operations

Series Objects – Arithmetic Operations

Entries Filtering
<seriesObject> <series – boolean expression >

Other feature

Difference between NumPy array Series objects
1. In case of ndarray, vector operation is possible only when ndarray are of similar shape. Whereas in case of series object, it will be aligned only with matching index otherwise NaN will be returned.

2. In ndarray, index always starts from 0 and always numeric. Whereas, in series, index can be of any type including number and not necessary to start from 0.
