Series is one of the core data structures in Pandas. It is like a crossover between a Python List and a Dictionary. The items in Series are stored in an order and there are labels with which you can retrieve them. An easy way to visualize this is two columns of data.
The first column stores the special indexes allotted to every item stored in the Series, a lot like the keys of a dictionary. While the second column stores the actual data. It's important to note that the data column has a label of its own and can be retrieved using the .name attribute. This is different than with dictionaries and is useful when it comes to merging multiple columns of data.
Why do we require Pandas Series?
Well, below we see some specific examples of requirements of Pandas Series Object.
Storing and analyzing financial data: Pandas Series can be used to store and analyze financial data, such as stock prices, currency exchange rates, and commodity prices.
Working with time series data: Pandas Series is well-suited for working with time series data, such as sensor readings, temperature data, or sales data.
Performing machine learning tasks: Pandas Series can be used to prepare data for machine learning tasks, such as feature engineering and data scaling.
Visualizing data: Pandas Series can be used to create data visualizations, such as line charts, bar charts, and histograms.
Overall, Pandas Series is a powerful and versatile tool for data analysis and manipulation in Python. It is widely used in a variety of industries, including finance, healthcare, and science.
Real Life Application Of Pandas Series
Here we enlist a few real-life applications Pandas Series being used in different industries.
A hedge fund manager uses Pandas Series to analyze stock market data and make investment decisions.
A healthcare analyst uses the Pandas Series to analyze patient records and identify trends in disease prevalence and treatment outcomes.
A climate scientist uses the Pandas Series to analyze temperature data and identify patterns in climate change.
A software engineer uses the Pandas Series to prepare data for a machine learning model that predicts customer churn.
Moving Ahead let's see a few basic operations around Series In Pandas. But before that, let's first import the two libraries we are going to use quite often.
import pandas as pd
import numpy as np
Creating A Series
Now there are various methods via which we can create a Pandas Series. A few Of them are listed as follows:
Via Passing A List in the Series Attribute
Let's first see the syntax for executing this task,
< Variable >=Pandas.Series(< List >)
Let's see the code where we are implementing this.
#Lets now pass an list to create a series
sampleList=['Ash','May','Brock']
SeriesStructure1=pd.Series(sampleList)
print(SeriesStructure1)
The output looks as follows:
As observable above when we pass a list through the Series attribute Pandas starts assigning every element of the dataset, keys, starting from zero.
Another thing to observe in the given output is that Pandas has automatically recognized and printed the datatype of the newly created Series. As every element of the list passed was an object(object is a collective name given to string, arrays etc) the Series formed from this list will have the datatype object.
See another example for a better understanding. This time in place of an Array of strings, we are going to pass an Array of numbers.
# Example2. This time in place of passing string we are going to pass numbers.
sampleList=[90,3,45]
SeriesStructure2=pd.Series(sampleList)
print(SeriesStructure2)
The output is as follows:
As observable this time the dtype of Series is int64 as the list passed through series was of datatype int64.
Also, note the underlying structure of the whole Pandas library is Numpy.
Important
In Python when we want to indicate a lack of data we generally use the None keyword. But let's now say that we pass a List of strings through the Series attribute of Pandas and have one of its element as None. When it happens, Pandas simply classify the None object as a string and assigns it an index in the Series we are going to get. Hence in place of treating None as a None datatype, pandas treats it as the common datatype of the List, String(Object) in this case. Below we see the code demonstrating it.
sampleList=['Ashutosh','7',None]
SeriesConstruct3=pd.Series(sampleList)
print(SeriesConstruct3)
The output is as follows:
On the other hand, when we create a list of numbers whether it be integers, float or double and add a None element in it, Pandas converts this None into a Special Floating Point value designated as NaN, which expands to Not A Number. Below we see it in motion.
sampleList=[4,6,8,None]
SeriesConstruct=pd.Series(sampleList)
print(SeriesConstruct)
The output is as follows:
One more important thing to note here is that we passed in a List of integers with a None value but in the output Series, we see that Pandas has set the dtype Of the Series to float64. This is because of NaN. Since we have the floating element NaN present in the Series we observe that Pandas typecasts the integer elements to float also and we get the common dtype as float.
Important
NaN is NOT equivalent to None and when we try the equality test, the result is False. It can be seen below.
np.nan == None
The output is as follows:
Creating A Series Via Dictionary
Till now we were creating series via lists. When we create a Pandas Series using lists the labels\Indexes are system generated while the data to which the indexes\labels were linked to were supplied by the data inside the list.
Another way of creating a Pandas Series is creating it using dictionaries. Here we observe that the keys of the dictionary work as the labels\indexes and the values of the dictionary work as data to which indexes are linked.
This gives us the additional functionality of manipulating the labels\indexes of Series elements which we didn't have in the case of lists.
Let's see the code below.
sampleDictionary={'A':'Amanda','B':'Cole','C':'Dwayne'}
newConstruct=pd.Series(sampleDictionary)
print(newConstruct)
The output is as follows:
The Index Attribute And Creation Of Series Using The Index Attribute
The indexes\labels of existing Series can be accessed via the Index attribute.
The general syntax for using the Index attribute is as follows:
< Name Of Series >.index
Let's see the implementation of the Index attribute as follows:
newConstruct.index
The output is as follows:
Now the Index attribute can also be used to create a series. Its implementation is shown below.
s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
print(s)
The output of the code is as follows:
Well that would be the end of this blog. Here I have tried my best to introduce you to the Pandas Series. We have discussed, What is Series?, Why do we require the Pandas Series Object?, Where is the Pandas Series required?. But most importantly we have seen How to we create a Pandas Series. All this would conclude the end of this blog.
This blog is a Part 1 two Part Blog series on Pandas Series Object.
If you like the content and if it really helped you out, do consider subscribing my blog.
You can also connect with me on Linkedin and Twitter, should you wish to do so.