2 agosto, 2024

Pandas Library: most important aspects

The Pandas library is an open source Python library. open source specialized in the management and analysis of data structures. It began to be developed in 2008 and by the end of 2009 it was already open source, allowing programmers from all over the world to make valuable contributions to improve the Pandas library every day.

What does the Pandas library contain?

The Pandas library offers a number of useful tools for reading and writing data between data structures, including CSV and text files, xlsx, SQL, and HDF5 format. It also has functions of data alignment, transformation and rotation of training data sets and all types, among other things.

Let’s look at some of the features of Pandas through an exercise:

example1 = pd.read_csv («./data/ex2data1.txt», sep = » , «, header = None, names = [‘x1’, ‘x2’, ‘label’]) example1.head()

Describe

example1.describe()

Describe What it does is return some statistical information about the variables that allow us to have this type of information, that is, numerical variables.

In this case, all the variables are numerical, so if we do a describe with this example, information from the 3 columns will appear:

If any of those columns were non-numeric, they would not be in the list.

So what do we know with the describe? Well we have the count, that is, the number of values; he mean or the mean; the standard deviation or std; the minimum value min and the maximum maxand the quartiles (25%, 50%, 75%).

While it is useful information, it is not used much in almost any context. However, it is important that you keep it in mind in case it occurs in future exercises or you have the opportunity to use it.

shape

He shape is a function within the Pandas library that returns the number of rows and columns of a array.

example1.shape()

(100, 3)

Tail

He tail is the opposite of head at the Pandas bookstore. This means that what he does tail is return last n-rows:

example1.tail()

Dtypes

He dtypes is a function within the Pandas library in Python that returns the type of the columns. This is very useful when we have to do data analysis and we have unknown data sets, that is, when we don’t know what we are dealing with:

example1.dtypes x1float64x2float64labelint64

dtype: object

Isnull

He isnull gives us back missing values or missing values.

He isnull in the Pandas bookstore shows us a dataframe of the same magnitude as the original, but with a TRUE/FALSE boolean whether the data is missing or not, that is, whether the data is contained or not.

example1.isnull()

Like this dataframe is unmanageable, what we will do is put .any after the isnull so that we check if there is any null value in any of the columns and, if so, show it to us.

example1.isnull.any()

dtype: bool

The table shows us that there are none.

To verify that this does work, let’s test isnull.any.

Within the dataset we will delete a value and see if the table is updated within our commands:

We delete the indicated value, save the changes and reload the exercise.

We load the table again and see that x2 appears as True, because now this column has a missing valuewhich is the one we have removed from the dataset previously.

In fact, if we do the describetells us that there are already 99 values, not 100:

Do you want to continue learning about data science?

If you want to continue learning about the Pandas library in Python and much more to be an expert in the field of data science, our Big Data, Artificial Intelligence & Machine Learning Full Stack Bootcamp offers you the possibility of achieving this in a few months. Keep learning all the theory and practice necessary to immerse yourself in this world and succeed in the job market. Dare to change your life and request more information now!

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *