pandas.DataFrame.value_counts#
Return a Series containing counts of unique rows in the DataFrame.
New in version 1.1.0.
Columns to use when counting unique combinations.
normalize bool, default False
Return proportions rather than frequencies.
sort bool, default True
Sort by frequencies.
ascending bool, default False
Sort in ascending order.
dropna bool, default True
Don’t include counts of rows that contain NA values.
New in version 1.3.0.
Equivalent method on Series.
The returned Series will have a MultiIndex with one level per input column. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.
Python | Pandas Series.value_counts()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index.
8 Python Pandas Value_counts() tricks that make your work more efficient
Before you start any data project, you need to take a step back and look at the dataset before doing anything with it. Exploratory Data Analysis (EDA) is just as important as any part of data analysis because real datasets are really messy, and lots of things can go wrong if you don’t know your data. The Pandas library is equipped with several handy functions for this very purpose, and value_counts is one of them. Pandas value_counts returns an object containing counts of unique values in a pandas dataframe in sorted order. However, most users tend to overlook that this function can be used not only with the default parameters. So in this article, I’ll show you how to get more value from the Pandas value_counts by altering the default parameters and a few additional tricks that will save you time.
What is value_counts() function?
The value_counts() function is used to get a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
Syntax
df[‘your_column’].value_counts() — this will return the count of unique occurences in the specified column.
It is important to note that value_counts only works on pandas series, not Pandas dataframes. As a result, we only include one bracket df[‘your_column’] and not two brackets df[[‘your_column’]].
Parameters
- normalize (bool, default False) — If True then the object returned will contain the relative frequencies of the unique values.
- sort (bool, default True) — Sort by frequencies.
- ascending (bool, default False) — Sort in ascending order.
- bins (int, optional) — Rather than count values, group them into half-open bins, a convenience for pd.cut , only works with numeric data.
- dropna (bool, default True) —Don’t include counts of NaN.
Loading a dataset for live demo
Let’s see the basic usage of this method using a dataset. I’ll be using the Coursera Course Dataset from Kaggle for the live demo. I have also published an accompanying notebook on git, in case you want to get my code.
Let’s start by importing the required libraries and the dataset. This is a fundamental step in every data analysis process. And then review the dataset in Jupyter notebooks.
Loading the dataset
This tells us that we have 891 records in our dataset and that we don’t have any NA values.
1. ) value_counts() with default parameters
Now we are ready to use value_counts function. Let begin with the basic application of the function.
Syntax — df[‘your_column’].value_counts()
We will get counts for the column course_difficulty from our dataframe.
basic use of value_counts function
The value_counts function returns the count of all unique values in the given index in descending order without any null values. We can quickly see that the maximum courses have Beginner difficulty, followed by Intermediate and Mixed, and then Advanced.
Now that we understand the basic use of the function, it is time to figure out what parameters do.
2.) value_counts() in ascending order
The series returned by value_counts() is in descending order by default. We can reverse the case by setting the ascending parameter to True .
Syntax — df[‘your_column’].value_counts(ascending=True)
value_counts in ascending order
3.) value_counts() sorted alphabetically
In some cases it is necessary to display your value_counts in an alphabetical order. This can be done easily by adding sort index sort_index(ascending=True) after your value_counts().
Default value_counts() for column «course_difficulty» sorts values by counts:
normal value_counts()
Value_counts() with sort_index(ascending=True) sorts by index (column that you are running value_counts() on:
Value_counts() sorted alphabetically
If you want to list value_counts() in reverse alphabetical order you will need to change ascending to False sort_index(ascending=False)
Value_counts() ordered in reverse alphabetical order
4.) Pandas value_counts(): sort by value, then alphabetically
Lets use for this example a slightly diffrent dataframe.
Here we want to get output sorted first by the value counts, then alphabetically by the name of the fruit. This can be done by combining value_counts() with sort_index(ascending=False) and sort_values(ascending=False) .
Value_counts() sorted by value then alphabetically
5.) value_counts() persentage counts or relative frequencies of the unique values
Sometimes, getting a percentage count is better than the normal count. By setting normalize=True , the object returned will contain the relative frequencies of the unique values. The normalize parameter is set to False by default.
Syntax — df[‘your_column’].value_counts(normalize=True)
value_counts as percentages
6.) value_counts() to bin continuous data into discrete intervals
This is one great hack that is commonly under-utilised. The value_counts() can be used to bin continuous data into discrete intervals with the help of the bin parameter. This option works only with numerical data. It is similar to the pd.cut function. Let’s see how it works using the course_rating column. Let’s group the counts for the column into 4 bins.
Syntax — df[‘your_column’].value_counts(bin = number of bins)
valse_counts default parameters
value_counts binned
Binning makes it easy to understand the idea being conveyed. We can easily see that most of the people out of the total population rated courses above 4.5. With just a few outliers where the rating is below 4.15 (only 7 rated courses lower than 4.15).
7.) value_counts() displaying the NaN values
By default, the count of null values is excluded from the result. But, the same can be displayed easily by setting the dropna parameter to False . Since our dataset does not have any null values setting dropna parameter would not make a difference. But this can be of use on another dataset that has null values, so keep this in mind.
Syntax — df[‘your_column’].value_counts(dropna=False)
8.) value_counts() as dataframe
As mentioned at the beginning of the article, value_counts returns series, not a dataframe. If you want to have your counts as a dataframe you can do it using function .to_frame() after the .value_counts() .
We can convert the series to a dataframe as follows:
Syntax — df[‘your_column’].value_counts().to_frame()
normal value_counts & value_counts as df
If you need to name index column and rename a column, with counts in the dataframe you can convert to dataframe in a slightly different way.
9.) Group by and value_counts
This is one of my favourite uses of the value_counts() function and an underutilized one too. Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts .
Syntax — df.groupby(‘your_column_1’)[‘your_column_2’].value_counts()
Using groupby and value_counts we can count the number of certificate types for each type of course difficulty.
Group by course difficulty and value counts for course certificate type
This is a multi-index, a valuable trick in pandas dataframe which allows us to have a few levels of index hierarchy in our dataframe. In this case, the course difficulty is the level 0 of the index and the certificate type is on level 1.
10. Pandas Value Counts With a Constraint
When working with a dataset, you may need to return the number of occurrences by your index column using value_counts() that are also limited by a constraint.
Syntax — df[‘your_column’].value_counts().loc[lambda x : x>1]
The above quick one-liner will filter out counts for unique data and see only data where the value in the specified column is greater than 1.
Let’s demonstrate this by limiting course rating to be greater than 4.
value_counts with a constraint
Hence, we can see that value counts is a handy tool, and we can do some interesting analysis with this single line of code.
Pandas – Count occurrences of value in a column
In this tutorial, we will look at how to count the occurrences of a value in a pandas dataframe column with the help of some examples.
Pandas value_counts() function
You can use the pandas series value_counts() function to count occurrences of each value in a pandas column. The following is the syntax:
It returns a pandas series containing the counts of unique values.
Let’s look at some examples of using the value_counts() function to get the count of occurrences of values in a column.
First, we will create a sample dataframe that we will be using throughout this tutorial.
We have created a dataframe storing the information on the Olympics performance of the legendary sprinter Usain Bolt. We will be using this dataframe throughout this tutorial.
Count occurrences of each unique value in the Column
Apply the pandas value_counts() function on the desired column. For example, let get the value counts in the “Event” column of the dataframe df. This will show the different events and their counts where Usain Bolt won an Olympics medal.
We get a pandas series with each unique value and its respective count in the “Event” column. You can see that Usain Bolt won three medals each in the “100 m” and the “200 m” event and two medals in the “4×100 m” event at the Olympics. Note that all these medals are gold medals.
Count occurrences of values in terms of proportion
At times you may want to know the proportion of each value in the column. For example, what proportion of Usain Bolt’s medals at the Olympics came from the “100 m” event. Pass normalize=True to the value_counts() function.
We now get the counts normalized as proportions.
Count occurrences of a specific value in a column
The return value from the pandas value_counts() function is a pandas series from which you can access individual counts. For example, to just count the occurrences of “200 m” in the “Event” column –
Here, we get the count of medals won in the “200 m” category by Usain Bolt as 3.
For more on the pandas value_counts() function, refer to its documentation.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.