8 Things to Know to Master Value Sorting in Pandas - Deepstash
8 Things to Know to Master Value Sorting in Pandas

8 Things to Know to Master Value Sorting in Pandas

Curated from: towardsdatascience.com

Ideas, facts & insights covering these topics:

8 ideas

·

161 reads

8

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

1. Sort by a Single Column

In this article, we’ll be using the flights dataset, which records the monthly passenger numbers from 1949 to 1960. For the purpose of this tutorial, we’ll select a random subset, as shown below.

When we want to sort the data by a single column, we specify the column name directly as the function call’s first parameter. As a side note, you may see me use head a lot, just to show you the top values without wasting the space.

8

35 reads

2. Sort Values Inplace

In the previous sorting, one thing you may have notices is that the sort_values method will create a new DataFrame object, as shown below.

To avoid creating a new DataFrame, you can request the sorting to be done inplace by setting the inplace parameter. When you do that, note that calling sort_values will return None .

8

19 reads

3. Reset Index After Sorting

In the previous sorting, you may notice that the index goes with each sorted row, which puzzles me sometimes, when I want the sorted DataFrame has an ordered index. In this case, you can either reset the index after sorting, or simply take advantage of the ignore_index parameter, as shown below.

8

20 reads

4. Sort by Multiple Columns

We don’t always need one column for sorting. In many cases, we need to sort the data frame by multiple columns. It’s also simple with sort_values because by doesn’t only take a single column but also a list of columns without any special syntax.

8

13 reads

5. Sort by Descending Orders

As we’ve seen so far, every sorting is done using the ascending order, which is the default behavior. However, we often want to have the data sorted by a descending order. We can take advantage of the ascending parameter.

What should we do if we sort by multiple columns and have different ascending requirements for these columns? In this case, we can pass a list of boolean values with each corresponding to one column.

8

27 reads

6. Sort by Custom Functions

Apparently, the sorted data isn’t something that we expect — the months are not in the desired order. To make this happen, we can take advantage of the sort_method taking a key parameter, to which we can pass a custom function for sorting, just like Python’s built-in sorted function. A possible solution is shown below.

  • The key takes a callable, and we use a custom function here. Besides, this parameter is only available with pandas 1.1.0+.
  • Unlike the key parameter used in sorted() , the key function applies to each of the sorting columns in the sort_values method.

8

17 reads

7. Sort Lexicographically Unordered Columns After Casting to Categorical

The above sorting using the key parameter can be confusing to some people. Is there a cleaner way? Pandas is arguably the most versatile library for data processing, and you can expect that there is something neat to solve this relatively common problem — converting these lexicographically unordered columns to categorical data.

  • We define a CategoricalDtype by specifying the order of the months.
  • We cast the month column to the new defined category.
  • When we sort the month, it will use the order of the months in the category data definition.

9

13 reads

8. Don’t Forget about NANs

It’s important to remember that your datasets can always contain NANs. Unless you’ve examined your data quality and know that there are no NANs, you should pay attention to that. When we sort values, these NANs are placed behind all the other valid values, by default. If we want to change this default behavior, we set the na_position parameter.

  • We first inject one NAN into the DataFrame object.
  • When we do nothing with the na_position , the NAN value is placed at the end of the sorting group.
  • When we set “first” to na_position , the NAN value appears at the top.

8

17 reads

IDEAS CURATED BY

Mausam Adhikari's ideas are part of this journey:

Machine Learning With Google

Learn more about computerscience with this collection

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Related collections

Similar ideas

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

100+ Learning Journeys

Access to 200,000+ ideas

Access to the mobile app

Unlimited idea saving

Unlimited history

Unlimited listening to ideas

Downloading & offline access

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email

I agree to receive email updates