Pandas ufuncs Tips and Tricks

Sat, 26 Aug 2017 11:16:25 +0800

Why are pandas ufuncs recommended over apply?

Pandas has an apply function that lets you run arbitrary functions across every value in a column. The catch is that apply is only marginally faster than a plain Python loop. That’s why pandas’ built-in ufuncs are the preferred choice for column preprocessing.
ufuncs are special functions built on top of numpy and implemented in C, which is why they’re so fast. Below we introduce several examples of ufuncs: .diff, .shift, .cumsum, .cumcount, .str commands (for strings), and .dt commands (for dates).

Pandas cut and qcut Functions

Sat, 05 Aug 2017 11:46:56 +0800

When we have continuous numerical values, we can discretize them using cut and qcut. The cut function bins values by numeric intervals, while qcut bins them by quantiles. In other words, cut produces bins of equal length, while qcut produces bins of equal size.

The cut function

Suppose we have the ages of a group of people:
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32, 101]
If we want to discretize this list into “18 to 25”, “25 to 35”, “35 to 60”, and “60 and above”, we can use the cut function:

Pandas - Tag - 300.Watts

Pandas ufuncs Tips and Tricks

Pandas cut and qcut Functions

The cut function