top of page
  • Writer's pictureAlibek Jakupov

Python pandas: flatten headers

Updated: Nov 19, 2021


In this short blog post we are going to see how to flatten your pandas dataframe after aggregation operation.



Why flatten your columns?


Imagine working with your dataframe as you usually do on SQL Server: you apply different operations, like join, aggregate, select etc. However after running an aggregation function on your pandas dataframe, you have multilevel column headers which are difficult to manipulate. After looking through StackOverflow questions, I've finally created a simple and flexible function which allows me to convert my aggregated dataset back into the initial format. So if this code snippet saves you a couple of hours I will be completely satisfied. Up we go!



Why this happens?


Let's have a look at this example. Here we have a dataframe, which we want to aggregate.

pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ]

-------------------------------------------------------------------

Out[67]: 
     EVENT_ID  SELECTION_ID   ODDS
0   100429300       5297529  18.001   100429300       5297529  20.002   100429300       5297529  21.003   100429300       5297529  22.004   100429300       5297529  23.005   100429300       5297529  24.006   100429300       5297529  25.00

And then when you group by Event Id and Selection Id, here's what you've got, a frame with mulit-index columns

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] )

-------------------------------------------------------------------

Out[68]: 
                         amin   amax
EVENT_ID  SELECTION_ID              
100428417 5490293        1.71   1.715881623        1.14   1.355922296        2.00   2.005956692        2.00   2.02100428419 603721         2.44   2.904387436        4.30   6.204398859        1.23   1.354574687        1.35   1.464881396       14.50  19.006032606        2.94   4.206065580        2.70   5.806065582        2.42   3.65100428421 5911426        2.22   2.52

Pretty ugly, right?


Quote from Pandas:

Hierarchical / Multi-level indexing is very exciting as it opens the door to some quite sophisticated data analysis and manipulation, especially for working with higher dimensional data. In essence, it enables you to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d).

Undoubtedly, it is exciting, but then it gets relatively complicated to manipalte my dataframe, whereas I simply want to manipulate it as I used to do via good old SQL commands (this is definitely due to the lack of my knowledge, I admit, but usually the goal of our work is to get the things done, with the least effort).


All the suggestions on StackOverflow was either irrelevant, or required too much modifications to my initial dataframe, or were not sufficiently flexible. If we make a summary of what have been suggested on the Internet, the workarounds are:

  1. Set the columns to the top level (using DataFrame.get_level_values(0))

  2. Use to_flat_index() which has been released starting from pandas 0.24.0 (still there may be some issues with column naming)

  3. Converting columns to list and then joining them all together

And in my case, it was even worse as I had two aggregation operations, mean and std. So I created simple function, to get rig of this noisy Multi-index columns (again, they are great, it's just that they were useless in my project). Here you are:

def flatten_columns(columns):
 return [first if second == '' else second for first, second in columns.values]

And inside the main body

# e.g construct a correlation between columns
correlation_dataframe = generate_correlation_dataframe(unpivoted_dataframe, show_status)

# then aggregate using mean and standard deviation
aggregated_dataframe = correlation_dataframe.groupby(['Year', 'Sequence', 'Variable'],as_index=False).agg({'Correlation':['mean', 'std']})

# and flatten
aggregated_dataframe.columns = flatten_columns(aggregated_dataframe.columns)

It worked out for me, but if this does not work for you, I would be very grateful if you provided your experience of flattening the pandas Multi-index rows here.


 

Again, hope you will find it useful


May the force be with you!


1,059 views0 comments
bottom of page