The Python Book
 
pandas aggregation groupby
20160606

Dataframe aggregation fun

Load the city dataframe into dataframe df.

Summary statistic of 1 column

df.population.describe()

count    2.261100e+04
mean     1.113210e+05
std      4.337739e+05
min      0.000000e+00
25%      2.189950e+04
50%      3.545000e+04
75%      7.402450e+04
max      1.460851e+07

Summary statistic per group

Load the city dataframe into df, then:

t1=df[['country','population']].groupby(['country'])
t2=t1.agg( ['min','mean','max','count'])
t2.sort_values(by=[ ('population','count') ],ascending=False).head(20)

Output:

        population                               
               min           mean       max count
country                                          
US           15002   62943.294138   8175133  2900
IN           15007  109181.708924  12691836  2398
BR               0  104364.320502  10021295  1195
DE               0   57970.979716   3426354   986
RU           15048  101571.065195  10381222   951
CN           15183  357967.030457  14608512   788
JP           15584  136453.906915   8336599   752
IT             895   49887.442136   2563241   674
GB           15024   81065.611200   7556900   625
FR           15009   44418.920455   2138551   616
ES           15006   65588.432282   3255944   539
MX           15074  153156.632735  12294193   501
PH           15066  100750.534884  10444527   430
TR           15058  142080.305263  11174257   380
ID           17504  170359.848901   8540121   364
PL           15002   64935.379421   1702139   311
PK           15048  160409.378641  11624219   309
NL           15071   53064.727626    777725   257
UA           15012  103468.816000   2514227   250
NG           15087  205090.336207   9000000   232

Note on selecting a multilevel column

Eg. select 'min' via tuple ('population','min').

t2[ t2[('population','min')]>50000 ]

        population                             
               min          mean      max count
country                                        
BB           98511  9.851100e+04    98511     1
CW          125000  1.250000e+05   125000     1
HK          288728  3.107000e+06  7012738     3
MO          520400  5.204000e+05   520400     1
MR           72337  3.668685e+05   661400     2
MV          103693  1.036930e+05   103693     1
SB           56298  5.629800e+04    56298     1
SG         3547809  3.547809e+06  3547809     1
ST           53300  5.330000e+04    53300     1
TL          150000  1.500000e+05   150000     1
 
Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22