The Python Book
 
dataframe csv
20141203

Add records to a dataframe in a for loop

The easiest way to get csv data into a dataframe is:

pd.read_csv('in.csv')

But how to do it if you need to massage the data a bit, or your input data is not comma separated ?

cols= ['entrydate','sorttag','artist','album','doi','tag' ] 
df=pd.DataFrame( columns= cols )

for ..:
    data = .. a row of data-fields separated by |
              with each field still to be stripped 
              of leading & trailing spaces
    df.loc[len(df)]=map(str.strip,data.split('|'))
pandas csv
20141019

Read/write csv

Read:

pd.read_csv('in.csv')

Write:

<yourdataframe>.to_csv('out.csv',header=False, index=False ) 

Load a csv file

Load the following csv file. Difficulty: the date is spread over 3 fields.

    2014, 8, 5, IBM, BUY, 50,
    2014, 10, 9, IBM, SELL, 20 ,
    2014, 9, 17, PG, BUY, 10,
    2014, 8, 15, PG, SELL, 20 ,

The way I implemented it:

10
11
12
13
14
15
16
17
18
19
20
21
22
# my way
ls_order_col= [ 'year', 'month', 'day', 'symbol', 'buy_sell', 'number','dummy' ]
df_mo=pd.read_csv(s_filename, sep=',', names=ls_order_col, skipinitialspace=True, index_col=False)

# add column of type datetime 
df_mo['date']=pd.to_datetime(df_mo.year*10000+df_mo.month*100+df_mo.day,format='%Y%m%d')

# drop some columns
df_mo.drop(['dummy','year','month','day'], axis=1, inplace=True)

# order by datetime
df_mo.sort(columns='date',inplace=True )
print df_mo

An alternative way,... it's better because the date is converted on reading, and the dataframe is indexed by the date.

 
Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22