The Python Book

fold split kfold
20160122

# Check the indexes on k-fold split

Suppose you split a list of n words into splits of k=5, what are the indexes of the splits?

Pseudo-code:

``````for i in 0..5:
start = n*i/k
end   = n*(i+1)/k``````

## Double check

Double check the above index formulas with words which have the same beginletter in a split (for easy validation).

 ```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ``` `````` #!/usr/bin/python data= ['argot', 'along', 'addax', 'azans', 'aboil', 'aband', 'ayelp', 'erred', 'ester', 'ekkas', 'entry', 'eldin', 'eruvs', 'ephas', 'imino', 'islet', 'inurn', 'iller', 'idiom', 'izars', 'iring', 'oches', 'outer', 'odist', 'orbit', 'ofays', 'outed', 'owned', 'unlaw', 'upjet', 'upend', 'urged', 'urent', 'uncus', 'updry'] n=len(data) k=5 # split into 5 for i in range(k): start=n*i/k end=n*(i+1)/k fold=data[start:end] print "Split {} of {}, length {} : {}".format(i, k, len(fold), fold) ``````

Output:

``````Split 0 of 5, length 7 : ['argot', 'along', 'addax', 'azans', 'aboil', 'aband', 'ayelp']
Split 1 of 5, length 7 : ['erred', 'ester', 'ekkas', 'entry', 'eldin', 'eruvs', 'ephas']
Split 2 of 5, length 7 : ['imino', 'islet', 'inurn', 'iller', 'idiom', 'izars', 'iring']
Split 3 of 5, length 7 : ['oches', 'outer', 'odist', 'orbit', 'ofays', 'outed', 'owned']
Split 4 of 5, length 7 : ['unlaw', 'upjet', 'upend', 'urged', 'urent', 'uncus', 'updry']``````

Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22