The Python Book
 
fold split kfold
20160122

Check the indexes on k-fold split

Suppose you split a list of n words into splits of k=5, what are the indexes of the splits?

Pseudo-code:

for i in 0..5: 
    start = n*i/k
    end   = n*(i+1)/k

Double check

Double check the above index formulas with words which have the same beginletter in a split (for easy validation).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
    #!/usr/bin/python 

    data= ['argot', 'along', 'addax', 'azans', 'aboil', 'aband', 'ayelp',
           'erred', 'ester', 'ekkas', 'entry', 'eldin', 'eruvs', 'ephas',
           'imino', 'islet', 'inurn', 'iller', 'idiom', 'izars', 'iring',
           'oches', 'outer', 'odist', 'orbit', 'ofays', 'outed', 'owned',
           'unlaw', 'upjet', 'upend', 'urged', 'urent', 'uncus', 'updry']

    n=len(data) 
    k=5         # split into 5

    for i in range(k):
        start=n*i/k
        end=n*(i+1)/k
        fold=data[start:end]
        print "Split {} of {}, length {} : {}".format(i, k, len(fold), fold) 

Output:

Split 0 of 5, length 7 : ['argot', 'along', 'addax', 'azans', 'aboil', 'aband', 'ayelp']
Split 1 of 5, length 7 : ['erred', 'ester', 'ekkas', 'entry', 'eldin', 'eruvs', 'ephas']
Split 2 of 5, length 7 : ['imino', 'islet', 'inurn', 'iller', 'idiom', 'izars', 'iring']
Split 3 of 5, length 7 : ['oches', 'outer', 'odist', 'orbit', 'ofays', 'outed', 'owned']
Split 4 of 5, length 7 : ['unlaw', 'upjet', 'upend', 'urged', 'urent', 'uncus', 'updry']
 
Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22