Posted on:

14 Aug 2021


Numpy Data Cleaning - Multiple columns to key-value pairs


In the Data Processing with Numpy course 'practical example' we learned how to change categorical data to numeric data,

We used the code to convert the Subgrades (A1-H1) to numeric values 1-36

In this example, we followed this strategy only in one column, however, while working on my own exercise I wanted to try to create a code to loop multiple columns at once and create the appropriate dictionaries and key-value pairs and convert the columns to values.

This is the code to change the column at index position 1 of my ndarray "brf_np" from categorical data to values, exactly as in the course and works perfectly.

keys = list(np.unique(brf_np[:,1]))
values = list(range(1, np.unique(brf_np[:,1]).shape[0]+1))
# If we don't add +1 at the end, we would have one too few values
dict_subheader = dict(zip(keys,values))

for i in np.unique(brf_np[:,1]):
    brf_np[:,1] = np.where(brf_np[:,1] == i,


However I would like to do the exact same procedure for colimns in index positions 1,2 at once not only column 1. I tried to create a loop and the loop executes but when I try to print the dictionaries they are not defined, altough in the below code the line dictionaries[j] = dict(zip(keys,values)) should define them?

dictionaries = ['dict_subheader','dict_line']
columns = [1,2]

for j in range(0,2):
    keys = list(np.unique(brf_np[:,columns[j]]))
    values = list(range(1, np.unique(brf_np[:,columns[j]]).shape[0]+1))
    dictionaries[j] = dict(zip(keys,values))
    for i in np.unique(brf_np[:,columns[j]]):
        brf_np[:,0] = np.where(brf_np[:,columns[j]] == i,

Thankful for any advice,
Kind regards,

0 answers ( 0 marked as helpful)

Submit an answer