Posted on:

14 Aug 2021

0

Numpy Data Cleaning - Multiple columns to key-value pairs

Hello,

In the Data Processing with Numpy course 'practical example' we learned how to change categorical data to numeric data,

We used the code to convert the Subgrades (A1-H1) to numeric values 1-36

In this example, we followed this strategy only in one column, however, while working on my own exercise I wanted to try to create a code to loop multiple columns at once and create the appropriate dictionaries and key-value pairs and convert the columns to values.

This is the code to change the column at index position 1 of my ndarray "brf_np" from categorical data to values, exactly as in the course and works perfectly.


keys = list(np.unique(brf_np[:,1]))
values = list(range(1, np.unique(brf_np[:,1]).shape[0]+1))
# If we don't add +1 at the end, we would have one too few values
dict_subheader = dict(zip(keys,values))
dict_subheader

for i in np.unique(brf_np[:,1]):
    brf_np[:,1] = np.where(brf_np[:,1] == i,
                        dict_subheader[i],
                          brf_np[:,1])

brf_np[:,1]



However I would like to do the exact same procedure for colimns in index positions 1,2 at once not only column 1. I tried to create a loop and the loop executes but when I try to print the dictionaries they are not defined, altough in the below code the line dictionaries[j] = dict(zip(keys,values)) should define them?


dictionaries = ['dict_subheader','dict_line']
columns = [1,2]

for j in range(0,2):
    keys = list(np.unique(brf_np[:,columns[j]]))
    values = list(range(1, np.unique(brf_np[:,columns[j]]).shape[0]+1))
    dictionaries[j] = dict(zip(keys,values))
 
    for i in np.unique(brf_np[:,columns[j]]):
        brf_np[:,0] = np.where(brf_np[:,columns[j]] == i,
                            dictionaries[j][i],
                              brf_np[:,columns[j]])



Thankful for any advice,
Kind regards,

0 answers ( 0 marked as helpful)

Submit an answer