use .loc vs not using .loc?
Let's take a look at the following codes:
# First Code
temp = df_purchase_predictors[df_purchase_predictors['Brand'] == 1]
temp.loc[:, 'Revenue Brand 1'] = temp['Price_1'] * temp['Quantity']
temp
versus:
# Second Code
temp1 = df_purchase_predictors[df_purchase_predictors['Brand'] == 1]
temp1['Revenue Brand 1'] = temp1['Price_1'] * temp1['Quantity']
temp1
The only difference is using loc or not.
and it returned exactly the same table.
is there any difference?
The key difference between the two lines of code lies in how the target DataFrame is referenced:
-
temp.loc[:, 'Revenue Brand 1']
:- This explicitly selects all rows (
:
) and only the'Revenue Brand 1'
column. It is useful when you want to modify a specific column while keeping the structure of the DataFrame intact, avoiding the creation of new columns in some cases.
- This explicitly selects all rows (
-
temp1['Revenue Brand 1']
:- This directly references (or creates) the
'Revenue Brand 1'
column intemp1
. If it doesn't exist, it will be created.
- This directly references (or creates) the
Both approaches work for creating a new column but loc
can be more flexible in complex scenarios (like modifying slices).
"
Given this is a new column, 2. seems better than 1. to me.