Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Last answered:

29 Dec 2022

Posted on:

13 Dec 2022

0

Error in implementing the Isolation Forest Method

Tried using the isolation forest method on the popular IBM employee attrition dataset and kept getting this error. Kindly help pls.

Code :>>
from sklearn.ensemble import IsolationForest

features = data.columns

X = data[features]
X_train = X[:1000]
X_test = X[1000:]

#Fit Model
clf = IsolationForest(n_estimators=50, max_samples=100)
clf.fit(X_train)

#Get Scores
data['scores'] = clf.decision_function(X_train)
data['anomaly'] = clf.predict(X)

#Get Anomalies
outliers = data.loc[data['anomaly'] == -1]

outliers

Error:>>

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13032\4171220808.py in <module>
     18 
     19 #Get Scores
---> 20 data['scores'] = clf.decision_function(X_train)
     21 data['anomaly'] = clf.predict(X)
     22 

~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3653         else:
   3654             # set column
-> 3655             self._set_item(key, value)
   3656 
   3657     def _setitem_slice(self, key: slice, value):

~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3830         ensure homogeneity.
   3831         """
-> 3832         value = self._sanitize_column(value)
   3833 
   3834         if (

~\anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, value)
   4536 
   4537         if is_list_like(value):
-> 4538             com.require_length_match(value, self.index)
   4539         return sanitize_array(value, self.index, copy=True, allow_2d=True)
   4540 

~\anaconda3\lib\site-packages\pandas\core\common.py in require_length_match(data, index)
    555     """
    556     if len(data) != len(index):
--> 557         raise ValueError(
    558             "Length of values "
    559             f"({len(data)}) "

ValueError: Length of values (1000) does not match length of index (1470)
2 answers ( 0 marked as helpful)
Instructor
Posted on:

28 Dec 2022

0

hey! it looks like you're applying this line of code:

`data['scores'] = clf.decision_function(X_train)` 

on your X_train, where your `data` variable is all the data. make sure the X_train and data are the same size.

Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Posted on:

29 Dec 2022

0

Ohh!  That makes more sense now. Will make corrections and give a feedback. Thanks.

Submit an answer