Error in implementing the Isolation Forest Method
Tried using the isolation forest method on the popular IBM employee attrition dataset and kept getting this error. Kindly help pls.
Code :>>
from sklearn.ensemble import IsolationForest
features = data.columns
X = data[features]
X_train = X[:1000]
X_test = X[1000:]
#Fit Model
clf = IsolationForest(n_estimators=50, max_samples=100)
clf.fit(X_train)
#Get Scores
data['scores'] = clf.decision_function(X_train)
data['anomaly'] = clf.predict(X)
#Get Anomalies
outliers = data.loc[data['anomaly'] == -1]
outliers
Error:>>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13032\4171220808.py in <module>
18
19 #Get Scores
---> 20 data['scores'] = clf.decision_function(X_train)
21 data['anomaly'] = clf.predict(X)
22
~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3653 else:
3654 # set column
-> 3655 self._set_item(key, value)
3656
3657 def _setitem_slice(self, key: slice, value):
~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3830 ensure homogeneity.
3831 """
-> 3832 value = self._sanitize_column(value)
3833
3834 if (
~\anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, value)
4536
4537 if is_list_like(value):
-> 4538 com.require_length_match(value, self.index)
4539 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4540
~\anaconda3\lib\site-packages\pandas\core\common.py in require_length_match(data, index)
555 """
556 if len(data) != len(index):
--> 557 raise ValueError(
558 "Length of values "
559 f"({len(data)}) "
ValueError: Length of values (1000) does not match length of index (1470)
hey! it looks like you're applying this line of code:
`data['scores'] = clf.decision_function(X_train)`
on your X_train, where your `data` variable is all the data. make sure the X_train and data are the same size.
Ohh! That makes more sense now. Will make corrections and give a feedback. Thanks.