How do I perform cross validation on time based split data?
What is the effective of splitting a data based on time series analysis?
Since there exist time-dependent dependencies among time series data, we can’t shuffle it. Hence, when we want to split it into a training set, testing set and validation set, we simply need to do so in consecutive chunks of time. Thus, if we’re using a 70-20-10 split, then we’ll use the chronologically first 70% of the data for the training set, then the next 20 for the testing set and the last 10 for validation.
Then, we can see how the best-fitting model for the training set fits the testing and validation sets.