The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

ARIMA LLR_test and start_ar_lags

ARIMA LLR_test and start_ar_lags

1
Vote
1
Answer

In the code for the LLR_test function the fit uses 11 lags for both fits. When I calculate the LLR p-value from just the results that I’ve already obtained with the minimum value for start_ar_lags (so as to not have to spend the time to fit every time I want to test this) I get different values. Does the start_ar_lags have to be the same for both fits in order to have a result that is reliable? Or is my approach equally valid?

Course code:

def LLR_test(mod_1, mod_2, DF = 1):
    L1 = mod_1.fit(start_ar_lags = 11).llf
    L2 = mod_2.fit(start_ar_lags = 11).llf
    LR = (2*(L2-L1))    
    p = chi2.sf(LR, DF).round(3)
    return p

Resulting in:

0.018
0.117

My code:

def LLR_test_results(res_1, res_2, DF = 1):
L1 = res_1.llf
L2 = res_2.llf
LR = (2*(L2-L1)) 
p = chi2.sf(LR, DF).round(3)
return p

Resulting in:

print(LLR_test_results(results[1, 1, 3], results[6, 1, 3], DF=5))
print(LLR_test_results(results[5, 1, 1], results[6, 1, 3], DF=3))
0.003
0.018

 

1 Answer

365 Team
0
Votes

Hey Freek, 
 
Recently, we discovered that Python doesn’t like fitting the same model several times, so using the results (as you have done) is the better approach. 
 
Does the start_ar_lags have to be the same for both fits in order to have a result that is reliable? 
Well, the starting lags essentially remove a part of the data that we use to compute the initial residuals. Hence, using different values for starting_ar_lags will result in differing residual sequences, which results in inconsistent results. However, when you don’t have to specify these starting parameters, you’re using more of the data available, so the results should be more representative. In other words, had I discovered that fitting a model several times results in inconsistent outputs, I would have used the results variables instead. Thus, I believe your approach is equally (if not more) valid. 
 
Best, 
365 Vik
 

×
LAST CHANCE
Ready to Learn Data Science?
50% OFF