Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assumption 5 - Independence of observations #3

Open
OskarBienko opened this issue Feb 12, 2023 · 0 comments
Open

Assumption 5 - Independence of observations #3

OskarBienko opened this issue Feb 12, 2023 · 0 comments

Comments

@OskarBienko
Copy link

Hi @kennethleungty ,

In my opinion plotting residuals vs index (or time) is usually misleading. I prefer to check the independence assumption via Ljung-Box test of autocorrelation. I've writtten a function which do this for me.

from statsmodels.stats.diagnostic import acorr_ljungbox
import statsmodels as statsmodels


def check_independence(model:statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, order:int):
    
    '''
    1. Perform the Ljung-Box test to check if residuals are autocorrelated
    2. Print both the null hypothesis and the p-values
    '''
    
    # If the lags parameter is an integer then this is taken to be the largest lag that is included, the test result is reported for all smaller lag length
    ljungbox_pvalues = acorr_ljungbox(x=model.resid_deviance.values, lags=order)['lb_pvalue'].round(2)
    boolean_mask = ljungbox_pvalues > 0.05
    
    if not ljungbox_pvalues.empty:
        print(f'The null hypothesis of Ljung-Box test is that there is autocorrelation in residuals of any order up to {order}.')
        print('p-value = P(reject H0|H0 true)')
        print(f'p-values of Ljung-Box test are: {[pvalue for pvalue in ljungbox_pvalues.values]}')
        print(f'p-values > 0.05, thus the residuals are uncorrelated at lags {[lag for lag in ljungbox_pvalues[boolean_mask].index]}')
        print(f'p-values < 0.05, thus the residuals are autocorrelated at lags {[lag for lag in ljungbox_pvalues[~boolean_mask].index]}')

In your example there is some autocorrelation - check out the output below.

check_independence(model=logit_results, order=10)
The null hypothesis of Ljung-Box test is that there is autocorrelation in residuals of any order up to 10.
p-value = P(reject H0|H0 true)
p-values of Ljung-Box test are: [0.36, 0.6, 0.62, 0.71, 0.02, 0.01, 0.02, 0.04, 0.04, 0.07]
p-values > 0.05, thus the residuals are uncorrelated at lags [1, 2, 3, 4, 10]
p-values < 0.05, thus the residuals are autocorrelated at lags [5, 6, 7, 8, 9]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant