Home Forums Main Forums Python Forum Hypthesis Testing with Python

• # Hypthesis Testing with Python

1 Member · 2 Posts
• ### Datura

Member
January 31, 2021 at 5:46 pm
`### 3) Test on means of Sales or other numeric variables by Student t-test and ANOVA.`
`import scipy.stats as stats### Two sample t-testsummary =impute.pivot_table('sales',                 columns=['credit'],                  aggfunc='mean',                  margins=False) `
`tStat, p_value = stats.ttest_ind(impute[impute['credit']==0 ]['sales'],                                 impute[impute['credit']==1 ]['sales'],                                 equal_var = False,                                 nan_policy='omit' )  #run independent sample T-Testprint("t-Stat:{0}, p-value:{1}".format(tStat, p_value))# We cannot reject H0 therefore conclude the sales are same for good and bad credit people.`
`### Multiple sample ANOVA test. We look into the deactivated people only. ###################summary =impute.pivot_table('sales',                    columns=['reason'],                  aggfunc='mean',                   margins=False) `
`### we need to drop null values on analysis variable, otherwise we will get nan results.impute.isnull().sum()sub=impute[ (sub['reason'] !=' ')  & (pd.notnull(impute['sales'] ) ) ]`
`FStat, p_value = stats.f_oneway( sub[ sub['reason']=='COMP']['sales'],                                 sub[ sub['reason']=='DEBT']['sales'],                                 sub[ sub['reason']=='MOVE']['sales'],                                 sub[ sub['reason']=='NEED']['sales'],                                 sub[ sub['reason']=='TECH']['sales']  )`
`print("F-Stat: {0}, DOF={1}, p-value: {2}".format(FStat, len(sub)-1, p_value))# Since p-value<0.05, we reject H0 and conclude that the sales means# significantly differ among people with different deactivation reasons.`
`### Note: From ANOVA analysis above, we know that treatment differences are statistically significant,### but ANOVA does not tell which treatments are significantly different from each other.### To know the pairs of significant different treatments, we will perform multiple### pairwise comparison (Post-hoc comparison) analysis using Tukey HSD test.`
`####################  Tukey’s multi-comparison method #########################`
`# This method tests at p <0.05 (correcting for the fact that multiple comparisons# are being made which would normally increase the probability of a significant# difference being identified). A results of ’reject = True’ means that a significant# difference has been observed.# load packagesfrom statsmodels.stats.multicomp import (pairwise_tukeyhsd, MultiComparison)`
`### class statsmodels.sandbox.stats.multicomp.MultiComparison(data, groups, group_order=None)sub.info()`

`MultiComp = MultiComparison(sub['sales'], ### analysis variable                                                                   sub['reason'].replace(' ', ' NA'),  ### group variable                            group_order=None)    ### group_order: the desired order for the group mean results to be reported in.results=MultiComp.tukeyhsd().summary()results############################ End of Program. #################################`
• ### Datura

Member
January 31, 2021 at 5:36 pm

Below code gives the general method for doing t-Test, ANOVA and Chi-Square test with Python. Please feel free to ask questions. Please note that: impute is the input data frame.

` ################### Part IV: Statistical Analyses. ########################### 1) Test the association between account status and different factors by Chi-Square Test.`
`import scipyfrom scipy.stats import chi2from scipy.stats import chi2_contingency # import Scipy's built-in function`
`impute.head()impute.info()`

`############## For reporting, the summary includes grand totals. ############### However, For Chi-Square test, the contingency table exlcudes grand totals.#########`
` summary =impute.pivot_table('acctno', <wbr>    ### analysis variable index=['credit'],          ### rows columns=['active'],        ### columns aggfunc='nunique',         margins=True)   summary`
` df =impute.pivot_table('acctno', <wbr>         ### analysis variable               index=['credit']<wbr>,          ### rows               columns=['active'],        ### columns               aggfunc='nunique',                    margins=False)   `

`######### use the chi2_contingency() function to do Chi-Squared test. ############# Degree of freedom is calculated by using the following formula: ### DOF =(r-1)(c-1), where r and C are the levels of treatment and outcome variables.`
`chi2, p_value, DOF, expected= chi2_contingency(df, correction=False) # "correction=False" means no Yates' correction is used! results=pd.DataFrame({'Chi-Square' : round(chi2, 2),                      'Degrees of Freedom' : DOF,                       'p_value' : p_value}, index=['Chi-Square Test Output'] )### p-value <0.05, we reject H0 and conclude that there exists an association between credit and account status.`
`########  Do the same thing on DealerType/RatePlan vs. Status etc. ############### 2) Test the association between account status and tenure segment.summary =impute.pivot_table('acctno',                    index=['tenure_bin'],                  columns=['active'],                aggfunc='nunique',                  margins=True)  `
`df =impute.pivot_table('acctno',               index=['tenure_bin'],                 columns=['active'],                  aggfunc='nunique',                 margins=False)  chi2, p_value, DOF, expected= chi2_contingency(df, correction=False) # "correction=False" means no Yates' correction is used! results=pd.DataFrame({'Chi-Square' : round(chi2, 2),                     'Degrees of Freedom' : DOF,                     'p_value' : p_value}, index=['Chi-Square Test Output'] ) ### p-value <0.05, we reject H0 and conclude that there exists an association between tenure and account status.`

Log in to reply.

Original Post
0 of 0 posts June 2018
Now