Forum Replies Created

Viewing 16 - 30 of 32 posts
  • Datura

    Member
    November 15, 2020 at 5:33 pm

    (3)Analysis of Variance (ANOVA) Test

    t-test 还有很多类型,one-way t-test, two-way t-test, paired t-test 等等,其区别以后再说。t-test 只能适用于比较2组数据 ( continuous) ,如果是多变量多组数据,就有些无能为力了。这时, 功能强大的 ANOVA 同学就闪亮登场。

    比如上面的化学反应,在 6 个不同反应温度下的进行试验,最后得到了6 组不同温度 下的反应产率。该如何判别这些数据是否有区别呢?Just follow the same way.

    H0: u1 = u2 = u3 =…..= u6

    H1: u1, u2….u6 not all equal.

    把实验数据带入 Excel, SAS, 进行 ANOVA 计算,最后如果:

    1. p > alpha, then we can not reject H0. 就是说,这许多组数据,可能并没有显 著差别,说不定是误差所致。

    2. p < alpha, then we can reject H0. 就是说,这6组数据,有显著差别,其中,至 少有一组与其他组是不同的。

    ANOVA 可以用来评估几乎所有的数据比较,回归分析,功能强大。<wbr>同时注意:ANOVA has both one-way and two-way ANOVA.

    In digital marketing, Student’s t-Test or Chi-Squared Test 被称为 A/B testing. 实际上,它们就是这里介绍的假说检验 hypothesis testing. 不要披上马甲就不认识了噢!

  • Datura

    Member
    November 12, 2020 at 6:43 pm
    Up
    0
    Down

    I think that, when you use query() function, probably you can only use columns in the Loan data frame. x is an external object, so Python cannot find it.

    Try to use other functions to subset data.

  • Datura

    Member
    November 11, 2020 at 11:08 pm
    Up
    1
    Down

    The X axis is same but Y axis is different.

    In cumulative gains chart, we want to present the gains of target capturing of using predictive model against random selection. However, for the cumulative distributions of Good and Bad classes, we intend to see the separation power of models.

    Did I answer your questions?

  • Datura

    Member
    November 11, 2020 at 12:04 pm

    Generally, it is a work process. We need to understand it in context.

  • Datura

    Member
    November 9, 2020 at 7:20 pm
    Up
    0
    Down

    Interesting. Too good to be true, very questionable!

    Let me look into the data when get time

  • Datura

    Member
    November 9, 2020 at 7:52 am
    Up
    0
    Down

    That’s ok. In this case we may only get 8 or 9 bins rather than 10 bins. A bin may have about big chunk, say 20% of records rather than 10 %. It happens in real work. It is fine

  • Datura

    Member
    November 8, 2020 at 9:54 pm
    Up
    2
    Down

    Credit score of zero or negative is theoretically possible because we can choose PDO or offset to scale a predicted probability to any wanted score range

    But actually, it’s always more convenient for us to set score range between 100 —1000. In this case, zero or a negative score are not real scores they are just special value codes, meaning no score or sth else. We need exclude these rows from our analysis

    To create the decile we divide it into approximately equal size bins based on the score. In Python we can use qcut() function to do it. In SAS use Proc Rank

  • Datura

    Member
    November 9, 2020 at 12:26 pm
    Up
    0
    Down

    ???

  • Datura

    Member
    October 26, 2020 at 10:10 pm
    Up
    1
    Down

    Wow, it works, awesome! you are a genius. Thank you! We need to use the locals() /globals() macro symbol tables, which are similar to those in SAS.

    df1=pd.DataFrame({})
    df2=pd.DataFrame({})
    df3=pd.DataFrame({})
    df4=pd.DataFrame({})
    df5=pd.DataFrame({})
    df6=pd.DataFrame({})

    dflist=[df1,df2,df3, df4, df5, df6]

    for i in range(len(dflist)):
    print("loop round: ", i)
    locals()['df'+str(i+1)] = raw[i]

    df6

    • This reply was modified 3 years, 6 months ago by  Datura.
  • Datura

    Member
    October 26, 2020 at 5:35 pm
    Up
    0
    Down

    ??? Is this the method to extract table data without using the read_html() function?

  • Datura

    Member
    October 26, 2020 at 5:32 pm
    Up
    0
    Down

    No automatic method? OK, it’s too bad, really inconvenient to use raw[1], raw[2]……

    • This reply was modified 3 years, 6 months ago by  Datura.
  • Datura

    Member
    October 26, 2020 at 12:33 pm
    Up
    0
    Down

    I tried this method, it run through without errors, but all the data frames df1-df3 are still all empty. The loop does not overwrite the pre-defined empty ones.

    df1=pd.DataFrame()
    df2=pd.DataFrame()
    df3=pd.DataFrame()

    dflist=[df1,df2,df3]
    for i in range(len(dflist)):
    print("loop round: ", i)
    temp=raw[i]
    print(temp)
    dflist[i] = temp

    df1

    So, what’s wrong?

    • This reply was modified 3 years, 6 months ago by  Datura.
  • Datura

    Member
    October 26, 2020 at 12:08 pm
    Up
    0
    Down

    No. All these tables are totally different with different columns and structures, I need separate them and then use each one differently.

  • Datura

    Member
    October 26, 2020 at 11:13 am
    Up
    0
    Down

    I checked the urlwatch Python package, please see below, the fundamental idea is same as mine. Please see below…… we can just use this package since it is available and free.

    ———————————————-

    Introduction

    urlwatch monitors the output of webpages or arbitrary shell commands.

    Every time you run urlwatch, it:

    • retrieves the output and processes it
    • compares it with the version retrieved the previous time (“diffing”)
    • if it finds any differences, generates a summary “report” that can be displayed or sent via one or more methods, such as email

    • This reply was modified 3 years, 6 months ago by  Datura.
    • This reply was modified 3 years, 6 months ago by  Datura.
  • Datura

    Member
    October 26, 2020 at 11:10 am
    Up
    0
    Down

    Got it. She wants to monitor government websites, they definitely don’t belong to her~~ 🙂

Viewing 16 - 30 of 32 posts
error: Content is protected !!