Last batch of notebooks for Think Stats

6 September 2024

0

Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks. Each chapter has a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.

If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.

Or you can read (but not run) the notebooks on GitHub:

Chapter 13 Notebook (Chapter 13 Solutions)
Chapter 14 Notebook (Chapter 14 Solutions)

I am done now, just in time for the semester to start, tomorrow! Here are some of the examples from Chapter 13, on survival analysis:

Survival analysis

If we have an unbiased sample of complete lifetimes, we can compute the survival function from the CDF and the hazard function from the survival function.
Here’s the distribution of pregnancy length in the NSFG dataset.

In [2]:

import nsfg

preg = nsfg.ReadFemPreg()
complete = preg.query('outcome in [1, 3, 4]').prglngth
cdf = thinkstats2.Cdf(complete, label='cdf')

The survival function is just the complementary CDF.

In [3]:

import survival

def MakeSurvivalFromCdf(cdf, label=''):
    ""Makes a survival function based on a CDF.

    cdf: Cdf
    
    returns: SurvivalFunction
    ""
    ts = cdf.xs
    ss = 1 - cdf.ps
    return survival.SurvivalFunction(ts, ss, label)

In [4]:

sf = MakeSurvivalFromCdf(cdf, label='survival')

In [5]:

print(cdf[13])
print(sf[13])

0.13978014121
0.86021985879

Here’s the CDF and SF.

In [6]:

thinkplot.Plot(sf)
thinkplot.Cdf(cdf, alpha=0.2)
thinkplot.Config(loc='center left')

And here’s the hazard function.

In [7]:

hf = sf.MakeHazardFunction(label='hazard')
print(hf[39])

0.676706827309

In [8]:

thinkplot.Plot(hf)
thinkplot.Config(ylim=[0, 0.75], loc='upper left')

About author

Allen Downey

I am a Professor of Computer Science at Olin College in Needham MA, and the author of Think Python, Think Bayes, Think Stats and several other books related to computer science and data science.

Previously I taught at Wellesley College and Colby College, and in 2009 I was a Visiting Scientist at Google, Inc. I have a Ph.D. from U.C. Berkeley and B.S. and M.S. degrees from MIT. Here is my CV.

I write a blog about Bayesian statistics and related topics called Probably Overthinking It. Several of my books are published by O’Reilly Media and all are available under free licenses from Green Tea Press.

1

Last batch of notebooks for Think Stats

Survival analysis

Allen Downey

Report: OpenAI Accelerating Efforts to Release a Multimodal LLM called GPT-Vision

According to Analyst: Apple ‘far behind’ Microsoft, Google on generative AI

ImageBind-LLM for Advancement in Multimodality Instruction-Following Modeling

Antivirus vs. VPN: What’s the Difference? Full 2024 Guide by Ana Jovanovic

Is Messenger Kids Safe for Kids? Full 2024 Overview by Raven Wu

Malwarebytes Not Opening? How to Fix It in 2024 by Ana Jovanovic

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US