Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks. Each chapter has a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.
If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.
Or you can read (but not run) the notebooks on GitHub:
Chapter 13 Notebook (Chapter 13 Solutions)
Chapter 14 Notebook (Chapter 14 Solutions)
I am done now, just in time for the semester to start, tomorrow! Here are some of the examples from Chapter 13, on survival analysis:
Survival analysis
If we have an unbiased sample of complete lifetimes, we can compute the survival function from the CDF and the hazard function from the survival function.
Here’s the distribution of pregnancy length in the NSFG dataset.
import nsfg
preg = nsfg.ReadFemPreg()
complete = preg.query('outcome in [1, 3, 4]').prglngth
cdf = thinkstats2.Cdf(complete, label='cdf')
import survival
def MakeSurvivalFromCdf(cdf, label=''):
""Makes a survival function based on a CDF.
cdf: Cdf
returns: SurvivalFunction
""
ts = cdf.xs
ss = 1 - cdf.ps
return survival.SurvivalFunction(ts, ss, label)
sf = MakeSurvivalFromCdf(cdf, label='survival')
print(cdf[13])
print(sf[13])
thinkplot.Plot(sf)
thinkplot.Cdf(cdf, alpha=0.2)
thinkplot.Config(loc='center left')
hf = sf.MakeHazardFunction(label='hazard')
print(hf[39])
thinkplot.Plot(hf)
thinkplot.Config(ylim=[0, 0.75], loc='upper left')
1