Third batch of notebooks for Think Stats

16 June 2025

4

As I mentioned in the previous post and the one before that, I am getting ready to teach Data Science in the spring, so I am going back through Think Stats and updating the Jupyter notebooks. I am done with Chapters 1 through 9 now.

If you are reading the book, you can get the notebooks by cloning this repository on GitHub and running the notebooks on your computer. Or you can read (but not run) the notebooks on GitHub:

Chapter 7 Notebook (Chapter 7 Solutions)
Chapter 8 Notebook (Chapter 8 Solutions)
Chapter 9 Notebook (Chapter 9 Solutions)

I’ll post the next batch soon; in the meantime, here are some of the examples from Chapter 7, demonstrating the surprising difficulty of making an effective scatter plot, especially with large datasets (in this example, I use data from the Behavioral Risk Factor Surveillance System, which includes data from more than 300,000 respondents).

Scatter plots

I’ll start with the data from the BRFSS again.

In [2]:

df = brfss.ReadBrfss(nrows=None)

The following function selects a random subset of a DataFrame.

In [3]:

def SampleRows(df, nrows, replace=False):
    indices = np.random.choice(df.index, nrows, replace=replace)
    sample = df.loc[indices]
    return sample

I’ll extract the height in cm and the weight in kg of the respondents in the sample.

In [4]:

sample = SampleRows(df, 5000)
heights, weights = sample.htm3, sample.wtkg2

Here’s a simple scatter plot with alpha=1, so each data point is fully saturated.

In [5]:

thinkplot.Scatter(heights, weights, alpha=1)
thinkplot.Config(xlabel='Height (cm)',
                 ylabel='Weight (kg)',
                 axis=[140, 210, 20, 200],
                 legend=False)

About author

Allen Downey

I am a Professor of Computer Science at Olin College in Needham MA, and the author of Think Python, Think Bayes, Think Stats and several other books related to computer science and data science.

Previously I taught at Wellesley College and Colby College, and in 2009 I was a Visiting Scientist at Google, Inc. I have a Ph.D. from U.C. Berkeley and B.S. and M.S. degrees from MIT. Here is my CV.

I write a blog about Bayesian statistics and related topics called Probably Overthinking It. Several of my books are published by O’Reilly Media and all are available under free licenses from Green Tea Press.

1

Third batch of notebooks for Think Stats

Scatter plots

Allen Downey

Report: OpenAI Accelerating Efforts to Release a Multimodal LLM called GPT-Vision

According to Analyst: Apple ‘far behind’ Microsoft, Google on generative AI

ImageBind-LLM for Advancement in Multimodality Instruction-Following Modeling

Asynchronous Advantage Actor Critic (A3C) algorithm

How to open a website in a Tkinter window?

Create a SQL table from Pandas dataframe using SQLAlchemy

LEAVE A REPLY Cancel reply

Most Popular

What to watch this weekend: ‘Mission: Impossible – The Final Reckoning,’ ‘Love on the Spectrum,’ and more

This Android feature could put your notifications on autopilot

Gemini is finally getting a wide rollout to Android Auto

Android’s next major update will change how you multitask

EDITOR PICKS

What to watch this weekend: ‘Mission: Impossible – The Final Reckoning,’ ‘Love on the Spectrum,’ and more

This Android feature could put your notifications on autopilot

Gemini is finally getting a wide rollout to Android Auto

POPULAR POSTS

What to watch this weekend: ‘Mission: Impossible – The Final Reckoning,’ ‘Love on the Spectrum,’ and more

This Android feature could put your notifications on autopilot

Gemini is finally getting a wide rollout to Android Auto

POPULAR CATEGORY

ABOUT US

FOLLOW US