Effective Data Science is still a work-in-progress. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it.

If you would like to contribute to the development of EDS, you may do so at

Videos / Chapters


Use the Preparing for Production section of the reading list to support and guide your exploration of this week’s topics. Note that these texts are divided into core reading, reference materials and materials of interest.


This week has fewer activities, since you will be working on the first assessment.


  • Read the LIME paper, which we will discuss during the live session.

  • Work through the understanding LIME R tutorial

  • Use code profiling tools to assess the performance of your rolling_mean() and rolling_sd() functions. Identify any efficiencies that can be made.


  • Write two functions to simulate a homogeneous Poisson process with intensity \(\lambda >0\) on the interval \((t_1, t_2) \subset \mathbb{R}\). The first should use the exponential distribution of inter-event times to simulate events in sequence. The second should use the Poisson distribution of the total event count to first simulate the number of events and then randomly allocate locations over the interval. Evaluate and compare the reproducibility and scalability of each implementation.

Live Session

In the live session we will begin with a discussion of this week’s tasks. We will then break into small groups for a reading group style discussion of the LIME paper that was set as reading for this week.