Checklist
Effective Data Science is still a work-in-progress. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it.
If you would like to contribute to the development of EDS, you may do so at https://github.com/zakvarty/data_science_notes.
Videos / Chapters
Reproducibility (26 min) [slides]
Explainability (16 min) [slides]
Scalability (30 min) [slides]
Reading
Use the Preparing for Production section of the reading list to support and guide your exploration of this week’s topics. Note that these texts are divided into core reading, reference materials and materials of interest.
Activities
This week has fewer activities, since you will be working on the first assessment.
Core
Read the LIME paper, which we will discuss during the live session.
Work through the understanding LIME R tutorial
Use code profiling tools to assess the performance of your
rolling_mean()
androlling_sd()
functions. Identify any efficiencies that can be made.
Bonus:
- Write two functions to simulate a homogeneous Poisson process with intensity \(\lambda >0\) on the interval \((t_1, t_2) \subset \mathbb{R}\). The first should use the exponential distribution of inter-event times to simulate events in sequence. The second should use the Poisson distribution of the total event count to first simulate the number of events and then randomly allocate locations over the interval. Evaluate and compare the reproducibility and scalability of each implementation.
Live Session
In the live session we will begin with a discussion of this week’s tasks. We will then break into small groups for a reading group style discussion of the LIME paper that was set as reading for this week.