Checklist
Videos / Chapters
Data Wrangling (20 min) [slides]
Data Exploration (25 min) [slides]
Data Visualisation (27 min) [slides]
Reading
Use the Data Exploration and Visualisation section of the reading list to support and guide your exploration of this week’s topics. Note that these texts are divided into core reading, reference materials and materials of interest.
Activities
Core:
NormConf is a conference dedicated to the unglamorous but essential aspects of working in the data sciences. The conference talks from December 2022 are available as a Youtube Playlist. Find a talk that interests you and watch it, then post a short summary to EdStem, describing what you learned from the talk and one thing that you still do not understand.
Work through this ggplot2 tutorial for beautiful plotting in R by Cédric Scherer, recreating the examples for yourself.
-
Using your
rolling_mean()
function as inspiration, write arolling_sd()
function that calculates the rolling standard deviation of a numeric vector.Extend your
rolling_sd()
function to optionally return approximate point-wise confidence bands for your rolling standard deviations. These should be \(\pm2\) standard errors by default and may be computed using analytical or re-sampling methods.Create a visualisation using your extended
rolling_sd()
function to assess whether the variability in the daily change in Dow Jones Index is changing over time. [data]
Bonus:
Add your
rolling_sd()
function to your R package, adding documentation and tests.During an exploratory analysis, we often need to assess the validity of an assumed distribution based on a sample of data. Write your own versions of
qqnorm()
andqqplot()
, which add point-wise tolerance intervals to assess whether deviation from the line \(y=x\) are larger than expected.Add your own versions of
qqnorm()
andqqplot()
to your R package, along with documentation and tests.
Live Session
In the live session we will begin with a discussion of this week’s tasks. We will then break into small groups for two data visualisation exercises.
(Note: For one of these exercises, it would be helpful to bring a small selection of coloured pens or pencils, of you have access to some. If not, please don’t worry - inventive use of black, blue and shading are perfectly acceptable alternatives!)
Please come to the live session prepared to discuss the following points:
Which NormConf video did you watch and what did you learn from it?
Other than ggplot2, what else have you used to create data visualisations? What are their relative strengths and weaknesses?
How did you implement your
rolling_sd()
function and what conclusions did you draw when applying it to the Dow Jones data?