Workflows Checklist

Videos / Chapters

Reading

Use the workflows section of the reading list to support and guide your exploration of this week’s materials. Note that these texts are divided into core reading, reference materials and materials of interest.

Tasks

Short:

  • When downloading papers from a journal webpage or ArXiV, they often have unhelpful file names like 2310.12711.pdf. Come up with your own naming convention that will keep your downloaded papers organised and easy to search. Write a short paragraph explaining and justifying your naming convention for someone else.

  • When downloading data from websites you can encounter similary uninformative names. Use the US Geological Survey website to download a csv file of all earthqakes in the conterminous US exceeding magnitude 3.0 during 2023. Note the default name of this file and suggest your own, improved name for this file.

  • Consider simulating a fire evacuation of the Huxley building taking an object oriented approach. List the classes of object might you define and what methods and fields would give to each of these. (There is no expectation to code this simulation)

Core:

  • Find 3 data science projects on Github and explore how they organise their work. Write a post on the EdStem forum that links to all three, and in a couple of paragraphs describe the content and structure of one project.

  • Create your own project directory (or directories) for this course and its assignments.

  • Write two of your own R functions. The first should calculate the geometric mean of a numeric vector. The second should calculate the rolling arithmetic mean of a numeric vector.

Bonus:

  • Re-factor an old project to match the project organisation and coding guides for this course. This might be a small research project, class notes or a collection of homework assignments. Use an R-based project if possible. If you only have python projects, then either translate these to R or apply the PEP8 style guide. Take care to select a suitably sized project so that this is a meaningful exercise but does not take more than a few hours.

  • If you are able to do so, host your re-factored project publicly and share it with the rest of the class on the EdStem Discussion forum.

Live Session

In the live session we will begin with a discussion of this week’s tasks. We will then create a minimal R package to organise and test the functions you have written.

  • Please come to the live session prepared to discuss the following points:

    • Did you make the assignment projects as subdirectories or as their stand alone projects? Why?

    • What were some terms that you had not met before during the readings? How did you find their meanings?

    • What did you have to consider when writing your rolling mean function?

  • If you would like to follow along in the live session then you should prepare by:

    • Installling/Updating R, RStudio and Quarto.

    • Follow the Happy git with R instructions to link RStudio and Git and Github (§1-14).

    • Write a username/README.md introducing yourself on your GitHub profile. (See e.g.: StatsRhian, nrennie)