Introduction to R: Management, Exploration, and Communication of data, 2-6 July 2018

Posted on Tue, Jan 09 2018 12:19:00



Introduction to R: Management, Exploration, and Communication of data, 2-6 July 2018 (Stellenbosch)

This intensive five-day course, registered as a University of Stellenbosch Short Course, will be presented at Stellenbosch under the auspices of the South African DST-NRF Centre for Epidemiological Modelling and Analysis (SACEMA). The course will take place at the Stellenbosch Institute for Advanced Study (StIAS), from 9 am to 5 pm daily. The course will be presented by Dr Roxanne Beauclair, of SACEMA and Data Yarn.

  • The deadline for early registration is 15 May 2018.
  • For participants within South Africa, the course fee is R6000 for early bird registration (payment made by 30 May) and R7000 for later registration (by 15 June).
  • For international participants, the fee is 500 € for early bird registration, and 600 € for late registration. (Note: Full payment must be processed prior to start of the course.)
  • The fee includes refreshments, lunches, some social events, and a non-refundable registration fee of R1200, for South African participants, or 110 € for international participants.

The costs of accommodation, breakfast, and dinner are not included. Short-term accommodation is in high demand, so best to book early. Useful websites include AirbnbTripAdvisor, Sleeping out and

For enquiries contact Assistant Director for Training, Gavin Hitchcock: copied to Roxanne Beauclair:

For course flyer, click here.


Course Overview

R is an open-source, statistical software platform that is growing in popularity due to its rapidly expanding amount of libraries containing cutting-edge statistical functions, as well as the user-friendly, built-in communication tools in RStudio. Course participants will not only be introduced to the basics of programming in R, but they will also learn how import and clean data in addition to visualising and reporting results. Specific topics that will be covered include: importing data; reshaping/tidying data; merging/joining datasets; handling dates and transforming numeric and categorical variables; summarizing data with plots and tables; and producing reproducible and shareable reports.

This course does NOT cover any form of hypothesis confirmation (e.g. using statistical tests, predictive or causal modelling), or methods used for “big data” -- all data will be small, in-memory datasets.

Lectures will be interwoven with practical exercises. Participants will be encouraged to follow along with exercises and programming on their own laptops.

Participants will learn the basics of:

●      R data types

●      R programming and style

And will acquire the skills to:

●      Import raw data into RStudio

●      Tidy and transform data into a format suitable for analysis

●      Explore data in R using ggplot and dplyr (libraries for plotting and summarising data, respectively)

●      Create reproducible, dynamic reports using RMarkdown


Target Audience

The course is suitable for graduate students or professionals who are familiar with statistical analysis and have managed or interacted with datasets using other software platforms. It is assumed that course participants are familiar with programming in another language.


Roxanne Beauclair is a specialist in applying biostatistical methods to epidemiological data. She holds a PhD in this field from Ghent University, and has launched her own statisticsl consultancy company, Data Yarn, based in Pretoria. She received training in Epidemiology (MPH) from the University of Cape Town in South Africa. She has been involved in an analytical capacity for several different epidemiological studies of sexual behaviour and HIV in Southern Africa. For her PhD research at Ghent University she studied how age-mixing patterns influence HIV transmission in the South African and Malawian contexts. Over the past few years she has become an R enthusiast and enjoys learning new ways to improve upon statistical programmes by creating clean, reproducible, and legible code.