Introduction to R Programming

The course, "Introduction to R Programming," is structured as a four-day workshop series, geared towards individuals with no or beginner-level proficiency in R. This course navigates through the basics of R programming, including the usage of RStudio, basic data cleaning and manipulation, and the visualization of descriptive statistics. It then progresses to include also more advanced statistical analyses, including various regression analysis techniques, as well as web scraping, text-as-data methodologies and spatial analysis.

The overarching goal is to teach participants a robust understanding of R programming techniques, to provide them the skills and knowledge they need to use R effectively in their own projects. By the end of the course, participants will have a solid foundation in R programming and will be able to use R to clean, manipulate, visualize, map, and analyze data, as well as to web scrape information from the internet and analyze text.This not only
facilitates independent programming but also widens horizons of what's achievable with R.

The PhD course is organized in 4 workshops, with each workshop consisting of 4 hours each that are divided into lecture style presentations, and hands-on exercise sessions.

Learning objectives

After this course, participants will be able to:

  • Familiarize themselves with and effectively utilize RStudio for programming tasks
  • Clean and manipulate data efficiently, making use of advanced functions from the tidyverse and purrr packages
  • Use Rmarkdown and quarto to output reproducible research reports and analyses
  • Create clear and visually appealing plots
  • Conduct and interpret various regression analysis techniques to infer relationships among variables
  • Employ advanced data analysis techniques such as web scraping, text-as-data methodologies, and spatial analysis to derive insights from diverse data sources
  • Extract information from the web and perform text analysis to garner insights from textual data

Course evaluation

This course has a 7-day take-home exam (pass/fail). To pass, participants need to use R
programming skills learned during the course in an Rmarkdown document that combines
text, code, and outputs.

Literature

Note: Specific chapters of the above books will be selected.

Below, the content of different days is laid out in more detail.

Day 1: Basic Introduction & Analysis

Day 1 of the PhD course consists of a basic introduction to R and Rstudio, as well as data management and basic analyses currently covered in “Metode 1”. This includes loading, merging, and recoding data, descriptive/univariate statistics, bivariate and multivariate correlation analysis, index formation, and plots. All sessions include hands-on exercises where participants can practice applying new learnings in practical exercises. We will work with Rmarkdown and quarto documents to create dynamic documents that combine text, code, and output.

Day 2: More Advanced Analysis Part I

The second day of the course will focus on more advanced data analysis methods. This includes a first set of methods currently part of “Metode 2”: T-test, linear regression, confidence intervals/p-values, factor analysis, and plots.

Day 3: More Advanced Analysis Part II

The third day of the course will also focus on more advanced data analysis methods. This includes more methods currently part of “Metode 2”: Logistic regression, fixed effects, interaction/difference-in-differences, transformation of variables, and more advanced visualizations.

Day 4: Advanced R programming concepts & simulation

The fourth day of the course will focus on more advanced tools in R (not taught at the bachelor’s level), which are not necessarily available in Stata. This includes for instance web scraping, text-as-data methods, and mapping/spatial analysis.