First Steps in R and RStudio

Welcome to First Steps in R and RStudio!

Skip ahead to watching the workshop recordings

About This Workshop Series

This series is intended to be a gentle introduction to using R and RStudio for people who interact with data and want to work in the R statistical programming language. This course is geared towards beginners who are comfortable doing basic tasks with data that comes in rows and columns (for example, organizing data in Excel) but have no programming background.

The workshop will cover how to get started using the R statistical programming language in your work. We’ll talk about how to import data, transform data, and create data visualizations in R. To keep this workshop series short, our scope is limited, and we won’t go into details that are specific to the conduct of research, like accessing the REDCap API, using modeling and machine learning, and applying statistical tests. For that, we are planning a future Skills Series we’re going to call Next Steps in R for Research. For this workshop series, we assume you know what R and RStudio are and have some ideas about why they’re useful. If you don’t know what R or RStudio are, we suggest you view the slides and recordings from Demystifying R and RStudio, or attend the next time that two-session workshop series is offered.

Besides the skills you’ll acquire in writing R code and using the RStudio IDE, you’ll also develop vocabulary useful for describing what you want to accomplish, and that will help you search for resources online or describe your problem to an LLM like CHOP GPT for help creating R code.

Pre-requisites: What you should know:

Before attending this series, you should be able to perform most or all the following skills. If you’re not sure you can, check out our Demystifying R and RStudio Skills Series. The slideshows, which have ample speaker notes, and/or the recordings of the talks (if available at the time you’re viewing this document) will be sufficient to help you acquire these skills. And don’t worry, we won’t quiz you!

  • Be able to describe the difference between R and RStudio
  • Be able to give one advantage for using scripts written in R for data analysis
  • Know a little about how to get access to R and RStudio at CHOP
  • Describe what makes programming “literate” (like a notebook)
  • Explain the real-life consequences of irreproducible research
  • Name one way Quarto documents can be helpful

Pre-requisites: What you should do:

Before attending a workshop session, we suggest that you do the following. It will make your experience of the workshop series smoother. If you don’t get a chance to do this before attending a workshop, you will have time to do it during the session, but we won’t necessarily be able to stop our presentation to help you if you get stuck.

  • Create a free Posit.cloud account. We will use this as our training environment and you will have continued access to your code and materials after the workshop, through your account at Posit.cloud. Don’t use this for any patient or other CHOP data, though!
  • If you haven’t already, please consider joining CHOP’s R User Group. It’s not necessary for the workshops but you might find it useful or even fun.

We suggest requesting these programs be installed on your CHOP device(s):

  • R – the language we use to clean, analyze, and visualize data
  • RStudio Desktop – an IDE for writing R
  • Git – version control software that will allow you to easily get the latest version of our course materials and will also be helpful for tracking changes in your own projects
  • GitHub Desktop – a helper, or “client” software that makes working with Git easier

Even though all of these software are free, you’ll need a Cost Center (or grant fund) to add to your request. Get that from your manager, administrative staff, or other leadership within your area. There will be no charge, but DTS uses this information for tracking resource utilization.

You’ll also need the MAC address of the device you need the software installed to.

Having R, RStudio, Git, and GitHub installed locally on your CHOP-issued device is not the only way to work with R and RStudio, but it can be the most convenient, and will be compliant with the constraints of working with real CHOP data. You won’t want to rely on RStudio on your personal computing device or on the cloud when it comes to working with real CHOP data!

On the day of your workshop

We suggest the following for virtual webinars:

  • If available to you, use two monitors (or another two-screen setup such as a laptop and a tablet or two laptops). This Skills Series is hands-on, so you will want to have extra space for working on code while also looking at slides or the chat window.

Workshop Sessions

Material in later sessions does build on work done in earlier sessions, so do watch them in order.

Recordings:

Sessions 3-5 not available yet, keep an eye on this space!

Session 1: Review and Setup

Watch the trimmed video: http://youtu.be/5rsAU9e3rHg or click below (and look below the video for useful links that will allow you to do the hands-on participation!)

Useful links to open while you watch this video:

Session 1 Content:

  • Quick review of R and RStudio
  • R Markdown and Quarto: methods for “literate statistical programming”
  • Posit.cloud: our environment for this course
  • Git and GitHub: Out of scope but very useful!
  • Getting R and RStudio at CHOP

Session 1 Goals:

  • Use Source and Visual views in RStudio to learn about markdown
  • Create a new code chunk in a Quarto document
  • Run a code chunk in a Quarto document

Session 2: Projects and File Ingestion

Watch the trimmed video: https://youtu.be/tHPbt6XNjck or click below (and look below the video for useful links that will allow you to do the hands-on participation!)

Useful links to open while you watch this video:

Session 2 Content:

  • File systems can be challenging to navigate
  • Projects in RStudio
  • Installing and loading packages
  • Tabular data ingestion from .csv files

Session 2 Goals:

  • Be able to explain when to use install.packages() and when to use library()
  • Ingest data from a .csv into a data frame and examine it
  • Render a quarto document to an output format (html)

Slides: https://arcus.github.io/first_steps_in_r_rstudio_skills_series/session_2.html


Session 3: Exploring Data Visually, Using ggplot2

  • ggplot2 syntax
  • Mapping Aesthetics
  • Setting Visuals
  • Color Palettes

Goals:

  • Describe what an “aesthetic mapping” is in ggplot2
  • Create a simple ggplot2 data visualization
  • Add a label (like a title or an x-axis label) to a plot in ggplot2

Slides: https://arcus.github.io/first_steps_in_r_rstudio_skills_series/session_3.html


Session 4: Selecting Data Using dplyr

  • Selecting columns
  • Filtering rows
  • Creating new columns

Goals:

  • Describe what a “factor” variable is in R and why it’s important to use it
  • Use “select” and “filter” to subset data
  • Use “group by” and “summarize” to get group-level statistics

Slides: https://arcus.github.io/first_steps_in_r_rstudio_skills_series/session_4.html


Session 5: Putting it All Together: Communicating

  • Create a new document
  • Explore a question
  • Create a visualization
  • Next steps for using R/RStudio

Goals:

  • Use text and markdown to describe analysis tasks in an organized, attractive document
  • Find useful examples from previous code and apply them to current work
  • Work with error messages and help files successfully

License

All of the material in the First Steps in R and RStudio GitHub repository is copyrighted under the Creative Commons BY-SA 4.0 copyright to make the material easy to reuse. We encourage you to reuse it and adapt it for your own teaching as you like!