Welcome

These slides available at: https://arcus.github.io/first_steps_in_r_rstudio_skills_series/session_2.html

  • Use keyboard arrow keys to
    • advance ( → ) and
    • go back ( ← )
  • Type “s” to see speaker notes
  • Type “?” to see other keyboard shortcuts

About Arcus / Your Presenter

Arcus is an initiative by the Research Institute aimed at promoting data discovery and reuse and increasing research reproducibility.

Among the many teams in Arcus, I represent Arcus Education!

Arcus website which displays tools and services including cohort discovery, arcus archives, data catalog, education and training, scientific projects, and clinical data query.

A circular diagram showing five project phases, including Discover: explore available data, Plan: plan your research project, Collect: receive your data, Analyze: analyze data in an Arcus lab, and Share: contribute data to archives.

Arcus Education

Education website page that includes three sections titled Getting Started with Arcus, Learn Data Science Skills, and Curate Your Own Learning Experience.

Arcus education provides data science training to researchers …

(and often this is useful to non-researchers too!).

https://arcus.chop.edu/i-want-to/arcus-education

Email us!

First Steps in R and RStudio

Arcus Education provides “Skills Series” for the entire CHOP community.

This Skills Series is a 5-session series aimed at helping you take your first steps in R and RStudio!

  • Session 1: Review and Setup
  • Session 2: Projects and File Ingestion
  • Session 3: Exploring Data Visually, Using ggplot2
  • Session 4: Selecting Data Using dplyr
  • Session 5: Putting it All Together: Communicating

Session 2 Itinerary

Projects and File Ingestion

  • File systems can be challenging to navigate
  • Projects in RStudio
  • Installing and loading packages
  • Tabular data ingestion from .csv files
  • Functions in R

Goals:

  • Be able to explain when to use install.packages() and when to use library()
  • Ingest data from a .csv and look at it
  • Render a quarto document to an output format (html)

Posit.Cloud (for learning)

https://posit.cloud is a great place for learning or practice with public (NOT CHOP!) datasets.

Please open your First Steps in R and RStudio Exercises project in Posit.cloud now.

If you did not already set up a Posit.cloud account with the exercise files in the first session of this Series, please do the following, now:

02:00

Updating From Git

  • I might update the exercise files!
  • This means you should save anything you will make changes to with a new name (like session_2_exercise_janedoe.qmd)
  • Then if I update session_2_exercise.qmd you can get the new version and not mess up your “janedoe” version
  • Go to Git and choose “Pull Branches” to get any updates
git menu with pull branches indicated

Where are your files?

  • Knowing where your files are can be tricky
  • RStudio / Posit.cloud “Projects” can help
  • Projects are directories that hold analysis scripts, data, and project info close together

In Posit.cloud, “New Project” is a big blue button: Posit.cloud workspace with the New Project button indicated.

In RStudio Desktop on your computer, you have to go to the File menu and choose “New Project”: RStudio Desktop menu on a Mac, showing File menu with New Project option indicated.

Advantages to using Projects

  • Keeping track of your files gets easier
  • Projects allow you to keep your various efforts separated (wait, which “my_data” is this?)
  • Multiple sessions of RStudio open on your computer that don’t interfere with each other.

Importing Data

  • Importing / Ingesting data is the first step to analyzing it!
  • You can use “base R” (the factory settings) to ingest data
  • But we suggest using an add-on package called tidyverse instead.

Lots of Ways to Ingest Data

Data can be ingested into R from lots of sources:

  • SQL Databases
  • REDCap
  • API Endpoints (Census Bureau, NYT, PubMed)
  • Data exported from SAS / SPSS / Stata
  • .json, .csv, .xlsx, .tsv, .txt, .arff files
  • and much much more!

CSV

We’re supplying you with .csv data.

A text file with many numeric and text values separated by commas, with no spaces between the fields, and each row being a new data entry.

Tidyverse

  • A consistent way to organize data
  • Human readable, concise, consistent code
  • Build pipelines from atomic data analysis steps

Tidyverse logo.

Installing a Package

(You probably did this already!)

  • Look in “Files” tab.
  • Go into the “solutions” folder
  • Click on “setup.qmd”.
  • Run the only code chunk there (green triangle “play” button)

setup.qmd script with the code chunk indicated, an arrow pointing to the execute code chunk button.

01:00

Installing and Loading Packages

The word tidyverse followed by the depiction of a package containing 3 help files, 3 data sets, and 4 functions.

install.packages("tidyverse") downloads the package (do once)

library(tidyverse) loads the package (do once per session)

read_csv()

data_frame <- read_csv(file_name)

  • read_csv ingests a file, creating an object that exists in your R environment
  • You have to ingest (import, bring in, other synonym…) data into the R environment to work with it
A depiction of a CSV file being transformed into rows and columns of data.

Functions

data_frame <- read_csv(file_name)

  • read_csv is the function name

Functions

data_frame <- read_csv(file_name)

  • read_csv is the function name
  • file_name is an argument passed to the function.

Functions

data_frame <- read_csv(file_name)

  • read_csv is the function name
  • file_name is an argument passed to the function.
  • data_frame is a named object that will receive the output of the function.

Functions

data_frame <- read_csv(file_name)

  • read_csv is the function name
  • file_name is an argument passed to the function.
  • data_frame is a named object that will receive the output of the function.
  • <- is the assignment operator that makes what’s on the right be assigned to the named object on the right

Hands-on

File browser with exercises folder indicated. Exercises folder with session_2_exercise.qmd file indicated.

  • Go into your First Steps in R and RStudio Exercises project in Posit.cloud
  • Go into the File tab in the lower right pane
  • Find the “Exercises” folder
  • Click on “session_2_exercise.qmd”.
  • Read through that file and complete the exercises!
  • We’ll give you a few minutes to complete this.
12:00

Bonus Content: File Paths – the “where”

  • A few tips:
    • / means “go into a child directory” (\ in Windows)
    • / as the first symbol means “start at the root”
    • . means “this directory”
    • .. means “the parent directory of this directory”
    • ~ means “my home directory”
  • Relative path – “directions from here”
  • Absolute path – “directions from anywhere”
  • Working directory – R’s “starting place”

Great module on Directories and File Paths

Recap

  • Functions (argument, input, output, objects)
  • Working with code chunks
  • Ingesting data from a .csv
  • Working with the environment pane
  • Learning about help in R
  • Naming things
  • Rendering

Q&A / Was This Effective?

In our team, we like to measure our effectiveness.

Goals:

  • Be able to explain when to use install.packages() and when to use library()
  • Ingest data from a .csv and look at it
  • Render a quarto document to an output format (html)

Next Session

Exploring Data Visually, Using ggplot2

  • ggplot2 syntax
  • Mapping Aesthetics
  • Setting Visuals
  • Color Palettes