Arcus Education provides “Skills Series” for the entire CHOP community.
This Skills Series is a short, 2-session series aimed at Demystifying R and RStudio!
Session 1: Introduction to R/RStudio
Session 2: Introduction to Literate Statistical Programming
Session 1 Itinerary
Introduction to R/RStudio
R is a programming language created for statistical data analysis
Why scripts? Reproducibility and open source data science
RStudio is one way to work with R
Considerations for working with R and RStudio at CHOP
Posit.Cloud
Goals:
Be able to describe the difference between R and RStudio
Be able to give one advantage for using scripts written in R for data analysis
Have a concrete next step for obtaining access to R and RStudio at CHOP
R is a Programming Language
R is a programming language. This is what it looks like:
# Ingest data from REDCaparcus_101_feedback_token <- readr::read_file("secrets/quick_arcus_101_feedback_token.txt")arcus_101_feedback <-get_data(arcus_101_feedback_token)# Get raw data and add the labels back in the correct order, show change over timearcus_101_feedback_updated <- arcus_101_feedback %>%# We don't need the "completeness" valueselect(-arcus_101_effectiveness_complete) %>%# Transform all the "knowledge" questionsmutate(across(starts_with("knowledge"),~factor(.x, levels =c("Very little knowledge", "Some knowledge", "Lots of knowledge", "Expert"))),# Transform all the "opinion" questions (pre)opinion_pre =factor(opinion_pre, levels =c("Largely negative, I didn't think Arcus was useful or helpful to CHOP.", "Somewhat negative, I had doubts about how useful or helpful Arcus was to CHOP.", "Neutral, I didn't have a strong opinion.","Somewhat positive, I believed that Arcus was useful or helpful to CHOP.", "Largely positive, I was certain that Arcus was useful or helpful to CHOP.")),# Transform all the "opinion" questions (post)opinion_post =factor(opinion_post, levels =c("Largely negative, I don't think Arcus is useful or helpful to CHOP.", "Somewhat negative, I have doubts about how useful or helpful Arcus is to CHOP.", "Neutral, I don't have a strong opinion.","Somewhat positive, I believe that Arcus is useful or helpful to CHOP.", "Largely positive, I am certain that Arcus is useful or helpful to CHOP.")),# Measure change (pre to post)knowledge_change =as.numeric(knowledge_post)-as.numeric(knowledge_pre),opinion_change =as.numeric(opinion_post)-as.numeric(opinion_pre), )# Make a bar chart showing pre-intervention knowledgeggplot(arcus_101_feedback_updated) +geom_bar(aes(x=knowledge_pre)) +scale_x_discrete(drop=FALSE) +labs(title ="Knowledge of Arcus Before 101") +xlab("")# Save this graph for laterggsave("figures/pre_101_knowledge.png")
R is a Programming Language
R is a statistical programming language.
Like other programming languages (Javascript, Python, C++):
R has specific syntax rules
R gives error messages that you might have to search online for
R has online communities that can help you learn (Stack Overflow, etc.)
Unlike other programming languages:
R was written specifically for statistical data analysis
Why Does This Matter?
Which is a better tool?
A multi-tool (like a Swiss Army knife)
A mostly mono-task tool (like a cherry pitter)
It depends! R is more focused / narrow… which can be good for beginners.
“Stainless 2CR Multi-tool”, Santeri Viinamäki, CC BY-SA 4.0, via Wikimedia Commons
Why Not Just Use Excel?
“Why even write code? Point and click is so much easier!”
These can be useful:
Excel
Point and click statistical analysis software (e.g. SPSS, SAS)
But they can also be:
Very manual / lots of steps you have to explain
Costly
One potential answer? Scripts!
Used with permission by Ed Himelblau. See his work or subscribe to his newsletter at https://www.himelblau.com/
Scripts
In data analysis, scripts are a series of computer code instructions that handle things like:
Ingesting data
Preparing data
Doing descriptive statistics
Conducting statistical tests
Creating models
Saving interim datasets
Creating data visualizations
Communicating information
Why Scripts?
In science, we’ve been hearing a lot about the “reproducibility crisis”.
It’s hard to re-do other people’s analyses… both for checking their work and for trying it in a new situation. This is bad for science!
One of the most important reasons to learn R is to improve the reproducibility of your work. One of the most powerful aspects of working in the R environment is that it makes it straightforward to produce reproducible data analyses, which will reduce risk and make “future you” much happier.
Used with permission by Ed Himelblau. See his work or subscribe to his newsletter at https://www.himelblau.com/
R Vs RStudio
R Programming language for data analysis
RStudio Integrated development environment (IDE)
Using R Alone vs With RStudio
The R App
RStudio
RStudio: Runs Lots of Places
Posit.cloud
Hosted by Posit (in the cloud)
Posit Workbench
Hosted by a company, on prem or in the cloud
RStudio Desktop
Installed on your computer
Working with R at CHOP
We work with regulated data
IRB protocols and other regulations might override what I say here!
You can work with R and RStudio on a CHOP device
You will probably have to request an install via a service ticket
You’ll need a cost center / grant / project number (even though there’s no cost)
Yes, this software has been used at CHOP before
You’ll need to give a reason (“I need to analyze data for my job…”)
Git – version control software that will allow you to easily get the latest version of our course materials and will also be helpful for tracking changes in your own projects
GitHub Desktop – a helper, or “client” software that makes working with Git easier
Researchers ONLY at CHOP
(You’ll need a research cost center to refer to for most of these)
This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.
60 min
Acknowledgements
R User Group leadership, especially Stephan Kadauke
Former learners at CHOP, Penn, Drexel, University of Botswana
DART study participants and pilots around the world