These slides available at: https://arcus.github.io/first_steps_in_r_rstudio_skills_series/session_2.html
Arcus is an initiative by the Research Institute aimed at promoting data discovery and reuse and increasing research reproducibility.
Among the many teams in Arcus, I represent Arcus Education!
Arcus education provides data science training to researchers …
(and often this is useful to non-researchers too!).
https://arcus.chop.edu/i-want-to/arcus-education
Email us! arcus-education@chop.edu
Arcus Education provides “Skills Series” for the entire CHOP community.
This Skills Series is a 5-session series aimed at helping you take your first steps in R and RStudio!
Exploring Data Visually, Using ggplot2
Goals:
{fig-alt=“ggplot2 logo.”}
ggplot
as an argumentggplot(data = cirrhosis_study)
Untidy
Measure 1 |
Measure 2 |
Measure 3 |
Measure 4 | |||||
pre |
post |
pre |
post |
pre |
post |
pre |
post | |
Team 1 (n=6m, 14f) |
||||||||
Team 2 (n=12m,8f) |
||||||||
Team 3 (n=10m, 10f) |
||||||||
Team 4 (n=5m, 15f) |
Untidy
Measure 1 |
Measure 2 |
Measure 3 |
Measure 4 | |||||
pre |
post |
pre |
post |
pre |
post |
pre |
post | |
Team 1 (n=6m, 14f) |
||||||||
Team 2 (n=12m,8f) |
||||||||
Team 3 (n=10m, 10f) |
||||||||
Team 4 (n=5m, 15f) |
Tidy
Intervention Stage |
N Males |
N Females |
Measure 1 |
Measure 2 |
Measure 3 |
Measure 4 |
|
Team 1 |
Pre | 6 | 14 | ||||
Team 1 |
Post | 6 | 14 | ||||
Team 2 |
Pre | 12 | 8 | ||||
Team 2 |
Post | 12 | 8 | ||||
Team 3 |
Pre | 10 | 10 | ||||
Team 3 |
Post | 10 | 10 | ||||
Team 4 |
Pre | 5 | 15 | ||||
Team 4 |
Post | 5 | 15 |
ID_Arm | N_Days | Status | Age | Sex | AHS Status |
---|---|---|---|---|---|
1_D | 400 | D | 21464 | F | Y/Y/Y |
2_D | 4500 | C | 20617 | F | N/Y/Y |
3_D | 1012 | D | 25594 | M | N/N/N |
4_D | 1925 | D | 19994 | F | N/Y/Y |
5_P | 1504 | CL | 13918 | F | N/Y/Y |
6_P | 2503 | D | 24201 | F | N/Y/N |
7_P | 1832 | C | 20284 | F | N/Y/N |
8_P | 2466 | D | 19379 | F | N/N/N |
9_D | 2400 | D | 15526 | F | N/N/Y |
10_P | 51 | D | 25772 | F | Y/N/Y |
11_P | 3762 | D | 19619 | F | N/Y/Y |
12_P | 304 | D | 21600 | F | N/N/Y |
13_P | 3577 | C | 16688 | F | N/N/N |
14_P | 1217 | D | 20535 | M | Y/Y/N |
15_D | 3584 | D | 23612 | F | N/N/N |
16_P | 3672 | C | 14772 | F | N/N/N |
17_P | 769 | D | 19060 | F | N/Y/N |
18_D | 131 | D | 19698 | F | N/Y/Y |
ID | N_Days | Status | Drug | Age | Sex | Ascites | Hepatomegaly | Spiders |
---|---|---|---|---|---|---|---|---|
1 | 400 | D | D-penicillamine | 21464 | F | Y | Y | Y |
2 | 4500 | C | D-penicillamine | 20617 | F | N | Y | Y |
3 | 1012 | D | D-penicillamine | 25594 | M | N | N | N |
4 | 1925 | D | D-penicillamine | 19994 | F | N | Y | Y |
5 | 1504 | CL | Placebo | 13918 | F | N | Y | Y |
6 | 2503 | D | Placebo | 24201 | F | N | Y | N |
7 | 1832 | C | Placebo | 20284 | F | N | Y | N |
8 | 2466 | D | Placebo | 19379 | F | N | N | N |
9 | 2400 | D | D-penicillamine | 15526 | F | N | N | Y |
10 | 51 | D | Placebo | 25772 | F | Y | N | Y |
11 | 3762 | D | Placebo | 19619 | F | N | Y | Y |
12 | 304 | D | Placebo | 21600 | F | N | N | Y |
13 | 3577 | C | Placebo | 16688 | F | N | N | N |
14 | 1217 | D | Placebo | 20535 | M | Y | Y | N |
15 | 3584 | D | D-penicillamine | 23612 | F | N | N | N |
16 | 3672 | C | Placebo | 14772 | F | N | N | N |
17 | 769 | D | Placebo | 19060 | F | N | Y | N |
18 | 131 | D | D-penicillamine | 19698 | F | N | Y | Y |
ggplot
as an argumentggplot(data = cirrhosis_study)
There are lots of ways to depict data geometrically:
geom_histogram()
geom_dotplot()
geom_bar()
geom_boxplot()
geom_point()
geom_line()
ggplot
as an argumentggplot(data = cirrhosis_study) +
geom_histogram()
Aesthetic mappings connect columns to visible attributes.
Spiders | Cholesterol | Albumin |
---|---|---|
Y | 261 | 2.6 |
Y | 302 | 4.14 |
N | 176 | 3.48 |
Y | 244 | 2.54 |
Y | 279 | 3.53 |
N | 248 | 3.98 |
N | 322 | 4.09 |
Spiders: •• (Color)
Cholesterol: ↔︎ (X axis)
Albumin: ↕ (Y axis)
Spiders | Cholesterol | Albumin |
---|---|---|
Y | 261 | 2.6 |
Y | 302 | 4.14 |
N | 176 | 3.48 |
Y | 244 | 2.54 |
Y | 279 | 3.53 |
N | 248 | 3.98 |
N | 322 | 4.09 |
Spiders: •• (Color)
Cholesterol: ↔︎ (X axis)
Albumin: ↕ (Y axis)
mapping = aes(x = Cholesterol,
y = Albumin,
color = Spiders)
In addition to x/y position and color, what other aesthetic mappings can you think of?
(Hint: things that don’t change when the data changes, like the background color of a graph or the font or title of a graph, aren’t mappings).
Type your answers in the chat!
ggplot
as an argumentggplot(data = cirrhosis_study) +
geom_histogram(aes (x = Cholesterol))
(you can put these in either of the 2 lines above…)
Today, you:
That’s a lot! Give yourselves a round of applause.
We like to measure our effectiveness (and analyze it in R!)
Goals:
Selecting Data Using dplyr
Arcus Education, Children’s Hospital of Philadelphia