Try out our self-paced learning modules!

Our modules are designed to be flexible and bite-sized, so you can use your time efficiently and pick just the topics you want to study. Each module has learning objectives and prerequisites clearly listed on the first page, to help you decide which ones you want to work on.

That said, many learners prefer to start with a curated list of suggestions, rather than browsing through the modules on their own. If you’re trying to figure out how to get started with DART, see if one of the suggested pathways below sounds like a good fit!

Pathway 1: Getting started with biomedical data science

This popular pathway is designed with new data scientists in mind. You might be early in your research career, or you might have years of experience but just trying out data science techniques for the first time.

This pathway provides a practical overview of what skills you’ll need to do reproducible, rigorous data science research in biomedical and health fields. We’ll touch on a lot of the hot topic techniques you may have heard about (what exactly are large language models?) and help you cut through the hype to figure out whether those are tools you want to invest time in learning.

If you’re at the point where you know you’re interested in biomedical data science but aren’t sure where to start, this is the pathway for you!

Getting started with biomedical data science: Modules

Order	Module	Description	Estimated Time
1	Reproducibility, Generalizability, and Reuse	This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.	60 min
2	How to Troubleshoot	Learning to use technical methods like coding and version control in your research inevitably means running into problems. Learn practical methods for troubleshooting and moving past error codes and other difficulties.	30 min
3	Learning to Learn Data Science	Discover how learning data science is different than learning other subjects.	20 min
4	Demystifying Geospatial Data	This module is a brief introduction to geospatial (location) data.	15 min
5	Omics Orientation	This module provides a brief introduction to omics and its associated fields.	15 min
6	Demystifying SQL	SQL is a relational database solution that has been around for decades. Learn more about this technology at a high level, without having to write code.	40 min
7	Demystifying Machine Learning	An approachable and practical introduction to machine learning for biomedical researchers.	60 min
8	Demystifying Large Language Models	Learn about large language models (LLM) like ChatGPT.	60 min
9	Directories and File Paths	In this module, learners will explore what a directory is and how to describe the location of a file using its file path.	15 min
10	Demystifying the Command Line Interface	Understand what the command line interface is and why it's useful!	15 min
11	Demystifying Python	This module introduces the Python programming language, explores why Python is useful in research, and describes how to download Python and Jupyter.	20 min
12	Demystifying Regular Expressions	Learn about pattern matching using regular expressions, or regex.	30 min
13	Citizen Science	This is an overview of citizen science for biomedical researchers.	45 min
14	Demystifying Containers	Containers can be a useful tool for reproducible workflows and collaboration. This module describes what containers are, why a researcher might want to use them, and what your options are for implementation.	20 min
15	Intro to Version Control	An introduction to what version control systems do and why you might want to use one.	15 min
16	Research Data Management Basics	Learn the basics about research data management.	40 min

Pathway 2: Focus on omics

This pathway is for people who want to start working with molecular data. It will bring you up to speed in the computing tools you’ll need to get started with genomics research, including using the command line, version control, and containerization. No computing background is assumed; we’ll start from the basics! Note that if you’re already actively working on genomics analysis, this material will likely be too basic for you.

Focus on omics: Modules

Order	Module	Description	Estimated Time
1	Reproducibility, Generalizability, and Reuse	This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.	60 min
2	How to Troubleshoot	Learning to use technical methods like coding and version control in your research inevitably means running into problems. Learn practical methods for troubleshooting and moving past error codes and other difficulties.	30 min
3	Directories and File Paths	In this module, learners will explore what a directory is and how to describe the location of a file using its file path.	15 min
4	Research Data Management Basics	Learn the basics about research data management.	40 min
5	Demystifying the Command Line Interface	Understand what the command line interface is and why it's useful!	15 min
6	Bash / Command Line 101	This course teaches learners to navigate their computer, as well as view and edit files, from the command line using Bash.	40 min
7	Bash: Searching and Organizing Files	This module will teach you how to use the bash shell to search and organize your files.	30 min
8	Bash: Combining Commands	This module will teach you how to combine two or more commands in Bash to create more complicated pipelines in Bash.	30 min
9	Bash: Conditionals and Loops	This module teaches you how to iterate through \"for\" loops and write conditional statements in Bash.	60 min
10	Bash: Reusable Scripts	This module will teach you how to create and use simple Bash scripts to make repetitive tasks as simple as possible.	60 min
11	Intro to Version Control	An introduction to what version control systems do and why you might want to use one.	15 min
12	Setting Up Git on Mac and Linux	This module provides recommendations and examples to help new users configure git on their computer for the first time on a Mac or Linux computer.	15 min
13	Setting Up Git on Windows	This module provides recommendations and examples to help new users configure Git on their Windows computer for the first time.	25 min
14	Creating a Git Repository	Create a new Git repository and get started with version control.	60 min
15	Exploring the History of your Git Repository	This module will teach you how to look at past versions of your work on Git and compare your project with previous versions.	30 min
15	Omics Orientation	This module provides a brief introduction to omics and its associated fields.	15 min
16	Genomics Tools and Methods: Computing Setup	This module walks you through setting up your own copy of a genomics analysis AMI (Amazon Machine Image) to run genomics analyses in the cloud.	30 min
17	Genomics Tools and Methods: Quality Control	Get started with genomics! This module walks you through how to analyze FASTQ files to assess read quality, the first step in a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population (variant calling).	40 min
18	Demystifying Containers	Containers can be a useful tool for reproducible workflows and collaboration. This module describes what containers are, why a researcher might want to use them, and what your options are for implementation.	20 min
19	Getting Started with Docker for Research	This tutorial combines a hands-on interactive Docker tutorial published by Docker Inc with an academic article outlining best practices for using Docker for research.	60 min

Pathway 3: Big data, big questions

This pathway is for people primarily interested in analysis of the rich, complex data in the electronic health record (EHR) and other big databases. If you’re interested in social determinants of health, retrospective analysis of clinical data, or connecting data from multiple sources, this is the pathway for you! This pathway includes a gentle but thorough introduction to SQL, the programming language you’ll need to be able to work with databases, as well as information about working with geospatial data, text data, and more.

Big data, big questions: Modules

Order	Module	Description	Estimated Time
1	Reproducibility, Generalizability, and Reuse	This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.	60 min
2	Research Data Management Basics	Learn the basics about research data management.	40 min
3	Demystifying SQL	SQL is a relational database solution that has been around for decades. Learn more about this technology at a high level, without having to write code.	40 min
4	Database Normalization	Learn about the concept of normalization and why it's important for organizing complicated data in relational databases.	40 min
5	SQL Basics	Structured Query Language, or SQL, is a relational database solution that has been around for decades. Learn how to do basic SQL queries on single tables, by using code, hands-on.	60 min
6	SQL, Intermediate Level	Learn how to do intermediate SQL queries on single tables, by using code, hands-on.	60 min
7	SQL Joins	Learn about SQL joins: what they accomplish, and how to write them.	60 min
8	Demystifying Geospatial Data	This module is a brief introduction to geospatial (location) data.	15 min
9	Encoding Geospatial Data: Latitude and Longitude	This is an introduction to latitude and longitude and the importance of geocoding - encoding geospatial data in the coordinate system.	15 min
10	The Elements of Maps	This is a general overview of ways that geospatial data can be communicated visually using maps.	45 min
11	Demystifying Regular Expressions	Learn about pattern matching using regular expressions, or regex.	30 min
12	Regular Expressions Basics	Begin to use regular expressions, or regex, for simple pattern matching.	60 min
13	Regular Expressions: Groups	Use regular expressions, or regex, for complex pattern matching involving capturing and non-capturing groups.	30 min
14	Regular Expressions: Flags, Anchors, and Boundaries	Use flags, anchors, and boundaries in regular expressions, or regex, for complex pattern matching.	45 min
15	Regular Expressions: Lookaheads	Use regular expressions, or regex, for complex pattern matching involving lookaheads.	30 min
16	Demystifying Large Language Models	Learn about large language models (LLM) like ChatGPT.	60 min
17	Demystifying Machine Learning	An approachable and practical introduction to machine learning for biomedical researchers.	60 min
18	Citizen Science	This is an overview of citizen science for biomedical researchers.	45 min

Pathway 4: Analysis in R

This pathway is focuses on the skills and techniques you’ll need to leverage the popular statistical programming language R. We’ll start from zero and walk you through everything you need to start analyzing data in R, including lots of opportunities for hands-on practice.

This is designed to be welcoming to folks with no coding experience, so if R will be your first programming language you’ll fit right in!

Analysis in R: Modules

Order	Module	Description	Estimated Time
1	Reproducibility, Generalizability, and Reuse	This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.	60 min
2	Tidy Data	Tidy is a technical term in data analysis and describes an optimal way for organizing data that will be analyzed computationally.	45 min
3	How to Troubleshoot	Learning to use technical methods like coding and version control in your research inevitably means running into problems. Learn practical methods for troubleshooting and moving past error codes and other difficulties.	30 min
4	R Basics: Introduction	Introduction to R and hands-on first steps for brand new beginners.	60 min
5	R Basics: Visualizing Data With ggplot2	Learn how to visualize data using R's `ggplot2` package.	60 min
6	R Basics: Transforming Data With dplyr	Learn how to transform (or wrangle) data using R's `dplyr` package.	60 min
7	Directories and File Paths	In this module, learners will explore what a directory is and how to describe the location of a file using its file path.	15 min
8	R Basics Practice	Use the basics of R coding, data transformation, and data visualization to work with real data.	60 min
9	Reshaping Data in R: Long and Wide Data	A module that teaches how to reshape tabular data in R, concentrating on some typical shapes known as "long" and "wide" data.	60 min
10	Missing Values in R	A practical demonstration of how missing values show up in R and how to deal with them. Note that this module does not cover statistical approaches for handling missing data, but instead focuses on the code you need to find, work with, and assign missing values in R.	45 min
11	Summary Statistics in R	Learn to calculate summary statistics in R, and how to present them in a table for publication.	30 min
12	Data Visualization in Open Source Software	Introduction to principles of data visualization and typical data visualization workflows using two common open source libraries: ggplot2 and seaborn.	20 min
13	Data Visualization in ggplot2	This module includes code and explanations for several popular data visualizations, using R's ggplot2 package. It also includes examples of how to modify ggplot2 plots to customize them for different uses (e.g. adhering to journal requirements for visualizations).	60 min
14	Introduction to Null Hypothesis Significance Testing	This is an introduction to NHST for biomedical researchers.	40 min
15	Statistical Tests in Open Source Software	This module provides an overview of the most commonly used kinds of statistical tests and links to code for running many of them in both R and python.	20 min
16	R Practice	Use the basics of R coding, data transformation, and data visualization to work with real data.	60 min
17	Demystifying Machine Learning	An approachable and practical introduction to machine learning for biomedical researchers.	60 min
18	Understanding the Bias-Variance Tradeoff	The bias-variance tradeoff is a central issue in nearly all machine learning analyses. This module explains what the tradeoff is, why it matters for machine learning, and what you can do to manage it in your own analyses.	20 min

Pathway 5: Analysis in Python

Python is a powerful open source programming language with tons of great tools for data science. If you’re looking to learn Python to do things like clean and analyze data, and create data visualizations, this pathway is for you. We’ll start from zero and walk you through everything you need to start analyzing data in Python, including lots of opportunities for hands-on practice.

This is designed to be welcoming to folks with no coding experience, so if Python will be your first programming language you’ll fit right in!

Analysis in Python: Modules

Order	Module	Description	Estimated Time
1	Reproducibility, Generalizability, and Reuse	This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable.	60 min
2	Tidy Data	Tidy is a technical term in data analysis and describes an optimal way for organizing data that will be analyzed computationally.	45 min
3	How to Troubleshoot	Learning to use technical methods like coding and version control in your research inevitably means running into problems. Learn practical methods for troubleshooting and moving past error codes and other difficulties.	30 min
4	Learning to Learn Data Science	Discover how learning data science is different than learning other subjects.	20 min
5	Directories and File Paths	In this module, learners will explore what a directory is and how to describe the location of a file using its file path.	15 min
6	Demystifying the Command Line Interface	Understand what the command line interface is and why it's useful!	15 min
7	Demystifying Python	This module introduces the Python programming language, explores why Python is useful in research, and describes how to download Python and Jupyter.	20 min
8	Python Basics: Functions, Methods, and Variables	Learn the foundations of writing Python code, including the use of functions, methods, and variables.	20 min
9	Python Basics: Lists and Dictionaries	Learn about collection objects, specifically lists and dictionaries, in Python.	15 min
10	Python Basics: Loops and Conditionals	Learn how to use loops and conditional statements in Python.	20 min
11	Python Basics: Exercise	Practice the skills acquired in the Python Basics sequence by working through an exercise.	30 min
12	Transform Data with pandas	This is an introduction to transforming data using a Python library named pandas.	60 min
13	Tidy Data	Tidy is a technical term in data analysis and describes an optimal way for organizing data that will be analyzed computationally.	45 min
14	Data Visualization in Open Source Software	Introduction to principles of data vizualization and typical data vizualization workflows using two common open source libraries: ggplot2 and seaborn.	20 min
15	Data Visualization in seaborn	This module includes code and explanations for several popular data visualizations using python's seaborn library. It also includes examples of how to modify seaborn plots to customize them for different uses.	60 min
16	Introduction to Null Hypothesis Significance Testing	This is an introduction to NHST for biomedical researchers.	40 min
17	Statistical Tests in Open Source Software	This module provides an overview of the most commonly used kinds of statistical tests and links to code for running many of them in both R and python.	20 min
18	Python Practice	Use the basics of Python coding, data transformation, and data visualization to work with real data.	60 min
19	Demystifying Machine Learning	An approachable and practical introduction to machine learning for biomedical researchers.	60 min
20	Understanding the Bias-Variance Tradeoff	The bias-variance tradeoff is a central issue in nearly all machine learning analyses. This module explains what the tradeoff is, why it matters for machine learning, and what you can do to manage it in your own analyses.	20 min

Looking for something else?

The suggested pathways above are just that – suggestions! You can work through DART modules in whatever order you like. All of our learning modules are freely available online.

We’re also building a self-service tool to help you find the modules most relevant to you. Test out our prototype module discovery application, and please leave feedback to help us improve!

Suggested Learning Modules

Data and Analytics for Research Training Program