R Resources
My short list of useful books, courses, and expert blogs. Many are free. These could be departure points for your own learning journey. My hope is that you take and copy sections for sharing with your own community of practice.
Yes, I am giving away all of the secrets. One of the things that I spend a lot of time thinking about is how to communicate about data science with other human beings. It is often too easy to focus on “just” the technical skills like being able to wrangle, explore, analyze, and visualize data. Describing, documenting, planning, collaborating, questioning, and communicating effectively is harder than it sounds. When team members each have a baseline of shared competencies, we work better together on the problems we face.
I re-visit the content of this list a few times a year as my work takes me further away from coding and more into defining how best to build production solutions. Many of you, too, will eventually be drawn into roles somewhere between educator, developer advocate, R admin, and devops engineer. Every R user also needs to have some exposure to a handful of Python libraries, YAML, DevOps and other tools that I mark with stars ⭐. Tell me about what is good here, and what was missed.
Inspired by these excellent pages:
Paul VanDerLaken’s R resources
Nathan Stephens’s Professional R Tooling and Integration
Oscar Baruffa’s Big Book of R
Martin Monkman’s Data Science with R: A Resource Compendium
The R Community
R is incredible software for statistics and data science. But while the bits and bytes of software are an essential component of its usefulness, software needs a community to be successful. And that’s an area where R really shines, as Shannon Ellis explains in this lovely ROpenSci blog post. For software, a thriving community offers developers, expertise, collaborators, writers and documentation, testers, agitators (to keep the community and software on track!), and so much more.
A brief list of sources that has helped me feel informed and included in the community:
R for Data Science Online Learning Community | R4DS hosts interactive Slack channels (button in the upper right hand corner) for community news, book clubs, meetup events, birds-of-a-feature groups, and career tips |
bookdown.org | Bookdown is an open source R-package that makes writing and publishing technical books easy. This website is a collection of recent books. |
#TidyTuesday | R4DS’s weekly data project aimed at creating opportunities to develop understanding in how to summarize data to make meaningful charts. You will find the hashtag in use by the community over on Twitter. |
Data Science StreamRs | Several data science professionals sharing their knowledge via video stream |
Introductory Books
Every one of us starts our journey from somewhere.
R for Data Science by Hadley Wickham and Garrett Grolemund is an excellent introduction to the Tidyverse. The R for Data Science Online Learning Community hosts book club-style weekly chapter discussions through their Slack channels. Vebash Naidoo has assembled a related solutions guide supplement. | |
R Cookbook by JD Long, Paul Teetor and R-Cookbook is full of how-to recipes, each of which solves a specific problem. The recipe includes a quick introduction followed by a discussion that unpacks the solution and provides insight into how it works. |
Other books for those starting from Excel and basic statistics:
Read the basic R manual, or at least the early chapters. It’s not as well written as more modern documentation, but it is important in being able to understand the tenents of legacy code.
Online Courses
Many of us want to try before we buy. Making the investment of time into reading a book (even a free one) could still be too much to ask. These materials, some even with interactive notebooks and project examples, may be a better path for you.
Data Visualization
If your toolkit has been limited to Powerpoint and Excel, you might not yet appreciate that there is so much more to crafting effective visual communication materials. In addition to the learning resources listed here, look for the recent books by Alberto Cairo and David McCandless at your local library.
Data Visualization, a Practial Introduction in R | by Kieran Healy |
ggplot2: Elegant Graphics for Data Analysis | the online version of work-in-progress 3rd edition by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen |
Hands-On Data Visualization | by Jack Dougherty and Ilya Ilyankou |
Custom fonts and plot quality with ggplot on Windows | by W R Chase |
How to Create BBC Style Graphics | |
R Graph Gallery | |
Fundamentals of Data Visualization | by Claus Wilke |
Exploratory Data Analysis & Visualization | by Zach Bogart and Joyce Robbins |
The Complete ggplot2 Tutorial | |
flowingdata.com | |
How to standardize group colors in data visualizations in R | by Paul van der Laken |
Empirical Bayes
In this little book David Robinson introduces a powerful tool for handling uncertainty across observations. It teaches both the math behind these and the code that you can adapt to your own data through the detail of a single case study: batting averages in baseball. He wrote it for people (like me) who need to understand and apply methods, but would rather work with real data than face down pages of equations.
So why is Empirical Bayes worth learning? These methods are especially well suited for many modern applications of data science.
Shiny
Shiny is an R package that makes it easy to build interactive web apps for non-programmers. You can host standalone apps on a webpage or embed them or build dashboards to be served from a cloud facility. You can also extend Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.
RMarkdown and Quarto
R Markdown is a file format for building dynamic documents that contain both code and document text. The package has been extended to provide systems for authoring and publishing books, presentations, blogs, and dashboards.
Quarto is a newer open-source, multi-language scientific and technical publishing system built on Pandoc. Like R Markdown, Quarto uses Knitr to execute code.
NLP: Text Mining, Document Classification, Sentiment Analysis, and Topic Modeling
Supervised Machine Learning for Text Analysis in R | by Julia Silge and Emil Hvitfeldt |
Text mining | by Julia Silge and David Robinson |
Learn Tidytext | by Julia Silge |
Text as Data: An Overview | by Ken Benoit |
word2vec in R | Belgium Network of Open Source Analytical Consultants |
Learn Regular Expressions |
More Advanced Books
Machine Learning
Mastery of the wide array of Machine Learning techniques in real business contexts requires a broad and deep study. Long-time engineers like me often make the mistake of skipping ahead to the Kaggle award winning algorithms and the CV buzzwords. We’re smart. Just wing it, right? This is a bad idea.
The leading edge of thinking in many ML areas is changing rapidly. These are just a starting point:
Data Science for Business | by Provost and Fawcett ( 2013 O’Reilly) |
⭐ Machine Learning Engineering in Action | Ben Wilson |
⭐ Designing Machine Learning Systems | Chip Huyen |
⭐ Reliable Machine Learning | Cathy Chen, Niall Murphy, Kranti Parisa, D. Sculley, Todd Underwood |
Introduction to Computational Thinking and Data Science | Grimson, Guttag, and Bell |
Tidy Modeling with R | by Max Kuhn and Julia Silge |
Feature Engineering and Selection: A Practical Approach for Predictive Models | by Max Kuhn and Kjell Johnson |
An Introduction to Statistical Learning | 2nd edition |
An Introduction to Statistical Learning Labs | tidymodels examples |
Practical Machine Learning in R | by Nwanganga and Chapple (2020) reviewed here. |
Find other materials on AI ethics in your specific working domain and be prepared to consider the impacts of validating and generalizing your work. No computing tool does this for you by itself, even if it claims to be automatic.
Machine learning modeling frameworks offer streamlined solutions for pre-processing, scoring, and publishing models. The most popular is arguably
scikit-learn over in the Python world. There are also fully supported proprietary systems available from SAS and Mathworks, at a cost. The most popular deep learning neural-net frameworks at this point in time are Tensorflow and Torch, with interfaces from several programming languages. In R the widely used frameworks are caret
, tidymodels
, and mlr3
. Any machine learning solution put into production will require proper orchestration and monitoring to assure delivery to the enterprise’s service level requirements.
⭐ Building Machine Learning Powered Applications: Going from Idea to Product | by Emmanuel Ameisen |
⭐ The Phoenix Project | by Gene Kim, Kevin Behr, and George Spafford |
MLOPS with R: An end-to-end process for building machine learning applications | on Azure |
Interpretable Machine Learning | |
Deep Learning |
Geospatial
Analyzing US Census Data: Methods, Maps, and Models in R | The goal of this book, by Kyle Walker is to illustrate the utility of R to prepare, map, and present data products |
Map Plots Created With R And Ggmap | by Laura Ellis |
Making Maps with R | by Eric Anderson for (NOAA/SWFSC) |
Spatial Data Science | by Edzer Pebesma and Roger Bivand |
Introduction to Spatial Data Programming with R | by Michael Dorman |
Predictive Soil Mapping with R | by Tomislav Hengl and Robert A. MacMillan |
Reproducible road safety research with R | |
Spatial Data Science | by Angela Li |
The GDSL Big List of Teaching Links | University of Liverpool |
Geocomputation with R | by Robin Lovelace |
another collection of geospatial sources | sshuair |
Meetups, Blogs, and User Groups
A seasoned useR group organizer reminded me recently that a non-trivial amount of effort is required to organize meetups, find speakers, get locations (non-covid times), market the meetups, etc. One way we all can help is to volunteer to speak, volunteer a location, help market, farm for speakers, etc. There is a moral motivation in the open source R community to lift one another up and recognize efforts. The software is “free” as in speech, not as in beer. If you learn something useful and encounter a buy me coffee button, be sure to offer a cup of java to the presenter in return.