Animating 2021 Cycling Goals with rStrava
Cycling đ˛ popularity has grown in 2021 as evidenced by local cycling club activity and US national surveys. Some of this may be related to changes like new e-bike configurations and city-wide bike share systems that make it accessible to more people. Itâs also possible that post-COVID, many people still want fun & affordable exercise.
Bike shops have been busy filling orders and making repairs. Lead times for some models are months long as global supply chains work to catch up. In my case, I waited 54 weeks for a new gravel bike. It arrived just hours before I embarked on the high point of my season this year, a century ride (100 miles).
I posted to this blog in December of 2020 initially on the rStrava
package, but realized this week that there is so much more to say, post-COVID, on the related issues and opportunities. There have recently been many good general supply chain talks, including examples on managing ripple effects of disruptions here and on the advanced statistical tool stacks available here.
Data resources on bike sharing and fitness tracker utilization are driving public policy decisions in new ways. In 2021, some 200 U.S. cities changed their streets to accommodate increased outdoor activity, including biking. Strava and Lyftâs Divvy are just two examples of many private entities that provide data access APIs. If you have not yet seen it, be sure to explore Stravaâs Global Activity Heatmap and just imagine the possibilities in your own community.
Everything from navigating our personal health care to decisions on how to ride safely warrant an up to date study of our changing world. For example, in general, roads were safer because of reduced traffic while many businesses and schools were closed and people werenât driving. Will that continue?
Letâs walk through an example of how an analyst gets their data, talk a little about publishing what it all means to an audience, and close with an animation.
The Strava API
As with any other project in R, we will start by loading free open source libraries available from CRAN and Marcus Beckâs github to pull from Stravaâs application programming interface (API). I also set my own graphics theme.
source(here::here("_common.R"))
suppressPackageStartupMessages({
library(tidyverse) # data manipulation
library(rStrava) # the Strava API
library(ggmap) # one of the mapping APIs
library(gganimate) # animation tools
library(patchwork) # combine charts and graphs
library(lubridate) # handle dates and times
extrafont::loadfonts(quiet = TRUE)
})
theme_set(theme_jim(base_size = 12))
Strava requires authentication to track the api utilization and compliance with their terms of service. This code is taken directly from Marcus Beckâs rStrava
help pages.
if (file.exists(here::here(".httr-oauth"))) {
stoken <- httr::config(token = readRDS(here::here(".httr-oauth"))[[1]])
} else {
app_name <- "Rapp" # chosen at
app_client_id <- "58858" # Client ID
app_secret <- Sys.getenv("rStravaSecret") # Keep client secrets hidden
stoken <-
httr::config(
token = strava_oauth(
app_name,
app_client_id,
app_secret,
app_scope = "activity:read_all",
cache = TRUE # creates a hidden authentication file and masks it in .gitignore
)
)
}
mykey <- Sys.getenv("GGMAP_GOOGLE_API_KEY")
register_google(mykey)
# Enable the Google Maps Platform Elevations API at https://console.cloud.google.com/google/maps-apis/
The API retrieval functions are called with the token established above. From there we can build a heat map of my activities. The first call is to download the complete list with get_activity_list()
and the second is to convert the list into an easy to use dataframe with compile_activities()
.
my_acts <- get_activity_list(stoken)
act_data <- compile_activities(my_acts,
units = "imperial"
) %>%
mutate(id = factor(id))
all_activity_heatmap <-
get_heat_map(
act_data,
key = mykey,
col = "darkgreen",
size = 2,
distlab = F,
f = 0.1
) +
labs(
title = "Cumulative Cycling Activity in Strava",
subtitle = paste0(format(as.Date(
min(act_data$start_date)
), "%B %e, %Y"), " to ", format(today(), "%B %e, %Y")),
fill = "",
x = NULL,
y = NULL,
caption = "map: Google"
) +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = margin(0, 0, 0, 0, "cm")
)
Letâs pair the map with a graph showing the accumulation of miles over time.
Accessing data, exploring it, and understanding what it means as an analyst is still only half of the work. Connecting with an audience to tell the story in a meaningful way is always the more challenging task. Regardless of whether the topic is supply chain risk or health and fitness decisions, the data alone is not enough.
Business school programs cover data viz and leadership communications at some level, often for a c-suite audience. Most every consultantâs pitch deck has the same MBA style building blocks (for Powerpoint).
The emerging field of data journalism offers hard won process frameworks for conveying insights through media to the broader world.
What makes data journalism different? New possibilities open up when combining the ânose for newsâ and ability to tell a compelling story with the scale of digital information now available. Those possibilities can come at any stage: using programming to automate data retrieval, or using software to identify connections between thousands of documents.
Data journalism can tell complex stories through infographics. Consider Hans Roslingâs spectacular talks on visualizing world poverty with Gapminder and David McCandlessâs work showing the importance of clear design at Information is Beautiful.
Often stories begin with resources in open source software, open access publishing and open data. Even within the venerable New York Times there are repos of open-source and even âinner-sourceâ (widely distributed internal package) digital tools. This leads me to a thought on the roles that fill these communications needs in organizations. This short video montage from a spring 2021 conference conveys a broader perspective of what sorts of people do modern analytical work.
Drilling down deeper, the rStrava
package includes functions for pulling elevation or % gradient for individual rides, or all of them. Heart rates, travel speeds, and really anything recorded can be plotted over maps from Google, OpenStreetMap, or Stamen (each with their own APIs).
No purchased software was required to access the personal biometric data here. No purchased software was required to parse it or to visualize it. Generally speaking, less cost, less friction and less gate keeping are all ways of creating value. Oddly, legacy corporate and government bureaucracies may still not see it that way, as there is no longer a need for analysts to beg for licenses from an IT department.
When you think about it, the way humans tend to operate mirrors open source principles. Arguably we are naturally inclined to use, share, learn, copy and improve on things together so that other people can be part of a community. Itâs not surprising then that itâs been so widely adopted, and because itâs so widespread, the barriers to entry are relatively low if you want to join the club.
Some companies are learning to tap into the open source community for talent and to upskill themselves. They go beyond reluctantly accepting the use of open source software to encouraging participation by their employees in projects.
No company is more emblematic of the shift than Microsoft, which initially waged a legal battle against it. The digital giant now uses open source software extensively. Most of Microsoft Azure runs on Linux. They acquired GitHub for $7.5 billion in 2018 â then the largest enterprise software acquisition ever â and its employees are heavily engaged, with over 5,000 of them contributing to open source projects in 2020.
Other leading companies have established open communities of excellence. They identify the open source software each department in the organization uses, and foster collaboration as well as best practice sharing. Catalyzing exchanges and getting various functions to share success stories help companies realize the full potential of their people.
On a personal interest level, open source lowers the learning barriers and allows for ad-hoc communities to spring up. It gives people a sense of purpose and helps to increase creativity and problem solving skills which favorably influence productivity.
Setting Goals for 2021
Strava offers their own paid subscription feature that includes graphs that let members âvisualize their progress and trajectoryâ as they strive to hit their targets. As always with this kind of functionality, the quality and usefulness of results depends on the inputs. Set reasonable goals and youâll have a nice visual reminder that youâre on track. Excessive optimism, on the other hand, yields graphs that serve only to remind you that you really need to try harder.
Another approach is to build our own goal tracking tool in R. One approach is illustrated here:
A closing thought:
Comparing ourselves to the success, or even the failure, of others is only natural. Who should we be comparing ourselves to, anyway? Some thoughts from Leonard Lee:
An acknowledgement of inspirations for building this post:
Author and
rStrava
Creator Marcus W. Beck
Sascha Wolfer published an very nice blog post at rCrastinate on running and is active in the RStats Strava Club
Jack Rozranâs blog rStrava post on visualizing cumulative mileage with ridgeplots.
Chris Woodsâs Cycle Eye Visual of total climb and distance in a radial plot.
John Peters has crafted a Shiny app that leverages Strava and other fitness data at EnDuRA
Daniel Padfield with concepts for animations.
Marcus Volzâs creative data vizualizations
Robin Lovelaceâs
cyclestreets
package
References
D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161.