Two years I started doing CrossFit and have since gotten kind of into it. After the CrossFit Open last year, I decided to dive a little deeping into the stats available and was able to prove, using public data, that I'm not very good at CrossFit.
Even still, the data set was amazing -- over a hundred thousand people participated in a series of workouts, with details like height, age, gender and weight on most of them. With this I started looking for patterns. Do taller people do better at a specific movement? How much does experience in CrossFit affect ranking? This lead me down a rabbit hole that is the statistics community.
Getting Started with R
@pull left @
Try R is still one of my favorite Code School courses. It's free, and only takes about an hour to go through. It also introduces some of the core concepts of R that come in handy reading the code below.
@pull right Coursera has some amazing courses on statistics. @
Like most people, I took some statistics in college, but don't use too much of it on a daily basis. Lucky for me, Coursera has some amazing stats classes. The one I ended up going through was called Statistics: Making Sense of Data, which hit close to what I was wanting to learn.
One of the nice parts about the format was it was taught similar to how it would be taught at a university -- with the main instructors teaching the statistics side, then with shorter followup videos detailing how to do the same thing in R.
A Boxplot Graph
Armed with new information about how to draw information from... other information, I went ahead and started making my first graph -- a handy boxplot. With all the information loaded into a Postgres database and put together the following script.
# Setup Postgres connection library('RPostgreSQL') drv <- dbDriver("PostgreSQL") con <- dbConnect(drv, dbname="crossfitopen_development") # Query the database rs <- dbSendQuery(con,"select info_time_crossfitting, wod1_score from athletes where mens=FALSE and info_time_crossfitting is not null and wod1_score > 5") # Load in our 70k rows results <- fetch(rs,n=-1) # Give names to our columns colnames(results) = c('Experience', 'Score') # Create R variables for each column in the database # Creates: # Experience vector # Score vector attach(results) # Order our Experience vector Experience = factor(Experience, c('Less than 6 months', '6-12 months', '1-2 years', '2-4 years', '4+ years', 'Decline to answer')) # Create Barplot! boxplot(Score~Experience, range=0, main='WOD 13.1 By Experience (Women)')
This pulled up an easy to understand Boxplot.
@pull right Statistical proof I'm bad at CrossFit @
Each vertical box represents what 50% of the popuplation on that vertical achieved, with the line inside the boxplot representing the median for that segment. Seeing as how I'd been CrossFitting for 1-2 years, and only scored 100 on this workout, I was sad to see my score was in the bottom 25% for my group. I knew I had to look deeper to see how I compared.
Having Fun With XKCD
I stumbled upon the XKCD Package For R not too long after, and decided to have some fun with this data. This library is plain out amazing, and impressive. Looking at the examples on the page alone I knew it was way over my head being still very new to R. But with one chart in mind -- my ranking on the 5 workouts -- I decided to start writing a graph using the XKCD style.
library(extrafont) loadfonts() library(xkcd) # Bring in the data! # workout=c(1:5) - creates a range from 1-5, so 1,2,3,4,5 # c(16...) - These are my percentiles for the five workouts, resepectively scores <- data.frame(workout=c(1:5), rank=c(16.71, 4.21, 19.61, 4.9, 19.38)) # Define how much of the X and Y access to show. # In our case, we'll show all of the Y access, # but only 1-5 on the X access side. xrange <- range(scores$workout) yrange <- range(c(0,100)) ratioxy <- diff(xrange) / diff(yrange) # Let's create XKCD style stick figure # I blatantly copied this part from the sample code mapping <- aes(x, y, scale, ratioxy, angleofspine , anglerighthumerus, anglelefthumerus, anglerightradius, angleleftradius, anglerightleg, angleleftleg, angleofneck) # The c(1.5,4.5) reprents the X coordinates of each of the 2 stick figures -- # likewise for the y coordinate. The rest of these control the arms and legs dataman <- data.frame( x= c(1.5,4.5), y=c(80, 70), scale = 17, ratioxy = ratioxy, angleofspine = -pi/2 , anglerighthumerus = c(-pi/6, -pi/6), anglelefthumerus = c(-pi/2 - pi/6, -pi/2 - pi/6), anglerightradius = c(pi/5, -pi/5), angleleftradius = c(pi/5, -pi/5), angleleftleg = 3*pi/2 + pi / 12 , anglerightleg = 3*pi/2 - pi / 12, angleofneck = runif(1, 3*pi/2-pi/10, 3*pi/2+pi/10)) # Those squigly lines that connect text to a character are easy to draw. # Each needs an x/y start point and an x/y end point. The library does the rest. datalines <- data.frame(xbegin=c(1.9,4.2,2), ybegin=c(80,70,77), xend=c(2.1,3.9,2.8), yend=c(88,80,68)) # Using ggplot to do the actual graphing -- an versatile graphing library for R p <- ggplot() + geom_smooth(mapping=aes(x=workout, y =rank), data=scores, method="loess") # Do ALL the generating! # This includes putting everything together and adding the sample text we want to write. # Of course, this text should be written using the xkcd font. p + xkcdaxis(xrange,yrange) + ylab("Percentile") + xkcdman(mapping, dataman) + annotate("text", x=2.4, y = 93, label = "There's a lot of\nroom up here", family="xkcd" ) + annotate("text", x=4.1, y = 83, label = "Let's do 7 minutes of Burpees!", family="xkcd" ) + annotate("text", x=2.8, y = 62, label = "I will use your face\nas a wallball target...", family="xkcd" ) + xkcdline(aes(xbegin=xbegin,ybegin=ybegin,xend=xend,yend=yend),datalines, xjitteramount = 0.11)
Running this in the R console, generates a pretty slick graph:
Even if I don't see myself creating loads of XKCD style graphs, getting one made was a lot of fun. If you're curious what else you can do with the XKCD library, check out the documentation.