Here at Thumbtack, R is an invaluable tool in helping us understand what’s happening in our marketplace. Whether it’s thinking about how to best match consumers with the right pro for their job, improve the design of our experiments, or uncover macro-trends in the local services market, R is the tool many of us here at Thumbtack rely on.
As our set of R users, codebase, and data has grown, the amount of R code that is shared and used has also increased. We began seeing team members using nearly identical functions to get data into R from our database, format our request form data, remove outliers when we were looking at the prices of quotes, and generate well-designed plots for presentations or blog posts.
Sam Finegold (co-author of this post and the tackr package) and I realized we didn’t want to end up with a spaghetti bowl of redundant functions and scripts that were all-but-unreadable to anyone not working directly on a project. Just like our friends at, Airbnb, we realized the best way to help our team streamline its R code both now and in the future was to develop our own R package. This is the brief and hopefully informative story of how we went about creating our package which we’ve named tackr.
Step 1: Learn How to Develop a Package from the Best
Our journey began at the Westin by SFO, where, thanks to funding from our generous Thumbtack education and conference stipends, we attended a two-day Master R Developer Workshop with the man who has lifted R from an obscure academic programming language to a mainstream data analysis and visualization tool used by millions, Hadley Wickham. Hadley walked us through the essentials of advanced programming in R, with a focus on writing clean and rigorously tested functions, and then showed us the ropes of how to use his roxygen, testthat, and devtools packages to write our own package. The workshop was great, and we highly recommend you check out Hadley’s freely available works if you’re interested in ramping up your R programming chops and/or developing your own package.
Step 2: Identify the Most Important Elements to Include in the R Package
Once we came back to work and starting thinking about how to develop our package, we didn’t want the perfect to be the enemy of the good. That is, we realized it would be futile to try and include every function ever written by a Thumbtack R user in the package since many are those functions were for one-off or ad hoc analyses. Instead, we wanted to channel our energy on addressing the most commonly used and essential components to coding in R in Thumbtack which were not already handled by Hadley’s and others’ well-tested packages.
We settled on two main things to include in the initial version of tackr. The first was to generate a Thumbtack-specific theme for plots made in R using ggplot2, a widely used plotting package with a lot of customizability. As the number of presentations and blog posts increased that featured plots made using the ggplot2 package, we decided it was time to centralize the style of the plots across Thumbtack.
The second major function was helping our team streamline the process of getting data from our database onto local machines. This is a task that every R user at Thumbtack does on a regular basis, so we figured even a modest improvement in efficiency would be a windfall productivity gain in aggregate. Plus, we knew this would help all future R users at Thumbtack, so those gains would be substantial over time.
Step 3: Build and Use Tackr
To make it as easy as possible to generate beautiful graphics in R, we developed a series of visualization-specific functions for tackr. The first implements a Thumbtack-specific theme for any graphic built in ggplot2 (h/t to Ricardo Bion, creator of the ggtech package, for the inspiration here) Plots that use TT_Theme() instantly acquire the official Thumbtack font, Avenir Next, and a light gray background coloration (#F0F0F0) that’s easy on the eyes. Here’s what the theme does in real time:
Also part of TT_Theme() is a way to alter every font size in the visualization with a single command. We made this possible by basing the font sizes for every element_text present on a single, uniform value that can be modified by the user. So, adding base_size=5 inside TT_Theme() doubles every the font size of every text element (the default is base_size=2.5), making this change:
The second set of functions we included in our initial version of tackr is way to easily change color schemes. One, scale_fill_tt(), is best for a graphic needing several shades, while another, TT_Orange(), is designed for use when only a single shade is needed. Both use our signature shade of orange, as chosen by our design team, (#F27802) as the focal point. Here’s an illustration of the changes implemented by TT_Orange():
Finally, since many of data visualization end up on our blog, in a publicly released report, or in front of a key external stakeholder, we wanted to make it easy to “brand” our data visualizations. Our add_logo() function does this by adding Thumbtack’s official logo in the bottom right corner of an existing plot, like so:
Put together, this set of functions can transform a data visualization in one simple line of code:
We wanted R to be a possible, full-stack data analysis tool at Thumbtack. However, competing tools had one significant edge: they had a streamlined connector to our data. Connecting R to our database was possible, but it definitely wasn’t streamlined. Less technical users had to wade through installing driver software and configuring their laptops to get setup. We remove all the headache with tackr. Now a user can install our package and setup a connection to our database after running a single function.
But why stop there? One of the more annoying workflows of data analysis is running a SQL query and staring at the screen waiting for it to end. We wrote a little run_query() function to run your SQL query as a string, and then notify you when it’s done using the beepr library. When your query finishes, we let you know your work is complete, and if it errors out, we punctuate it with the classic “Wilhelm Scream.” This enables us to switch up what we’re doing while the query runs.
Step 4: Keep Building!
The R community at Thumbtack is already sizeable for a company of our size, with 75% of our data science and analytics team members using R on a regular basis, and tackr is already starting to help us all move faster. But, perhaps most exciting is what’s next for tackr as more R users join us at Thumbtack and others at the company continue to adopt it for their data analysis needs. We’re already starting to think of what new functions we can add to make our package even more useful for our existing colleagues as well as those who have yet to join our team.
Have any ideas? Join us and put them into action!