Thumbtack Engineering Engineering

Steve Howard - Understanding A/B Testing with Monte Carlo Simulation

From the July 28, 2014 SoMa Tech Talk series.

Abstract: As anyone with A/B testing experience can tell you, the humble A/B test is loaded with complexity and pitfalls. Seemingly basic questions of experimental design and analysis are surprisingly difficult to get a handle on, even for those with a background in statistics. How long should I run my test? Which calculator should I use? What confidence level is appropriate for me? In this talk, I'll discuss my attempts to use Monte Carlo simulation to put these questions into a very practical context: how do various choices affect your ability to achieve a higher conversion rate when all is said and done? I'll sprinkle in some interesting statistics and engineering tips along the way.

Evan Miller - Understanding Statistics with Visualization

From the July 28, 2014 SoMa Tech Talk series.

Abstract: Evan Miller will be speaking about visually appealing ways to "supercharge" traditional descriptive visualizations with inferential statistics. Evan is the author of the popular Wizard statistics application for Mac.

A romance of a single dimension: linear git history in practice

Electrical wires in Bangkok

"I call our world Flatland, not because we call it so, but to make its nature clearer to you, my happy readers, who are privileged to live in Space."
(Edwin A. Abbott, Flatland)

A linear commit history is a fine, beautiful thing. It keeps developers sane. It keeps beastly merge commits at bay. It removes pollution from the history. It enables faster debugging. And, like any useful tool, the linear history is a useful mental construct for thinking about code and changes to that code.

A linear commit history relies on a powerful git mechanism called the rebase. You might have heard fairy tales of how rebasing is dangerous or how it can corrupt your history. Yes. Like most powerful tools, in the hands of a novice, the rebase could be problematic. However, like any powerful tool, a master craftsman can wield it carefully and precisely to achieve great ends.

This post is targeted at new developers (or non-engineers) who are looking to understand a git workflow in the context of a drawing a straight line, a linear commit history.

Components of the line

You can think about the code as a series of commit objects, that are organized linearly and which stack on top of one another. If you were to start the codebase over, and apply each commit sequentially, you'd end up with exactly the same codebase we have now. Each commit has a parent commit. This is useful because you can think of each commit as just being a difference between two states: the state of the codebase at the parent commit versus the state of the codebase at the new commit.

This difference has a more formal name in git: a diff. The concept of a diff is widely used, and can refer to any difference between file(s) in one state and the same files in a different state. Git even has special commands for viewing differences: git diff and its variations, but we won't go into those at the moment.

Each commit is given a unique hash code, which is a bunch of letters and numbers, like 83ddf3be77b58395d2f00b7f51a7cec8bafd2ac8. Because these codes are so unique, you can often refer to them by just the first part of the code, like 83ddf3b.

Linear commit history

In the following diagrams, each commit will be referred to with a one-letter code. Each machine references the commit history. You can think of production, staging, and the two versions of master as pointers into the history.

production represents what our users are currently seeing in production. staging is the code about to be deployed. master is the code we are working on currently in development.

A <- B <- C <- D <- E <- F <- G
^              ^              ^
|              |              |
production     staging        master on remote
                              master on your machine

Great! So we can see here that production is behind staging, which is what we expect. So if we were to deploy what's currently on staging to production, the diagram would now show both pointing to D:

A <- B <- C <- D <- E <- F <- G
               ^              ^
               |              |
               staging        master on remote
               production     master on your machine

Now you make several new changes to the codebase on your local dev instance, and you commit them into master. Those commits have the codes H, I, J, K, and L. The commit history now looks like this:

A <- B <- C <- D <- E <- F <- G <- H <- I <- J <- K <- L
               ^              ^                        ^
               |              |                        |
               staging        master on remote           master on your machine
               production

Great. Now your local copy of master includes the changes you made - H, I, J, K, and L. The master step is to issue a git push origin master to get these 5 commits into a central place where they can be accessed by other developers and by other systems.

A <- B <- C <- D <- E <- F <- G <- H <- I <- J <- K <- L
               ^                                       ^
               |                                       |
               staging                                 master on your machine
               production                              master on remote

Now you want to update the staging environment to the latest version of master, so you'll go to your internal deployment tool and start the deployment process. The result will be that staging now points to the commit history at commit L, and the staging environment will showcase the newer version of the codebase.

A <- B <- C <- D <- E <- F <- G <- H <- I <- J <- K <- L
               ^                                       ^
               |                                       |
               production                              master on your machine
                                                       master on remote
                                                       staging

If you want to also put those commits into production, you'd now issue another deployment that will update the production pointer to master:

A <- B <- C <- D <- E <- F <- G <- H <- I <- J <- K <- L
                                                       ^
                                                       |
                                                       master on your machine
                                                       master on remote
                                                       staging
                                                       production

Feature branches

The linear commit history described above works great when you're just working in master, but what happens when you want to make a big feature that has a multi-week development timeline? You don't want to make big features in master, because you'd prevent yourself from making any small bugfixes or tweaks in master, so you create a new branch. Think about the main commit history diagrammed above as the trunk of a tree, then it makes sense how you might branch off that trunk.

Let's say the new branch is going to be a new interface for the app that uses smoke signals to communicate with old-fashioned users, so we'll call the new branch smoke-signals and we'll create the branch with git checkout -b smoke-signals (do this while master is checked out, so your new branch will start at master). Detailed versions of this command (and its caveats) are below. We'll focus on the high level diagram here.

Working from the diagrams above, we hide the earlier part of the history to make it easier to read. Your new branch smoke-signals will start at the same commit as master:

J <- K <- L
          ^
          |
          master on your machine
          master on remote
          staging
          production
          smoke-signals

That's exactly what we expected. You start working on smoke signals, and make your first commit that can show a "hello" signal. You commit this and it has commit-id M.

J <- K <- L <- M
          ^    ^
          |    |
          |    smoke-signals
          |
          master on your machine
          master on remote
          staging
          production

You make a few more commits into the smoke-signals branch:

J <- K <- L <- M <- N <- O <- P
          ^                   ^
          |                   |
          |                   smoke-signals
          |
          master on your machine
          master on remote
          staging
          production

Suddenly the support team emails you and notifies you of a high-priority bug that you need to fix. You switch branches from smoke-signals to master (git checkout master), and get to work on the bugfix. (Note that to switch branches you should have a clean working tree, which means you need to either commit your work before switching or stash it) When you commit the bugfix it is given a new unique ID Q. Q is based off of master, so now the commit history has two very distinct branches, indicated in the diagram with +. Also note that master on your machine is now at Q.

                                 smoke-signals
             / <- M <- N <- O <- P
J <- K <- L +
          ^  \ <- Q
          |       ^
          |       |
          |       master on your machine
          |
          master on remote
          staging
          production

You want to get the bugfix Q out to production right away, so you first git push origin master to get your commit into the remote server.

                                 smoke-signals
             / <- M <- N <- O <- P
J <- K <- L +
          ^  \ <- Q
          |       ^
          |       |
          |       master on your machine
          |       master on remote
          |
          staging
          production

Then you issue a deploy to update the staging and production pointers as well. After these steps, the diagram looks like this:

                                 smoke-signals
             / <- M <- N <- O <- P
J <- K <- L +
             \ <- Q
                  ^
                  |
                  master on your machine
                  master on remote
                  staging
                  production

Awesome, your bugfix is in production and now you can go back to working on smoke-signals. You issue git checkout smoke-signals to get back to your project, and write some more code and get smoke-signals ready for primetime. You issue a final commit and it has an ID of R:

                                      smoke-signals
             / <- M <- N <- O <- P <- R
J <- K <- L +
             \ <- Q
                  ^
                  |
                  master on your machine
                  master on remote
                  staging
                  production

In the meantime, your colleague has been working on master and created another bugfix S:

                                      smoke-signals
             / <- M <- N <- O <- P <- R
J <- K <- L +
             \ <- Q <- S
                  ^    ^
                  |    |
                  |    master on remote
                  |
                  master on your machine
                  staging
                  production

This is starting to get complicated, so take a deep breath and look closely at the diagram. Your goal in the master three steps will be to get smoke-signals on production.

First: update your local copy of master to what your colleague has with git checkout master and git pull --rebase origin master. Note that you might also choose to run git fetch if git reports that your branch is ahead of origin/master by N commits (this is not actually a bug or an issue, so you can skip the fetch if you choose).

                                      smoke-signals
             / <- M <- N <- O <- P <- R
J <- K <- L +
             \ <- Q <- S
                  ^    ^
                  |    |
                  |     master on remote
                  |     master on your machine
                  |
                  staging
                  production

Second, and this is important, you will stack smoke-signals on top of master by first going to the branch with git checkout smoke-signals then issuing a rebase with git rebase master. By doing this you essentially detach your branch from where it departed from master and rewires the diagram so your feature branch is now sitting on top of master.

$ git checkout master
$ git pull --rebase origin master
$ git checkout smoke-signals
$ git rebase master
# .. Resolve any conflicts followed with `git rebase --continue`
                 / <- (nothing here!)
    J <- K <- L +
                 \ <- Q <- S <- M <- N <- O <- P <- R
                      ^    ^                        ^
                      |    |                        |
                      |     master on remote          smoke-signals
                      |     master on your machine
                      |
                      staging
                      production

During a rebase, you might have to resolve conflicts if any files changed in one branch were also changed in the other branch. This is totally normal, and you should carefully consider the conflicts to make sure they're resolved correctly. After you resolve all conflicted files, type git rebase --continue to continue the rebasing process.

Now you'll want to move your master to be at R. This is the only time you should issue a merge command. Be sure to use the --ff-only flag.

$ git checkout master
$ git merge --ff-only smoke-signals
             / <- (nothing here!)
J <- K <- L +
             \ <- Q <- S <- M <- N <- O <- P <- R
                  ^    ^                        ^
                  |    |                        |
                  |     master on remote          smoke-signals
                  |                             master on your machine
                  |
                  staging
                  production

Now your master is at R, and you can use the same deploy process described above to get your new smoke signals feature into production.

J <- K <- L <- Q <- S <- M <- N <- O <- P <- R
                                             ^
                                             |
                                             smoke-signals
                                             master on your machine
                                             master on remote
                                             staging
                                             production

Notice how you've now flattened the commit history into a single line again. Exactly what we wanted.

Conclusion

The linear commit history is a useful tool for managing a complex codebase. We have found it scales well with growing codebase and engineering team, and recommend the linear history as a useful strategy for anyone considering ways of managing workflows.

Thumbtack in Three Quotes

In everything that we do, Thumbtack strives to be comprehensive, super-helpful, easy, dependable, and encouraging. These core values inform our decisions across the board, from product-based changes to social interactions with each other, and they manifest themselves in our passion for our work, our obsession with our customers, and our relentless self-improvement. Here's why.

I'm a rising sophomore Computer Science and Robotics major at Carnegie Mellon University. I like robots. They're great. But I love Thumbtack - the people, the company, the product. I'm finishing a summer internship here at Thumbtack, and it's been one of the greatest experiences of my life. I joined because I felt like I could do the following: first, work on something that has a profound impact on people's lives; second, work at a company that really believes in the power of a great culture; and third, learn from some of the most ridiculously intelligent (yet remarkably humble) engineers I'd ever met.

But you've already heard from Brandon about why you should definitely want to intern (or work full-time!) at Thumbtack, so instead, I'll do my best to describe our culture. As with any company's culture, it's nearly impossible to describe in a few words; alternatively, I'll provide a few quotes that I think aptly encompass what's unique about Thumbtack Engineering.

"For, firstly, the social instincts lead an animal to take pleasure in the society of its fellows, to feel a certain amount of sympathy with them..." -Charles Darwin

We've got a diverse group of interesting people who are sociable and intellectually curious. We love spending time with each other, and have tons of fun studying CS theory together (if that's your thing, of course!), reading and debating literature at a book club, and playing foosball at an almost-amateur level. We've gone urban hiking in the city, biking to Sausalito, and explored the 'Desolation Wilderness' at our engineering retreat in Lake Tahoe. The fact that we're so close to each other socially allows us the opportunity to truly feel comfortable working with, understanding, and having fun with each other. It's important to everyone at Thumbtack that there's a balance between work and life, and that makes us more excited to work every day.

"Everything is an experiment." -Tibor Kalman

At Thumbtack, everything – everything – is an experiment. We truly believe that data are a vehicle that we can use to answer the toughest questions, whether the data be quantitative, anecdotal, or anything in between. This belief pervades everything we do, both internally and from a product standpoint.

We even experiment in how we organize and review ourselves. In our Product Process Review, we review positive and negative aspects of the current product process (the way we, as members of the product team – for us this includes PMs, designers, and engineers – organize ourselves and the way projects travel through the pipeline), and develop (if needed) a new methodology to solve the problems we faced and the problems we think we'll face in the future. One of the greatest things about this is that although we, as engineers, are able to give feedback about product and engineering decisions, we also get an opportunity to give impactful feedback about the way in which we give feedback (inception, right?). The fact that this meeting between every person working on our product is so successful is a testament to how truly collaborative we are, and how well engineers, designers, and project managers are able to effectively communicate with each other.

We're also strictly data-driven when it comes to product changes. Our Engineering Technical Lead, Steve Howard, is an avid statistician when he's not talking to our engineers, interviewing candidates, or playing foosball (what a nerd!). Steve has created an environment of statistical rigor, where we analyze everything we learn with a critical eye. Interacting with him has grown within many of us a love for the fundamentals of statistics, experimental rigor, and data – and that has made us better engineers and better people.

"Love is a quality, not a quantity." -Vanna Bonta

There are two ultimately important things that I can draw from this. First, Thumbtack Engineering emphasizes quality as an important factor to balance with speed. While many startups push for fast release cycles and development of a minimum viable product as quickly as possible, we recognize the importance – the necessity – of maintainable, readable, and high quality code. We focus on developing our codebase and our individual coding style every single day – engineers code review every change that's made to the website (in order to promote high quality code and prevent against knowledge siloing), and improve each other's performance with every review.

Secondly, and perhaps more importantly, we, as a company and as individuals, love what we do. It's that simple. We're truly passionate about changing people's lives and making a difference. The passion that pervades through our work is apparent, and there's nothing that can stand in the way of us doing our best to improve the lives of our customers. There's simply nothing that's more powerful than real passion about helping people, and that's what makes us tick.

Thumbtack is a unique and wonderful place. The culture here, without a doubt, is one of the things that makes us come back for more, day after day. If company culture interests you, or if this sounds great to you, let us know.

Eng Retreat in Tahoe: Robots and Desolation

Within engineering, we like to have consensus on decisions, especially those that have significant and lasting import. So before we left for a multi-day retreat in Tahoe, the engineeering team asked itself an important question.

robots-question

The results were indicative of just how great of a team we have.

survey-results

And with that, we packed our robot boxes and set off for the Sierras.

We hiked in Desolation Wilderness, spent time stargazing, caught (and ate) crawfish, tossed Frisbees, played Resistance, cooked and ate incredible food, and all-in-all had a fantastic time—and no one got too hurt.

We'll let the following pictures tell more of the story.

everyone

hike-1

hike-2

hike-3

hike-4

lake-1

crawfish

lake-2

building-1

building-2

building-3

robots

Page 1 / 9 »