Thumbtack Engineering Engineering

Fearless migrations to backend services

Web applications usually start out as a single codebase. In time, that little monolith grows. Before you know it, the application, team, and business have all grown to the point that the not-so-little monolith is a bottleneck. It's time to start breaking it down into manageable, orthogonal services.

There are a lot of good reasons for moving toward a service-oriented architecture - less disruptive deployments, flexibility in choice of language and tooling, smaller codebase, better fault isolation - but I'm going to assume that you are already sold on the idea.

Switching over to a newly created service is a difficult problem. If that service should store data, you likely want to move the existing data over to the new service. This is easy if you take down the relevant parts of the site for a few hours while you move the data over, but chances are that you don't want to do that. Likewise, you also want to avoid downtime caused by bugs or performance issues in the new service that are only discovered when it is exposed to production traffic.

The goal of this plan is to avoid any scary "flip the switch and pray" moments. Every move you make should be small, low risk, and easy to undo.

Preparation

Before writing the service, before thinking about the API, before even making changes to the monolith codebase, think about how ready it is to have major changes made in it. Are there unit or functional tests for flows you'll be working with? It's nice to feel assured that your changes aren't breaking anything. Besides, this stuff should have had tests all along.

Another question: are you capturing performance and error rate metrics? Do you have good dashboards set up for these metrics? Do you know how often each endpoint is being hit? These are always good to have, but they are especially important when making major changes to the architecture. It's easy to accidentally make things slower or introduce an error case and not find out about it. Plus, if you make things faster, it feels good to show off the graph in Slack.

It's also well worth your time to clean up unused features. Why waste time porting something just to turn around and delete it? Cleaning up old features is also a good exercise to re-familiarize yourself with everything that happens in that part of the code. You may find some surprises!

The Beauty and the Beast

The existing monolith might be a bit of a beast. In all likelihood, the logic we want to extract is spread around in several odd corners. We'll need to tame the beast before we can introduce it to our beautiful new service. We need to gather up all that logic and move it behind an interface. This interface will eventually become a template for the API exposed by the service, and will be directly implemented by the service's client library. This interface also makes a good point for adding an instrumentation decorator that captures metrics like how often different methods are called and how long they take.

This interface needs to encapsulate the implementation well enough that the interface works just as well for the existing code as it would for a new service. If some piece of functionality is awkward to move behind that interface, it will likely also be awkward to integrate into the service. It is a much lower cost to change the interface now than to wait until after the service has been built. If the code is large and complicated, it may be worthwhile to move things behind the interface one at a time. This lets you deploy smaller changes at one time.

Building that interface should clarify what the service needs to do. Armed with that knowledge, decide on the design of the service. This is a good point to write an RFC. Getting your design down on paper helps find bugs in your thinking. As someone once said, "weeks of coding can save you hours of planning." Don't know what to write? Pretend a three-year-old is questioning you, and answer "why?" about the decisions you made. Reconsider your plans when you don't find your own arguments convincing.

Building a service is basically just building a smaller web app, so the process of actually writing it doesn't require much explanation. That said, there are a few things you should consider. What makes the connection between the monolith and the service secure and private? What metrics do you need to monitor on the new service? How are failures handled: what happens to the monolith if the service goes down, what happens to the service if one of its dependencies goes down? What's the plan for restoring from backups? Trying out failure cases (try restoring from a backup) will often find issues in these processes. Remember, a backup that you can't restore from isn't a backup.

Bringing the service up

Before switching production to the shiny new service, you want to feel sure that it will handle the load and will be stable. To do that, we'll test it with some traffic. To capture the full variety of possible values, there is nothing quite like production traffic, so that's what we'll use.

Using that interface you just created, make a proxy within the monolith. This proxy will start duplicating some traffic to the new service. The new service isn't in production "for real" yet, so this will still rely on the existing code's results. Errors the new service returns should be logged but otherwise ignored. If the call to the new service is expected to take a meaningful amount of time, consider running it in parallel with the existing logic or even running it asynchronously. Either way, make sure there is a timeout so it can't slow down the application too much if it is responding slowly.

Start with a small percentage of traffic going to the new service. Gradually ramp up the load so you can see and fix issues with performance when they are still minor. Minimize the factor the load increases at each step by increasing it in geometric (1, 2, 4, 8, 16%) increments instead of linear (1, 5, 10, 15%, etc). It doesn't hurt to have a feature flag system that lets you quickly turn off the traffic to the service without deploying a configuration change, just in case.

In order to know it's safe to move the load up to the next step, you need metrics to tell how it is currently doing. Make sure you aren't seeing errors from the new service. Watch both the time it takes the service to serve the requests as well as the total time it takes to run both the new and the old code. If it makes sense for the type of service, also check how often the result returned from the service differs in a meaningful way from the existing code's result.

If this service does processing that has real-world side effects (such as sending email), send those to a sandbox account or dummy backend for now.

If the service stores data, queries from the data will return the wrong results since the data is missing. Once you are duplicating 100% of write operations you can start backfilling data to fix that (if you backfill the data before the service receives all writes, the data will just fall out of date again). Watch out for performance problems that only crop up once all the data is present. A problem like a missing index could go unnoticed until there is some real volume of data. To avoid impacting the performance of either the existing or the new datastore, you may need to throttle the backfill process.

The not-so-big switch

Once the service has been running on 100% of traffic for a while without problems, it's time to start using it for real. At this point both the old code and the new service are working on all requests, so we'll simply switch the roles for some portion of requests. Again here, we'll gradually ramp up the proportion of traffic we switch over.

To switch roles for a request, switch which implementation is sandboxed when interacting with the outside world and switch which results are used. Even once 100% of the operations are switched over to the new service, keep duplicating operations (at least, write operations) to the old implementation. That keeps the data in that implementation fresh so you can switch back to it if need be. This delays the point of no return until after the new service has been serving 100% of production load for long enough for you to be totally confident in it.

Finishing up

The old implementation should end not with a bang but with a yawn. Crossing the point of no return should be the boring part. Stop writing to the old implementation, clean up the code, take a backup, drop the table. Share some celebratory cookies.

While waiting to shake out any bugs before killing off the old implementation, think back over how things went. What helped the process go well? What was unnecessary? What could have been better? Make a list of things to do differently next time. Share your insights. Order cookies.

Further reading:

8 Questions Every Female Software Engineer Should Ask Before Taking a Job

thumbtack engineering

There have been countless articles written recently about women in tech, especially engineers, and why many of them don’t stick around. When I think back to the month I spent interviewing for engineering jobs before I joined Thumbtack, I don’t remember being at all concerned about vetting companies for female-friendliness. Luckily, I ended up in a great place, but I know plenty of other women who haven’t been as fortunate.

That got me thinking about some questions I should have asked up front to screen companies for female friendliness when hiring women entrepreneurs.

1. What kinds of things do team members do together besides work?

Last year, while I was writing the original version of this post, a female friend of mine asked this question at a company where she interviewed for a software engineering position. The response was something like this:

“We do a lot of things outside of work together. I actually went surfing with one of my coworkers this morning. But if you wanted to find someone to, I don’t know, go shopping with you, I’m sure you could.”

Because, you know, all women love shopping. Gender-based assumptions like this would make me worry about what other assumptions might be made. Not all answers to a question like this will give you such a clear signal, but the answer you do get will give you a good feel for the personalities of the people you’ll be working with.

This question can also suss out how central drinking is to social events. I’m not saying that women don’t like drinking, but team bonding that is centered around drinking can be an indicator of a brogrammer culture. At Thumbtack we brew beer and take mixology classes together. But even when we do those things, the focus is not on consuming excessive amounts of alcohol. Rather, we do these things to learn something new, appreciate the drinks, and get to know each other better.

2. How often do people ask questions? And how do people ask questions?

The right answer to this question should be “all the time.” Women are more likely than men to suffer from impostor syndrome. We believe that our success is a result of luck, timing, or deception rather than our own intelligence or competence. This can make it difficult for us to ask for help, for fear of being discovered as an impostor. It’s easier to ask for help when a supportive and humble culture is already established, where engineers are constantly asking for and receiving help from each other in a warm and supportive manner.

3. What sorts of things do you do as a company to ensure high-quality code and continued learning?

Processes for reviewing code and a culture of continued learning can be indicators of humility. Specifically, look for engineering teams that:

  • Pair program: It doesn’t have to be required or happen all the time, but teams with engineers that pair with each other, even a couple of times a week, are likely to be teams who value collaboration. Because engineering can sometimes be an isolating profession for women, this type of collaborative environment can be great for female engineers.

  • Participate in code review: A great follow-up question here is “Why are code reviews valuable to you?” Bonus points go to the company with engineers that recognize that not only do code reviews help ensure high-quality code in the codebase, they also create more opportunities for engineers to learn from each other and learn about different parts of the codebase.

  • Take online classes together and provide an education stipend: A culture where engineers are continually learning remind us that everyone is still learning and no one knows everything.

Pay attention to the tone with which your interviewer speaks about these topics. If a company encourages pair programming, but your interviewer doesn’t recognize the benefits, this is a red flag.

4. Are there any women on the engineering team? If so, what positions do they hold?

If the answer is no, it isn’t necessarily a red flag. Many teams want to hire more women, but there aren’t enough of us out there. In this case, you could follow up with “Is it important to you to have a diverse team? Why or why not?”

If the answer is yes, however, you have a great opportunity to speak with someone directly about what it’s like to be a female engineer at that company. Ask for their contact information so you can reach out to them if you don’t meet them during your interview. When you do meet them, ask them about their experience as a woman on the team. If you have any concerns, ask specific questions to address them.

Of the nearly 30,000 people surveyed in the 2015 Women in the Workplace Study, 38 percent of women claimed the people who assisted them most in their career are mostly or all men, compared to 63 percent of men. Because women tend to have mostly female or mixed networks, where men tend have mostly male networks, having women in leadership positions may give you more access to mentorship opportunities.

5. Are your teammates open to receiving feedback? Do you give and receive feedback often?

Everyone has things they are working on improving, but some habits can cause women to feel more isolated. For example, consider the situation where a male interrupts a female in a meeting. Given that women are often told “Don’t let this happen to you!”, it can inspire some pretty self-defeating or even angry feelings when it does happen.

If that, or something like it, happens to you, you want to be able to give that person some constructive feedback - and for that person to take it to heart. I’ve had to do this a couple of times, and while it was difficult, it was incredibly heartwarming to see those people really appreciate the feedback and work hard on improving.

6. Are any engineers on the team involved in programs aimed at supporting women in the industry (e.g. PyLadies, Women Who Code, Hackbright, etc.)?

I found out about Thumbtack because three of the nine engineers on the team (at the time I was hired) had volunteered at Hackbright, an organization that provides engineering fellowships for women. This told me that Thumbtack cares about hiring more women in engineering roles.

7. Does your company offer unconscious bias training?

First of all, make sure you know what unconscious bias training is, because your interviewer might not know - and you might have to explain it.

If the answer is yes, that’s awesome. Likely that means the company is prioritizing nurturing an inclusive and fair working environment. Follow up with “Who is encouraged to take the training?” If the answer is “everyone,” that’s even better. Unfortunately, unconscious bias training is still relatively rare and smaller companies may not have the resources to provide it. If the company does not offer it, ask “Have you considered it? If so, why didn’t you choose to implement it?” It’s likely your interviewer will not know the answers to these questions. If that’s the case, ask them for the contact information of someone who does, and ask that person.

8. Are there any women in roles at the director level or higher? If so, what proportion the leadership at the company are women?

According to the 2015 Women in the Workplace Study, 32 percent of director-level applicants are women. When your applicant pool is that skewed, it can be difficult to construct a gender-balanced leadership team. A company who has women in these positions despite this difficulty may be more committed to nurturing a diverse workplace. More women in leadership roles also make it more likely the female perspective will be considered in important decisions for the company and culture. At Thumbtack, 38 percent of director-level or higher positions are occupied by women.


I feel incredibly grateful to have found a workplace that has such a fantastic culture at Thumbtack that lacks many of the issues female developers face. Many thanks to the engineers on our team that have worked so hard to build this culture.

I hope this post will help other women who are looking for a job as a software engineer - or who might be looking in the future. You might also want to check out the interview prep events hosted by Women Who Code, as well as posts like Self Care Strategies for the Software Engineer Job Search on the Hackbright Academy blog.

Got any resources you want to share? I’d love to hear about them in the comments.

Bio:

Katie Thomas is software engineer at Thumbtack, currently working on the product experience.

Fast iOS Functional Testing

Here at Thumbtack we use KIF to drive the functional tests for our iOS apps. For those unfamiliar with functional testing in iOS, KIF essentially allows us to write tests that programmatically mimic a user: touching, swiping and typing.

We're very much in full-swing, working our way through a long list of high priority features; the mythical feature-complete nirvana still far beyond the horizon. Every new feature (sometimes a refactor) introduces more functional tests. A major limiting factor of functional tests is that they're slow. For every programmatic tap, or swipe we have to wait for iOS to perform its elaborate animations. Our functional test runtimes were rapidly approaching 10 minutes. Compounding the situation, our CI setup runs tests on more devices & iOS versions than a developer typically has on hand, resulting in a high likelihood that a branch will fail CI when first pushed.

We like to iterate quickly, so something had to change. What if we could disable animations?

Disabling Animations

We were a little hesitant going down this road; we liked that our functional tests were a (somewhat) accurate reproduction of real world usage. Fortunately, disabling animations proved to be a quick and simple task, thanks to Method Swizzling. Replace a method here, replace a method there and, hey presto. We gave it a shot.

At this point the assumption was bold: swizzle some methods, reduce CI runtimes by a 100x and head out to the local watering hole. Nice.

Not so fast - or should I say - not fast enough?

We immediately witnessed that our tests weren't as reliable as they once were. KIF would timeout waiting for views to appear, our state machine would raise exceptions about invalid state transitions, assertions would fail, or even more ominous; Core Data's infamous "could not fulfill a fault" exception.

Race conditions everywhere

(as our UK readers will recognize immediately, that's John McCririck, a well known horse racing pundit)

Some of the state transition errors, assertions and Core Data crashes looked suspiciously similar to crashes we had been seeing in production, yet unable to reproduce locally. That's when the penny dropped. We had stumbled upon a method by which we could really put our apps through their paces, cause them some stress. Animations had been acting as a shroud over our eyes, denying us the opportunity to see the truth about the (in)stability of our apps.

A sizable portion of the work required to get CI green again was of a pedantic nature. KIF (and probably any iOS functional testing framework) makes it easy to write tests that pass based on implicit assumptions. For example, with animations enabled waiting for view A to appear also always means that view B is very likely now visible. Once animations are disabled however, these assumptions may no longer hold true. While this was quite annoying, it was a very small price to pay for the opportunity to fix some of our most common production crashes.

At this point, you may be asking: "All of your users will be using the app with animations enabled, how does disabling them represent a realistic scenario?". There are many, many factors that determine when exactly a particular unit of code may be executed. Now multiply these factors by all the different models of iOS devices in our customers' hands, and further multiply that with unique conditions under which their device is operating (memory pressure, network speeds, etc...). I like to think of having animations disabled as the most extreme conditions our apps might experience. By testing at much lower thresholds, we hopefully reduce the risk that variations in real world usage can result in a crash.

What We Fixed

While fixing bugs is always a rewarding experience, my personal favorite outcome of this endeavor was that it led to a much needed refactoring of our authentication mechanisms. But for those of you looking for something more tangible, here are a few of the bugs we fixed:

Sign out

When a user signs out of the app, we reset our Core Data stack; the main queue context and private persistent store contexts are deallocated and re-instantiated. With animations disabled, we immediately began to see Core Data faulting exceptions. Some controller(s) were attempting to perform operations using an NSManagedObject whose NSManagedObjectContext had been deallocated.

The immediate cause of the crashes were controllers holding strong references to managed objects. We fixed those by instead holding a reference to the NSManagedObjectID, and using -[NSManagedObjectContext existingObjectWithID:error:] to load the object as needed, and handling a nil return value.

The bigger problem was that the app could even get into such a state; controllers are still operating on data while sign out related actions are being performed. This broken down to two problems:

  1. Calls to our server-side API may complete after sign out, triggering controllers to attempt to reload data. This was solved by first waiting for all API calls to complete, and then performing a non-animated pop to our root view controller.

  2. We didn't have a state where we could be certain that all sign out actions were now complete. Previously, all sign out actions were performed on transition to a 'Not Authenticated' state. We moved all of these actions to a new 'Unauthenticating' state, and we could be certain that when we transition to 'Not Authenticated' all sign out actions have been performed and we can safely reset the Core Data stack.

NSOperation's completionBlock

In response to a server-side API call, we enqueue an NSOperation to map the JSON data onto objects. We were using -[NSOperation completionBlock] to perform a few more actions. A key detail of the behavior of completionBlock is that it's called after the operation is marked as finished. The documentation is clear on this, but it's a detail we overlooked.

Overzealous use of temporary private queue contexts

In our Thumbtack for Pros app we present Pros with invites to bid on jobs, these invitations are modeled literally as an Invite. Invites can expire, or for various reason become unavailable. When we detect an Invite is no longer available, we delete it, in a temporary child private queue context. We delete in a child context because there may be many Invites to delete and we want those deletions to happen in a single transaction.

We have another mechanism that constantly watches on the main queue context for any changes to an Invite, and then attempts to insert/update/remove a corresponding unmanaged InboxItem object. This mechanism created a temporary child context to process the changes to the invites. With animations disabled, we cause NSManagedObjectContextObjectsDidChangeNotification and NSManagedObjectContextObjectsDidSaveNotification to fire at a more frequent rate. Due to the increased workload, it became easier to trigger the crashes we'd seen in production. Because we used a private context to process changes, we had a race condition where the Invite exists when we begin to process the changes, yet is deleted at some point during processing. This results in a crash when we save the child context and Core Data attempts to reconcile changes to a deleted object.

Using a private context to process changes was admittedly a pre-mature optimization, so we opted to using the main queue context instead and thus negating the possibility of a data race. Arbitrarily deleting Invites still makes us uneasy, in the future we may move to a more deterministic approach where the concurrent impact is much easier to reason about.

Next Steps

KIF was clearly not designed with the expectation that some crazy developer might disable animations and expect their tests to instantly run at Ludicrous Speed. Unfortunately, KIF contains many sleeps; places where it must wait for iOS to do its thing. I presume this is primarily caused by the fact it uses private APIs that were not necessarily intended for the use of functional testing.

A few tweaks were needed to realize a more satisfactory reduction in runtimes. Those changes are available in my fork. I'm very much interested in hearing the KIF authors' thoughts on how we might further reduce the overhead, I'm sure there's a lot of low hanging fruit to pick.

A Practical Introduction to Testing

Coming to Thumbtack fresh out of Carleton College last summer, I had written about 5 unit tests in my life (and a few of them were in my interviews!). My original approach to testing was to think of the various scenarios a given feature might experience, and try them out manually. After a round of code review? Try them out again. This was slow, boring, painful, and error-prone – but there's a (much, much) better way!

Write automated tests

My mentor here suggested that I watch Misko Hevery's talk about testing: The Psychology of Testing. One of Misko's points was "everyone knows how to write tests, so why don't they?" At the time, I had some idea of how to write tests, but had no idea how to write good tests. What makes a test useful? How is test code different from production code? How should my code change to be more testable?

The following is meant to be a brief introduction to testing – why it matters, and some principles to keep in mind when writing tests.

Why write automated tests?

Let's take a step back and clarify – why are automated tests useful?

  1. Verify functionality: Writing code isn't easy, and you're bound to make mistakes. Writing tests helps ensure that your code is actually doing what you intended it to do.
  2. Prevent regressions: You're working as part of a team with other engineers. Those engineers write code that interacts with your code, and may alter the code you've written – tests ensure that such alterations don't break that code. Others on your team should feel confident that your code still functions correctly if your tests pass.
  3. Improve productivity: Running automated tests is a repeatable task that makes your life and your team's life easier. They're easy to run, and can be automated to run at important times such as pushing new code to master or attempting to deploy new code to production.
  4. Provide documentation: Looking at the tests for a particular piece of code can very clearly outline the expected behavior of that code given some input. A test method named test_returns_404_for_deleted_request makes it easy to quickly identify and understand that code's intended behavior.

Things to keep in mind when writing tests

So you're convinced that testing is a good idea and want to write some tests! Here are some things to keep in mind as you do so.

Test code is still code

Just like production code, test code will be read and maintained by other engineers. It should be just as understandable as production code – documentation is still important! Writing descriptive test names makes it easier for other engineers (and your future self) to understand what's being tested. Try to name tests with a structure similar to test_{expected behavior}_for_{scenario}, such as the example mentioned above (test_returns_404_for_deleted_request).

Test code should be simple

When working in a complex code base, tests will often require a decent amount of "set up" code. Break out common setup into helper methods, and leave test methods as simple as possible. A short test_ method lets the reader of the code focus on what's specific to that test. Similarly, test code should be as linear as possible – as a rule of thumb, try to limit the amount of indentation in a test. if statements, for or while loops, and other control flow add complexity to your test code, and should generally be avoided [0].

Tests should focus on functionality, not implementation details

It's tempting to want to write tests that check every little bit of how your code works. Tests that focus on implementation as opposed to functionality can slow down future engineers. If your code exposes some set of public methods, those are the methods that should be tested; not the way that those methods actually work under the hood. For example, if a class stores some state in a heap, a user of that class shouldn't need to know that. If that class is refactored to use an array instead, your tests shouldn't fail!

A "Change-detector" test is another example of a well-intentioned test that makes your code more difficult to work with in the future. Avoid testing how your code works, and instead focus on what functionality your code provides.

Dependencies should be injected

We love dependency injection here at Thumbtack [1]. There are quite a few reasons to use dependency injection, but let’s consider its benefits for testing with a simple example. Say your code sometimes [2] triggers the sending of an email to a user – in testing, we definitely don't want a real email to be sent! Injecting the dependency of an "email sender" allows us to write a test that ensures our code would send an email in production code, but doesn't actually do so in our test. Creating a “test double” to use instead of a real email sender enables us to verify the functionality of the class we are testing. In this case, we’ll create a “stub” email sender that stores the emails it has “sent”, and check its state after our code has run [3].

class ThingBeingTested(object):
    def __init__(self, email_sender):
        self.email_sender = email_sender

    def do_something_and_possibly_send_email(self):
        # some logic happens here that we want to test
        if should_send_email:
            self.email_sender.send_email()

Then our test code looks something like:

class FakeEmailSender(object):
    def __init__(self):
        self.emails_sent = []

    def send_email(self, email):
        self.emails_sent.append(email)

....

def test_sends_email_in_particular_scenario(self):
    fake_sender = FakeEmailSender()
    thing = ThingBeingTested(fake_sender)
    # ... Any other configuration to set up a scenario where the email should be sent ...
    thing.do_something_and_possibly_send_email()
    # Make sure we "sent" an email
    self.assertEquals(len(fake_sender.emails_sent), 1)

Then in production code, we instead inject a real email sender. We've successfully tested the logic that determines whether or not to send the email, and have left the actual email sending to another object.

Tests should be deterministic

Sometimes code relies on non-deterministic sources of data – third party APIs, random number generators, and time are a few examples of such data sources. How do you ensure that your tests don't spuriously pass or fail based on those outside sources? Dependency injection helps with this, as well.

Similar to our example above, consider a simple class that randomly assigns passengers to seats on an airplane. For the sake of this example, we'll assume that seats are identified by integers, and that seats 8 - 12 are in the exit row. Rather than using an actual random number generator, you can pass in a fake generator that returns some preconfigured value.

class AirplaneSeatAssigner(object):
    def __init__(self, random_int_generator):
        self.random_int_generator = random_int_generator

    def get_seat_assignment(self):
        seat_number = self.random_int_generator.get_random_int()
        if self.is_seat_in_exit_row(seat_number)
            # Double check that the passenger is OK with an exit row
            ....

class FakeRandomIntGenerator(object):
    def __init__(self, int_to_return):
        self.int_to_return = int_to_return

    def get_random_int(self):
        return self.int_to_return
.....

def test_for_user_assigned_to_exit_row_seat(self):
    exit_row_seat_number = 9
    fake_generator = FakeRandomIntGenerator(exit_row_seat_number)
    assigner = AirplaneSeatAssigner(fake_generator)

    assigner.get_seat_assignment()
    # ... check the expected behavior for an exit row...

Since we've injected a random number generator that we know will return a certain value, we can test the different scenarios that result from different random numbers. Even better, each time we run our tests, we'll get a consistent result.

Conclusion

Testing is an essential part of the software development process – writing good, useful tests makes your code more reliable, maintainable, and understandable for other engineers. I hope you found something here that helps you write better tests!

Notes and useful resources

[0] The exception to the rule is table-driven tests. This is a common and idiomatic pattern in Go, for example.

[1] See previous posts from Steve and Jeremy. We even found it to be performant!

[2] I've left this example purposefully simple and unrealistic – real test code might require extra setup!

[3] There is a subtle difference between the idea of a “mock” and a “stub” – Martin Fowler has written extensively on this topic. See his “Mocks Aren’t Stubs” article for more explanation.

Other resources

  • Martin Fowler writes a lot of great stuff on the topic of testing – it's worth just browsing his website.
  • Misko Hevery is a great resource, specifically for dependency injection
  • Google has an extensive blog about testing. The "Testing on the Toilet" series is great for providing small pieces of advice in short, simple to understand examples. One of my favorites describes three important qualities to keep in mind when writing tests.

The 12 Weeks of Thumbtack

On the first week of Thumbtack...

Day1

The office—I wouldn’t really call Thumbtack my “true love” (yet)—gave me a promise of exciting times to come. The week rushed by with on-boarding meetings and processes (more exciting that you might expect, especially with the words “Friday massages”); an All-Hands meeting, in which all engineers come together and ideas fly around like bullets; many delectable Thumbtack lunches and dinners; and a team bonding at the beach—a San Franciscan beach, i.e. a cold, bleak streak of sand at the edge of land, made enjoyable from the warmth of the surrounding company (pardon the pun). And that wasn’t all: I was internalizing the company goals and values, cramming 5+ years of team history and infrastructure into my brain, picking up Go, and spending hours of excitement with my mentor, Alex, in preparation for the launch of Kiki’s (email) delivery service[1].

On the second week of Thumbtack…

Alex asked me, “do you want to present at All-Thumbs tomorrow?” All-Thumbs is a company-wide gathering in which any team or team member can present… in front of everyone. So on my 9th day, I’m holding that microphone, standing in the spotlight, and trying to explain how Kiki will replace our current emailing setup on the website while simultaneously wondering whether the front row audience can hear the pounding of my heart.

goroutines

Kiki had made good headway: we had spun up several million goroutines to track usage stats, written a preliminary version of the design doc and code, and had the environment all set up and running on AWS. On top of Kiki, I also talked about my work on a few Go packages that all our services now use. In fact, watching these packages imported by services going into production terrified me more than presenting at All-Thumbs: what if I had made a mistake? I could compromise the website’s security or crash our services, even after extensive code review (luckily, neither occurred). Only two weeks in, Thumbtack had thrown me challenges and projects with impacts beyond what I had ever experienced within the safety of school walls.

On the third week of Thumbtack…

I went back to school. I didn't even have to apply this time—I was automatically enrolled in Thumbtack University (my dream school, of course). Suffice to say, school and lecture are the same no matter where you go... so perhaps 20 hours of “class” didn't make the highlight of my week. I have to admit, though—the material was fascinating, and I came out amazed at how well the internal components to Thumbtack coordinated and worked effortlessly together. This rapid and intense onboarding procedure, while not as fun as my work with Kiki, clarified and made Thumbtack's mission concrete. So although I was uninspired to attend 20 hours of lecture, I gained more drive to help achieve Thumbtack's ultimate goals. I finally felt that I was a part of Thumbtack.

On the fourth week of Thumbtack...

Make Week kicked off! Make Week is a week in which the entire company emerges from whatever projects are currently underway to pursue their ideas on new features and product improvements that would otherwise be abandoned for more pressing issues. I had heard of companies holding hackathons or hack weeks, so I asked Marco at lunch (casual lunch with the CEO, no big deal) why Thumbtack adamantly stuck to “make” instead of “hack.” His response: “hack” is typically connected with engineering, and Make Week is a week intended for everyone—engineering, marketing, design, and more—to stretch their minds and innovate ways for the company to evolve. To imply that only engineers should participate would be to lose the valuable minds of 2/3 of the company. I found this to be yet another example of the enormous effort at Thumbtack to encourage transparency and communication across the company: no one team ever functions completely separate from another, and transparency works as the oil to keep the internal mechanics of Thumbtack running smoothly.

I took a break from Kiki this week and had the opportunity to pair with several different engineers to work on several Make Week projects of my own, including an internal server for Godocs, a script to automate setting up environments and applications, and a few more Go packages for our services. Finally, I again presented at All-Thumbs, albeit feeling slightly calmer this time. Make Week had been a time of exploration, and I had worked on several projects that would become increasingly important and useful during my internship.

On the fifth week of Thumbtack...

Where there are problems... there are Fix-Its and Thumbtack engineers. It was Fix-It week, the week in which Eng and Product tackled issues previously set aside for more urgent projects over the course of the last term. This included strategizing for the future to prevent predicted problems from ever being born. Although I was involved in planning for my team, AWS, whose general purpose is to migrate all servers to AWS, my involvement with Kiki and our Go skeleton also placed me in ArCo (Architecture Committee), which was to oversee our move to SOA (service-oriented-architecture). Sitting in our meetings, I was struck by the importance of this committee—we were essentially proposing a standard on how and why services could and should be created for years to come. And when Alex mentioned that Kiki would set the example for “best service practices”... well, no pressure. While I lacked the experience of many of the engineers in the room, it was strangely easy to voice my ideas and questions (everything was taken seriously, no matter how naïve I thought I sounded), which made me appreciate the openness of my teammates to new viewpoints and ideas. So on top of creating Kiki, I began documenting the steps to service creation and started a checklist of essential service elements. One of my biggest regrets is that we probably won't be able to complete this monumental task before this internship is over, and I wish I could stay to see the ultimate outcome.

On the sixth week of Thumbtack...

Things went wild. It was the week of the third quarter kickoff—an entire day of celebrating what we had achieved the previous quarter, and of gearing ourselves up to achieve our goals for the quarter ahead.

carnival The presentations foretold of the exciting, but simultaneously intimidating, months ahead. To conclude the quarter, we all headed over to a carnival-themed gathering, complete with aerial silk dancers, fortune tellers, donut burgers (I highly recommend, they were delicious), face painting, and much more. But the excitement didn't stop when we returned home at midnight...

Like most of us, I enjoy reading my phone notifications when I wake up in the morning. Unless it's an email at 7 A.M. from Mark (VP of engineering) reporting:

EMERGENCY: emails queued up, 0 sent in 2 hours.

Guess who was in charge of the emailing services? Yup... I had bloodied my hands with my first emergency. Luckily, the issue was resolved quickly—thankfully before Alex woke up—and with nearly zero impact, but I had learned my lesson well. At Thumbtack, we have a system of postmortems—every emergency is “owned” by one or a few people, and analyzed for what can be done to avoid anything similar in the future. I felt strangely proud to own my first postmortem; I had known before that a careless action could bring the system down, but the reality of it hit me hard. The postmortem now serves as my permanent reminder, like a burn scar that heralds back to a childhood memory of curious fingers playing with fire.

On the seventh week of Thumbtack...

Go Gophers! This week, I flew out with 5 other engineers to Denver, Colorado for the second-ever Gophercon, a conference dedicated to the Go programming language. I won't go into much detail about the 20+ fantastic talks, but I did document all my learnings in this Go wiki, which will hopefully prove useful in the future! I also somehow acquired a ticket to the sold-out Go workshop track, which were “deep dives” into some of the more advanced (and really cool) features of Go.

Gophers

Besides the conference's enlightening and intellectual lectures, I got a taste of the Go community, and more broadly, the “coder community” at large. It struck me as a surprise that I was the only woman in a room of 100+ men during the workshops, and it took significant effort to find another woman in the conference of 1500 attendees (many of whom had long hair). Although the gender balance in engineering isn't quite 50-50 at Thumbtack or at school, I had never before experienced the notorious gender disparities and stereotypes as I did now, such as the automatic assumption that I was attending the “Go Bootcamp,” meant for beginners of Go, rather than the workshops intended for more experienced participants. It didn't make a difference in my learning experience, but admittedly there were some awkward moments, like when a bunch of guys refused to walk through a doorway until I had passed through. The conference had also amusingly assumed that that everyone attending Gophercon was above 21—after talking to the organizers about the drink tickets I had received for the after-party at the brewery, I was reassured the next conferences would be more minor-friendly.

Nevertheless, these small bumps during the trip did nothing to lessen my enjoyment of the conference and of Denver in general. My appreciation of Go definitely increased (one engineer jokingly called me a “go-fangirl”), and I spent my free time sampling some of the famous foods, museums, and historical districts of Denver.

On the eighth week of Thumbtack...

Kiki blasted off, sending more than 100 emails a second. We deployed Kiki to send all emails from all the engineers' test version of the website, as well as all the staging emails used in our second, Salt Lake City office. The usual testing process for new services includes unit tests, integration/end-to-end tests, load tests, and finally an A/B test to ensure that Kiki can handle everything that could possibly go wrong. Thus, we attacked Kiki with 7x more email requests than we currently handle, simulated network failures, and manually triggered panics, all through which (to my surprise) Kiki came out relatively unscathed. While not much new code emerged from the process, Kiki become more refined and robust, nearing its production birthday.

As exciting as email-sending was, more overwhelming were the 58 reporters and media frenzy that greeted me Thursday morning, causing traffic jams on 9th street as Jeb Bush's Uber driver meandered over to Thumbtack HQ (see more on his visit here). It was simultaneously terrifying and reassuring to meet one of the potentially most powerful people in the world—and to realize that even figureheads are human.

On the ninth week of Thumbtack...

Kiki hit a bit of a road-bump. Remember how I was on the architecture team? Well, one of the most argued points had been our way to ensure data persistence in the face of network failure or unpredictable hardware failures. Kiki currently had a simple system with goroutines and file storage for saving email data—for every request, Kiki would spawn a new goroutine, write data to file, send the email, and then delete the data, ensuring that, save for very rare occasions, unsent emails would not be completely lost. On top of email sending, we ran a “cron job” type goroutine that pulled unsent data from the file system and resent emails on a set schedule. Potentially the worst feeling in my internship so far was checking this code into a new branch, heading back to master... and deleting it all. The architecture team had agreed (myself slightly reluctantly) upon a data persistence system involving queues, a conditional write check to avoid duplicity, and a two-tier environment setup. Such a system could be implemented once in our shared library, tested extensively, and then be used by all. To have all our services following the same patterns would make designing new services and debugging existing ones more efficient, not to mention that long debates per service over that services' particular design patterns would be avoided on this topic. With this pattern, essentially all emails would be first placed in the queue, after which a “worker tier” application would pull from the queue and send the email, using a conditional write to ensure that the email had not yet been sent. The second tier, a “web tier” application, would expose an API to the outside world, allowing our application to respond to more than just requests from the queue, e.g. requests to unsubscribe emails marked as bounces.

yoga Starting over, although I understood the need to do so, was slightly disappointing—Kiki had been so close, and now productionization felt weeks away (weeks I didn't have). However, I now had the opportunity to abstract Kiki to be used for other notification services (SMS and push notifications), as well as organize Kiki's code into a more understandable package setup. I was more than determined to get Kiki back to production-ready by the end of the week, and was able to succeed, implementing the new data persistence design pattern, pulling out Jiji[2], Kiki's new webserver sidekick, into a new web tier environment, and reconfiguring our deployment scripts to work with dual application tiers. I ended a tiring, frustrating, but ultimately rewarding week with a session of rooftop yoga led by Jeremy—an upside-down San Francisco sunrise had never looked so good.

On the tenth week of Thumbtack...

We dark launched Kiki into production, essentially running Kiki in parallel to the current email-sending system so that Kiki could practice sending emails in production without actually sending them to their designated recipients. After the previous week of chaos and non-stop coding to get the refactored Kiki back into shape, I felt like I could stare at Kiki's metrics and dashboard forever and never get bored—it was unbelievable to watch those numbers tick every second and realize that production was actually happening.

dashboards

And now was time to experiment! We ran profiling tools to figure out where CPU was used most and tweaked Kiki to perform even better, testing with different network resources, memory resources, and machine models (surprisingly, we were CPU-bound, mainly due to the context switching of goroutines) Kiki was becoming polished—I now could settle down and tidy up loose ends, making Kiki as perfect as possible. I also integrated our push notification system (Lakitu[3]) with Kiki, getting a taste of the mobile team's work and collaborating with their team members (I was tempted for a moment to leave AWS for mobile to obtain one of Thumbtack's iPhones... but decided to remain loyal to Android). It was an incredible feeling to realize that what started as a relatively small summer project—an emailing service—had transformed into something much bigger: a service that would set the standards for all services to come, and that would handle more than 5x the number of requests than initially planned. Although push notifications were not yet integrated with the website, I had finished with 2/3 of Kiki's final product—all the extra hours of work had definitely been worth it. For emails, what remained was to do an online A/B test with Kiki, to ensure that Kiki worked equally as well as the current script attached to the website. The week ended with a much needed break—an AWS team celebration of the past quarter's work, complete with a 14-dish, family-style dinner.

On the eleventh week of Thumbtack...

stats As exciting as spinning up new services and scripting new deployment features had been, this week was a time to visit the past. This meant plowing through fifteen code reviews and modifying seven of our code repos, including those of services that had been untouched for over half a year. We wanted to bring all our older services, such as Hercule, up to the standards by which Kiki now abides. Of course, before doing this, we had to first decide on Kiki itself: should we use flags or environment variables? Should something like ports be configurable to ensure future portability (pardon the pun)? Should we alter the code for readability and clarity, or keep it concise and add documentation instead? How should we track metrics and alert on errors? It was slightly exhausting to code a change, decide that we should remove it, and then decide later to change it back to the original. One thing I learned: if you put a group of highly knowledgeable (and opinionated) engineers in a room and debate a controversial decision, discussions can linger on forever—it's nearly impossible to find a solution that satisfies everyone. Sitting in those committees reminded me of debate tournaments—just when I found myself convinced by one point, someone would highlight the torrent of problems that came with it. The often heated back-and-forths definitely never got boring.

After working on all our services, I also cleaned up our deployment and service resource creation scripts and demoed these for our Engineering and Product teams! Every other week, we have "deep dives" into current projects underway or new procedures/tools that all engineers should know and use. My scripts fit into the latter category, and it was truly awesome to see other engineers across all teams using them—what started as a (slightly selfish) Make-week project to simplify my work setting up AWS environments and enforcing environment standards had turned into a productionized product that reduced what took hours to do into minutes, allowing any engineer to create AWS resources without double checking against Thumbtack's standardized setup configuration or asking someone on the AWS/infrastructure team. Immediately after the demo, I dove into my last, 20-minute All-Thumbs presentation, summarizing my work with Kiki, on the architecture team, and on our deployment scripts! Unlike my first, 3-minute presentation, the nerves had disappeared—and I left the podium (literally) dropping the mic.

On the twelfth week of Thumbtack...

I went off into the land of Big Data, learning how to use Hadoop, Hive, Spark, and more to extract data to analyze for Kiki's A/B test. We started off with 1% of production traffic sent to Kiki, then once metrics from emails sent from Kiki showed no significant deviation from baseline metrics, moved to 10% traffic, then 50%, and eventually to 100%! I paired with some of our data scientists on the Data Platform team—it was like stepping off into another world, leaving AWS to encounter the world of SQL and Hadoop clusters. With this final pairing, I realized I had actually paired with people on every single engineering team during my time here, either helping them with AWS or receiving help myself—I had integrated mobile push notifications into Kiki, dealt a little with our data platform, worked with the matching service, and code-reviewed some of our growth services. But while I had at least skimmed the surface of most of our engineering team and code-base, I had yet to work with designers or product managers—I guess working on our infrastructure and back-end services had to have some cons (although I didn't really mind at all).

I also had the chance to experience first-hand the impact of Thumbtack's product. One of the aspects I appreciate most about carsharing services is the conversation—not only do I get a ride, but I get the chance to have a nice chat along the way! During one of my excursions, I discovered my driver's dream was to open his own restaurant—and he was providing catering services to fundraise and bring his culinary skills out into the public. And so, of course, I brought up Thumbtack! Right before we parted, after I said that I hoped that Thumbtack will help him achieve his goals, he commented, "You must really love your job—I can hear it in the way you talk about it, it's really genuine." He couldn't have been more right. (I've checked back with him, and he's received 100 requests in the four days since he's signed up!) I also signed up as a pro myself for chamber music performance, and got my first hire this week!

After the launch of Kiki, things rapidly came to a close; we ended our internships with a slew of farewell dinners with Marco (CEO), Mark (VP Eng), and the team, which had come to feel like a second family. As always, the culinary team outdid themselves (dining hall food will definitely pale in comparison to this summer's meals, which included "farro risotto, lamb porterhouse, seared NY strip, zucchini a la plancha, blue cheese salad", and good old "buttermilk country fried chicken"). Although we were all sad to leave, the nights were full of our stories of our adventures, blunders, and most importantly, unforgettable learnings from this incredible summer. As we head back to school, we go armed with a new set of tools and experiences, and the knowledge that we made a difference in someone's life this summer.

team


A little about me—I'm currently a rising junior at Harvard studying computer science and mathematics. To be honest, I can't remember when I first heard about Thumbtack, but I do recall my first intense interview with Alex and encountering the genuine passion of the team to solve the many challenges Thumbtack faces. It's been an unforgettable experience, and I wish I could partake in Thumbtack's bright future ahead. Although my time at Thumbtack is over for now... perhaps yours is just about to start.

Kiki

[1] Kiki's Delivery Service (魔女の宅急便, Majo no Takkyūbin) is a 1989 Japanese anime produced, written, and directed by Hayao Miyazaki and is based on Eiko Kadono's novel of the same name.

[2] Jiji is Kiki's anthropomorphy pet talking cat and closest companion. (Coincidentally, Jiji has a girlfriend, a white cat named Lily)

[3] Lakitus are Koopas who ride clouds through the skies, mostly dropping Spiny Eggs on Mario or Luigi.


Page 1 / 11 »