When we first developed our mobile apps in 2014, we decided to use a third-party service to deliver push notifications. It was fast, easy and at the time we did not have enough engineers to implement sending push notifications ourselves. However, as we have grown, both in number of engineers and in mobile app users, that company no longer gives us the level of control, data analysis, and stability that we desire. (Plus, it charges us per push notification.) Therefore, we decided to implement our own way of sending push notifications.
I’m Amal Nanavati, an incoming junior Computer Science and Global Studies major at Carnegie Mellon University. I spent this summer interning with Thumbtack’s Foundation team, and this is just one of the projects I worked on.
This post discusses how we designed and implemented our own system of sending push notifications, interesting preliminary data we have got from this system, as well as next steps.
This push notification service would be housed inside Kiki, our email, SMS and push notification delivery service (read more about the development of Kiki in this blog post). Kiki has internal HTTP endpoints to subscribe and unsubscribe mobile devices, as well as to send push notifications. Our challenge was to revamp those endpoints, so as oppose to forwarding the request to the third-party service, Kiki would handle them internally. Going into the project, the chief technical questions we had to answer:
- How should we store device tokens and what device tokens should we store?
- How should we navigate the different Apple (Apple Push Notification Service or APNS) and Google (Firebase Cloud Messaging or FCM) protocols for sending push notifications?
- How should we switch from to our own push notification system while causing the least disturbance for our pros and customers?
Furthermore, the system had to be robust and efficient: Thumbtack sends millions of push notifications per day, thousands per minute at peak and our mobile user base is rapidly growing.
After many discussions, we decided on the following. The device tokens would be stored in DynamoDB, primarily for its scalability. In addition, we would only store the currently active tokens (i.e. the device tokens of users that are currently logged in and have notifications enabled.) Although we could imagine adding the capability to send notifications to users that have logged out (i.e. reminders to log back in), we figured it was best to keep Kiki as lightweight as possible. Plus, if we want to send notifications to logged out users in the future, we could use Dynamo Streams to achieve that.
We also decided to lazily update device tokens in the table: we only update them when sending a notification to a particular user, if APNS or FCM tells us the device token is unregistered. This method is by far the most efficient, although it could result in the buildup of old device tokens we never send push notification to (for example, if a user stops using Thumbtack.) If in the future we notice there are a lot of device tokens that haven’t received push notifications for a while, we can periodically send test notifications to all the devices, to see if the tokens still valid (FCM supports “dry-run” notifications, APNS does not yet).
APNS recently switched to a HTTP2 protocol that fixes many of the issues present in the old system. For this protocol, Apple suggests keeping a small number of persistent connections to APNS open, and repeatedly sending individual push notifications requests on these connections. Based on this, we decided to keep x connections to APNS persistently open, and have y workers per connection sending push notification requests on those. This prevents bottlenecking the APNS connections, since the workers only wait long enough to receive the APNS response, and then pass it along to thread that was sending the push notification.
FCM, on the other hand, uses a standard HTTP protocol but does support batching. Therefore, we decided to send only one HTTP request per user per notification and batch together all device tokens the user might have. If we find a need to send less requests to FCM in the future, we might add a separate endpoint to Kiki to send one notification to multiple users (for example, when we are syndicating requests to pros) or we might modify the current Kiki endpoint to wait a certain period of time to see if Kiki gets additional requests with the same notification and if so batch them together.
The switchover plan was the most complicated part of the process because we needed a different switchover plan for iOS and Android. On iOS, it turned out that device tokens from the third-party service also worked with APNS, so all we had to do was migrate the existing tokens from to DyanmoDB. Since both databases had the exact same device tokens (we had Kiki double-write iOS tokens on subscribe and unsubscribe), we could use a feature flag to send x% of notifications through APNS, and the other 100-x% through the third-party service.
For Android, not only did the device tokens from the third-party service not work with FCM, but the Android app also needed extra configuration to receive push notifications from FCM. Since we had to make a new version of the app to receive push notifications from FCM, we decided that Kiki would only write new device tokens to DynamoDB, and send all Android push notification requests to both the third-party service service (to target all the old apps) and FCM (to target all the new apps). The app would still be configured to receive notifications from both (for edge cases such as when the user has installed the new app but not yet opened it). About a month after releasing the new Android app, we would force upgrade, completely moving over to our new push notification delivery service.
This system has been deployed to 100% for iOS push notifications, and is currently getting deployed for Android (the Android deploy is controlled by the rollout of the new app). We have had significantly fewer errors with the new system (since APNS and FCM have server-side errors much less frequently than the third-party service we used), and will force-upgrade to the new version of the Android app in the coming months.
Now that this system is built, we have been able to collect and analyze fascinating data surrounding push notifications. For example, some of our users (a tiny fraction) have as many as 42 active apps! These must be the small businesses, that have multiple employees to respond to requests. Further, we can now collect data on the afterlife of a push notification: by assigning each push notification a unique ID, and having the apps generate events when push notifications are opened, we can track how quickly users respond to push notifications and what they do afterwards. This data will be useful for optimizing our messaging pathways and customizing our push notifications.
Next steps for the push notification system largely depend on what we observe with the data. For example, if we find that many users open push notifications right away, we might conditionally send SMS-es only if the push notifications haven’t been opened within x seconds. If we find that users see and respond to requests using some means other than push notifications, we can use silent push notifications to remove the notifications from the dashboard once they have been read through email or SMS. Finally, based on what we see about push notification usage, we might build out a “Push Notification Creation” tool for our content teams to send customized, targeted push notifications to our pros and customers.
As you can see there is a lot more work to be done, both with messaging and the other projects mentioned in our Engineering blog. If any of this interests you, join Thumbtack and help build the local services marketplace of the future!