This year's PyCon was again, extraordinary. Our heartfelt thanks go out
to the entire community for such an amazing event. We especially enjoyed
getting to know other engineers from all over the world.
In fact, we’ve found that one of the best ways to start a conversation
with an engineer is to start by talking about code. Thumbtack’s second
annual code challenge gave us a great opportunity to do this.
Last year's challenge was to determine whether there was a winner on
a given Connect 4 game board. This year's challenge required looking for
anagrams of word pairs in large bodies of text. Big thanks to Jeff
Kramer, who posted the problem description online.
In response to feedback from last year’s challenge, this year's anagram
challenge had two components: an easier challenge that required finding
anagrams in a paragraph of text, and a harder challenge that looked for
anagrams in the full corpus of Alice in Wonderland.
We offered two prize sets to match the two difficulty levels: a mug and
sunglasses for the easier challenge, and a beer mug and shot glass for
the harder challenge.
The general form of solutions to this year’s challenge was fairly
consistent. In fact, there were no solutions that deviated from this
- Parse the input text to filter and transform input words
- Remove duplicate words (can be done at this point or later)
- Find all possible anagram pairs of those words (often using
- Create a data structure to hold matching anagram pairs, usually
keyed off the sorted letters of the pair
We received 53 total (working) solutions to the Alice In Wonderland
The best solutions were all on the order of O(N^2^) (where N is the
number of unique words in the corpus), while slower solutions were often
O(N^4^). The difference was almost entirely due to the data structures
used in step 4: faster solutions used a dictionary (search is constant
time), while slower solutions used a list (search takes time O(N^2^),
where search is linear on the number of word pairs).
We tested all the solutions on a modern Mac laptop and pulled out some
of the fastest solutions to highlight here. O(N^2^) solutions generally
run in the 15-40s range. We see a massive spike (literally, off this
chart) for slower solutions.
Hall of Fame
Chris Moultrie's solution was the fastest of all the solutions we
received. On our testing machine, his solution to the Alice problem
routinely ran in under 12 seconds. There are several clever
optimizations specific to the challenge, so, while this is not a
general-purpose solution, it is extremely quick. (Find Chris on
GitHub and Twitter.)
Second place for speed goes to Łukasz Balcerzak. His solution routinely
took ~13.5s. This is a more general-purpose solution that is also
well-factored and readable. (Find Łukasz on GitHub and
Samuel Merritt's solution is also commendable for its speed (ranked in
the top 5) as well as its clarity and excellent word choice. (Find
Samuel on GitHub and Twitter.)
Congratulations to everyone who participated! If you submitted your
solution but were not able to claim your prize at PyCon, please email us
and we’ll try to get your prize to you.
We had a great time with this challenge and hope to see you all again at
next year's PyCon in Montreal.