How four mobile geeks in a room took on a six-million-dollar app discovery algorithm

Janel Torkington
8 min readNov 11, 2015

I’d been working on the Content team at Appszoom for a year and a half when we decided we had to take a different approach to organizing apps.

Basically, Appszoom crawls both the App Store and the Play Store, aggregating the dual oceans of data and plopping them into our own site. From there, the editing team ploughs through the pile, reviewing any and all that look halfway decent. In addition to generating mounds of reviews (25K and counting!), we keep a nifty blog that crunches through the staggering numbers and offers curated lists of apps to do this or that. We especially try for cross-sectional lists that wouldn’t be possible via official store classification: Apps For Throwing A Great Party, Fun One-Handed Apps To Beat Breastfeeding Boredom, etc.

But nothing was taking off. Our in-house editorial team wasn’t big enough to cover all possible user needs with our one-off lists, so people still needed to know an exact name of an app in order to find something to match their need.

The store categories were — and are — woefully inadequate. Categorizations as broad as “Tools” are so vague that they render themselves useless. Weirdo categories like “Libraries and Demo” stick around despite nobody having any idea WTF. Games are worse still; what even qualifies as “Casual” or “Puzzle” anymore?

I won’t go into how much of a joke the most popular charts have become; plenty of others have done so effectively and at length.

Screenshot from Nov. 11, 2015: Hey there, WhatsApp! What’s shakin’, Facebook?

What we saw in the market

So we started poking around. “The app discovery problem” has been a buzzword on everybody’s lips for the last several years. A handful of proposed solutions have been legitimately interesting:

  • AppCrawlr: Crawls both App Store and Play Store for keywords to automatically classify apps based on tagging system. Detailed semantic search engine. Lots of false positives. No original content. Acquired by Softonic back in March; haven’t seen any updates since then.
  • Appbrain: Crawls Play Store and indexes apps for super-duper SEO power. No original content. Plummeting Alexa, but still gets plenty of wayward traffic. “Discovery” only in the sense that they frequently show up first on Google results.
  • Playboard: User-curated lists. Android only. Web and app based.
  • Ask a question (“What is the best cross-platform note taking app?”) and get answers from the community that can be up/downvoted, reddit-style.
  • XYO: An app and web-based classification system that crawled the stores and automatically assigned a wide variety of tags. It allowed you to browse “similar apps” based on the tags, plus provided snippets of user reviews that it detected as potentially relevant. No original content. Acquired by Mandalay Digital Group and disappeared.

Other blips on the app discovery radar: r/androidgaming and its ilk, Chomp (acquired by Apple, shut down), Hubbl (acquired by Airpush, defunct), Appflow (defunct web, no updates since 2012), Kinetik (defunct web), Discovr (defunct), and Appsfire (pivoted to mobile advertising).

Holy shit.

How to logically break down the dual oceans of apps

Our humble four-person Content team learned as much as we could based off of the marginal success of the competition. It’s a sticky problem, clearly. Right as we put it on our to-do, the big news in app discovery was the acquisition of AppCrawlr by Softonic to the tune of a cool $6M.

We were impressed by AppCrawlr’s extensive tagging structure, but not its automatic execution. Its filters weren’t high enough to keep out the crap plaguing the stores. No original content means users have to rely blindly on the algorithm.

Could four (human) app geeks in a room find a way to beat a six-million-dollar algorithm?

We decided that tags were the right way to go. Each of us has been reviewing apps en masse for years, so off the tops of our heads we were able to brainstorm an enormous list of possible tags to classify apps and games (“tilt controls,” “pixel art graphics,” “barcode scanner,” etc.). From there, we came up with the simplest way possible to organize the tags:

Fully breaking down games required a handful of extra tag categories:

We pulled inspiration from user and expert app lists everywhere. We also scoured data from Appszoom analyzing how our very own hordes of traffic browse our catalog, helping us realize that “niche” needs like Tattoo/Nail Art Designs and Ringback Tones are way more in demand than anyone snug in a Silicon Valley bubble might guess.

All told, we began the process with about 600 tags.

… oh god.

The app tagging process

We coordinated with the Appszoom Product folks to set up the skeleton taxonomy, which let us associate any number of tags with each app.

Given that there are hundreds of potential tags every time you go to classify an app, there were several serious discussions about making this process more intuitive. For example, would it make sense to associate certain tags with each market category? Could we assume only a small slice of our custom tags would be relevant to all apps in the “Business” Play Store category? (e.g. Be Productive, Time Tracking, Busy People)

In the end, we decided that was too risky of a choice to make before we had data to back it up. As such, when editor tags an app, they must sort through hundreds of potential tags each time.

It would be terribly inefficient at first, this we knew. But we were thinking big. Spend enough time with any complex system, and you’ll get to know even its weirder details like the back of your hand.

We had already been tagging games as we reviewed them for a years’ time, but, for the most part, the limited set of old tags wasn’t intuitive. We migrated some of them (Theme: “Space”) and ditched the rest (Audience: “People” — seriously?).

That left 500-some-odd tags to fill with relevant apps. The large majority of them already had relevant, important, high-quality apps with an original review from our team, so the main chunk of the work was a matter of manually sorting enough to fill the taxonomy. We would simply have to pour time into it. Heaps and loads of time.

First, we ran a query for the top 1K downloaded apps, both in the Stores and on Appszoom. We divided the apps out between the four editors and worked our way through the lists.

As we sorted, we culled proposed tags that weren’t useful after all (Collectibles, Wanderers, Updates frequently), plus added many more that naturally came up in the course of seeing thousands upon thousands of apps (VPN, Police scanner, Stacking game).

After making our way through the first thousand apps, we took a look at the now over 700 strong list of tags. Just hitting the most popular wouldn’t be enough — the app stores do a good enough job of showcasing mega-popular apps already. We had to fill out the weirder tags with relevant offerings, which meant becoming mini-experts on apps for the Visually Impaired, Drum apps, Medication Reminder apps, and Whack-a-mole games.

  • Editors: 4
  • Tags: 741
  • Apps per tag per platform: 10 minimum
  • Time to properly tag an app: Between 3–5 minutes. Requires familiarizing oneself with the app, scanning through list of hundreds of potential tags, validating tag update, checking for related apps (Pro/Lite versions, Android/iOS)
  • Estimated hours invested in initial tagging effort: ~1000. Half for the top downloads, half filling out the full taxonomy — and not including any of the countless meetings wherein we debated the respective merits of Air Traffic Tracker vs. Flight Tracker and what exactly qualifies as an Incremental Game.
  • Tagging start date: March 11, 2015
  • Celebration date of filled-in, cleaned-up taxonomy: August 6th, 2015
  • Number of tagging-related spreadsheets in Drive: 16
  • Apps tagged as of November 11, 2015: 19,234
Splashing it all over the homesite.

What’s next?

We’ve got a ton of ideas of where to go next with our fledgling taxonomy, some of them already being pushed into production.

Inspired by what Curated has done for games, we’ve designed and commissioned a simple Android app discoverer, browsable by tags and including only high-quality apps reviewed by our team. We love the Curated idea; we want to take it to the next level for both apps and games with our zillions of tags and enormous database of content. Ours will make its debut on the Play Store within the month.

We’re betting on this one-two combo of actual humans skimming the very top-notch cream off the stores combined with logical organization that allows users to sort for the right match. It leaves room both for the magic of app discovery (since curators have crawled the corners of the stores sniffing up hidden gold) as well as the utility of quickly finding what you want (even if you don’t know the name of any specific app — find high-quality options through the Types and/or User Needs).

We’re also devising an intuitive searching system à la Appcrawlr, which will enable you to combine tags to make detailed queries like Kid-friendly Animal-themed One-finger games with Outstanding graphics and a Sweet soundtrack (hint: you’re looking for Gathering Sky).

Like so many of our start-up ilk, we’re also obsessed with tracking. Having our most highly-trafficked apps divided into detailed categories is going to allow us to make super specific observations on user behavior over time. Are those who browse Podcasts more likely to also look for Password Managers? What kind of apps do Females from the 35–44 bracket download? Which countries are still obsessed with Flappy-like games? There’s a potential gold mine of data here; stay tuned for insights as we dig ’em up.



Janel Torkington

Content designer. Sassy futurist. Ukulele plucker. Ottolenghi acolyte.