Whatever Happened to Democratizing Machine Learning? Three Inconvenient Truths about the State of Enterprise ML

by Grant Shirk · May 28, 2019

At TieCON 2019, the world’s largest entrepreneurial conference, Sisu CEO and founder Peter Bailis gave a keynote address on “Whatever happened to democratizing ML?” In that conversation, Peter challenged the conventional wisdom (and described three inconvenient truths) about enterprise ML.

There’s no doubt that we’re in a golden age of AI. Over the last several months, we’ve seen incredible advances in applying artificial intelligence techniques to image recognition, language processing, planning, and information retrieval. We’re seeing practical applications of machine learning improving everyday activities. There are more amusing applications too, including one team teaching AI how to craft puns.

However - particularly in the world of business - it feels like we’re “not quite there yet” when it comes to finding meaningful enterprise ML and AI applications. There’s a growing sentiment that solutions in the market today are too bespoke, require extensive consulting investment, and are at risk for never showing a positive ROI.

At Sisu, we believe there are three inconvenient truths about enterprise ML that are at the root of this challenge. The good news is that each of these challenges are surmountable with the right focus.

Training Data is Scarce

One of the most valuable investments ever made into training data is the ImageNet project, a set of over 14M images categorized and labeled, and open to the public. Thanks to this investment from Fei-Fei Li and the ImageNet team, researchers and deep learning enthusiasts have been able to dramatically improve image classification accuracy.

However, gathering this kind of labeled data at scale can be very demanding. Particularly for tasks involving sensitive data or limited domain expertise, data is difficult or even impossible to come by. For example, the collection and labeling of DICOM medical image scans is challenging for privacy reasons and it’s even harder to find experts who can credibly identify and label tumors, tears, and abnormalities. These are really valuable tasks, but it’s an open question if it’s feasible to get enough data to effectively train upon.

Deep Networks Don’t Help Much with Structured Data

What’s more, deep networks don’t help much with model accuracy for structured data use cases. This is particularly relevant for businesses, as most enterprise information is structured, tabular data. A great example of this in practice is a recent paper from Google on “Scalable and accurate deep learning with electronic health records.” The paper shows some dramatic results for prediction accuracy in healthcare outcomes, but at the same time also shows that simpler approaches like logistic regression perform almost as well.

Or in other words, we’re not quite at the point where the investment required to train a deep net on structured data delivers a significant ROI above and beyond other techniques.

AutoML is Not a Panacea

AutoML has recently been touted as a major advance for enterprise ML. While automating key steps of the data science process can increase the pace of model creation, this automation is not a panacea to solving enterprise ML. There’s still a long way to go before AutoML models reach the level of accuracy needed for real-world success.


So what can we do in response? By taking each of these truths in turn, it is possible to identify a few key principles that can accelerate the adoption and effectiveness of machine learning in the enterprise.

  • First, take advantage of the data we already have, it is possible to get meaningful results from our models and tools faster. Looking across industries, most companies are not putting the data they’re collecting on a daily basis to use. Most estimates, including recent surveys by Forrester, Microstrategy, and Hitachi, indicate that 2/3 or more of the data collected by businesses goes unused. There’s a huge advantage for companies who can shift their existing data stores from passive to active assets.

  • Second, focus on augmentation, not automation. A great example of a high-impact workflow is the analysis of performance metrics. At Sisu, we’re helping people rapidly perform real-time diagnosis of why their most important metrics are changing, using machine learning and large-scale data explanation. By focusing on augmenting the skills and capacity of these experts in the business, we can help teams find meaningful facts buried deep in their operational data.

  • Finally, put models in production quickly. The old adage, “the perfect is the enemy of the good” is especially true in this arena. By putting simple, effective models into production and setting the expectation that user feedback is expected and useful, we can rapidly iterate and use that feedback as a way to further train and refine the system.

We’re at the point where the luster on this golden age of AI is fading, but with the right investments, it’s possible to avoid widespread disillusionment with the technology. There are valuable, practical, and feasible applications for enterprise ML. It’s why we started Sisu, and we’re excited for the possibility of “off the shelf machine learning” and making new tools accessible to more people in the organization.