Achieving high accuracy in machine learning often requires extremely large amounts of training data – but just how much training data is required? Recent research highlights a surprising trend: training data provides diminishing marginal returns. That is, the final few percentage points of accuracy gains require orders of magnitude more training data than the first 95%.
Just as Moore’s Law helped define the economics of microprocessor R&D, this exponential trade-off between dataset size and accuracy has major implications for bringing ML to scale. Specifically, under this trade-off, we can coarsely classify applications of ML into two categories:
Bespoke ML: High Accuracy at High Cost. When accuracy is paramount, data is critical. In applications where errors are expensive—like healthcare, autonomous vehicles, and fraud detection—there is an arms race to stockpile training data. In these domains, large training datasets are a critical, defensible advantage. In this regime, creating a useful model is expensive, and productionizing ML requires specialized, verticalized approaches.
Off-the-Shelf ML: Low(er) Accuracy at Low Cost. At the opposite end of the spectrum, it’s possible to obtain models with lower accuracy quickly and with low cost. The amount of data required to achieve modest accuracy can be surprisingly small – for example, we can train a vanilla ResNet-50 from scratch to over 70% Top-5 accuracy on ImageNet with only 100 examples per target class; compared to training on the full dataset, this incurs a ~20% accuracy penalty but uses 10x fewer data points. 70% accuracy is not near human accuracy, but models in this range can help reduce human workloads.
It’s clear from recent history that bespoke ML efforts can work well, especially if staffed by large and specialized teams of ML PhDs, data engineers, and human annotators. In contrast, off-the-shelf ML is far more accessible, but it’s unclear how to leverage these models to make useful business decisions. Attempting to fully automate existing workflows is unlikely to work well, except for the lowest-value use cases. Instead, using off-the-shelf ML in processes that leverage and augment human abilities have the most potential for widespread adoption. At Sisu, we believe delivering value with off-the-shelf ML represents an opportunity at the scale of Microsoft Office, and there’s an entirely new set of tools needed to fill the gap.