MLCommons: Better ML for Everyone

MLCommons:
Better ML for
Everyone
David Kanter
Executive Director

MLCommons™ in 6 questions
1. What is MLCommons?
2. Why benchmarks?
3. Why datasets?
4. Why best practices?
5. What’s next?
6. How can I get involved?

● Information access
● Business productivity
Machine learning (ML) could benefit everyone
● Health
● Safety
Icon: Ætoms
Photo: Ian Maddox Photo: Katrina.Tuliao

And has a huge potential market

But machine learning is a young industry.

Young industries need things to grow
!

MLCommons is a new open engineering
organization to create better ML for everyone
Open engineering
organizations
AI/ML
organizations
MLCommons

MLCommons is supported by industry and
academics
Academics from educational
institutions including:
Harvard University
Indiana University
Polytechnique Montreal
Stanford University
University of California, Berkeley
University of Toronto
University of Tübingen
University of York, United
Kingdom
Yonsei University

MLCommons is the work of many people...
And many others contributing
ideas and code...

MLCommons creates better ML through three
pillars
Benchmarks
Datasets
Best
practices
Better ML
Research

“What get measured, gets improved.” — Peter Drucker
Benchmarking aligns research with development,
engineering with marketing, and competitors across the
industry in pursuit of the same clear objective.
Benchmarks drive progress and transparency

MLCommons will host MLPerf™
Industry standard; drives progress and transparency
MLPerf result press coverage (selected)

MLPerf progress
Scale 2018 2019 2020 2021
Training - HPC
Training
Inference - Datacenter
Inference - Edge
Inference - Mobile
Inference - Tiny (IoT)
Increasing breadth Improving technical approach
New training/inference benchmarks
● Recommendation: DLRM + 1TB
dataset
● Medical imaging: U-NET
● Speech-to-text: RNN-T
Standardized methodology for Training
● Optimizer definitions
● Hyperparameter definitions
● Convergence expectation (WIP)
Adding power measurement to Inference
Launched Mobile App (early alpha release)

ML needs ImageNet++ for everything
Imagenet: $300K → Modern ML
~80% of research papers by leading ML companies cite public datasets
And ML innovations needs:
● Large
● CC license or similar
● Redistributable
● Diverse
● Continually improving
But most public datasets are:
● Small
● Legally restricted
● Not redistributable
● Not diverse
● Static

MLCommons is starting with speech-to-text
https://commons.wikimedia.org/wiki/File:List_of_language
s_by_number_of_native_speakers.png
English
Voice interfaces will reach most of Earth’s 8 billion people by 2025
Need bigger datasets that support more diverse languages and accents
{
Earth’s population
grouped by native
language

People’s Speech: 10 years of speech, CC-BY
Read text Conversation +
noise
Diverse
languages/accents
English
60+ Other
languages
Future
w
ork
● ~10 years of labeled
speech (>10TB)
● CC-BY license (likely),
redistributable
● Undergoing evaluation by
MLCommons members
● Aiming for public release
1H2021
● Living dataset

ML has too much friction
Example: found an ML model you want to use?
Interface (how do you even run it)?
Software dependances?
Dataset?
Platform compatibility?
All solved after a couple of days of hard work!
And then it converges to 81.6% of claimed accuracy?
Unsplash.com

MLCube™ is a shipping container for ML models
Cargo ships Unsplash.com: / Shipping container: KMJ / Medicines: Ralf Roletschek / Electronics: DustyDingo
Complex
infrastructure
Complex
contents
Simple interface = low friction

MLCube makes it easier to share models
Basically, a docker with consistent command line and metadata
(really an abstract interface for any container)
Simple runners for:
● Local machine
● Multiple clouds
● Kubernetes
Or incorporate into your own infrastructure
Learn more at:
https://github.com/mlcommons/mlcube

MLCommons Research
Algorithmic Research Working Group
● Benchmarks for algorithms to improve efficiency: better accuracy/compute
Medical Research Working Group
● Federated evaluation across distributed data: research ~= clinical practice
Scientific Research Working Group
● Better datasets and software for science
(Your idea here)

We welcome people who want to make ML
better.
● Join our mailing list
● Attend community events
● Become a member (free for academics)
● Participate in working groups
● Submit benchmark results
Join us at mlcommons.org!

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

MLCommons: Better ML for Everyone

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a MLCommons: Better ML for Everyone

Similar a MLCommons: Better ML for Everyone (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

MLCommons: Better ML for Everyone