Bryce Meredig of Citrine Informatics presents the company's materials data platform, Citrination. For academic and government users, this infrastructure is a free and open means to meet data management plan requirements of many federal funding agencies.
1. Citrination: Open Infrastructure for
Ingesting, Storing, & Mining
Materials Data
Bryce Meredig & Greg Mulholland
Citrine Informatics
MRS Fall Meeting
2 December 2015
3. About Citrine
Data platform for the physical
world—our software aggregates
and mines materials data to aid
R&D, mfg, sales
4. Business Model
We sell enterprise-scale industrial
deployments of our platform
We don’t charge academic or
government labs for public data
storage or access
5. Bold Assertion #1
Our platform is a one-line data
management plan for everyone
-Funding agencies ask you to make data
accessible, but do not specify how
-Anyone can store public data on our
platform for free, today
7. Bold Assertion #3
-Funding agencies don’t want an
infrastructure mortgage
Scientists should focus on science,
not IT
-Proliferation of unconnected data islands
doesn’t serve the community
8. We’re Nice, But Not a Charity
More data make our platform
smarter and more valuable
Users help us, and each other, by
curating and organizing data
11. Platform Overview
Data extraction
pipeline turns
docs & files
into a structured
database
Structured data
are far more
discoverable,
and also
amenable to
machine learning
19. Uploading Data
Ingestion is instant if you
create JSON or .csv files
-see (citrination.com/contributing)
Otherwise, we figure it out!
20. Credit and Provenance
We acknowledge both the contributor (i.e., uploader) and
the published work via the DOI
21. Incentives: Vanity Metrics
Weekly pageviews of the OQMD paper’s page (does not
count individual datum views)—comparable to high-
impact journal metrics!
32. Data Standards
MIF – Materials Information File:
General JSON schema for
defining materials data
Open standard and open-source
tools for working with it
34. Working with MIF
mifkit – open-source Python
toolset for working with MIF
Create MIFs in your code, or
import MIFs from Citrination into
your code
35. Programmatic Data Access
# full documentation:
http://citrineinformatics.github.io/api-documentation
# search the entire database
client.search(formula='CrFeSn’)
# filter on values
client.search(formula=‘GaN’, property=‘band gap’,
max_measurement=‘3’)
# search a single data set
client.search(formula='CrFeSn', data_set_id=‘100’)
API usage example:
36. Data Quality and Curation
Our philosophy: We are not arbiters of data quality; instead,
we give the community tools to assess and discuss quality
38. Machine Learning: TE Case Study
Sparks, T. D., Gaultois, M. W., Oliynyk, A., Brgoch, J., & Meredig, B. “Data mining
our way to the next generation of thermoelectrics.” Scripta Materialia (2015)
ML-based web app that predicts
key thermoelectric properties for
any bulk poly material
Heat map of thermal
conductivity predicted by
ML in Ru-Dy-Ge system
42. Get Involved
• We’ll store your data today—easy
data management plan template
• Join Citrination newsletter:
bit.ly/1NGNgdb
• Access our public data
• Email me: bryce@citrine.io