Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
The DataSift platform
1. Industry Leading Big Data Platform For Social
AGGREGATE across sources
for real-time & historic data
from a single API
PROCESS to filter out noise,
extract metadata & categorize
to add structure
DELIVER into BI tools,
enterprise & social apps.
3. Benefits of DataSift
Aggregate Process Deliver
Description Description Description
Single API across 20+ sources
Standard data format
Real-time and historical data access
Enrichments: Increase value by adding
meaning to the data
Filtering: Get relevant data with
unconstrained sophisticated filters
Categorization: Contextualize and add
structure to the data
Multiple options for integration using
APIs and pre-built connectors
Guarantee data delivery with DataSift
PUSH protocol
Configurable database formats to easily
map data into tables in your database
Benefits Benefits Benefits
Reduce integration costs
Minimize ongoing maintenance
Differentiate with value-added data
Lower infrastructure costs
Speed up time to market
Lower operational costs
Speed up time to market
4. One platform for faster integration
• Single API
• Standard data format
• Real-time and historical data
AGGREGAT
E
Broad, Unified &
Compliant Access
5. Broad Access to Data Sources
Twitter Facebook Sina Weibo WordPress Intense Debate Tumblr Google+
YouTube Bitly Instagram NewsCred Reddit WikipediaDailyMotion
Topix IMDb Videos Blogs Message Boards
Historical data available in addition to real-time
6. Enrichments: Increase value by adding
meaning to the data
Filtering: Get relevant data with
unconstrained sophisticated filters
Categorization: Contextualize and add
structure to the data
PROCESS
Enrich, increase
relevance and
contextualize data
7. Enrichments: Increase Value By Adding Meaning to Data
Add valuable meta data to the
raw feeds in real-time
Get more precise by adding
enrichment data to your filters
8. Filtering: Get Relevant Data With Unconstrained Sophisticated Filters
Get only the data you need, avoid paying for,
and processing, junk data
• Filter across content, meta data and
enrichments
• Create filters visually or using code
• Add data sources quickly by applying one
filter across multiple sources
It's really important to be able to curate what's interesting out of social content and surface that. And
that really requires a robust platform where we can dig in and figure out exactly what is relevant.
- Peter Yared, CTO & CIO, CBS Interactive
9. Categorization: Contextualize and Add Structure to the Data
Ready the data for consumption – making it
easier to analyze and decreasing time to value
• Define rules for classifying and scoring the
data
• Use machine learning to categorize and
score the data in real-time
• Leverage out of the box data science with
the library of pre-built classifiersJournalist
Tier-1 Customer
Profile
CRM
Churn Content
10. Simplify integration and leverage your
existing tools to consume the data
• Stream data in real-time or pull at your
own pace using push/pull APIs
• Leverage pre-built connectors to
popular storage solutions and BI tools
• Guarantee data delivery with DataSift
PUSH protocol
• Configurable database formats to
easily map data into tables in your
databases
DELIVER
Democratize Data
Across Your Apps and
Org
11. Consume The Data In Your Existing Infrastructure
Pull ConnectorHTTP RedisMongoDB
CouchDBAmazon DynamoDB Amazon S3
Google Big Query
FTP
SFTP ZoomData
ElasticSearch
Splunk Enterprise Streaming APIREST API
12. 1.5 Billion
Interactions per day
2.4 Petabytes
Total data archive
2 Terabytes
Archive per day
The mission critical nature of social demands an enterprise class social data
provider.
Susan Etlinger, Altimeter
Enterprise Class Solution
• Built to scale for social
• 99.9% reliability with 24/7 support
• Onboarding, training and guided deployment services
13. Lower Costs and Faster Time To Value
Lower Costs Faster time to value
• Get (and pay for) less junk data
• Use a single set of filters for real-time and
historical data
• Minimize data cleansing costs
• Simplify integration with DataSift PUSH
• Single API for adding new data sources
• Enrichments provide value add data out
of the box
• Categorization & scoring provides
“analysis-ready” data
• Consume the data using your existing
infrastructure
DataSift does the heavy lifting needed to get rich social data, allowing us to
focus on building innovative new features for our applications.
Adam Root, Co-founder and CTO, HipLogiq