SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
JoeOlson
DataArchitect
SmartChicagoCollaborative
27Mar2014
joe.olson@cct.org
(All the cool buzzwords in one place!)
Social Media,
Cloud Computing,
Machine Learning,
Open Source, and
Big Data Analytics
Social Media - Twitter
• What can we learn from Twitter?
• 400 million tweets per day
source: http://articles.washingtonpost.com/2013-03-21/business/37889387_1_tweets-jack-dorsey-twitter
• 218 million users
source: http://techcrunch.com/2013/10/03/bweeting/
• Excellent source of sentiment
• Excellent source of big data
• Prototyping
• Modeling natural language
• Resume padding
Social Media - Twitter
• How do we get at the data?
• Twitter provided APIs:
• https://dev.twitter.com/docs
• Streaming
• Set up a real time data stream (json) based on keywords
• REST (v1.1)
• Make REST requests, and get results
• Possible parameters:
• Geospatial bounding box
• By time
• By user, hashtag, retweets etc
• Fire hose
• Big $$$. Big data
Social Media - Twitter
• Information & Obstacles
• Who
• What
• At best: Plain English (!)
• Worse: (Spanish or Arabic or Portuguese...)
• Worst: “Textspeak” symbols :-0, UTF8 chars, etc.
• Absolute Worst: combination of all of them
• Where
• 1-2% with latitude / longitude
• Geocode
• When
Social Media - Twitter
JSON Tweet example:
• "created_at":"Sun Oct 27 13:57:40 +0000 2013",
• "id":394462908261740540,
• "text":"Flu :(",
• "source":"<a href="http://twitmania.com" rel="nofollow">TwitMania™</a>",
• "user":{
• "id":594141140,
• "name":"Yultiana Farida N",
• "screen_name":"yultiana",
• "followers_count":231,
• "friends_count":252,
• "created_at":"Tue May 29 23:58:25 +0000 2012",
• "statuses_count":2397,
• },
• "geo":null,
Cloud Computing
• What does cloud computing bring to the table?
• Amazon’s EC2:
• Commoditized hardware
• Low cost
• Only charged for resources you use
• No long term commitments
• Scalable
• "Throwaway" mentality
**IF** you play by their rules!
Cloud Computing – AWS
• Tools
• Virtual Machines
• # of Processors, RAM, OS, disk capacity and I/O – all configurable
• Price range: $.02/hr - $4.60/hr
• Licensed OSes cost 50% more than Linux OSes
• Archive Storage
• S3 / Glacier
• Work Queues
• SQS
• Data Stores
• Dynamo (key value store), Red Shift (analysis store)
• Virtual Networking
• Routers, VPN gateways, access control lists, etc
• APIs
• Command line
• HTTPS REST
• Native programming languages (Python, bash, PHP, Java etc.)
Ideal for rapid prototyping / proof of concepts
Cloud Computing – AWS
• APIs
• Basic
• Start an instance (and start billing)
• Stop an instance (stop billing)
• Insert item into queue
• Remove item from queue
• Write to backup store
• Ultra advanced
• Reserved vs. on demand vs. spot instances
• Price can drop as much as 80% due to market demand
• Instance can disappear at any time
Big Data Analytics
• Can we skirt the “big data” problem by distilling the tweets
down from millions and millions “noise” tweets into a more
desirable data set?
• Enrich in real time, rather than on archived data, and avoid the
overhead of map/reduce?
• Possible Enrichment of raw data:
• Classification – separate tweets into “relevant” and “irrelevant”
• Geocoding – improve on the 1-2% ?
• Aggregation –> map reduce
• Mapping -> Reduce Function -> Output
• AWS – Elastic Map Reduce
• Clustering
Machine Learning
• Classification: relevant, or irrelevant?
• Human trained model
• Once model is established, bounce new data off it for
classification
• Validation of model
• Accuracy =
(Total # of classifications – Mismatches between machine / human)
Total # of classifications
• Crowdsourcing – AWS Mechanical Turk
• Improve model by feeding disagreements back into the model
• Our best text classification model to date: low 90%
Open Source
• Friendly to the commoditized computing paradigm
• Don’t have to worry about licensing issues
• Contributes to the “throwaway” discipline
• Don’t have to re-invent the wheel (collaboration)
• Solutions applicable to all parts of the architecture
• Acquire data: Node.js – non blocking
• Analyze data: R – statistical engine
• Store and query data: MongoDB (document store) or Riak (key-
value database)
Architecture
• We know Twitter is providing a mountain of data from all parts
of the world
• We know Amazon is providing a framework of low cost, on-
demand, no commitment computing
• Open source is providing a rich tool set
• Goals:
• Architect with cost in mind!
• Enrichment - Real time and after-the-fact enrichment (open data)
• Scalable
• Decoupled
• Service based
• Rapid development
• Prove the concepts
Architecture - Acquire
• Acquire the data from Twitter
• If classifying in real time:
• Store then classify?
• Classify then store?
• Tools
• Twitter streaming API
• Keywords
• Node.js
• Several different packages to interface with Twitter APIs
• Amazon
• EC2
• SQS (?) Extremely useful, but drives the cost up
Architecture - Analyze
• Classification interface
• Service based – HTTP REST
• Push or pull?
• Push – classifiers listen on port 80
• Pull – classifier starts pulling from an established work queue
• Both highly scalable and flexible with respect to cost.
• Stateless
• R
• Human trained machine learning packages available
• Cloud friendly – no licenses
• Automatable – from install, configuration, execution
Architecture - Store
• Store JSON as an object (document store) or normalize (relational
database)?
• Relational databases
• disk I/O intensive – not cloud friendly
• allow complex indexing
• Easy to get a business intelligence front end on them
• Requires a schema / ETL
• Key-value document stores
• Designed to be scalable – doesn’t need fast disks
• Indexing is not nearly as flexible as RDBMS
• More difficult to front a UI – no “drag and drop” tools
• No schema / ETL needed.
• Not as mature
• MongoDB / Riak
Architecture – Presentation
• Least need for cloud friendly scalability here?
• Options
• Licensed BI software – Tableau, Endeca, Jaspersoft, Pentaho
• Open source BI software – SpagoBI
• Roll your own - PHP, Ruby, Visual Basic, Javascript, etc
• Connect to an existing system instead?
Costs – Real Time Classification
• Number of tweets collected per day: 1,000,000 (comfortable - .25%)
• Machine used on EC2 to acquire (node.js): micro
• $.02/hr * 24 hrs = .48/day
• Machine used on EC2 to classify (R): small (x2)
• $.06/hr * 24 hrs = $1.44/day*2 = $2.88/day
• Machine used on EC2 to store (MongoDB): large
• $.24/hr * 24 hrs = $5.76 /day
• Machine used on EC2 for GUI (Apache): small
• $.06/hr * 24 = $1.44
•
$0.48+$2.88+$5.76+$1.44 = $10.56 / 1,000,000 =
.00001056 cents/tweet
Can add more zeros if you relax real-time classification (spot instances)
Costs - Archive
• Size of average tweet: 2.5 KB
• Cost to archive:
• s3 : .095 GB/month
• 0.0000002 per tweet per month
• Glacier: .01 GB/month
• 0.00000002 per tweet per month
• Compression will add even more zeros, but will require more
computing power, and mean more latency for post collection
data analysis. Can be automated.
Use Cases
• Foodborne Chicago (http://foodborne.smartchicagoapps.org/)
• Public-private partnership with City of Chicago Dept. of Public Health
and Smart Chicago Collaborative
• Reach out to city residents on Twitter tweeting about food poisoning
symptoms, in an attempt to get them to log information in the City’s
311 database (via the Open311 API)
• Once in the 311 database, it follows established City workflows, and
becomes actionable
• Numbers (1 year):
• 2,390 tweets classified as related to food poisoning
• 282 tweets responded to
• 205 reports submitted
• 145 inspections
• Real time classification examples:
• “Ugh! I got food poisoning from the McDonalds’s on Halstead!”
http://184.73.52.31/cgi-bin/R/fp_classifier?text=Ugh!%20I%20got%20food%20poisoning%20from%20McDonalds%20on%20Halstead
• “U of Chicago releases a new paper on the effects of food poisoning”
http://184.73.52.31/cgi-bin/R/fp_classifier?text=U%20of%20Chicago%20releases%20new%20paper%20on%20the%20effects%20of%20food%20poisoning
• Video:http://www.youtube.com/watch?v=RNf9XQ_25Yw&feature=youtu.be
Use Cases
• Disease Tracker
• Large scale attempt to track disease occurrences in the United
States.
• Sponsored by the Dept. of HHS
• Approximately 1 million tweets a day (cold, flu) classified in real
time
• EC2 scalable instances
• Geolocation
• Cost to run for 6 months: $850
Future Directions
• Turnkey service
• Can all this functionality be abstracted down to a pushbutton
service?
• Open data
• Can you advertise the data collected, how you enriched it, and
allow others to come along an enrich it as well?
• General purpose bridge between Twitter and issue tracking
databases
• Big industry problem
Github Sources
• Tweet Collector
• https://github.com/smartchicago/TweetCollector
• Classifier Code
• https://github.com/corynissen/foodborne_classifier

Más contenido relacionado

La actualidad más candente

Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesJen Stirrup
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server ProLynn Langit
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...DataStax
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 

La actualidad más candente (20)

Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 

Destacado

Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...
Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...
Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...Dynatrace
 
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...Kevin O'Shea
 
Today's social media and cloud computing in business environment
Today's social media and cloud computing in business environmentToday's social media and cloud computing in business environment
Today's social media and cloud computing in business environmentNovi Research Center
 
Big data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offeringsBig data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offeringsVijayananda Mohire
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureRick Mans
 
The Cloud and Mobile - Guppers
The Cloud and Mobile - GuppersThe Cloud and Mobile - Guppers
The Cloud and Mobile - GuppersAndy Harjanto
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개Gruter
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices M A Hossain Tonu
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media AnalyticsSimplify360
 

Destacado (10)

Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...
Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...
Java-Based Microservices: Understanding the Benefits and Boundaries for Your ...
 
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
 
Today's social media and cloud computing in business environment
Today's social media and cloud computing in business environmentToday's social media and cloud computing in business environment
Today's social media and cloud computing in business environment
 
Understanding meteor
Understanding meteorUnderstanding meteor
Understanding meteor
 
Big data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offeringsBig data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offerings
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architecture
 
The Cloud and Mobile - Guppers
The Cloud and Mobile - GuppersThe Cloud and Mobile - Guppers
The Cloud and Mobile - Guppers
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media Analytics
 

Similar a Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics (Chicago Summit)

Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04marc_harrison
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceMercedes Coyle
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionJean-Claude Sotto
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Miguel Pastor
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 

Similar a Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics (Chicago Summit) (20)

Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big data
Big dataBig data
Big data
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 

Más de Open Analytics

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationOpen Analytics
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsOpen Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital EconomyOpen Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupOpen Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupOpen Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalOpen Analytics
 

Más de Open Analytics (20)

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & Personalization
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
 
HDScores OA DC Pitch
HDScores OA DC PitchHDScores OA DC Pitch
HDScores OA DC Pitch
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics (Chicago Summit)

  • 1. JoeOlson DataArchitect SmartChicagoCollaborative 27Mar2014 joe.olson@cct.org (All the cool buzzwords in one place!) Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics
  • 2. Social Media - Twitter • What can we learn from Twitter? • 400 million tweets per day source: http://articles.washingtonpost.com/2013-03-21/business/37889387_1_tweets-jack-dorsey-twitter • 218 million users source: http://techcrunch.com/2013/10/03/bweeting/ • Excellent source of sentiment • Excellent source of big data • Prototyping • Modeling natural language • Resume padding
  • 3. Social Media - Twitter • How do we get at the data? • Twitter provided APIs: • https://dev.twitter.com/docs • Streaming • Set up a real time data stream (json) based on keywords • REST (v1.1) • Make REST requests, and get results • Possible parameters: • Geospatial bounding box • By time • By user, hashtag, retweets etc • Fire hose • Big $$$. Big data
  • 4. Social Media - Twitter • Information & Obstacles • Who • What • At best: Plain English (!) • Worse: (Spanish or Arabic or Portuguese...) • Worst: “Textspeak” symbols :-0, UTF8 chars, etc. • Absolute Worst: combination of all of them • Where • 1-2% with latitude / longitude • Geocode • When
  • 5. Social Media - Twitter JSON Tweet example: • "created_at":"Sun Oct 27 13:57:40 +0000 2013", • "id":394462908261740540, • "text":"Flu :(", • "source":"<a href="http://twitmania.com" rel="nofollow">TwitMania™</a>", • "user":{ • "id":594141140, • "name":"Yultiana Farida N", • "screen_name":"yultiana", • "followers_count":231, • "friends_count":252, • "created_at":"Tue May 29 23:58:25 +0000 2012", • "statuses_count":2397, • }, • "geo":null,
  • 6. Cloud Computing • What does cloud computing bring to the table? • Amazon’s EC2: • Commoditized hardware • Low cost • Only charged for resources you use • No long term commitments • Scalable • "Throwaway" mentality **IF** you play by their rules!
  • 7. Cloud Computing – AWS • Tools • Virtual Machines • # of Processors, RAM, OS, disk capacity and I/O – all configurable • Price range: $.02/hr - $4.60/hr • Licensed OSes cost 50% more than Linux OSes • Archive Storage • S3 / Glacier • Work Queues • SQS • Data Stores • Dynamo (key value store), Red Shift (analysis store) • Virtual Networking • Routers, VPN gateways, access control lists, etc • APIs • Command line • HTTPS REST • Native programming languages (Python, bash, PHP, Java etc.) Ideal for rapid prototyping / proof of concepts
  • 8. Cloud Computing – AWS • APIs • Basic • Start an instance (and start billing) • Stop an instance (stop billing) • Insert item into queue • Remove item from queue • Write to backup store • Ultra advanced • Reserved vs. on demand vs. spot instances • Price can drop as much as 80% due to market demand • Instance can disappear at any time
  • 9. Big Data Analytics • Can we skirt the “big data” problem by distilling the tweets down from millions and millions “noise” tweets into a more desirable data set? • Enrich in real time, rather than on archived data, and avoid the overhead of map/reduce? • Possible Enrichment of raw data: • Classification – separate tweets into “relevant” and “irrelevant” • Geocoding – improve on the 1-2% ? • Aggregation –> map reduce • Mapping -> Reduce Function -> Output • AWS – Elastic Map Reduce • Clustering
  • 10. Machine Learning • Classification: relevant, or irrelevant? • Human trained model • Once model is established, bounce new data off it for classification • Validation of model • Accuracy = (Total # of classifications – Mismatches between machine / human) Total # of classifications • Crowdsourcing – AWS Mechanical Turk • Improve model by feeding disagreements back into the model • Our best text classification model to date: low 90%
  • 11. Open Source • Friendly to the commoditized computing paradigm • Don’t have to worry about licensing issues • Contributes to the “throwaway” discipline • Don’t have to re-invent the wheel (collaboration) • Solutions applicable to all parts of the architecture • Acquire data: Node.js – non blocking • Analyze data: R – statistical engine • Store and query data: MongoDB (document store) or Riak (key- value database)
  • 12. Architecture • We know Twitter is providing a mountain of data from all parts of the world • We know Amazon is providing a framework of low cost, on- demand, no commitment computing • Open source is providing a rich tool set • Goals: • Architect with cost in mind! • Enrichment - Real time and after-the-fact enrichment (open data) • Scalable • Decoupled • Service based • Rapid development • Prove the concepts
  • 13. Architecture - Acquire • Acquire the data from Twitter • If classifying in real time: • Store then classify? • Classify then store? • Tools • Twitter streaming API • Keywords • Node.js • Several different packages to interface with Twitter APIs • Amazon • EC2 • SQS (?) Extremely useful, but drives the cost up
  • 14. Architecture - Analyze • Classification interface • Service based – HTTP REST • Push or pull? • Push – classifiers listen on port 80 • Pull – classifier starts pulling from an established work queue • Both highly scalable and flexible with respect to cost. • Stateless • R • Human trained machine learning packages available • Cloud friendly – no licenses • Automatable – from install, configuration, execution
  • 15. Architecture - Store • Store JSON as an object (document store) or normalize (relational database)? • Relational databases • disk I/O intensive – not cloud friendly • allow complex indexing • Easy to get a business intelligence front end on them • Requires a schema / ETL • Key-value document stores • Designed to be scalable – doesn’t need fast disks • Indexing is not nearly as flexible as RDBMS • More difficult to front a UI – no “drag and drop” tools • No schema / ETL needed. • Not as mature • MongoDB / Riak
  • 16. Architecture – Presentation • Least need for cloud friendly scalability here? • Options • Licensed BI software – Tableau, Endeca, Jaspersoft, Pentaho • Open source BI software – SpagoBI • Roll your own - PHP, Ruby, Visual Basic, Javascript, etc • Connect to an existing system instead?
  • 17. Costs – Real Time Classification • Number of tweets collected per day: 1,000,000 (comfortable - .25%) • Machine used on EC2 to acquire (node.js): micro • $.02/hr * 24 hrs = .48/day • Machine used on EC2 to classify (R): small (x2) • $.06/hr * 24 hrs = $1.44/day*2 = $2.88/day • Machine used on EC2 to store (MongoDB): large • $.24/hr * 24 hrs = $5.76 /day • Machine used on EC2 for GUI (Apache): small • $.06/hr * 24 = $1.44 • $0.48+$2.88+$5.76+$1.44 = $10.56 / 1,000,000 = .00001056 cents/tweet Can add more zeros if you relax real-time classification (spot instances)
  • 18. Costs - Archive • Size of average tweet: 2.5 KB • Cost to archive: • s3 : .095 GB/month • 0.0000002 per tweet per month • Glacier: .01 GB/month • 0.00000002 per tweet per month • Compression will add even more zeros, but will require more computing power, and mean more latency for post collection data analysis. Can be automated.
  • 19. Use Cases • Foodborne Chicago (http://foodborne.smartchicagoapps.org/) • Public-private partnership with City of Chicago Dept. of Public Health and Smart Chicago Collaborative • Reach out to city residents on Twitter tweeting about food poisoning symptoms, in an attempt to get them to log information in the City’s 311 database (via the Open311 API) • Once in the 311 database, it follows established City workflows, and becomes actionable • Numbers (1 year): • 2,390 tweets classified as related to food poisoning • 282 tweets responded to • 205 reports submitted • 145 inspections • Real time classification examples: • “Ugh! I got food poisoning from the McDonalds’s on Halstead!” http://184.73.52.31/cgi-bin/R/fp_classifier?text=Ugh!%20I%20got%20food%20poisoning%20from%20McDonalds%20on%20Halstead • “U of Chicago releases a new paper on the effects of food poisoning” http://184.73.52.31/cgi-bin/R/fp_classifier?text=U%20of%20Chicago%20releases%20new%20paper%20on%20the%20effects%20of%20food%20poisoning • Video:http://www.youtube.com/watch?v=RNf9XQ_25Yw&feature=youtu.be
  • 20. Use Cases • Disease Tracker • Large scale attempt to track disease occurrences in the United States. • Sponsored by the Dept. of HHS • Approximately 1 million tweets a day (cold, flu) classified in real time • EC2 scalable instances • Geolocation • Cost to run for 6 months: $850
  • 21. Future Directions • Turnkey service • Can all this functionality be abstracted down to a pushbutton service? • Open data • Can you advertise the data collected, how you enriched it, and allow others to come along an enrich it as well? • General purpose bridge between Twitter and issue tracking databases • Big industry problem
  • 22. Github Sources • Tweet Collector • https://github.com/smartchicago/TweetCollector • Classifier Code • https://github.com/corynissen/foodborne_classifier