SlideShare una empresa de Scribd logo
1 de 31
Utilizing Crowd-sourced
Data for Knowledge
Extraction
A Themed Report of SemTechBiz San Francisco 2013.06
Summary
People found great use of external data to help
extract knowledge, build models
These valuable data are generated by crowds but
harvested by mining algorithms and/or UI tools
LOD to enrich attributes and synonyms (WalmartLabs),
NLP on recipes and build deep models (Whisk.com)
Webmaster tools to markup content (Google)
SemTechBiz SF 13
SemTechBiz2013 in San Francisco is still the largest in the
world on semantic web related technologies
With many new comers from various industries
An indicator of the technologies entering prime time
Has up to 7 parallel talks – broad coverage and interests
Now a 2nd tier conference in my humble opinion
Diluted to 3 times/locations: US West + US East + EU / year
Attendees: 1200 in 2011, 800 in 2012, 600 in 2013
Now missing elite researchers and/or top executives
More practical, real-world, business, startups, less academic
Context and Scope
This is a themed report on building knowledge-base
and/or semantic models
The theme title is decided post-conference due to the
obvious similarity among all relevant presentations
@WalmartLabs
Using heterogeneous data
Connect
People
and
Product
s
@WalmartLabs
• Color search and presentation: WordNet!
“Red Shirt”
• Intent? Linked Data can help, on related products too.
“Green Lantern”
• DVD or Halloween costume? Time/news is thy friend.
“Dark Knight”
External Data by @WalmartLabs
Vast amount of external data sets: WordNet, Dbpedia, LOD
cloud, Twitter stream, third-party prices (crawled), product
descriptions, user click streams (web logs)…
appcrawlr
TipSense Technologies
A platform for pulling statistically significant
knowledge from unstructured semantic data sets
Transforming vast amounts of unstructured and
semi-structured content into a fully annotated
conceptual model.
Conceptual entity recognition
Contextualized content fingerprinting
Concepts/topic model, sentiment analysis
Whisk.com
Keynote: Understanding Recipes
UK startup Whisk.com @nickholzherr on collecting
recipe ingredients, enriching with
semantics, recommending dishes and help ordering
from stores.
Wrapper induction, NLP for data collection
Coping with missing info, noises, vague data
Model flavor profiles, portion changing
Challenges and opportunities
Leftovers, geo-data, local shopping, coupons…
BloomReach.Search
Understanding Intents
Entity, Relationship Mining
Built database of millions of concepts
Shallow ontology modeling via entity and attribute
extraction/mining
Rich semantics (units, colors, patterns, cities…)
Concept propagation (tagging by training on user
weblogs)
Product Annotation
Network of Concepts
Google Webmaster
Tools: Markup
Structured Data
Structured Data Markup
Not something entirely new: Rich Snippet
We experimented it 2 years ago (extension of
Semantic Job Search proposal)
Supporting more types now
An ecosystem no one afford to lose
Google leveraged the SEO utility to gain more
structured data (free labor)
Others
Gannett (News)
Use a combination of auto-tagging and rules to match news
articles with an evolving taxonomy (low-tech, but works )
ISS (Intelligent Software Solutions)
Complex Event Processing (in “expressive” language)
Fuzzy matching with patterns with Bayesian Networks
Semantic Search and Automatic question answering
Google now answers (factoid questions)
E.g. “What did Steve Jobs die?”, “What is the height of Mt.
Everest”, “Who is the CEO of Apple?”
Closely Related to
Knowledge Acquisition
Similar Underlying Use Cases, Datasets and
Technologies
Query Interpretation
@SemTechBiz
“Red Shirt”
Shirt (Red)
Red ~=
Crimson, scarlet, ruby, cher
ry, rose, …
T-shirt a Shirt?
@ProjectHalo
“Dead Duck”
Bird (dead)
Dead ~= not
alive, gone, expired, killed,
…
Beijing Duck a Duck?
Build structured queries from natural languages
Disambiguation Query expansion
Intent & Process
@ SemTechBiz
“Eco-friendly gift for dad”
Need products as gifts
Related to “dad”, “father”
Expand “eco-friendly” to
close related concepts
Weigh purchases/views
during special event
(Christmas, Father’s Day)*
@ Project Halo
“How do we feel the sense
of heat?”
Need sentences on feeling
Related to “heat/hot”
Expand “heat”, “sense” to
related concepts
Weigh on signal
transmission in neuron*
The Process of getting
something done
* Learned from past user activities
Abstract Concept
Concrete Instances
@ SemTechBiz
“Eco-friendly” (gift)
Mine related product
review sites and blogs
~=
Organic, Recycled, Solar, R
eclaimed, …
@ Project Halo
“Feeling” (heat)
Mine related biological
sites, books, tutorials
~=
Sense, Experience, Feel, Te
mperature Sensation, …
Build abstract concept, entity, instance
networks/graphs
Ranking Support
@ SemTechBiz2013
Products related to “Gift”
Recipes for “Sweet
Seafood”
Apps that are “Free, Pretty
and Fun”
@ Project Halo
Concepts related to “Feel”
Sentences on “Red
Producer”
Creatures that can be “both
a prey and a predator”
Scoring algorithm to return the
most relevant results
Modeling
@ SemTechBiz2013
“Flavor” model (Whisk)
“Special Occasion” learning
(BloomSearch)
“Cooking” process
(ingredients, portion, left-
over, purchase…)
@ Project Halo
“Function” model in AURA
“Neural signal
transmission”
“Mitosis” event
(steps, components, tempo
ral process, result…)
From Facts, Relations to
Casual and Deep Models
Crowd-sourcing
@ SemTechBiz2013
Use webmasters to
generate structured
markups
(Author, Category, Title, Pri
ce, Rating, …)
@ Project Halo
Use students to generate
metadata for
sentences, questions and
answers
(Relevance, UT, Type, Chapt
er, Exact/Various, …)
Crowd-Sourcing works, if it has a limited
quantity and can be done cheaply
Google provides other utility (incentives for SEO) to lure webmasters
Project Halo need figure out our game plan
Summary of Use of
(Big, Wild) Data
@SemTech
Parse vague user query into best
structured queries for databases
Understand user’s underlying
intent
Link concept entity to concrete
entities
Rank apps, products …
Deep, contextual models
(flavor, time and location…)
Use crowds directly for free
@ProjectHalo
Translate Find-A-Value and other
simple questions into complex IR
queries
Understand sentence’s purpose
Relate category/class to
instances
Rank answers, evidence…
Deep contextual models
(location, process, events…)
Need leverage crowd cheaply
Many Different Data
Sources and Techniques
One Thing in Common
What Can We Learn?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Building up a Data Science Team from Scratch
Building up a Data Science Team from ScratchBuilding up a Data Science Team from Scratch
Building up a Data Science Team from Scratch
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Data science team (new version)
Data science team (new version)Data science team (new version)
Data science team (new version)
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 

Destacado

Lesson 2 Basicstructure
Lesson 2 BasicstructureLesson 2 Basicstructure
Lesson 2 Basicstructure
Ryan Chung
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Universitat Politècnica de Catalunya
 

Destacado (15)

туранбакыт+люди+транспорт
туранбакыт+люди+транспорттуранбакыт+люди+транспорт
туранбакыт+люди+транспорт
 
Hour of Code
Hour of CodeHour of Code
Hour of Code
 
улпан 2
улпан 2улпан 2
улпан 2
 
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
 
Audio Assignment Client Pitch
Audio Assignment Client PitchAudio Assignment Client Pitch
Audio Assignment Client Pitch
 
Lesson 2 Basicstructure
Lesson 2 BasicstructureLesson 2 Basicstructure
Lesson 2 Basicstructure
 
Brighton Ruby 2016 Recap
Brighton Ruby 2016 RecapBrighton Ruby 2016 Recap
Brighton Ruby 2016 Recap
 
Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar
 
Webinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High AvailabilityWebinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High Availability
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
 
Marcapasos: Aspectos Prácticos
Marcapasos: Aspectos PrácticosMarcapasos: Aspectos Prácticos
Marcapasos: Aspectos Prácticos
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
svaneke
svanekesvaneke
svaneke
 

Similar a Smart datamining semtechbiz 2013 report

Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
Kaniska Mandal
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
Thengo Kim
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Amit Sheth
 

Similar a Smart datamining semtechbiz 2013 report (20)

Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social TaggingSocial Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
SLA Summer 2008
SLA Summer 2008SLA Summer 2008
SLA Summer 2008
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Inforum 2007 Into The User environment
Inforum 2007 Into The User environmentInforum 2007 Into The User environment
Inforum 2007 Into The User environment
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Gic2011 aula10-ingles
Gic2011 aula10-inglesGic2011 aula10-ingles
Gic2011 aula10-ingles
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Technology Trends
Technology TrendsTechnology Trends
Technology Trends
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 

Más de Jesse Wang

Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
Jesse Wang
 

Más de Jesse Wang (20)

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge Acquisition
 
Chinese New Year
Chinese New Year Chinese New Year
Chinese New Year
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify office
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 Site
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome Remarks
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+apps
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applications
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page Maker
 
Facets of applied smw
Facets of applied smwFacets of applied smw
Facets of applied smw
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first preview
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawiki
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Smart datamining semtechbiz 2013 report

  • 1. Utilizing Crowd-sourced Data for Knowledge Extraction A Themed Report of SemTechBiz San Francisco 2013.06
  • 2. Summary People found great use of external data to help extract knowledge, build models These valuable data are generated by crowds but harvested by mining algorithms and/or UI tools LOD to enrich attributes and synonyms (WalmartLabs), NLP on recipes and build deep models (Whisk.com) Webmaster tools to markup content (Google)
  • 3. SemTechBiz SF 13 SemTechBiz2013 in San Francisco is still the largest in the world on semantic web related technologies With many new comers from various industries An indicator of the technologies entering prime time Has up to 7 parallel talks – broad coverage and interests Now a 2nd tier conference in my humble opinion Diluted to 3 times/locations: US West + US East + EU / year Attendees: 1200 in 2011, 800 in 2012, 600 in 2013 Now missing elite researchers and/or top executives More practical, real-world, business, startups, less academic
  • 4. Context and Scope This is a themed report on building knowledge-base and/or semantic models The theme title is decided post-conference due to the obvious similarity among all relevant presentations
  • 6. @WalmartLabs • Color search and presentation: WordNet! “Red Shirt” • Intent? Linked Data can help, on related products too. “Green Lantern” • DVD or Halloween costume? Time/news is thy friend. “Dark Knight”
  • 7. External Data by @WalmartLabs Vast amount of external data sets: WordNet, Dbpedia, LOD cloud, Twitter stream, third-party prices (crawled), product descriptions, user click streams (web logs)…
  • 8.
  • 10. TipSense Technologies A platform for pulling statistically significant knowledge from unstructured semantic data sets Transforming vast amounts of unstructured and semi-structured content into a fully annotated conceptual model. Conceptual entity recognition Contextualized content fingerprinting Concepts/topic model, sentiment analysis
  • 11.
  • 12. Whisk.com Keynote: Understanding Recipes UK startup Whisk.com @nickholzherr on collecting recipe ingredients, enriching with semantics, recommending dishes and help ordering from stores. Wrapper induction, NLP for data collection Coping with missing info, noises, vague data Model flavor profiles, portion changing Challenges and opportunities Leftovers, geo-data, local shopping, coupons…
  • 13.
  • 15. Understanding Intents Entity, Relationship Mining Built database of millions of concepts Shallow ontology modeling via entity and attribute extraction/mining Rich semantics (units, colors, patterns, cities…) Concept propagation (tagging by training on user weblogs)
  • 19. Structured Data Markup Not something entirely new: Rich Snippet We experimented it 2 years ago (extension of Semantic Job Search proposal) Supporting more types now An ecosystem no one afford to lose Google leveraged the SEO utility to gain more structured data (free labor)
  • 20. Others Gannett (News) Use a combination of auto-tagging and rules to match news articles with an evolving taxonomy (low-tech, but works ) ISS (Intelligent Software Solutions) Complex Event Processing (in “expressive” language) Fuzzy matching with patterns with Bayesian Networks Semantic Search and Automatic question answering Google now answers (factoid questions) E.g. “What did Steve Jobs die?”, “What is the height of Mt. Everest”, “Who is the CEO of Apple?”
  • 21. Closely Related to Knowledge Acquisition Similar Underlying Use Cases, Datasets and Technologies
  • 22. Query Interpretation @SemTechBiz “Red Shirt” Shirt (Red) Red ~= Crimson, scarlet, ruby, cher ry, rose, … T-shirt a Shirt? @ProjectHalo “Dead Duck” Bird (dead) Dead ~= not alive, gone, expired, killed, … Beijing Duck a Duck? Build structured queries from natural languages Disambiguation Query expansion
  • 23. Intent & Process @ SemTechBiz “Eco-friendly gift for dad” Need products as gifts Related to “dad”, “father” Expand “eco-friendly” to close related concepts Weigh purchases/views during special event (Christmas, Father’s Day)* @ Project Halo “How do we feel the sense of heat?” Need sentences on feeling Related to “heat/hot” Expand “heat”, “sense” to related concepts Weigh on signal transmission in neuron* The Process of getting something done * Learned from past user activities
  • 24. Abstract Concept Concrete Instances @ SemTechBiz “Eco-friendly” (gift) Mine related product review sites and blogs ~= Organic, Recycled, Solar, R eclaimed, … @ Project Halo “Feeling” (heat) Mine related biological sites, books, tutorials ~= Sense, Experience, Feel, Te mperature Sensation, … Build abstract concept, entity, instance networks/graphs
  • 25. Ranking Support @ SemTechBiz2013 Products related to “Gift” Recipes for “Sweet Seafood” Apps that are “Free, Pretty and Fun” @ Project Halo Concepts related to “Feel” Sentences on “Red Producer” Creatures that can be “both a prey and a predator” Scoring algorithm to return the most relevant results
  • 26. Modeling @ SemTechBiz2013 “Flavor” model (Whisk) “Special Occasion” learning (BloomSearch) “Cooking” process (ingredients, portion, left- over, purchase…) @ Project Halo “Function” model in AURA “Neural signal transmission” “Mitosis” event (steps, components, tempo ral process, result…) From Facts, Relations to Casual and Deep Models
  • 27. Crowd-sourcing @ SemTechBiz2013 Use webmasters to generate structured markups (Author, Category, Title, Pri ce, Rating, …) @ Project Halo Use students to generate metadata for sentences, questions and answers (Relevance, UT, Type, Chapt er, Exact/Various, …) Crowd-Sourcing works, if it has a limited quantity and can be done cheaply Google provides other utility (incentives for SEO) to lure webmasters Project Halo need figure out our game plan
  • 28. Summary of Use of (Big, Wild) Data @SemTech Parse vague user query into best structured queries for databases Understand user’s underlying intent Link concept entity to concrete entities Rank apps, products … Deep, contextual models (flavor, time and location…) Use crowds directly for free @ProjectHalo Translate Find-A-Value and other simple questions into complex IR queries Understand sentence’s purpose Relate category/class to instances Rank answers, evidence… Deep contextual models (location, process, events…) Need leverage crowd cheaply
  • 29. Many Different Data Sources and Techniques
  • 30. One Thing in Common
  • 31. What Can We Learn?