SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Freebase
A socially managed semantic database



Jamie Taylor
SemTech 2010 Data Camp
Freebase has Many Types of Things
12 Million Topics
A Multiplicity Strong Identifiers

            http://rdf.freebase.com/ns/en.berlin_wall




            http://www.ellerdale.com/topics/view/0080-6ba0




            http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c

                   http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c

http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
Relations
contains
                          400 Million
           contained-by

                                  event               label
                                          albums

                            member-of
                                          member-of

           nationality

                           education
                                          education

                          contained-by
What’s in Freebase?
http://www.bestbuy.com/site/She+Wolf…

              http://www.daylife.com/topic/Shakira

                         http://twitter.com/shakira

                  http://www.facebook.com/shakira

                  http://www.myspace.com/shakira

                  http://www.last.fm/music/Shakira

http://www.netflix.com/RoleDisplay/Shakira/20046629

          http://www.guardian.co.uk/music/shakira
99% pure

All data undergoes rigorous QA before load
Major focus is reconciliation
Use sampling to assure 99% accuracy
Data that does not meet 99% accuracy is not loaded
What's been built on Freebase?
Up to 100,000 Queries a Day




 Quarterly dumps of graph
    http://download.freebase.com
Users contribute data




Users extend the data model
The Freebase Commons
                      Top-level domains
                      ·American football       ·Internet
                      ·Anime/Manga             ·Language
                      ·Architecture            ·Law
                      ·Astronomy               ·Library
                      ·Automotive              ·Location
                      ·Aviation                ·Martial Arts
                      ·Awards                  ·Measurement Unit
                      ·Baseball                ·Media Common
                      ·Basketball              ·Medicine
                      ·Bicycles                ·Metaweb Types
                      ·Biology                 ·Meteorology
                      ·Boats                   ·Military
                      ·Broadcast               ·Music
                      ·Business                ·Olympics
                      ·Celebrities             ·Opera
                      ·Chemistry               ·Organization
                      ·Comics                  ·People
                      ·Common                  ·Geography
                      ·Computers               ·Projects
                      ·Conferences             ·Protected Places
                      ·Cricket                 ·Publishing
                      ·Data World              ·Radio
                      ·Digicams                ·Rail
                      ·Education               ·Religion
                      ·Engineering             ·Royalty
                      ·Event                   ·Soccer
                      ·Clothing and Textiles   ·Spaceflight
                      ·Fictional Universes     ·Sports
                      ·Film                    ·Symbols
                      ·Food & Drink            ·Tennis
                      ·Freebase                ·Theater
                      ·Games                   ·Time
                      ·Geology                 ·Transportation




schema = vocabulary
                      ·Government              ·Travel
                      ·Hobbies and Interests   ·TV
                      ·Ice Hockey              ·Video Games
                      ·Influence               ·Visual Art
The Scope of Schema
   10,448 Properties
      describing
     4,936 Types*
     organized into
     641 Domains
     (77 Commons)
            *types with 10 or more instances
Strength through Exemplars
                                                   Type Instances


            100,000,000


             10,000,000



                                                              >10 instances,
              1,000,000


               100,000
                                                              4936 types
Instances




                10,000


                  1,000
                                                              1424 Commons
                   100


                    10


                     1
                          0   1000   2000   3000   4000   5000    6000   7000   8000   9000   10000 11000
                                                                 Rank
Metaweb Query Language
      [{
           "name" : null,
           "type" : "/film/film"
      }]




               MQL
[{
     "name" : null,
     "type" : "/film/film",
     "directed_by":{"id":"/en/george_lucas"},
     "starring":[{
            "actor":{"id":"/en/harrison_ford"}
         }]
}]




                      MQL
[{
      "name" : null,
      "type" : "/film/film",
      "directed_by":{"id":"/en/george_lucas"},
      "starring": [{
          "actor": {
             "name": null,
             "film": [{
                 "film": {"id": "/en/the_great_escape"}
             }]
          }
     }]
}]


                     Donald Pleasence
                        THX 1138
Freebase Suggest
Reconciliation
        {
             "/type/object/name":"Blade Runner",
             "/type/object/type":"/film/film",
             "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"],
             "/film/film/director":"Ridley Scott",
             "/film/film/release_date_s":"1981"
         }
[{
     "id":"/guid/9202a8c04000641f8000000000009e89",
     "name":["Blade Runner", "Bladerunner"],
     "score":1.4320519,
     "match":true,
     "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/award_winning_work",
     ]},
 {
     "id":"/guid/9202a8c04000641f80000000002643d0",
     "name":["Blade"],
     "score":0.48852453,
     "match":false,
     "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/award_nominated_work",
     ]}

               http://data.labs.freebase.com/recon/
Topic Blocks
Topic API
         Shortcut to building Topic displays
         Two forms:
             basic (names, types, description)
             standard (basic + keys, properties)




http://www.freebase.com/experimental/topic/standard?id=/en/ncis
Geo Search API



Semantic              Spatial              Semantic




      http://www.freebase.com/docs/geosearch
Gridworks
Acre Development Environment
Getting Started++
•   Freebase Documentation Hub
    •   http://www.freebase.com/docs
•   Developer Mailing List
    •   http://lists.freebase.com/mailman/listinfo/freebase-discuss
    •   http://freebase.markmail.org
•   Real Time help on IRC
    •   Freenode #freebase
•   Freebase Happenings
    •   http://blog.freebase.com
•   About the Graph Store
    •   Google: "ACM SIGMOD schema last tuple store"

Más contenido relacionado

Similar a Freebase - Semantic Technologies 2010 Code Camp

YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
drgath
 
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
MODUL Technology GmbH
 
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
Iccv2009 recognition and learning object categories   p3 c00 - summary and da...Iccv2009 recognition and learning object categories   p3 c00 - summary and da...
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
zukun
 

Similar a Freebase - Semantic Technologies 2010 Code Camp (19)

Freebase API @ HackTO 2
Freebase API @ HackTO 2Freebase API @ HackTO 2
Freebase API @ HackTO 2
 
Text Analytic Summit 2010
Text Analytic Summit 2010Text Analytic Summit 2010
Text Analytic Summit 2010
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
 
ChContext
ChContextChContext
ChContext
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
 
Ruby Kaigi July 2009 Tokyo (Japanese)
Ruby Kaigi July 2009 Tokyo (Japanese)Ruby Kaigi July 2009 Tokyo (Japanese)
Ruby Kaigi July 2009 Tokyo (Japanese)
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
Iccv2009 recognition and learning object categories   p3 c00 - summary and da...Iccv2009 recognition and learning object categories   p3 c00 - summary and da...
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
 
SC in SL
SC in SLSC in SL
SC in SL
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
How Brands Can Survive & Thrive Online - Digital Evolution
How Brands Can Survive & Thrive Online - Digital EvolutionHow Brands Can Survive & Thrive Online - Digital Evolution
How Brands Can Survive & Thrive Online - Digital Evolution
 
Sounddogsppt
SounddogspptSounddogsppt
Sounddogsppt
 
A Training & Simulation Perspective on Maritime Information & Automation
A Training & Simulation Perspective on Maritime Information & AutomationA Training & Simulation Perspective on Maritime Information & Automation
A Training & Simulation Perspective on Maritime Information & Automation
 
Looking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended VersionLooking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended Version
 
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web InfrastructureEvaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
 
COMP 4010 - Lecture 7: Introduction to Augmented Reality
COMP 4010 - Lecture 7: Introduction to Augmented RealityCOMP 4010 - Lecture 7: Introduction to Augmented Reality
COMP 4010 - Lecture 7: Introduction to Augmented Reality
 

Más de Jamie Taylor (7)

The next phase of Web2.0: Data
The next phase of Web2.0: DataThe next phase of Web2.0: Data
The next phase of Web2.0: Data
 
Public private-cloud
Public private-cloudPublic private-cloud
Public private-cloud
 
Using Semantics to Enhance Content
Using Semantics to Enhance ContentUsing Semantics to Enhance Content
Using Semantics to Enhance Content
 
Freebase Workshop, December 2009
Freebase Workshop, December 2009Freebase Workshop, December 2009
Freebase Workshop, December 2009
 
Using Semantics to Enhance Content Publishing
Using Semantics to Enhance Content PublishingUsing Semantics to Enhance Content Publishing
Using Semantics to Enhance Content Publishing
 
ISWC 2009 Consuming LOD
ISWC 2009 Consuming LODISWC 2009 Consuming LOD
ISWC 2009 Consuming LOD
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Freebase - Semantic Technologies 2010 Code Camp

  • 1. Freebase A socially managed semantic database Jamie Taylor SemTech 2010 Data Camp
  • 2.
  • 3. Freebase has Many Types of Things
  • 5.
  • 6. A Multiplicity Strong Identifiers http://rdf.freebase.com/ns/en.berlin_wall http://www.ellerdale.com/topics/view/0080-6ba0 http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
  • 7. Relations contains 400 Million contained-by event label albums member-of member-of nationality education education contained-by
  • 9.
  • 10. http://www.bestbuy.com/site/She+Wolf… http://www.daylife.com/topic/Shakira http://twitter.com/shakira http://www.facebook.com/shakira http://www.myspace.com/shakira http://www.last.fm/music/Shakira http://www.netflix.com/RoleDisplay/Shakira/20046629 http://www.guardian.co.uk/music/shakira
  • 11. 99% pure All data undergoes rigorous QA before load Major focus is reconciliation Use sampling to assure 99% accuracy Data that does not meet 99% accuracy is not loaded
  • 12. What's been built on Freebase?
  • 13. Up to 100,000 Queries a Day Quarterly dumps of graph http://download.freebase.com
  • 14.
  • 15.
  • 16. Users contribute data Users extend the data model
  • 17. The Freebase Commons Top-level domains ·American football ·Internet ·Anime/Manga ·Language ·Architecture ·Law ·Astronomy ·Library ·Automotive ·Location ·Aviation ·Martial Arts ·Awards ·Measurement Unit ·Baseball ·Media Common ·Basketball ·Medicine ·Bicycles ·Metaweb Types ·Biology ·Meteorology ·Boats ·Military ·Broadcast ·Music ·Business ·Olympics ·Celebrities ·Opera ·Chemistry ·Organization ·Comics ·People ·Common ·Geography ·Computers ·Projects ·Conferences ·Protected Places ·Cricket ·Publishing ·Data World ·Radio ·Digicams ·Rail ·Education ·Religion ·Engineering ·Royalty ·Event ·Soccer ·Clothing and Textiles ·Spaceflight ·Fictional Universes ·Sports ·Film ·Symbols ·Food & Drink ·Tennis ·Freebase ·Theater ·Games ·Time ·Geology ·Transportation schema = vocabulary ·Government ·Travel ·Hobbies and Interests ·TV ·Ice Hockey ·Video Games ·Influence ·Visual Art
  • 18. The Scope of Schema 10,448 Properties describing 4,936 Types* organized into 641 Domains (77 Commons) *types with 10 or more instances
  • 19. Strength through Exemplars Type Instances 100,000,000 10,000,000 >10 instances, 1,000,000 100,000 4936 types Instances 10,000 1,000 1424 Commons 100 10 1 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 Rank
  • 20. Metaweb Query Language [{ "name" : null, "type" : "/film/film" }] MQL
  • 21. [{ "name" : null, "type" : "/film/film", "directed_by":{"id":"/en/george_lucas"}, "starring":[{ "actor":{"id":"/en/harrison_ford"} }] }] MQL
  • 22. [{ "name" : null, "type" : "/film/film", "directed_by":{"id":"/en/george_lucas"}, "starring": [{ "actor": { "name": null, "film": [{ "film": {"id": "/en/the_great_escape"} }] } }] }] Donald Pleasence THX 1138
  • 24. Reconciliation { "/type/object/name":"Blade Runner", "/type/object/type":"/film/film", "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"], "/film/film/director":"Ridley Scott", "/film/film/release_date_s":"1981" } [{ "id":"/guid/9202a8c04000641f8000000000009e89", "name":["Blade Runner", "Bladerunner"], "score":1.4320519, "match":true, "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/award_winning_work", ]}, { "id":"/guid/9202a8c04000641f80000000002643d0", "name":["Blade"], "score":0.48852453, "match":false, "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/award_nominated_work", ]} http://data.labs.freebase.com/recon/
  • 26. Topic API Shortcut to building Topic displays Two forms: basic (names, types, description) standard (basic + keys, properties) http://www.freebase.com/experimental/topic/standard?id=/en/ncis
  • 27. Geo Search API Semantic Spatial Semantic http://www.freebase.com/docs/geosearch
  • 30. Getting Started++ • Freebase Documentation Hub • http://www.freebase.com/docs • Developer Mailing List • http://lists.freebase.com/mailman/listinfo/freebase-discuss • http://freebase.markmail.org • Real Time help on IRC • Freenode #freebase • Freebase Happenings • http://blog.freebase.com • About the Graph Store • Google: "ACM SIGMOD schema last tuple store"