SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Social Geography &
          Wikipedia
        a quick overwiew

      Maurizio Napolitano
(SoNet internal research meeting)

         FBK 27/08/2010
SoNet Research Meetings

These slides were used for an internal presentation of the
  SoNet group.
Every week, one member of the SoNet group presents a
  research papers to the other members. The mentioned
  paper(s) are hence written by other researchers.
Being internal presentations, these slides might be a bit
  rough and unpolished.
You can find more information (including this
  presentation) about the SoNet group at
  http://sonet.fbk.eu
Summary

• Introduction: the wikification of GIS
• Wikipedia and geodata
• Some research questions
      –  You Are Where You Edit:
      Locating Wikipedia Contributors Through Edit
         Histories
      – Spatiotemporal Mapping of Wikipedia
         Concepts
Introduction
Introduction
Sui, D.Z. The wikification of GIS and its
  consequences: or Angelina Jolie's new tattoo
  and the future of GIS. Comp. Env. Urb. Sys.
  2008, 32, 1-5.
The wikifications

• The GIS has changed
  – Better hardware → easy management
  – Data production → Crowdsourcing project
    (WikiMapia, OpenStreetMap, Mapufacture,
  GeoCommons, TierraWiki, FixMyStreet, WhoIsSick
    … ) and GeoTag
  – People → Organizations

  … NEOGEOGRAPHY ...
Wikipedia and geodata – applications
                    (1/2)
                                          • Space-time
                                            exploration
                                          • Space-time selection
                                          • Space-Wikipedia
                                            relationship
                                            exploration
                                          • Space-Wikipedia
                                            relationship selection


Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic
lenses to explore georeferenced Wikipedia content. In Proc. of the 3rd
International Workshop on Pervasive Mobile Interaction Devices.
Wikipedia and geodata – applications
                        (2/2)




Spatial feature-edge-feature
relationships in Wikipedia     Berlin article temporal reference profile
Wikipedia e geodata
Wikipedia geopages – the infobox




      {{Infobox Settlement
      …
      |latd = 37 |latm = 18 |lats = 15 |latNS = N
      |longd = 121 |longm = 52 |longs = 22 |longEW = W
      …
      }}




You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
Wikipedia geopages – problem and
                    solution
Must process page Wiki
  markup to identify              DBpedia
  geographic templates and            – Public ontology derived
  extract coordinates                   from Wikipedia, including
  – Wiki markup language                extracted geographic
    continually evolves                 coordinates
  – Geographic templates              – Amounts to a primitive
    continually evolve                  gazetteer of geographic
  – Over 20 distinct template forms     entities in Wikipedia
    at this time for different
    coordinate systems and feature
    types
Features With Extent

All geopages are tagged with a single lat/lon point
Tradeoff between simplicity and accuracy
Examples: Country or state  Center or capital city,
Road  Midpoint, River  Source
Want to distinguish these features, as tagged point may
be geographically distant from other contributor edits
In Wikipedia, more precise coordinates generally
indicates smaller extent
California: (37, -120)
San Jose, CA: (37.304, -121.873)
Some research questions

          1. You Are Where You Edit
2. Spatiotemporal Mapping of Wikipedia Concepts
You Are Where You Edit:
     Locating Wikipedia Contributors Through Edit
                       Histories
• Contributors tend to add what they know and self-
  organize into groups based on interest
  – Can contributors be further categorized based on their edits
    to geographic pages? (= geopages)


• Identify Wikipedia contributors who:
  – Edit geopages in a constrained geographic area
  – Mostly edit one or two “pet” geopages
  – Identify reasons for the above patterns
      Worked only on the english Wikipedia version
                8 Oct 2008 - 61.7GB of data.
  You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
Wikipedia/DBpedia dump statistics.
  Stat       Type       Class        Total       Geo        Geo%
   pages                             14915993    328393     2.2%
   contrib   both                    16235895    2011828    12.4%
             anon                    13795118    1655135    12.0%
             named                   2440777     356693     14,6%
   edits     both                    224473397   15341937   6.8%

             anon                    55571407    4519807    8.1%

             named       both        168901990   10822130   6.4%

                        non minor    114844836   6357558    5.5%

                        minor        54057154    4464572    8.3%




A considerable number of pages (~330k) are tagged with geographic coordinates
Basic Observations
• Named contributors are outnumbered by
  anonymous ones by about 5 to 1, but are
  responsible for 2–3 times as many geopage
  edits
• A nontrivial number of named contributors have
  made at least one non-minor edit to a geopage
  (14.6%)
• Most edits to geopages are non-minor edits
  (58.7%)
Geopages Country distribution
                                   Country               Count
• Vast majority of                 USA                   83871

  geopages tagged to               France                37730

  the US and Europe                UK                    26651

• Possibly reflects the            Poland                16050
                                   Germany               15939
  geographic
                                   Russia                10964
  distribution of
                                   Canada                8970
  contributors to the
                                   Italy                 8772
  English Wikipedia
                                   Spain                 6603
                                   India                 5683

  You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
Wikipedia geocoverage -
    en.wikipedia.org
Wikipedia Edit Histories

• Easily-parsed XML format
• Information saved for each edit:
  – Username (or IP address, if anonymous)
  – Timestamp
  – Whether edit is “minor” (spelling, formatting)
• Excluded anonymous edits
  – Not allowed to be marked minor, to avoid abuse
  – Most Wikipedia vandalism perpetrated anonymously
• Also excluded minor edits
  – Geopages tend to have mostly non-minor edits
Sample Edit Patterns
Indentify edit area contributors
•
 Large number of edits to geopages
•
 Geopage edits constrained to a small
area
•
 At least K edited geopages
•
 Area α of convex hull of edited geopage
coordinates smaller than A (edit area) Of 356693 contributors with at

K = 3 and - A = 1 deg2 ≈ 112 x 112 km
                                           least one edit to a geopage,
                                         only 102271 (28.7%) have user
                                           pages. Also, for the 93195
                                         contributors with at least five
                                           edits to geopages, only
                                           47623
                                         (51.1%) have user pages.
Accounting For Outliers
•   Local edit patterns may be muddled
    by “outlier” edits
•   For each contributor, select a fraction
    F of edited geopages with smallest
    convex hull area


•   Simple approximation scheme:
    1. For each geopage P:
       a. Sort edited geopages by distance from P
       b. Compute convex hull HP of first F geopages
    2. Select HP with smallest area α
•   Example: 71 deg2 - 10 deg2
    (5k x 5k mi - 112 x 112km)
Contributor Locality
Computed minimum edit
area sizes for
F = {95%, 80%}, both
(a) with and (b) without
features with extent
30–35% of contributors
have edit areas smaller
than 1 deg2
Over 50% of contributors
with less than 5 geopage
edits are highly local
Pet Geopages
• Statistics for users with:
  – 5–20 edits (~93k)
  – over 20 edits (~28k)



• Over 50% of contributors
  with 5–20 edits, and 25%
  of contributors with over 20
  edits, have over 80% of
  geopage edits confined to
  two geopages
Reasons for Tight Edit Areas
Randomly selected 100
contributors with at least 10
edits to geopages and small
edit areas



  • Concurrently examined contributors’ user pages and the
    set of edited geopages to determine an interest

  • Contributors with small edit areas tend to be born in or
    are living in the region defined by their edit areas
Some research questions

           1. You Are Where You Edit
2. Spatiotemporal Mapping of Wikipedia Concepts
The question
    “Where” and “when” are important implicit aspects
     of a wide variety of concepts.
    Wikipedia offer:
         . Geopages
       –
         . Biography (birth and death dates)
       –
         . Temporal and spatial information concerning concepts
       – (Romanticism, Scholasticism)


           HOW ASSOCIATE THIS INFORMATION?


The solution start by using dbpedia
By using common wiki pages in this languages:
English, German, French, Italian, Spanish, Dutch and Portuguese.
Results about spatiotemporal
    mapping of Wikipedia concepts
Topics explorerd:
1.Concepts and geolocation
2.Biography and country
3.Cultural interaction between countries
4.Historical periods of literature and philosophy
  and related countries
Country rappresented in Wikipedia
                geopages




Distribution of geotagged articles in different countries (log2 values are plotted).
Top 20 cities (geotagged)
               14.000 total




Top 20 cities by number of geotagged articles.
People by century




Distribution of Wikipedia person articles per century of lifespan.
Log2 values are shown.
The total number of century-people associations is higher than 423,846 because many persons are
associated with 2 centuries
Biography and countries




Distribution of Wikipedia biographies by nationality
Distribution of Wikipedia biographies
            by occupation
Cultural interactions




Top 5 source (incoming) and destinations (outgoing) of cultural interactions
for 10 countries.
Statistics computer from the study of locations associated to persons present in Wikipedia.
The size of displayed countris is proportional to the log2 of the contry's score.
Top 5 cities and countries in philosophy
             for different periods




Top 5 cities and countries in philosophy for different periods.
Starting with the 1st century, five century slots were used.
Top 5 cities and countries in literature
 from the 15th century to nowadays.




        One century time slots were usede
Creative Commons
Attribution-ShareAlike 2.5
You are free:
     ●
         to copy, distribute, display, and perform the work
     ●
         to make derivative works
     ●
         to make commercial use of the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or
licensor.
Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under a license identical to this one.
   For any reuse or distribution, you must make clear to others the license terms of
this work.
  Any of these conditions can be waived if you get permission from the copyright
holder.
Your fair use and other rights are in no way affected by the above.
More info at http://creativecommons.org/licenses/by-sa/2.5/

          All the images come from the relative papers
Bibliography
Sui, D.Z. The wikification of GIS and its consequences: or Angelina Jolie's new tattoo
 and the future of GIS. Comp. Env. Urb. Sys. 2008, 32, 1-5.
http://geog.tamu.edu/~sui/publication/pub2008/SuiCEUSeditorial.pdf

Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic lenses to
 explore georeferenced Wikipedia content. In Proc. of the 3rd International Workshop on
 Pervasive Mobile Interaction Devices
http://www.deutsche-telekom-laboratories.de/~rohs/papers/Hecht-WikEye.pdf

Michael D. Lieberman You Are Where You Edit ICWSM 2009, San Jose, CA
http://www.umiacs.umd.edu/~jimmylin/publications/Lieberman_Lin_ICWSM2009.pdf

Adrian's analysis of Wikipedia: Adrian Popescu, Gregory Grefenstette Spatiotemporal Mapping of
 Wikipedia Concepts, JCDL 2010, June 21 - 25, Brisbane, Australia
http://portal.acm.org/ft_gateway.cfm?id=1816142&type=pdf

Más contenido relacionado

Similar a Social Geography & Wikipedia a quick overwiew

Web Mapping with Drupal
Web Mapping with DrupalWeb Mapping with Drupal
Web Mapping with Drupal
Ranel Padon
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
Hiroyuki Miyazaki
 
Gisruk2013 addy edit2
Gisruk2013 addy edit2Gisruk2013 addy edit2
Gisruk2013 addy edit2
Addy Pope
 
Introduction_to_QGIS_Revision, read before
Introduction_to_QGIS_Revision, read beforeIntroduction_to_QGIS_Revision, read before
Introduction_to_QGIS_Revision, read before
MadhuSudhan725843
 

Similar a Social Geography & Wikipedia a quick overwiew (20)

Esri and the Scientific Community
Esri and the Scientific CommunityEsri and the Scientific Community
Esri and the Scientific Community
 
OpenStreetMap
OpenStreetMapOpenStreetMap
OpenStreetMap
 
Vector.pdf
Vector.pdfVector.pdf
Vector.pdf
 
Big Geo Data: Open Source and Open Standards
Big Geo Data: Open Source and Open StandardsBig Geo Data: Open Source and Open Standards
Big Geo Data: Open Source and Open Standards
 
Phd defense slides
Phd defense slidesPhd defense slides
Phd defense slides
 
Open geo data - technical issue
Open geo data  - technical issueOpen geo data  - technical issue
Open geo data - technical issue
 
Hawaii Pacific GIS Conference 2012: National Data Sets - New US Topos for Haw...
Hawaii Pacific GIS Conference 2012: National Data Sets - New US Topos for Haw...Hawaii Pacific GIS Conference 2012: National Data Sets - New US Topos for Haw...
Hawaii Pacific GIS Conference 2012: National Data Sets - New US Topos for Haw...
 
Web Mapping with Drupal
Web Mapping with DrupalWeb Mapping with Drupal
Web Mapping with Drupal
 
Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...
 
A short introduction to GIS
A short introduction to GISA short introduction to GIS
A short introduction to GIS
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
 
3D Sub-Surface Modelling
3D Sub-Surface Modelling3D Sub-Surface Modelling
3D Sub-Surface Modelling
 
Q GIS Training Presentation
Q GIS Training PresentationQ GIS Training Presentation
Q GIS Training Presentation
 
Gisruk2013 addy edit2
Gisruk2013 addy edit2Gisruk2013 addy edit2
Gisruk2013 addy edit2
 
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSAPIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
 
Big data
Big dataBig data
Big data
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 2
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 2USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 2
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 2
 
Geohistory-Géohistoire Canada: Developing a partnership for historical GIS an...
Geohistory-Géohistoire Canada: Developing a partnership for historical GIS an...Geohistory-Géohistoire Canada: Developing a partnership for historical GIS an...
Geohistory-Géohistoire Canada: Developing a partnership for historical GIS an...
 
FINAL LESSON 3--GIS-Overview of GIS.pptx
FINAL LESSON 3--GIS-Overview of GIS.pptxFINAL LESSON 3--GIS-Overview of GIS.pptx
FINAL LESSON 3--GIS-Overview of GIS.pptx
 
Introduction_to_QGIS_Revision, read before
Introduction_to_QGIS_Revision, read beforeIntroduction_to_QGIS_Revision, read before
Introduction_to_QGIS_Revision, read before
 

Más de Maurizio Napolitano

Más de Maurizio Napolitano (20)

I dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisioneI dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisione
 
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
 
La gestione del gruppo
La gestione del gruppoLa gestione del gruppo
La gestione del gruppo
 
percorsi ciclabili e stress
percorsi ciclabili e stresspercorsi ciclabili e stress
percorsi ciclabili e stress
 
Soluzioni open source per la mobilità
Soluzioni open source per la mobilitàSoluzioni open source per la mobilità
Soluzioni open source per la mobilità
 
Il diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitaleIl diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitale
 
OpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondoOpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondo
 
Estrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINTEstrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINT
 
OpenStreetMap: passato, presente e futuro (?)
OpenStreetMap:  passato, presente e futuro (?)OpenStreetMap:  passato, presente e futuro (?)
OpenStreetMap: passato, presente e futuro (?)
 
Strumenti per il Fact Checking
Strumenti per il Fact CheckingStrumenti per il Fact Checking
Strumenti per il Fact Checking
 
Estrarre contenuti da Web
Estrarre contenuti da WebEstrarre contenuti da Web
Estrarre contenuti da Web
 
Ten years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to doTen years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to do
 
Infographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBKInfographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBK
 
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
 
Dati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticityDati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticity
 
la comunicazione attraverso i social media
la comunicazione attraverso i social mediala comunicazione attraverso i social media
la comunicazione attraverso i social media
 
creare cruscotti per investigare i dati
creare cruscotti per investigare i daticreare cruscotti per investigare i dati
creare cruscotti per investigare i dati
 
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitaleFollow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
 
Strumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare graficiStrumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare grafici
 
Data Journalism e Fake News
Data Journalism e Fake NewsData Journalism e Fake News
Data Journalism e Fake News
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Social Geography & Wikipedia a quick overwiew

  • 1. Social Geography & Wikipedia a quick overwiew Maurizio Napolitano (SoNet internal research meeting) FBK 27/08/2010
  • 2. SoNet Research Meetings These slides were used for an internal presentation of the SoNet group. Every week, one member of the SoNet group presents a research papers to the other members. The mentioned paper(s) are hence written by other researchers. Being internal presentations, these slides might be a bit rough and unpolished. You can find more information (including this presentation) about the SoNet group at http://sonet.fbk.eu
  • 3. Summary • Introduction: the wikification of GIS • Wikipedia and geodata • Some research questions – You Are Where You Edit: Locating Wikipedia Contributors Through Edit Histories – Spatiotemporal Mapping of Wikipedia Concepts
  • 5. Introduction Sui, D.Z. The wikification of GIS and its consequences: or Angelina Jolie's new tattoo and the future of GIS. Comp. Env. Urb. Sys. 2008, 32, 1-5.
  • 6. The wikifications • The GIS has changed – Better hardware → easy management – Data production → Crowdsourcing project (WikiMapia, OpenStreetMap, Mapufacture, GeoCommons, TierraWiki, FixMyStreet, WhoIsSick … ) and GeoTag – People → Organizations … NEOGEOGRAPHY ...
  • 7. Wikipedia and geodata – applications (1/2) • Space-time exploration • Space-time selection • Space-Wikipedia relationship exploration • Space-Wikipedia relationship selection Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic lenses to explore georeferenced Wikipedia content. In Proc. of the 3rd International Workshop on Pervasive Mobile Interaction Devices.
  • 8. Wikipedia and geodata – applications (2/2) Spatial feature-edge-feature relationships in Wikipedia Berlin article temporal reference profile
  • 10. Wikipedia geopages – the infobox {{Infobox Settlement … |latd = 37 |latm = 18 |lats = 15 |latNS = N |longd = 121 |longm = 52 |longs = 22 |longEW = W … }} You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
  • 11. Wikipedia geopages – problem and solution Must process page Wiki markup to identify DBpedia geographic templates and – Public ontology derived extract coordinates from Wikipedia, including – Wiki markup language extracted geographic continually evolves coordinates – Geographic templates – Amounts to a primitive continually evolve gazetteer of geographic – Over 20 distinct template forms entities in Wikipedia at this time for different coordinate systems and feature types
  • 12. Features With Extent All geopages are tagged with a single lat/lon point Tradeoff between simplicity and accuracy Examples: Country or state  Center or capital city, Road  Midpoint, River  Source Want to distinguish these features, as tagged point may be geographically distant from other contributor edits In Wikipedia, more precise coordinates generally indicates smaller extent California: (37, -120) San Jose, CA: (37.304, -121.873)
  • 13. Some research questions 1. You Are Where You Edit 2. Spatiotemporal Mapping of Wikipedia Concepts
  • 14. You Are Where You Edit: Locating Wikipedia Contributors Through Edit Histories • Contributors tend to add what they know and self- organize into groups based on interest – Can contributors be further categorized based on their edits to geographic pages? (= geopages) • Identify Wikipedia contributors who: – Edit geopages in a constrained geographic area – Mostly edit one or two “pet” geopages – Identify reasons for the above patterns Worked only on the english Wikipedia version 8 Oct 2008 - 61.7GB of data. You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
  • 15. Wikipedia/DBpedia dump statistics. Stat Type Class Total Geo Geo% pages 14915993 328393 2.2% contrib both 16235895 2011828 12.4% anon 13795118 1655135 12.0% named 2440777 356693 14,6% edits both 224473397 15341937 6.8% anon 55571407 4519807 8.1% named both 168901990 10822130 6.4% non minor 114844836 6357558 5.5% minor 54057154 4464572 8.3% A considerable number of pages (~330k) are tagged with geographic coordinates
  • 16. Basic Observations • Named contributors are outnumbered by anonymous ones by about 5 to 1, but are responsible for 2–3 times as many geopage edits • A nontrivial number of named contributors have made at least one non-minor edit to a geopage (14.6%) • Most edits to geopages are non-minor edits (58.7%)
  • 17. Geopages Country distribution Country Count • Vast majority of USA 83871 geopages tagged to France 37730 the US and Europe UK 26651 • Possibly reflects the Poland 16050 Germany 15939 geographic Russia 10964 distribution of Canada 8970 contributors to the Italy 8772 English Wikipedia Spain 6603 India 5683 You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
  • 18. Wikipedia geocoverage - en.wikipedia.org
  • 19. Wikipedia Edit Histories • Easily-parsed XML format • Information saved for each edit: – Username (or IP address, if anonymous) – Timestamp – Whether edit is “minor” (spelling, formatting) • Excluded anonymous edits – Not allowed to be marked minor, to avoid abuse – Most Wikipedia vandalism perpetrated anonymously • Also excluded minor edits – Geopages tend to have mostly non-minor edits
  • 21. Indentify edit area contributors • Large number of edits to geopages • Geopage edits constrained to a small area • At least K edited geopages • Area α of convex hull of edited geopage coordinates smaller than A (edit area) Of 356693 contributors with at K = 3 and - A = 1 deg2 ≈ 112 x 112 km least one edit to a geopage, only 102271 (28.7%) have user pages. Also, for the 93195 contributors with at least five edits to geopages, only 47623 (51.1%) have user pages.
  • 22. Accounting For Outliers • Local edit patterns may be muddled by “outlier” edits • For each contributor, select a fraction F of edited geopages with smallest convex hull area • Simple approximation scheme: 1. For each geopage P: a. Sort edited geopages by distance from P b. Compute convex hull HP of first F geopages 2. Select HP with smallest area α • Example: 71 deg2 - 10 deg2 (5k x 5k mi - 112 x 112km)
  • 23. Contributor Locality Computed minimum edit area sizes for F = {95%, 80%}, both (a) with and (b) without features with extent 30–35% of contributors have edit areas smaller than 1 deg2 Over 50% of contributors with less than 5 geopage edits are highly local
  • 24. Pet Geopages • Statistics for users with: – 5–20 edits (~93k) – over 20 edits (~28k) • Over 50% of contributors with 5–20 edits, and 25% of contributors with over 20 edits, have over 80% of geopage edits confined to two geopages
  • 25. Reasons for Tight Edit Areas Randomly selected 100 contributors with at least 10 edits to geopages and small edit areas • Concurrently examined contributors’ user pages and the set of edited geopages to determine an interest • Contributors with small edit areas tend to be born in or are living in the region defined by their edit areas
  • 26. Some research questions 1. You Are Where You Edit 2. Spatiotemporal Mapping of Wikipedia Concepts
  • 27. The question “Where” and “when” are important implicit aspects of a wide variety of concepts. Wikipedia offer: . Geopages – . Biography (birth and death dates) – . Temporal and spatial information concerning concepts – (Romanticism, Scholasticism) HOW ASSOCIATE THIS INFORMATION? The solution start by using dbpedia By using common wiki pages in this languages: English, German, French, Italian, Spanish, Dutch and Portuguese.
  • 28. Results about spatiotemporal mapping of Wikipedia concepts Topics explorerd: 1.Concepts and geolocation 2.Biography and country 3.Cultural interaction between countries 4.Historical periods of literature and philosophy and related countries
  • 29. Country rappresented in Wikipedia geopages Distribution of geotagged articles in different countries (log2 values are plotted).
  • 30. Top 20 cities (geotagged) 14.000 total Top 20 cities by number of geotagged articles.
  • 31. People by century Distribution of Wikipedia person articles per century of lifespan. Log2 values are shown. The total number of century-people associations is higher than 423,846 because many persons are associated with 2 centuries
  • 32. Biography and countries Distribution of Wikipedia biographies by nationality
  • 33. Distribution of Wikipedia biographies by occupation
  • 34. Cultural interactions Top 5 source (incoming) and destinations (outgoing) of cultural interactions for 10 countries. Statistics computer from the study of locations associated to persons present in Wikipedia. The size of displayed countris is proportional to the log2 of the contry's score.
  • 35. Top 5 cities and countries in philosophy for different periods Top 5 cities and countries in philosophy for different periods. Starting with the 1st century, five century slots were used.
  • 36. Top 5 cities and countries in literature from the 15th century to nowadays. One century time slots were usede
  • 37. Creative Commons Attribution-ShareAlike 2.5 You are free: ● to copy, distribute, display, and perform the work ● to make derivative works ● to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. More info at http://creativecommons.org/licenses/by-sa/2.5/ All the images come from the relative papers
  • 38. Bibliography Sui, D.Z. The wikification of GIS and its consequences: or Angelina Jolie's new tattoo and the future of GIS. Comp. Env. Urb. Sys. 2008, 32, 1-5. http://geog.tamu.edu/~sui/publication/pub2008/SuiCEUSeditorial.pdf Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic lenses to explore georeferenced Wikipedia content. In Proc. of the 3rd International Workshop on Pervasive Mobile Interaction Devices http://www.deutsche-telekom-laboratories.de/~rohs/papers/Hecht-WikEye.pdf Michael D. Lieberman You Are Where You Edit ICWSM 2009, San Jose, CA http://www.umiacs.umd.edu/~jimmylin/publications/Lieberman_Lin_ICWSM2009.pdf Adrian's analysis of Wikipedia: Adrian Popescu, Gregory Grefenstette Spatiotemporal Mapping of Wikipedia Concepts, JCDL 2010, June 21 - 25, Brisbane, Australia http://portal.acm.org/ft_gateway.cfm?id=1816142&type=pdf