SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
Harvesting crowdsourcing biodiversity data from Facebook groups
             Jason Guan-Shuo Mai1, Cheng-Hsin Hsu1, Dong-Po Deng2, De-En Lin3, Hsu-Hong Lin3, Kwang-Tsao Shao1
1 Taiwan Biodiversity Information Facility (TaiBIF), Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
2 Institute of Information Science, Academia Sinica, Taipei, Taiwan
3 Taiwan Endemic Species Research Institute, Council of Agriculture, Nantou, Taiwan

The emergence of Web 2.0 enables people to contribute their biodiversity observations on the Web. These crowdsourcing biodiversity data are increasing their
value in scientific studies due to the potentially broader spatial and temporal scales. However, the data provided in plain text hinder the process of data retrieval
and analysis. In this study, we propose a framework to automatically structure the loose-format text so that volunteers can keep providing data in their own
familiar ways, while interested citizens, biodiversity researchers and managers can benefit from the semantically structured information. We take 2 Facebook
biodiversity interest groups Reptile-Road-Mortality and Enjoy-Moths as examples.
                                          0. Crowdsourcing -                                                               Thread
                                              participants provide                          2. Using natural language                            Post message
                                              unstructured data                             processing techs with Taiwan
                                              voluntarily                                   Geographic Name and Taiwan                           Post Picture
                                                                                            Catalogue of Life databases as
                                               Facebook interest groups                     knowledge bases to extract
                                                                                                                                             Comment message
                                                                                            species vernacular names and
      6. Improving                                                                          place names from a thread                        Comment message
      source data
                                                                                                                                             Comment message
      quality without
      changing users’                                                                                                                                  …
                                        Reptile-Road-Mortality Enjoy-Moths                                                             What a typical discussion thread
      own familiar                                                                                                                     looks like.
      ways                              1. Crawling data from
                                        Facebook via its API                                                           Our algorithm picks a most related species
                                                                                                                       name appearing in a thread based on social
                                                                                                                       networking characteristics.
    Semantic
    annotation tool
    disambiguates                                                                                        For each vernacular name in TaiCOL do:
    toponymic                                                                                                           occurs in the message?    Full-matched
    homonyms                                                                                                細紋南蛇
                                                                                                                                      Yes         name
                                                                                                                       No
                                                                                                                          occurs in the
                                                                                                             Prefix3      message?              Postfix2 occurs in the thread?
                                                                                                             細紋南                        Yes      南蛇                     Yes

                                                                                                                         No                                      No
                                                                                                                           occurs in the
                                               One click on a                                                              message?
                                               message to
                                               recognize species
                                                                          Main                               Prefix2
                                                                                                              細紋
                                                                                                                                         Yes        Postfix1
                                                                                                                                                      蛇
                                                                                                                                                                      No
                                                                                                                                                                           Yes

                                                                                                                                  No
                                               vernacular names
                                               and related
                                                                         Database                                    Name doesn’t exist in the          Matched abbreviation
                                                                                                                     message                            Calculate confidence score
                                               information
                                                                                                                                                        of this name
        5. Developing
                                                           4. Publishing
        browser plug-
                                                           linked open
        ins to give
                                                           data via D2R
        users digested
                                                           server for
        feedback of
                                                           open access
        structuralized
                                                           and usage
        data




       Our dataset is linked to other datasets on
       linked open data cloud such as DBPedia,
       GeoNames and LODE (Linked Open Data of           3. Introducing content management
       Ecology) so it can have benefit from the large
       amount of meta-information they provide.         system Drupal for easier data                                      Algorithms used to recognize abbreviations
                                                        management (including error                                        of vernacular names and place names
                                                        correction) and display

Más contenido relacionado

Más de Dongpo Deng

20140710 tca gsdi
20140710 tca gsdi20140710 tca gsdi
20140710 tca gsdi
Dongpo Deng
 
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation DataSocial Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
Dongpo Deng
 
20131106 acm geocrowd
20131106 acm geocrowd20131106 acm geocrowd
20131106 acm geocrowd
Dongpo Deng
 
2012 ACM Geocrowd
2012 ACM Geocrowd2012 ACM Geocrowd
2012 ACM Geocrowd
Dongpo Deng
 
物種學名與地理空間資訊處理
物種學名與地理空間資訊處理物種學名與地理空間資訊處理
物種學名與地理空間資訊處理
Dongpo Deng
 
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief MappingOpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
Dongpo Deng
 
SOTM Taiwan 2013 Opening Session
SOTM Taiwan 2013 Opening SessionSOTM Taiwan 2013 Opening Session
SOTM Taiwan 2013 Opening Session
Dongpo Deng
 

Más de Dongpo Deng (20)

農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
 
開放街圖社群經營的不等式
開放街圖社群經營的不等式開放街圖社群經營的不等式
開放街圖社群經營的不等式
 
2016年歐洲資料論壇
2016年歐洲資料論壇2016年歐洲資料論壇
2016年歐洲資料論壇
 
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
 
20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_Mapping20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_Mapping
 
Crowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so farCrowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so far
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
 
20141018_OD_meetup#3
20141018_OD_meetup#320141018_OD_meetup#3
20141018_OD_meetup#3
 
20141001 climate change&osm
20141001 climate change&osm20141001 climate change&osm
20141001 climate change&osm
 
20140721 open geomeeting
20140721 open geomeeting20140721 open geomeeting
20140721 open geomeeting
 
20140710 tca gsdi
20140710 tca gsdi20140710 tca gsdi
20140710 tca gsdi
 
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation DataSocial Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
 
TGIS 2013
TGIS 2013TGIS 2013
TGIS 2013
 
20131106 acm geocrowd
20131106 acm geocrowd20131106 acm geocrowd
20131106 acm geocrowd
 
2012 ACM Geocrowd
2012 ACM Geocrowd2012 ACM Geocrowd
2012 ACM Geocrowd
 
物種學名與地理空間資訊處理
物種學名與地理空間資訊處理物種學名與地理空間資訊處理
物種學名與地理空間資訊處理
 
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief MappingOpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
OpenStreetMap: Grassroots Mapping and Crowdsourcing Disaster Relief Mapping
 
SOTM Taiwan 2013 Opening Session
SOTM Taiwan 2013 Opening SessionSOTM Taiwan 2013 Opening Session
SOTM Taiwan 2013 Opening Session
 
The One and Many Maps: Participatory and Temporal Diversities in OpenStreetMap
The One and Many Maps: Participatory and Temporal Diversities in OpenStreetMapThe One and Many Maps: Participatory and Temporal Diversities in OpenStreetMap
The One and Many Maps: Participatory and Temporal Diversities in OpenStreetMap
 

Harvesting crowdsourcing biodiversity data from Facebook groups

  • 1. Harvesting crowdsourcing biodiversity data from Facebook groups Jason Guan-Shuo Mai1, Cheng-Hsin Hsu1, Dong-Po Deng2, De-En Lin3, Hsu-Hong Lin3, Kwang-Tsao Shao1 1 Taiwan Biodiversity Information Facility (TaiBIF), Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 2 Institute of Information Science, Academia Sinica, Taipei, Taiwan 3 Taiwan Endemic Species Research Institute, Council of Agriculture, Nantou, Taiwan The emergence of Web 2.0 enables people to contribute their biodiversity observations on the Web. These crowdsourcing biodiversity data are increasing their value in scientific studies due to the potentially broader spatial and temporal scales. However, the data provided in plain text hinder the process of data retrieval and analysis. In this study, we propose a framework to automatically structure the loose-format text so that volunteers can keep providing data in their own familiar ways, while interested citizens, biodiversity researchers and managers can benefit from the semantically structured information. We take 2 Facebook biodiversity interest groups Reptile-Road-Mortality and Enjoy-Moths as examples. 0. Crowdsourcing - Thread participants provide 2. Using natural language Post message unstructured data processing techs with Taiwan voluntarily Geographic Name and Taiwan Post Picture Catalogue of Life databases as Facebook interest groups knowledge bases to extract Comment message species vernacular names and 6. Improving place names from a thread Comment message source data Comment message quality without changing users’ … Reptile-Road-Mortality Enjoy-Moths What a typical discussion thread own familiar looks like. ways 1. Crawling data from Facebook via its API Our algorithm picks a most related species name appearing in a thread based on social networking characteristics. Semantic annotation tool disambiguates For each vernacular name in TaiCOL do: toponymic occurs in the message? Full-matched homonyms 細紋南蛇 Yes name No occurs in the Prefix3 message? Postfix2 occurs in the thread? 細紋南 Yes 南蛇 Yes No No occurs in the One click on a message? message to recognize species Main Prefix2 細紋 Yes Postfix1 蛇 No Yes No vernacular names and related Database Name doesn’t exist in the Matched abbreviation message Calculate confidence score information of this name 5. Developing 4. Publishing browser plug- linked open ins to give data via D2R users digested server for feedback of open access structuralized and usage data Our dataset is linked to other datasets on linked open data cloud such as DBPedia, GeoNames and LODE (Linked Open Data of 3. Introducing content management Ecology) so it can have benefit from the large amount of meta-information they provide. system Drupal for easier data Algorithms used to recognize abbreviations management (including error of vernacular names and place names correction) and display