SlideShare una empresa de Scribd logo
1 de 12
3rd Socio-Cultural Data Summit
            National Defense University
Center for Technology and National Security Policy
Admin

• Unclassified conference

• Chatham House rules

• Lunch in the new fiscal reality (the cafeteria)

• We have breaks and time built into our schedule to continue
  discussions or to sidebar




                                                                    2
Data Summit(s) Objective

• “Good” data are required for reliable analysis.

   − Socio-cultural data of any sort are hard to find.

   − When we do find them, they are messy, fragmented,
     disorganized, poorly measured, etc.

• These Data Summits are committed to fostering a community that is
  interested in finding, evaluating, collecting, cleaning up, smartly
  integrating, and then using socio-cultural data against applied
  problems with scientific rigor.

   − Focus on a broad community with as few restrictions as possible.

   − Focus on rigor and science without sacrificing the ability to
     conduct real world applications.

                                                                        3
Logical Progression of these Data Efforts

1. DataCards: quick and dirty effort to find, tag, and index data of all
   sorts for as many audiences as possible to reduce search costs for
   socio-cultural data.

2. First Data Summit: Take a first cut at data evaluation criteria and
   beat the heck out of it in working groups so that can start to
   evaluate socio-cultural data that we’ve found.

3. Second Data Summit: Expand the aperture on what constitutes
   data and relate working group insights back to prior evaluation
   criteria and lessons learned for continuing to find and define data.

4. Third Data Summit: Start to tackle the complex issue of “how we
   put the data together” once we have found it.

......more working groups focused on areas where we perceive we can
make concrete progress on data integration, cleaning, and fusion.
                                                                           4
DataCards Overview

• DataCards is a structured wiki-like platform that uses “cards” (like card
  catalog cards or baseball cards) to index and describe key details re:
  socio-cultural (and related) data sources.
• Objectives of DataCards include:
   – Make sources of data discoverable.
   – Reduce search costs for data.
   – Conduit to discover and share data sources between and among
     non-traditional, academic, NGO, defense, law enforcement, and
     intelligence communities.
• Accessing DataCards:
   − Commercial Internet: http://www.datacards.org/
   − Development Site: http://beta.datacards.org/
   − SIPRNet: by request, hosted by OSD CAPE



                                                                              5
DataCards Content/Usage Update


• Total cards: 1,682
  (2,416 pending additional cards)




                            • Total datacards.org users: 537




• Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average
  time/visit; multiple visits from 28 countries


                                                                        6
Related to DataCards




                   7
Summary of 1st Data Summit

• Data, and the quality of the data, used for applied socio-cultural work for the
  DoD and other agencies is generally poor.
   • Often general and hard to apply to real world situations
   • Rarely evaluated, and even more rarely evaluated objectively
• Worked on data evaluation criteria so that a “smart person” isn’t needed to
  evaluate data sources.
   • Smart people used to create the criteria, and will use “smart people in
      training” to apply the ratings.
   • The ratings shouldn’t rely on the experience of the rater, but on the
      quality of the criteria.
• The effort acknowledged that one size does not fit all requirements, and
  criteria should be flexible enough to accommodate a variety of conceptions of
  what constitutes “data.”
• DataCards assists consumers of socio-cultural data to rapidly find the data they
  need. The evaluation criteria help assess suitability and quality of possible data
  sources for their desired application.

                                                                                       8
Summary of 2nd Data Summit
• “Data” is a user-defined term; it is not specific to one particular type of data.
  DataCards is a platform with a wide user base with varied data needs.
  DataCards should seek to assist with the discovery and evaluation of data
  sources.
• Big data is a growing field of interest within analytical and knowledge
  communities. Big data, which was defined by the complexity, structure, and
  size of data, is not just social media but is generally transactional in
  nature, including financial transactions, SMS, and search engine results.
• Many data sources are qualitative in nature and cannot be analyzed and
  machine processed the way quantitative or geospatial data are processed and
  analyzed.
• The most important considerations for users of geospatial data require robust
  searching capabilities, a minimal path to finding data, and complete data.
• There is no one way that individuals use to find data. Discovery is often project
  specific and individuals tend to establish and follow predictable patterns of
  behavior when finding data because certain sources tend to be proven
  relevant and trustworthy.
                                                                                      9
What is this Summit About?

• This summit is about getting the mess of socio-cultural “stuff” we
  often call data into a usable analytic format.
• The first panel focuses on two unique and innovative approaches
  toward putting data together for intelligence and analytic purposes;
  and a Phase 3 IARPA program that is rapidly fusing data in support of
  the intelligence community’s requirements for integrated and
  disparate data.
• The second panel focuses on two of the major types of data that are
  often trumpeted as the silver bullet to understanding all things
  socio-cultural: social media and polling/surveys. However, these are
  great case studies in the potential pitfalls of data aggregation
  without careful thought about what it is you are putting together.




                                                                          10
What is this Summit About? (continued)

• The third panel provides three approaches to dealing with socio-
  cultural data, with moderate technical detail. This includes a look at
  the application of statistics to missing data, the dirty work of getting
  socio-cultural data ready for a DARPA program, and dealing with
  situations where socio-cultural data are sparse.
• Tomorrow, the fourth panel will focus on scientific and technical
  approaches to information extraction and data fusion challenges.
• The fifth panel will offer up thoughts on three compelling and
  promising areas for socio-cultural data integration: geospatial data
  of multiple resolutions, qualitative/subject matter expert-derived
  data, and human geography data.
• We’ll end after lunch with a discussion about how we as a
  community want to proceed on this conquest.


                                                                             11
What Do I Want to Get Out Of this Summit?

• Community-building and the invigoration of new ideas to support
  better work with socio-cultural data.
• Feedback on what methods we are missing and what has merit.
• Feedback on what the forward operator needs from a group like
  this—this includes the warfighter, but also law enforcement
  officers, NGOs, partner nations, foreign service officers, economic
  development professionals: anyone working in the field to make a
  difference.




                                                                        12

Más contenido relacionado

La actualidad más candente

Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Information Consolidation
Information ConsolidationInformation Consolidation
Information ConsolidationKishor Sakariya
 
Library as a knowledge management centre
Library as a knowledge management centreLibrary as a knowledge management centre
Library as a knowledge management centrePrasanna Iyer
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics CourseSyracuse University
 
Transforming The Academic Library Services For Generation Y Using Knowledge M...
Transforming The Academic Library Services For Generation Y Using Knowledge M...Transforming The Academic Library Services For Generation Y Using Knowledge M...
Transforming The Academic Library Services For Generation Y Using Knowledge M...tulipbiru64
 
Knowledge management and the role of libraries
Knowledge management and the role of librariesKnowledge management and the role of libraries
Knowledge management and the role of librariespatrickalfredwaluchio
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information SharingWalter Kitchenman
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web RingValeria Pesce
 
Handout for Planning and Implementing a Digital Library Project
Handout for Planning and Implementing a Digital Library ProjectHandout for Planning and Implementing a Digital Library Project
Handout for Planning and Implementing a Digital Library ProjectJenn Riley
 

La actualidad más candente (10)

Wherefore libraries
Wherefore librariesWherefore libraries
Wherefore libraries
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Information Consolidation
Information ConsolidationInformation Consolidation
Information Consolidation
 
Library as a knowledge management centre
Library as a knowledge management centreLibrary as a knowledge management centre
Library as a knowledge management centre
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics Course
 
Transforming The Academic Library Services For Generation Y Using Knowledge M...
Transforming The Academic Library Services For Generation Y Using Knowledge M...Transforming The Academic Library Services For Generation Y Using Knowledge M...
Transforming The Academic Library Services For Generation Y Using Knowledge M...
 
Knowledge management and the role of libraries
Knowledge management and the role of librariesKnowledge management and the role of libraries
Knowledge management and the role of libraries
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information Sharing
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web Ring
 
Handout for Planning and Implementing a Digital Library Project
Handout for Planning and Implementing a Digital Library ProjectHandout for Planning and Implementing a Digital Library Project
Handout for Planning and Implementing a Digital Library Project
 

Destacado

Original Images Powerpoint
Original Images PowerpointOriginal Images Powerpoint
Original Images Powerpointpaigeh1995
 
How NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataHow NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataDataCards
 
презентация по информатике
презентация по информатикепрезентация по информатике
презентация по информатикеSEZY216
 
Research on Film Covers
Research on Film CoversResearch on Film Covers
Research on Film Coverspaigeh1995
 
The Challenges and Pitfalls of Aggregating Social Media Data
The Challenges and Pitfalls of Aggregating Social Media DataThe Challenges and Pitfalls of Aggregating Social Media Data
The Challenges and Pitfalls of Aggregating Social Media DataDataCards
 
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...DataCards
 

Destacado (7)

Original Images Powerpoint
Original Images PowerpointOriginal Images Powerpoint
Original Images Powerpoint
 
3D animation
3D animation3D animation
3D animation
 
How NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataHow NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling Data
 
презентация по информатике
презентация по информатикепрезентация по информатике
презентация по информатике
 
Research on Film Covers
Research on Film CoversResearch on Film Covers
Research on Film Covers
 
The Challenges and Pitfalls of Aggregating Social Media Data
The Challenges and Pitfalls of Aggregating Social Media DataThe Challenges and Pitfalls of Aggregating Social Media Data
The Challenges and Pitfalls of Aggregating Social Media Data
 
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
Alignment and Analytics of Large Scale, Disparate Data from IARPA's Knowledge...
 

Similar a Data Integration Challenges and Approaches

Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open ScienceMark Parsons
 
Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online caniceconsulting
 
Practical Data Management Plans
Practical Data Management PlansPractical Data Management Plans
Practical Data Management PlansIUPUI
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...e-ROSA
 
Digital Social Innovation - Data for Good & Open Calgary
Digital Social Innovation - Data for Good & Open CalgaryDigital Social Innovation - Data for Good & Open Calgary
Digital Social Innovation - Data for Good & Open Calgaryopencalgary
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International DevelopmentAlex Rascanu
 
Citizen centric approaches to Social Media analysis (CaSMa)
Citizen centric approaches to Social Media analysis (CaSMa)Citizen centric approaches to Social Media analysis (CaSMa)
Citizen centric approaches to Social Media analysis (CaSMa)Ansgar Koene
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 

Similar a Data Integration Challenges and Approaches (20)

Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Data literacy
Data literacyData literacy
Data literacy
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online
 
Open Sesame: Open Data, Data Liberation and Opportunities for Librarians
Open Sesame: Open Data, Data Liberation and Opportunities for LibrariansOpen Sesame: Open Data, Data Liberation and Opportunities for Librarians
Open Sesame: Open Data, Data Liberation and Opportunities for Librarians
 
Practical Data Management Plans
Practical Data Management PlansPractical Data Management Plans
Practical Data Management Plans
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
 
Digital Social Innovation - Data for Good & Open Calgary
Digital Social Innovation - Data for Good & Open CalgaryDigital Social Innovation - Data for Good & Open Calgary
Digital Social Innovation - Data for Good & Open Calgary
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
Citizen centric approaches to Social Media analysis (CaSMa)
Citizen centric approaches to Social Media analysis (CaSMa)Citizen centric approaches to Social Media analysis (CaSMa)
Citizen centric approaches to Social Media analysis (CaSMa)
 
Cambridgeshire open data
Cambridgeshire open dataCambridgeshire open data
Cambridgeshire open data
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 

Data Integration Challenges and Approaches

  • 1. 3rd Socio-Cultural Data Summit National Defense University Center for Technology and National Security Policy
  • 2. Admin • Unclassified conference • Chatham House rules • Lunch in the new fiscal reality (the cafeteria) • We have breaks and time built into our schedule to continue discussions or to sidebar 2
  • 3. Data Summit(s) Objective • “Good” data are required for reliable analysis. − Socio-cultural data of any sort are hard to find. − When we do find them, they are messy, fragmented, disorganized, poorly measured, etc. • These Data Summits are committed to fostering a community that is interested in finding, evaluating, collecting, cleaning up, smartly integrating, and then using socio-cultural data against applied problems with scientific rigor. − Focus on a broad community with as few restrictions as possible. − Focus on rigor and science without sacrificing the ability to conduct real world applications. 3
  • 4. Logical Progression of these Data Efforts 1. DataCards: quick and dirty effort to find, tag, and index data of all sorts for as many audiences as possible to reduce search costs for socio-cultural data. 2. First Data Summit: Take a first cut at data evaluation criteria and beat the heck out of it in working groups so that can start to evaluate socio-cultural data that we’ve found. 3. Second Data Summit: Expand the aperture on what constitutes data and relate working group insights back to prior evaluation criteria and lessons learned for continuing to find and define data. 4. Third Data Summit: Start to tackle the complex issue of “how we put the data together” once we have found it. ......more working groups focused on areas where we perceive we can make concrete progress on data integration, cleaning, and fusion. 4
  • 5. DataCards Overview • DataCards is a structured wiki-like platform that uses “cards” (like card catalog cards or baseball cards) to index and describe key details re: socio-cultural (and related) data sources. • Objectives of DataCards include: – Make sources of data discoverable. – Reduce search costs for data. – Conduit to discover and share data sources between and among non-traditional, academic, NGO, defense, law enforcement, and intelligence communities. • Accessing DataCards: − Commercial Internet: http://www.datacards.org/ − Development Site: http://beta.datacards.org/ − SIPRNet: by request, hosted by OSD CAPE 5
  • 6. DataCards Content/Usage Update • Total cards: 1,682 (2,416 pending additional cards) • Total datacards.org users: 537 • Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average time/visit; multiple visits from 28 countries 6
  • 8. Summary of 1st Data Summit • Data, and the quality of the data, used for applied socio-cultural work for the DoD and other agencies is generally poor. • Often general and hard to apply to real world situations • Rarely evaluated, and even more rarely evaluated objectively • Worked on data evaluation criteria so that a “smart person” isn’t needed to evaluate data sources. • Smart people used to create the criteria, and will use “smart people in training” to apply the ratings. • The ratings shouldn’t rely on the experience of the rater, but on the quality of the criteria. • The effort acknowledged that one size does not fit all requirements, and criteria should be flexible enough to accommodate a variety of conceptions of what constitutes “data.” • DataCards assists consumers of socio-cultural data to rapidly find the data they need. The evaluation criteria help assess suitability and quality of possible data sources for their desired application. 8
  • 9. Summary of 2nd Data Summit • “Data” is a user-defined term; it is not specific to one particular type of data. DataCards is a platform with a wide user base with varied data needs. DataCards should seek to assist with the discovery and evaluation of data sources. • Big data is a growing field of interest within analytical and knowledge communities. Big data, which was defined by the complexity, structure, and size of data, is not just social media but is generally transactional in nature, including financial transactions, SMS, and search engine results. • Many data sources are qualitative in nature and cannot be analyzed and machine processed the way quantitative or geospatial data are processed and analyzed. • The most important considerations for users of geospatial data require robust searching capabilities, a minimal path to finding data, and complete data. • There is no one way that individuals use to find data. Discovery is often project specific and individuals tend to establish and follow predictable patterns of behavior when finding data because certain sources tend to be proven relevant and trustworthy. 9
  • 10. What is this Summit About? • This summit is about getting the mess of socio-cultural “stuff” we often call data into a usable analytic format. • The first panel focuses on two unique and innovative approaches toward putting data together for intelligence and analytic purposes; and a Phase 3 IARPA program that is rapidly fusing data in support of the intelligence community’s requirements for integrated and disparate data. • The second panel focuses on two of the major types of data that are often trumpeted as the silver bullet to understanding all things socio-cultural: social media and polling/surveys. However, these are great case studies in the potential pitfalls of data aggregation without careful thought about what it is you are putting together. 10
  • 11. What is this Summit About? (continued) • The third panel provides three approaches to dealing with socio- cultural data, with moderate technical detail. This includes a look at the application of statistics to missing data, the dirty work of getting socio-cultural data ready for a DARPA program, and dealing with situations where socio-cultural data are sparse. • Tomorrow, the fourth panel will focus on scientific and technical approaches to information extraction and data fusion challenges. • The fifth panel will offer up thoughts on three compelling and promising areas for socio-cultural data integration: geospatial data of multiple resolutions, qualitative/subject matter expert-derived data, and human geography data. • We’ll end after lunch with a discussion about how we as a community want to proceed on this conquest. 11
  • 12. What Do I Want to Get Out Of this Summit? • Community-building and the invigoration of new ideas to support better work with socio-cultural data. • Feedback on what methods we are missing and what has merit. • Feedback on what the forward operator needs from a group like this—this includes the warfighter, but also law enforcement officers, NGOs, partner nations, foreign service officers, economic development professionals: anyone working in the field to make a difference. 12