SlideShare una empresa de Scribd logo
1 de 24
Unstructured Data in BI6th May 2011 by Monaheng Diaho Study Leader: Dr. Kotze
Unstructured data Does not reside in relational database tables. Has no predefined structure or format. Not arranged in any order.  Difficult to categorise for use in BI. Resides in several documents over multiple sources Internal (data within an organisation) External (data outside the organisation) Environmental Scanning: scanning for information about events trends and relationships in a company’s outside environment. (Sabherwal & Becerra-Fernandez 2011:85)
Environmental scanning:		                (Sabherwal  & Becerra-Fernandez 2011:85) Shows how changes in external environment may impact a company’s decision making. Predictor of improved organisational performance through monitoring external events. Includes seeking/searching and using information.
A two dimensional model proposed by Daft & Weick(1984):                     (Sabherwal  & Becerra-Fernandez 2011:86) Environmental Analysability (EA). Organisational intrusiveness (OI).
Environmental scanning cont’d Undirected viewing mode. Satisfied with limited information. Does not seek comprehensive data. Relies on irregular contacts and information. Conditioned viewing mode. Makes use of standard procedures. Relies on significant data from external reports that are widely used in industry.
Environmental scanning cont’d Searching mode. Systematically analyses data to produce market forecasts, trend analysis and intelligence reports. Willing to revise and update existing knowledge. Enacting mode. Construct own environment. Gather information by trying new behaviour and observing what happens. Experiment, test and stimulate. Ignore precedent, rules and traditional expectations.
Types of unstructured content:         (Ferguson 2011:6; McCallum 2005:49; SPSS 2003:3): HTML content (e.g. web chat, blogs and web pages) Documents (e.g. memos, research papers and articles) Forms (e.g. patent applications) Emails SMS content. Multimedia content (audio, video, images).
Examples of data sources:                (Ferguson 2011:6) Email archives. Call center transcripts. Customer feedback databases. Enterprise intranets. Enterprise content management systems. File systems. Document management systems. Social networking sites. RSSNewsfeeds.
Wittles (n.d.) asserts that : 20% of an organisations data is structured and ready for use in BI data analysis  The remaining 80% is unstructured data. Significance of unstructured data is underestimated.
The social media effect The current main driver in the upsurge of online content is social networks. Facebook statistics are used as an example.
Ferguson (2011:4)
Ferguson (2011:4)
Social Intelligence Bringing unstructured data into the decision making process. Augment structured data to optimise intelligence.
Examples of intelligence  Brand intelligence Identifying customer complaints or reviews for a product. Competitor intelligence Benchmarking marketing campaigns. Influencer intelligence Identifying trendsetters. Organisational intelligence Managing employee relations.
Examples of intelligence cont’d Crime intelligence Fraud detection. Copy detection. Organised crime detection.
Untangling unstructured data Content analytics (text mining & web mining)     The process of analysing semi-structured or unstructured content from one or more sources to derive insight that will be of business benefit. (Ferguson 2011:4)
Data acquisition Using crawlers, search and indexing technologies To identify tag and index relevant content. Multiple crawlers can be set to crawl in parallel. Crawled content can be Indexed and the index made available for analysis. Stored in a file system (e.g. Hadoop DFS, MongoDB).
Text mining system architecture(Feldman & Sanger 2007:17)
High level view text mining app      (Ferguson 2011:12)
Pros & Cons Pros Provides a deep insight for BI. Quick detection of trends. Cons Analytics are industry dependent, because each industry has unique content to utilise. Indexing large content volumes may bog down search engine performance. Content tagging may not be accurate. Crawlers may not detect some content.
Future considerations: Ensuring that user content is accurately tagged. Ensure that content is up-to-date  and relevant. Validating content sources. Identify business drivers to get the best solution. For scalability issues allocate adequate processing power to analytics.
Possible research opportunities Patent violation detection system. Questionnaire/interview analysis system. CRM content analytics. Contextual comparison and assessment. Multimedia content detection.
References Feldman, R. and Sanger, J. 2007. The text mining handbook: Advanced approaches 	 in analyzing unstructured data. New York: Cambridge University Press. Ferguson, M. 2011. Integrating and analysing unstructured data. Info360 BI Conference. Washington DC. McCallum, A. 2005. Information extraction. (http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf)Retrieved 17 February 2011. Sabherwal, R. & Becerra-Fernandez, I. 2011. Business intelligence: Practices, technologies, and management. John Wiley & Sons, Inc: New Jersey.  SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago. Wittles, G. n.d. Unstructured data offers a vast store of untapped BI value.  	(http://www.themanager.org/strategy/Unstructured_data.htm)Retrieved 19 February 2011.
END

Más contenido relacionado

La actualidad más candente

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 

La actualidad más candente (20)

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Data science
Data science Data science
Data science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Similar a Unstructured Data in BI

A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisMichele Thomas
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Write a scholarly post on the following topic and reply given 4 po.docx
Write a scholarly post on the following topic and reply given 4 po.docxWrite a scholarly post on the following topic and reply given 4 po.docx
Write a scholarly post on the following topic and reply given 4 po.docxarnoldmeredith47041
 
10 1108 jwam-09-2019-0027
10 1108 jwam-09-2019-002710 1108 jwam-09-2019-0027
10 1108 jwam-09-2019-0027kamilHussain15
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
1. What are the business costs or risks of poor data quality Sup.docx
1.  What are the business costs or risks of poor data quality Sup.docx1.  What are the business costs or risks of poor data quality Sup.docx
1. What are the business costs or risks of poor data quality Sup.docxSONU61709
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxbriancrawford30935
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475IJRAT
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moVinaOconner450
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Emily Kolvitz
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
httpowl.english.purdue.eduowlresource54401 The Pur
httpowl.english.purdue.eduowlresource54401 The Purhttpowl.english.purdue.eduowlresource54401 The Pur
httpowl.english.purdue.eduowlresource54401 The PurPazSilviapm
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programsJoseph Busch
 

Similar a Unstructured Data in BI (20)

A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
2 business intel and org data
2 business intel and org data2 business intel and org data
2 business intel and org data
 
Write a scholarly post on the following topic and reply given 4 po.docx
Write a scholarly post on the following topic and reply given 4 po.docxWrite a scholarly post on the following topic and reply given 4 po.docx
Write a scholarly post on the following topic and reply given 4 po.docx
 
10 1108 jwam-09-2019-0027
10 1108 jwam-09-2019-002710 1108 jwam-09-2019-0027
10 1108 jwam-09-2019-0027
 
Sub1579
Sub1579Sub1579
Sub1579
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
1. What are the business costs or risks of poor data quality Sup.docx
1.  What are the business costs or risks of poor data quality Sup.docx1.  What are the business costs or risks of poor data quality Sup.docx
1. What are the business costs or risks of poor data quality Sup.docx
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docx
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated mo
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...Structured data and metadata evaluation methodology for organizations looking...
Structured data and metadata evaluation methodology for organizations looking...
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
I T Evolution
I T  EvolutionI T  Evolution
I T Evolution
 
httpowl.english.purdue.eduowlresource54401 The Pur
httpowl.english.purdue.eduowlresource54401 The Purhttpowl.english.purdue.eduowlresource54401 The Pur
httpowl.english.purdue.eduowlresource54401 The Pur
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programs
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Unstructured Data in BI

  • 1. Unstructured Data in BI6th May 2011 by Monaheng Diaho Study Leader: Dr. Kotze
  • 2. Unstructured data Does not reside in relational database tables. Has no predefined structure or format. Not arranged in any order. Difficult to categorise for use in BI. Resides in several documents over multiple sources Internal (data within an organisation) External (data outside the organisation) Environmental Scanning: scanning for information about events trends and relationships in a company’s outside environment. (Sabherwal & Becerra-Fernandez 2011:85)
  • 3. Environmental scanning: (Sabherwal & Becerra-Fernandez 2011:85) Shows how changes in external environment may impact a company’s decision making. Predictor of improved organisational performance through monitoring external events. Includes seeking/searching and using information.
  • 4. A two dimensional model proposed by Daft & Weick(1984): (Sabherwal & Becerra-Fernandez 2011:86) Environmental Analysability (EA). Organisational intrusiveness (OI).
  • 5. Environmental scanning cont’d Undirected viewing mode. Satisfied with limited information. Does not seek comprehensive data. Relies on irregular contacts and information. Conditioned viewing mode. Makes use of standard procedures. Relies on significant data from external reports that are widely used in industry.
  • 6. Environmental scanning cont’d Searching mode. Systematically analyses data to produce market forecasts, trend analysis and intelligence reports. Willing to revise and update existing knowledge. Enacting mode. Construct own environment. Gather information by trying new behaviour and observing what happens. Experiment, test and stimulate. Ignore precedent, rules and traditional expectations.
  • 7. Types of unstructured content: (Ferguson 2011:6; McCallum 2005:49; SPSS 2003:3): HTML content (e.g. web chat, blogs and web pages) Documents (e.g. memos, research papers and articles) Forms (e.g. patent applications) Emails SMS content. Multimedia content (audio, video, images).
  • 8. Examples of data sources: (Ferguson 2011:6) Email archives. Call center transcripts. Customer feedback databases. Enterprise intranets. Enterprise content management systems. File systems. Document management systems. Social networking sites. RSSNewsfeeds.
  • 9. Wittles (n.d.) asserts that : 20% of an organisations data is structured and ready for use in BI data analysis The remaining 80% is unstructured data. Significance of unstructured data is underestimated.
  • 10. The social media effect The current main driver in the upsurge of online content is social networks. Facebook statistics are used as an example.
  • 13. Social Intelligence Bringing unstructured data into the decision making process. Augment structured data to optimise intelligence.
  • 14. Examples of intelligence Brand intelligence Identifying customer complaints or reviews for a product. Competitor intelligence Benchmarking marketing campaigns. Influencer intelligence Identifying trendsetters. Organisational intelligence Managing employee relations.
  • 15. Examples of intelligence cont’d Crime intelligence Fraud detection. Copy detection. Organised crime detection.
  • 16. Untangling unstructured data Content analytics (text mining & web mining) The process of analysing semi-structured or unstructured content from one or more sources to derive insight that will be of business benefit. (Ferguson 2011:4)
  • 17. Data acquisition Using crawlers, search and indexing technologies To identify tag and index relevant content. Multiple crawlers can be set to crawl in parallel. Crawled content can be Indexed and the index made available for analysis. Stored in a file system (e.g. Hadoop DFS, MongoDB).
  • 18. Text mining system architecture(Feldman & Sanger 2007:17)
  • 19. High level view text mining app (Ferguson 2011:12)
  • 20. Pros & Cons Pros Provides a deep insight for BI. Quick detection of trends. Cons Analytics are industry dependent, because each industry has unique content to utilise. Indexing large content volumes may bog down search engine performance. Content tagging may not be accurate. Crawlers may not detect some content.
  • 21. Future considerations: Ensuring that user content is accurately tagged. Ensure that content is up-to-date and relevant. Validating content sources. Identify business drivers to get the best solution. For scalability issues allocate adequate processing power to analytics.
  • 22. Possible research opportunities Patent violation detection system. Questionnaire/interview analysis system. CRM content analytics. Contextual comparison and assessment. Multimedia content detection.
  • 23. References Feldman, R. and Sanger, J. 2007. The text mining handbook: Advanced approaches in analyzing unstructured data. New York: Cambridge University Press. Ferguson, M. 2011. Integrating and analysing unstructured data. Info360 BI Conference. Washington DC. McCallum, A. 2005. Information extraction. (http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf)Retrieved 17 February 2011. Sabherwal, R. & Becerra-Fernandez, I. 2011. Business intelligence: Practices, technologies, and management. John Wiley & Sons, Inc: New Jersey. SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago. Wittles, G. n.d. Unstructured data offers a vast store of untapped BI value. (http://www.themanager.org/strategy/Unstructured_data.htm)Retrieved 19 February 2011.
  • 24. END