Unstructured Data in BI

Unstructured Data in BI6th May 2011 by Monaheng Diaho Study Leader: Dr. Kotze

Unstructured data Does not reside in relational database tables. Has no predefined structure or format. Not arranged in any order. Difficult to categorise for use in BI. Resides in several documents over multiple sources Internal (data within an organisation) External (data outside the organisation) Environmental Scanning: scanning for information about events trends and relationships in a company’s outside environment. (Sabherwal & Becerra-Fernandez 2011:85)

Environmental scanning: (Sabherwal & Becerra-Fernandez 2011:85) Shows how changes in external environment may impact a company’s decision making. Predictor of improved organisational performance through monitoring external events. Includes seeking/searching and using information.

A two dimensional model proposed by Daft & Weick(1984): (Sabherwal & Becerra-Fernandez 2011:86) Environmental Analysability (EA). Organisational intrusiveness (OI).

Environmental scanning cont’d Undirected viewing mode. Satisfied with limited information. Does not seek comprehensive data. Relies on irregular contacts and information. Conditioned viewing mode. Makes use of standard procedures. Relies on significant data from external reports that are widely used in industry.

Environmental scanning cont’d Searching mode. Systematically analyses data to produce market forecasts, trend analysis and intelligence reports. Willing to revise and update existing knowledge. Enacting mode. Construct own environment. Gather information by trying new behaviour and observing what happens. Experiment, test and stimulate. Ignore precedent, rules and traditional expectations.

Types of unstructured content: (Ferguson 2011:6; McCallum 2005:49; SPSS 2003:3): HTML content (e.g. web chat, blogs and web pages) Documents (e.g. memos, research papers and articles) Forms (e.g. patent applications) Emails SMS content. Multimedia content (audio, video, images).

Examples of data sources: (Ferguson 2011:6) Email archives. Call center transcripts. Customer feedback databases. Enterprise intranets. Enterprise content management systems. File systems. Document management systems. Social networking sites. RSSNewsfeeds.

Wittles (n.d.) asserts that : 20% of an organisations data is structured and ready for use in BI data analysis The remaining 80% is unstructured data. Significance of unstructured data is underestimated.

The social media effect The current main driver in the upsurge of online content is social networks. Facebook statistics are used as an example.

Social Intelligence Bringing unstructured data into the decision making process. Augment structured data to optimise intelligence.

Examples of intelligence Brand intelligence Identifying customer complaints or reviews for a product. Competitor intelligence Benchmarking marketing campaigns. Influencer intelligence Identifying trendsetters. Organisational intelligence Managing employee relations.

Examples of intelligence cont’d Crime intelligence Fraud detection. Copy detection. Organised crime detection.

Untangling unstructured data Content analytics (text mining & web mining) The process of analysing semi-structured or unstructured content from one or more sources to derive insight that will be of business benefit. (Ferguson 2011:4)

Data acquisition Using crawlers, search and indexing technologies To identify tag and index relevant content. Multiple crawlers can be set to crawl in parallel. Crawled content can be Indexed and the index made available for analysis. Stored in a file system (e.g. Hadoop DFS, MongoDB).

Text mining system architecture(Feldman & Sanger 2007:17)

High level view text mining app (Ferguson 2011:12)

Pros & Cons Pros Provides a deep insight for BI. Quick detection of trends. Cons Analytics are industry dependent, because each industry has unique content to utilise. Indexing large content volumes may bog down search engine performance. Content tagging may not be accurate. Crawlers may not detect some content.

Future considerations: Ensuring that user content is accurately tagged. Ensure that content is up-to-date and relevant. Validating content sources. Identify business drivers to get the best solution. For scalability issues allocate adequate processing power to analytics.

Possible research opportunities Patent violation detection system. Questionnaire/interview analysis system. CRM content analytics. Contextual comparison and assessment. Multimedia content detection.

References Feldman, R. and Sanger, J. 2007. The text mining handbook: Advanced approaches in analyzing unstructured data. New York: Cambridge University Press. Ferguson, M. 2011. Integrating and analysing unstructured data. Info360 BI Conference. Washington DC. McCallum, A. 2005. Information extraction. (http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf)Retrieved 17 February 2011. Sabherwal, R. & Becerra-Fernandez, I. 2011. Business intelligence: Practices, technologies, and management. John Wiley & Sons, Inc: New Jersey. SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago. Wittles, G. n.d. Unstructured data offers a vast store of untapped BI value. (http://www.themanager.org/strategy/Unstructured_data.htm)Retrieved 19 February 2011.

Unstructured Data in BI

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Unstructured Data in BI

Similar a Unstructured Data in BI (20)

Último

Último (20)

Unstructured Data in BI