Injustice - Developers Among Us (SciFiDevCon 2024)
Unlocking Big Data through Analytics and Search - Big Data Cloud - June 3 Meetup
1. Big Data Cloud Meetup Big Data & Cloud Computing - Help, Educate & Demystify. June 3rd 2011
2. Kitenga, Mark Davis CTO June 3rd 2011 Meetup Unlocking Big Data through Analytics and Search
3. Big Data Enormous transactional data Enormous unstructured information Too big for databases New tools are needed
4. Unstructured data explosion Multimedia Content Text Imagery Audio Video Sensor Streams Biometric data 3D Text Email Documents Web pages Tweets Posts <5% Structured Enterprise Data Datawarehouse CDRs Financial records Access logs 4
5. Big Data Trillions of user interactions/transactions == Big Data >100M <10M <1M Open source MySQL PHP Data warehousing Parallel SQL Big hardware NoSQL Hadoop/MapReduce Hbase/HIVE Emerging technologies Traditional (DBMS-based) solutions 5
8. Information Extraction Machine-Learning Finite State Transducer Finite State Transducer Finite State Transducer Parts-of-Speech Tagging Lemmatization Tokenization
10. Defense Intelligence Analyst support staff needs to convert raw data into actionable intelligence 10 Named Entity Extraction Image tagging Video analytics Linkage Analysis Network Visualization Search Improve Force Effectiveness Hadoop/MapReduce, GPUs, HDFS, Hbase, SOLR Situation Reports Geo-tagged Imagery US Army Navy DHS NSA
11. CASE STUDY: US ARMY 11 The Solution >200 data feeds <0.5s queries Fast analysis cycles Machine Learning Analytics Biometrics Linkage Analysis Face recognition Video tagging Collaborative systems Analysis Bottlenecks 200 data feeds Unacceptable response time Analysts avoid complete searches Basic entity extraction Slow analysis cycles Distribution by PowerPoint Enabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene, NoSQL, Hbase Enabling Technologies: Oracle and custom thick clients
12. Pharma Bioinformatics Increase speed of drug discovery 12 Biological Named Entity Extraction Author Name Extraction and Normalization Linkage Analysis Timelines Facetted Search ZettaVox Faster Discovery Hadoop/MapReduce, HDFS, Hbase, GPUs, SOLR Patents Genetic Sequence Data Journal Articles
18. Summary Big Data spans unstructured and structured data Effective tools for managing both involve understanding the differences and similarities of both Bridging the chasm between them means merging search and analytics together
20. Contact Info 20 mark@kitenga.com http://www.kitenga.com Kitenga, Inc. 2953 Bunker Hill Lane, Suite 400 Santa Clara, CA 95054 1-(408)-462-KITE 1-(253)-541-6799 (FAX)