Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Video Analytics on Hadoop webinar victor fang-201309

2.658 visualizaciones

Publicado el

Video Analytics on Hadoop webinar.
Presented by Pivotal Data Science Team 201309.

Publicado en: Tecnología
  • Sé el primero en comentar

Video Analytics on Hadoop webinar victor fang-201309

  1. 1. A NEW PLATFORM FOR A NEW ERA
  2. 2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved. What You Can Do With Hadoop Webinar Series Unstructured Data – Video Analytics September 6, 2013 Dr. Chunsheng (Victor) Fang, Sr. Data Scientist Annika Jimenez, Global Head of Data Science Services Nikesh Shah, Sr. Product Marketing Manager
  3. 3. 3© Copyright 2013 Pivotal. All rights reserved. What You Will Learn  Pivotal Data Science Lab Services  New Emerging Trends for Unstructured Data  Video Analytics on Hadoop  Analytics with SQL
  4. 4. © Copyright 2013 Pivotal. All rights reserved. Pivotal Platform Cloud Storage Virtualization Data & Analytics Platform Cloud Application Platform Data-Driven Application Development Pivotal Data Science Labs
  5. 5. © Copyright 2013 Pivotal. All rights reserved. Pivotal Data Science
  6. 6. © Copyright 2013 Pivotal. All rights reserved. Data Science Value Chain Instrume n-tation Logs Capture Store Transfor m and Prepare Access Model Developm ent Deploy Applicatio ns Process Change Product Engineer Platform Engineer DBA Data Engineer/Progr ammer Data Engineer Data Scientist Platform Engineer Application Developer PMO
  7. 7. © Copyright 2013 Pivotal. All rights reserved. How We Help Our Customers 1. Data Science Strategy Definition 2. Point Proof-of-Value Model Development 3. Multiple Model Development + Apps 4. DSIC  Transformation to “Predictive Enterprise” 5. Also: – Algorithm development – Pushing the envelope in problem-solving Pivotal Data Science Labs
  8. 8. © Copyright 2013 Pivotal. All rights reserved. Pivotal Data Science Knowledge Development
  9. 9. © Copyright 2013 Pivotal. All rights reserved. Pivotal Data Science Dream Team • Derek Lin – Network Security, Fraud Detection, Speech and Language Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC) • Hulya Farinas – Optimization, Resource Allocation in Healthcare (Modeler at M-Factor, IBM, Ph.D. in Operations Research, University of Florida) • Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley) • Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical Informatics, Stanford) • Mariann Micsinai – Next Generation Sequencing (Market Risk Management Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and Yale) • Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist at Riverain Medical, SDE at Amazon.com, Ph.D. in Computer Sciences, University of Cincinnati) • Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in Computer Sciences, University of Wisconsin-Madison) • Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D. in Computing & Dec. Systems Eng, Arizona State University) • Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ, Ph.D. in Statistics, Australian National University) • Michael Brand –Text, Speech and Video Research for Retail, Finance and Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics, Weizmann Institute) • Kee Siong Ng – Data Mining in Healthcare (Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and Postdoctoral Fellow, Australian National University) • Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal Poly Pomona) • Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning, Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine Learning, Australian National University) • Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical Engineering, Stanford) • Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at M- Factor, M.S. in Statistics, Stanford) • Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen Marketing, B.S. in Mathematics, Kennesaw State University) • Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC) • Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex Optimization (Ph.D. in Operations Research, UC Berkeley) • Srivatsan Ramanujam – NLP and Text Mining (Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer Sciences, UT Austin) • Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in Economics/Computer Science, TU Berlin)
  10. 10. © Copyright 2013 Pivotal. All rights reserved. Data Science Labs: Packaged Services LAB PRIMER (2-Week Strategy) • Customized Analytics Roadmap • 1-day Moderated Brainstorming Session • Prioritized Opportunities • Architectural Recommendations LAB 600 (6-Week Lab) • Prof. Services (Data Load) • Data Science Model Building • Project Management • Ready-to-Deploy Model(s) LAB 1200 (12-Week Lab) • Prof. Services (Data Load) • Data Science Model Building • Project • Management • Ready-to-Deploy Model(s) LAB 100 (2-Week Lab) • On-site Pivotal Analytics Training • Rapid Model/Insight Build on Customer Data (2 weeks)
  11. 11. © Copyright 2013 Pivotal. All rights reserved. Approach: Data Science Lab 1200 Week 1 2 3 4 5 6 7 8 9 10 11 12 Data Exploration Features Building Model Development Code QA and Scoring Model Optimization & Validation Data Loaded Insights Presentation Training Preliminary Model Review Feature Review Data Review Documentation
  12. 12. © Copyright 2013 Pivotal. All rights reserved. Program Management Data Architecture and Engineering Data Scientists Training and Skills Development  Facilitate data loading processes from source systems to Pivotal Data Fabric  Coordinate data needs with Data Scientists  Best practice education for analytics performance  Data migration to support new applications  Oversight and communication plans  Organizational alignment  Risk mitigation  Resource planning  Prioritize deliverables  Socialize progress of overall initiative  Instill data collaboration culture  Execute Data Science Lab engagements around revenue generation or cost saving efforts  Hands on education with new data analysis techniques  Introduce new analytics tools and methodologies  Identify candidates for deeper data science training  Create training curriculum  Recruiting Methodology  Parallel computing techniques defined and demonstrated  Build institutional knowledge for client data science team Data Science Innovation Center (DSIC) Key Principles • Building a predictive enterprise is, first and foremost, about building a human infrastructure. • Analytics is an iterative knowledge discovery process and needs to be managed as such. • Discovery starts from asking the right questions – that can be as important as finding answers to those questions.
  13. 13. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. Large Scale Video Analytics Platform on Hadoop Dr. Chunsheng (Victor) Fang, Sr. Data Scientist
  14. 14. © Copyright 2013 Pivotal. All rights reserved. Pivotal Video Analytics Taskforce  Chunsheng (Victor) Fang, Ph.D. – Sr. Data Scientist  Regunathan Radhakrishnan, Ph.D. – Sr. Data Scientist  Derek Lin, – Principal Data Scientist  Sameer Tiwari – Hadoop Architect Kenneth Dowling & Michael Nemesh – DCA Admin
  15. 15. 16© Copyright 2013 Pivotal. All rights reserved. Industry Use Case Surveillance Video Anomaly Detection
  16. 16. © Copyright 2013 Pivotal. All rights reserved. Anomaly Detection in Surveillance Video  Detect anomalous objects in a restricted perimeter.  Typical large enterprise collects TB’s video per day.  Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events.  Post-Incident monitoring enabled by Hadoop / HAWQ.
  17. 17. © Copyright 2013 Pivotal. All rights reserved. Unstructured Video Data Workflow  Unstructured data as input  ETL: Distributed Video Transcoder  Analytics: Distributed Video Analytics  Structured Insights in relational database for advanced analytics ETL Analytics Unstructured Data Structured Insights
  18. 18. © Copyright 2013 Pivotal. All rights reserved. Real World Video Data • Benchmark Surveillance Videos (i-LIDS) from United Kingdom Home Office – Library of HiDef CCTV video footage based around ‘scenarios’ central to the government’s requirements. – The footage accurately represents real operating conditions and potential threats. • Anomaly Detection: Sterile zone dataset Night Day
  19. 19. © Copyright 2013 Pivotal. All rights reserved. Most Common Video Standards MPEG & ITU: responsible for many video standards MPEG-2 (1995): Widely adopted, DVDs, Digital TV broadcast, set-top boxes
  20. 20. © Copyright 2013 Pivotal. All rights reserved. Intro to MPEG Standard  MPEG standard encodes video frames – Redundancy in time: inter-frame encoding – Redundancy in space: intra-frame encoding  Motion compensation – I-frame: (Key frame) intra-frame encoding – P-frame: (Predicted frame) Predicting regions of current frame from previous frame – B-frame: (Bi-predictive frame) Predicting regions of current frame using both previous and next frame
  21. 21. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved. Distributed Video Transcoder on Hadoop Distributed MapReduce MPEG Transcoder
  22. 22. © Copyright 2013 Pivotal. All rights reserved. Motivation of Distributed Video Transcoding  Can we decode the individual frames from an arbitrary block in Hadoop File System (HDFS)?  Hadoop splits any file into 64MB or 128MB blocks in HDFS.  Each block can be processed in parallel by customized Map-Reduce function  Most video file standards are Not Hadoop-Friendly.
  23. 23. © Copyright 2013 Pivotal. All rights reserved. Decoding MPEG-2 with MapReduce  Two key observations – Video header information: available only at the header in the bitstream – Group of Pictures (GOP) header repeats  Steps to decode arbitrary blocks – Step 1: Configure each mapper to extract the header information from each file; ▪ Totals ~20 videos at 5GB – Step 2: Start searching for GOP header in each block in parallel; – Step 3: Decode frames into a suitable image format (JPEG, BMP, etc); – Step 4: Consolidate all time-stamped frames into Hadoop Sequence File. ▪ Reduces to sequence file at 500MB Transcoding MPEG-2 video into Hadoop-friendly format
  24. 24. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. Distributed Video Analytics Platform on Hadoop
  25. 25. © Copyright 2013 Pivotal. All rights reserved. Object Detection with Gaussian Mixture Model • The video data is much more noisier than we realize. • You don’t realize it because your visual cortex can denoise. • For computer, it requires good statistical models (e.g. GMM) for robustness. Distribution of pixel intensities over time
  26. 26. © Copyright 2013 Pivotal. All rights reserved. Typical Video Analytics Workflow  Video/image data are highly unstructured  Hadoop proven to be excellent in extracting structured insights from Big Data  A typical workflow: ANALYTIC RESULT Foreground Extraction Background Stat Model Visual Key Composite Key Feature Extraction /Classification ((Key, Time), Loc)
  27. 27. © Copyright 2013 Pivotal. All rights reserved. Use Case 1: Anomaly Detection  Extracting structured info from Unstructured data  Computer vision algorithms fit into Mapper/Reducer framework  Intermediate (Key, Value) – (RestrictedArea, IntrusionEvent(Time, ViolatorImage) ) Map Reduc e HDFS Map Map Map HDFS / GPDB Reduc e Reduc e 2012-09-01 07:00:00
  28. 28. © Copyright 2013 Pivotal. All rights reserved. Use Case 2: Trajectory Analysis  Tracking multiple objects in Big Data video archives  Building high level summarization e.g. moving trajectory time series T1 T2 T3 T4 T5 T6
  29. 29. © Copyright 2013 Pivotal. All rights reserved. Use Case 2: Trajectory Analysis “Map” Map Foreground Extraction Background Stat Model Visual Key Composite Key Feature Extraction /Classification ((VisKey, time), loc) Emit(K,V)
  30. 30. © Copyright 2013 Pivotal. All rights reserved. Use Case 2: Trajectory Analysis “Reduce” Reduce Aggregate User defined Trajectory model (Object, Trajectory) 2nd Sort on Composite key ((VisKey, time), loc)
  31. 31. © Copyright 2013 Pivotal. All rights reserved. Video Analytics Platform Supports  Video ETL – Support standard formats: MPG, AVI, MP4. – Sequence file in HDFS  Image Processing Toolkit – Support standard formats (e.g. JPEG, BMP, PNG) – Color space conversion – Edge/key point detection – Morphological processing – Filtering: convolutional, median, etc.  PHD MapReduce for scalable computer vision algorithms  HAWQ SQL for high level analytics
  32. 32. 34© Copyright 2013 Pivotal. All rights reserved. Video Analytics Demo
  33. 33. © Copyright 2013 Pivotal. All rights reserved. Performance Quick Facts  Each frame takes 103 millisecond to process a 720x576 video frame (near real time even in Java)  Detection algorithm: Linearly scale with #processors • Impacts: • Enhance public security • Improve security officers’ producitivity
  34. 34. © Copyright 2013 Pivotal. All rights reserved. Querying the Analytics Results • Average speed of the red car on yesterday, using window function SELECT sqrt(power(avg(abs(x_diff)),2) + power(avg(abs(y_diff)),2))*FPS_MPS_FACTOR FROM ( SELECT X-lag(X,1) OVER (ORDER BY TIME ) AS x_diff, Y-lag(Y,1) OVER (ORDER BY TIME ) AS y_diff FROM SANMATEO WHERE TARGET = AND TIME > (CURRENT_TIMESTAMP – INTERVAL ‘1’ DAY) AND TIME < (CURRENT_TIMESTAMP ); ) x_tmp; • RESULT: • 7.2 mph
  35. 35. © Copyright 2013 Pivotal. All rights reserved. More Use Cases  Most of computer vision algorithms are embarrassingly parallel  No data sharing between processes – Feature extraction – Object detection/classification  Video Categorization for user generated contents – Find out trending in Youtube videos by topic modeling  Object Detection – Detect known categories of objects, e.g. face, bar code, vehicle.  Object Search – Given a known object, using template matching to locate the object Haar-like + AdaBoost Cascade Face Detector
  36. 36. © Copyright 2013 Pivotal. All rights reserved. Summary  Hadoop : a great tool for data scientists to crunch Unstructured Big Data!  Hadoop extracts Structured insights from Unstructured video with customized computer vision algorithms.  Scalable framework with ease of experimenting, developing, deploying!  Pivotal HD demonstrates large scale video analytics use cases: – Anomaly detection – Trajectory analysis – More …
  37. 37. 48© Copyright 2013 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved. Q&A
  38. 38. © Copyright 2013 Pivotal. All rights reserved. More Information Pivotal Blog Site August 12, 2013 Large Scale Video Analytics Contact the Data Science Lab Services info@gopivotal.com
  39. 39. 50© Copyright 2013 Pivotal. All rights reserved. 50© Copyright 2013 Pivotal. All rights reserved. Thank You
  40. 40. A NEW PLATFORM FOR A NEW ERA

×