Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Three Big Data Case Studies

3.777 visualizaciones

Publicado el

Takes you to the fundamentals of Big Data. Has real life examples. Also find out why you may or may not need Big Data.

Publicado en: Tecnología

Three Big Data Case Studies

  2. 2. Great use cases of Big Data Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 3600 View of the Customer Extend existing customer views (CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results
  3. 3. • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue Why Big Data File Systems Relational Data Content Mgmt Email CRM Supply Chain ERP RSS Feeds Cloud Custom SourcesDataViews Applications/ Users
  4. 4. Atidan Approach Implement a Hadoop- centric reference architecture Move enterprise batch processing to Hadoop Make Hadoop the single point of truth Massively reduce ETL by transforming within Hadoop Move results and aggregates back to legacy systems for consumption Retain, within Hadoop, source files at the finest granularity for re-use Top Criteria • Allow users to use familiar consumption interfaces (web, mobile) • Enable businesses to unlock previously unusable data Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Ingest BigData ArchitectureHighlevel
  5. 5. Atidan Case Study Usage Analysis using Hadoop • Business Need • A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs • The logs received from IIS were stored in multiple files e.g. Daily logs • The data had free text, it was unstructured and it also contained irrelevant data • The exact analysis criteria/parameters/desired outcome were not pre-known • Solution • Traditional RDBMS could not handle the problem due to the type and volume of the data and the uncertainty around ultimate analysis criteria • Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily • The solution was fault tolerant to data inconsistencies • Hadoop provided elasticity to incremental data addition • Scalability in the range of Peta Bytes • Based on data size and complexity, the processing can be scaled from one node to 100 nodes • Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage in the project • The organization got completely new and unexpected insights on employee, customer and vendor/partner behavior • Correlations between employee’s usage pattern and attrition as well as productivity were established
  6. 6. Atidan Case Study Usage Analysis using Hadoop 0 2000 4000 6000 8000 10000 12000 14000 Accepted… BadRequest… Created(201) Forbidden… Not… NotFound… OK(200) Unauthorise… Request Types 0 200 400 600 800 1000 1200 January March May July September November January March May July September November 2001 2002 Monthly Requests 0 200000 400000 600000 Amare Amit Bhagat Mukesh Praneel Sanjog Vimal Users
  7. 7. • The size of data being collected and analyzed in industry for business intelligence (BI) is growing rapidly making traditional warehousing solution prohibitively expensive • Map Reduce is low level and complex to write • Hive provides high level query language like SQL • This allows for ad-hoc analysis • Business need not know patterns to look for in advance Big Query - Hive
  8. 8. Atidan Case Study Customer data collection (KYC) using Hadoop • Business Need • A financial institution had to periodically collect customer data • Customers are very reluctant to provide updated data • This customer data has to be cross-checked against the billions of transactions they receive per day • They want to collate data that is available in public domain from known social media sites • The data had free text, it was unstructured and it also contained irrelevant data • Solution • A graph database is constructed over the extracted social data to analyze transactions • Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database • Aggregate customer information from existing sources, social media, government sources • Analyzed transaction to find hidden patterns • Enable link analysis, risk monitoring • Facilitate decision making(new products) and customer discovery
  9. 9. Atidan Case Study Customer data collection (KYC) using Hadoop Big Data Processing Graph Database Customer Clustering Income/Expense changes Corporate structure changes AML Peer group analysis Pattern Analysis Customer InformationWeb Social Channel Partners Utility Providers Aadhar UIDAI
  10. 10. • Lowers cost of follow-up with users • Reduces loses by highlighting risky users early • Graph database based AML • Insights into • New products • New customers • New loans to existing customers • New investment opportunities for customers • Reduces operational errors • Traceability of data source Advantages of Hadoop (KYC) Solution to Banks AML Graph Queries Due Diligence Risk Credit Scoring Mitigation Analysis Peer groups New Prospects Insights New Products New Customers
  11. 11. Atidan Case Study Email scanning and categorization using MongoDB Business Need Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s page for frontend access The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM Solution Atidan proposed a MongoDB-Drupal based solution with the following approach: • Scheduler was created to pull only headers from the all-user common webmail account • Stored them into the intermediate Catalog in MongoDB • Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered records and saved into the final Catalog in MongoDB • Emails from the final catalog pushed into the front end platform (Drupal) Key Takeaways • Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible • The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
  12. 12. Atidan Case Study Email scanning and categorization using MongoDB
  13. 13. • Node.js (data transformation) • MongoDB (database) • Schema-less • RESTFUL service to access data from the browser • Drupal (Frontend) • Basic unit of data storage and transfer was JSON object • Storage and querying • NoSQL/Simple/Schema-less database • Advantages • highly scalable, very flexible, simple • Connectivity • node.js  Server side Javascript Technologies used
  14. 14. Thank you!