Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

QuerySurge Slide Deck for Big Data Testing Webinar

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 39 Anuncio

QuerySurge Slide Deck for Big Data Testing Webinar

Descargar para leer sin conexión

This is a slide deck from QuerySurge's Big Data Testing webinar.

Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com

The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.

Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.

This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors

You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)

This is a slide deck from QuerySurge's Big Data Testing webinar.

Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com

The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.

Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.

This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors

You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a QuerySurge Slide Deck for Big Data Testing Webinar (20)

Anuncio

Más reciente (20)

Anuncio

QuerySurge Slide Deck for Big Data Testing Webinar

  1. 1. Bill Hayduk CEO, RTTS Business Leader, QuerySurge (the software division of RTTS) Testing Big Data: Automated ETL Testing of Hadoop and NoSQL Jeff Bocarsly, Ph.D. Chief Architect QuerySurge Division, RTTS
  2. 2. built by QuerySurge™ • About Big Data and Hadoop • About NoSQL • Hadoop and DWH Use Case • How to test Big Data • Demo of QuerySurge w/ Hadoop and NoSQL AGENDA Testing Big Data: Automated ETL Testing of Hadoop and NoSQL Host: RTTS/QuerySurge Date: July 30, 2022 Time: 1:00 pm, Eastern Standard Time (New York, GMT-05:00) Session number: 630 771 732
  3. 3. FACTS Founded: 1996 Headquarters: New York Customers: 700+ Strategic Partners: See logos Enterprise Software: QuerySurge Launched: 2012 Customers: 170+ in 30 countries RTTS is the leading provider of software & data quality for critical business systems About Technology Partners
  4. 4. Regional Consulting firms Technology Partners Global System Integrators Argentina, Australia, Belgium, Brazil, Canada, Chile, India, Malaysia, Netherlands, New Zealand, Norway, Sweden, Singapore, South Africa, Ukraine, US
  5. 5. Data Warehouse Data Warehouse ETL ETL Mainframe Business Intelligence & Analytics C-level executives are using BI & Analytics to make critical business decisions with the assumption that the underlying data is fine We know it is not ETL Typical data issue areas
  6. 6. Big data – defined as too much volume, velocity and variety to work on normal database architectures. Size Defined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes 1,000,000 gigabytes = 1,000,000,000 megabytes built by built by QuerySurge™
  7. 7. Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others built by QuerySurge™
  8. 8. Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Technologies include: • massively parallel processing (MPP) databases • data warehouses • Data mining grids • distributed file systems • distributed databases • cloud computing platforms • the Internet, and • scalable storage system built by QuerySurge™
  9. 9. built by QuerySurge™ • easily deals with complexities of high of data Hadoop is an open-source project that develops software for scalable, distributed computing. • is a of large data sets across clusters of computers using simple programming models. from single servers to 1,000’s of machines, each offering local computation and storage. • detects and at the application layer
  10. 10. built by QuerySurge™ • Redundant and reliable • Extremely powerful • Easy to program distributed apps • Runs on commodity hardware
  11. 11. built by QuerySurge™ “Spending on Hadoop software and subscriptions will increase to approximately $677 million, with overall big data market anticipated to reach the $50 billion mark.” - Wikibon
  12. 12. built by QuerySurge™ MapReduce (Task Tracker) HDFS (Data Node) MapReduce – processing part that manages the programming jobs. (a.k.a. Task Tracker) HDFS (Hadoop Distributed File System) – stores data on the machines. (a.k.a. Data Node) machine
  13. 13. built by QuerySurge™ Cluster Add more machines for scaling – from 1 to 100 to 1,000 Job Tracker accepts jobs, assigns tasks, identifies failed machines Name Node Coordination for HDFS. Inserts and extraction are communicated through the Name Node. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Name Node
  14. 14. built by QuerySurge™ MapReduce (Task Tracker) HDFS (Data Node) HiveQL HiveQL HiveQL HiveQL HiveQL Apache Hive - a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive provides a mechanism to query the data using a SQL-like language called HiveQL that interacts with the HDFS files • create • insert • update • delete • select
  15. 15. What is NoSQL? A term used to describe high-performance, non-relational databases that provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases NoSQL Database Types Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. a software division of QuerySurge™
  16. 16. built by ™ Source: MongoDB, Inc. Data Warehouse Batch Aggregation ETL from MongoDB ETL to MongoDB
  17. 17. built by QuerySurge™
  18. 18. built by ™ • Online real-time processing • Data set is smaller • Measured in milliseconds • Offline big data processing • Offline analytics • Measured in minutes & hours Source: classpattern.com When to use NoSQL? / When to use Hadoop?
  19. 19. built by QuerySurge™
  20. 20. built by QuerySurge™ Data Warehouse Hadoop NoSQL Hadoop Data Warehouse
  21. 21. built by QuerySurge™ USE CASE 1*** Use Hadoop as a landing zone for big data & raw data 1) bring all raw, big data into Hadoop 2) perform some pre-processing of this data 3) determine which data goes to Data Warehouse 4) Extract, transform and load (ETL) pertinent data into Data Warehouse ***Source: Vijay Ramaiah, IBM product manager, datanami magazine, June 10, 2013 built by QuerySurge™
  22. 22. Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point built by Business Intelligence software ETL Source Data Source Hadoop ETL Process Target DWH built by QuerySurge™ test entry point test entry points
  23. 23. Relational DB & Data Warehousing Source Data @ BI, Analytics & Reporting Ingestion built by ™ test entry point test entry point test entry point test entry point test entry point
  24. 24. built by QuerySurge™ - we need to verify more data and to do it faster - we need to automate the testing effort - We need to be able to test across different platforms We need a testing tool!
  25. 25. built by QuerySurge™ built by
  26. 26. built by QuerySurge™ QuerySurge is the smart Data Testing solution that automates the data validation and ETL testing of Big Data with full DevOps functionality for continuous testing built by
  27. 27. a software division of QuerySurge™ Data Quality at Speed → Automate the launch, execution, comparison & auto-email results Test across different platforms → Data Warehouse, Hadoop, NoSQL, DB, flat files, XML, JSON, BI Reports Smart Query Wizards - no coding needed → Query Wizards create tests visually, without writing SQL Data Analytics & Data Intelligence → Data Analytics Dashboard, Data Intelligence Reports, emailed results, Ready-for-Analytics back-end data access Create Custom Tests → Modularize functions with snippets, set thresholds, stage data, check data types DevOps for Data & Continuous Testing → API Integration with Build/Release, Continuous Integration/ETL , Operations/DevOps Monitoring, Test Management/Issue Tracking, more Projects → Multi-project support, global admin user, activity log reports
  28. 28. Web-based… Supported OS... Connects through… …to any JDBC compliant data source QuerySurge™ QuerySurge Controller QuerySurge Server DB Server (MySQL) App Server (Tomcat) QuerySurge Agents (Ships with 10 Agents) a software division of Installs... …in the Cloud …on a VM …on a Bare Metal Server
  29. 29. Design Library Scheduler Query Wizards a software division of QuerySurge™ Data Intelligence Reports Run-Time Dashboard DevOps for Data Data Analytics Dashboard Projects
  30. 30. QuerySurge™ a software division of Multi-Project Support Multiple projects can now be created in a single QuerySurge instance. This allows for multiple groups to work on the same QuerySurge server without seeing each other’s assets (project-level security). Features supported in Multi-Projects are: • Global Admin User: This new user type administers the QuerySurge instance across multiple projects. • Assign Users to Projects: Users can be assigned to one or more projects. In each assignment, a user can have a different project role (administrator, standard user or participant user). • Assign Agents to Projects: Agents can be shared across projects or dedicated to specific projects. • Project Import: Import project data into another project on the same instance or into a different environment (Dev/QA/Prod). • Project Export: Export entire projects and store for backup purposes. • Activity Log Reports: Two reports that track specific changes for auditing purposes, including manipulations to users or connections.
  31. 31. Fast and Easy. No programming needed. QuerySurge™ • Perform 80% of all data tests with no SQL coding • Opens up testing to novices & non-technical members • Speeds up testing for skilled coders • provides a huge Return-On-Investment a software division of
  32. 32. QuerySurge™ a software division of
  33. 33. Design Library • Create custom Query Pairs (source & target SQLs for tests that have transformations) Scheduling  Build groups of Query Pairs  Schedule Test Runs • Run immediately • Run at set date/time • Have event kick it off ™ a software division of
  34. 34. Deep-Dive Reporting  Examine and automatically email test results Run Dashboard  View real-time execution  Analyze real-time results ™ a software division of
  35. 35. a software division of QuerySurge™ QuerySurge DevOps for Data • First full DevOps for Data testing solution • Both RESTful and command line APIs • Improves Data Quality at Speed QuerySurge DevOps for Data integrates with: • Continuous integration/ETL solutions • Automated build/release/deployment solutions • Operations and DevOps monitoring solutions • Test management/issue tracking solutions • Scheduling and workload automation solutions 60+ API calls with almost 100 different properties that users can utilize to retrieve, edit, update, or delete information.
  36. 36. QuerySurge™ • view data reliability & pass rate • add, move, filter, zoom-in on any data widget & underlying data • verify build success or failure a software division of
  37. 37. Large Suite March 5, 2021 16:20:44 March 5, 2021 March 5, 2021 4:24 PM Start Time QuerySurge™ 6 minutes
  38. 38. (1) Trial in the Cloud of QuerySurgeTM, including self-learning tutorial that works with sample data for 3 days (2) Downloaded Trial of QuerySurgeTM, including self-learning tutorial with sample data or your data for 15 days for more information on our Trials, please visit: www.querysurge.com/compare-trial-options TRIAL IN THE CLOUD built by QuerySurge™ http://www.rttsweb.com/training/courses/big-data-testing-courses Big Data Testing Courses Filled with examples and labs, this hands-on training teaches concepts and HQL techniques used in Big Data testing. For more information on our Big Data Testing classes, please visit:
  39. 39. built by built by QuerySurge™ To see the video of our Big Data testing webinar please visit: http://www.querysurge.com/solutions/testing-big-data/big-data-testing-for-hadoop Big Data is on the verge of revolutionizing enterprise data management architectures. - DeZyre

×