Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data analytics & its Trends

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 41 Anuncio

Data analytics & its Trends

Descargar para leer sin conexión

This Presentation gives an insight into what is big data, data analytics, difference between big data and data science.And also salary trends in big data analytics.

This Presentation gives an insight into what is big data, data analytics, difference between big data and data science.And also salary trends in big data analytics.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Data analytics & its Trends (20)

Anuncio

Más reciente (20)

Data analytics & its Trends

  1. 1. Big Data Analytics & Trends Presentation by Dr.K.Sreenivasa Rao Dept. of CSE, VBIT
  2. 2. Content 1. What is Big data ? 2. Why Big data ? 3. Some Definitions. 4. Types of data-Structured, Unstructured & Semi structured 5. The data Landscape 6. Some other definitions 7. Characteristics of big data 8. Data generation Points 9. Big Data analytics 10.Example Scenario 11.Challenges of Big data 12.Hadoop, History & Complementary Packages 13.Difference between Big data & Data Science. 14.Salary Trends in Hadoop/Big Data
  3. 3. What is Big data? •Facebook generates 10TB daily •Twitter generates 7TB of data Daily •IBM claims 90% of today’s stored data was generated in just the last two years.
  4. 4. Why Big Data ? • Growth of Big Data is needed because of – Increase of storage capacities – Increase of processing power – Availability of data(different data types) – Every day we create 2.5 Million TB[quintillion bytes(1 Quintillionbyte= 1 Exabyte=1000Petabytes where 1 Petabyte=1000 TB)] of data; 90% of the data in the world today has been created in the last two years alone. • FB generates 10TB daily • Twitter generates 7TB of data Daily • IBM claims 90% of today’s stored data was generated in just the last two years.
  5. 5. Some Definitions • Big data is a "catch all" word, related to the power of using a lot of data to solve problems.. Big data is the data that is large enough and complex that it becomes difficult to process using a single computer... • Big data is simply the large sets of data that businesses and other parties put together to serve specific goals and operations. Big data can include many different kinds of data in many different kinds of formats.
  6. 6. Some Definitions • Big data is an evolving term that describes any voluminous amount of structured, semi structured and unstructured data that has the potential to be mined for information. [Ref: Strata + Hadoop World 2016: Hadoop and Spark in spotlight]
  7. 7. RDF-Resource Description Framework
  8. 8. Some Other Definitions • Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. • Big data is often characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe Terabytes, Petabytes and even Exabytes of data captured over time.
  9. 9. Characteristics of Big data Volume: (Data Quantity) • Twitter generates about 80 MB per second. • Facebook generates 10 TB data per day. • Black box data: Single flight generates nearly 10 TB of data per every ½ an hour. • Twitter generates of about 80 MB every second. Velocity: (Data Speed) ebay analyzes 5 million transactions per day. • Finally, velocity refers to the speed at which big data must be analyzed. Velocity is also meaningful, as big data analysis expands into fields like machine learning and artificial intelligence, where analytical processes mimic perception by finding and using patterns in the collected data. Variety: (Data Types) Bigdata includes data from e-commerce sites, health care data, education, stock exchange, banking etc….. Varying in Time: • [http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data]
  10. 10. • http://www.information-management.com/news/big-data-analytics/the-
  11. 11. Data generation Points Examples Mobile Devices Readers/Scanners Science facilities Microphones Cameras Social Media Programs/ Software
  12. 12. Big Data Analytics • Examining large amount of data • Appropriate information • Identification of hidden patterns, unknown correlations • Competitive advantage • Better business decisions: Strategic and Operational • Effective marketing, customer satisfaction, increased revenue
  13. 13. Example Scenario U need reading articles, Pictures & videos, links to facebook & twitter etc….
  14. 14. Pictures & reading articles
  15. 15. Watching Videos etc… still have no clarity….
  16. 16. Such bigdata is to be sorted, filtered & analyzed to produce useful information for decision making.
  17. 17. For haps facebook may help u better to identify best gym equipment for your office….. Finally Analytics gives us useful insight or information from big data.
  18. 18. Challenges of big data: • Problem: To read 1 TB data from a hard drive • Sol1: 1 machine of 4 I/O channels of 100 MBps • 1 TB=1024*1024 MB • 10,48,576 MB • =10, 485 Seconds • =174.75 Minutes by 1 i/o channel • =174.75/4 • =43.6 Minutes for by 4 i/o channels • Sol2: If 10 machines are used for reading it takes 43.6/10=4.36 minutes to read 1 TB data. • i.e to analyze big data, first we need to read it, today challenge is i/o speed but not storage capacity. • Challenge is to read/write data but not to store it. • Hadoop is framework to solve the above challenges.
  19. 19. Hadoop • Hadoop: is an open source java based programming framework that supports processing of large datasets in distributed computing environment. It is a part of apache project sponsored by Apache Software Foundation. • It is designed to answer the question “How to process big data with reasonable cost & time”. • Definition2: • Apache hadoop ia a framework for distributed processing of large datasets across clusters of commodity computers/hardware using simple programming model (mapReduce). • Commodity hardware is cheap & more in number rather than high cost high end, less number of servers or super/micro computers. • Who use hadoop ?: • Indian Aadar scheme is using hadoop. • Google has built a new version of distributed file system using hadoop to handle & analyze its data. • Yahoo • Facebook etc….
  20. 20. • History: • It was founded by yahoo in 2005. • It was handed over to Google in 2006. • Now it is Apache hadoop. • Some Public Cloud services that gives hadoop: • AWS Elastic MapReduce • Amazon EC2/S3 • Google Cloud DataProc
  21. 21. Hadoop Components: • 1.HDFS: (Hadoop Distributed File System) for storing data across thousands of servers to achieve high bandwidth. • 2.MapReduce: Provides programming model to handle large distributed processing –mapping data & reducing it to a result. • Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. 
  22. 22. Complementary software packages: • The term Hadoop has come to refer not just to the base modules above, but also to collection of additional software packages that can be installed on top of or alongside Hadoop, such as  • Apache Pig,  • Apache Hive,  • Apache HBase,  • Apache Phoenix,  • Apache Spark,  • Apache ZooKeeper,  • Cloudera Impala,  • Apache Flume,  • Apache Sqoop,  • Apache Oozie,  • Apache Storm. • HBase: An open source , non relational distributed database. • Hive: A datawarehouse that provides data summary • Pig: A high level platform that creates programs run on hadoop. • Apache Spark: A fast engine for bigdata processing capable of streaming & supporting SQL, machine learning, grapg processing. One survey says, 80 % of hadoop projects are going to mature in 2016 & people are looking towards apache spark for their next projects.
  23. 23. • Where processing is hosted? – Distributed Servers / Cloud (e.g. Amazon EC2) • Where data is stored? – Distributed Storage (e.g. Amazon S3) • What is the programming model? – Distributed Processing (e.g. MapReduce) • How data is stored & indexed? – High-performance schema-free databases (e.g. MongoDB) • What operations are performed on data? – Analytic / Semantic Processing Types of tools used in Big-Data
  24. 24. Difference between Big data & Data Science. • [http://www.kdnuggets.com/2015/07/data-science-big-data-different-beasts.html] • Creating artifact from the ore requires the tools, craftmanship and science. Same is the case of big data and data science, here we present the distinguishing factors between the ore and the artifact. • Data Science looks to create models that capture the underlying patterns of complex systems, and codify those models into working applications. Big Data looks to collect and manage large amounts of varied data to serve large-scale web applications and vast sensor networks. Although both offer the potential to produce value from data, the fundamental difference between Data Science and Big Data can be summarized in one statement: -Collecting Does Not Mean Discovering
  25. 25. Investments in data-focused activities center around tools instead of approaches. The engineering cart gets put before the scientific horse, leaving an organization with a big set of tools, and a small amount of knowledge on how to convert data into something useful. So, Data Science is expertise in converting data to an useful information/products that answer always-changing demands of the market.
  26. 26. Salary Trends for Bigdata/hadoop • Big Data Hadoop Salary Trends • 1.Average Big Data salaries have increased by 9.3% in the last 12 months. Current salary range is between $119,250 to $168,250. • 2.A Hadoop developer making $120,000 will be evaluated by competitor companies at $155,000. Thats a 29% hike. • 3.On average there is a new Big Data/Hadoop technology released every 6 weeks. So make sure you stay updated. • 4.The average salary for a Hadoop Developer in San Francisco, CA, is $139,000. • 5.A Senior Hadoop developer in San Francisco, CA can earn over $178,000 on an average. • 6.Hortonworks, Paxata, Bloomberg LP - are hiring top Big Data Hadoop talent for the highest pay package. • 7.The states with the most Hadoop Big Data jobs are California, New York, New Jersey and Texas. - duh that was obvious :)
  27. 27. So, make sure, you stay updated
  28. 28. Future of Big Data • $15 billion on software firms only specializing in data management and analytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017 • The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020.
  29. 29. • So, Data Science as a career goal will enrich employability of the graduate in future market. • Big data Market Forecast
  30. 30. References • www.Slideshare.com • www.wikipedia.com • www.computereducation.org • Strata + Hadoop World 2016: Hadoop and Spark in spotlight • http://searchcloudcomputing.techtarget.com/definition/bi g-data-Big-Data • http://www.information-management.com/news/big-data- analytics/the-top-5-trends-in-big-data-for-2017-10029956- 1.html • Books-  Big Data by Viktor Mayer-Schonberger

×