Más contenido relacionado


Data Mining With Big Data

  1. Submitted By Supervise By
  2. Problem Definition Purpose What is …. Challenges with data Big data algorithms  How To Produce The Big Data Big Data Characteristics Applications of Data Mining FILD OF BIG DATA Variety (Complexity) Real-time/Fast Data Real-Time Analytics/Decision Requirement A Single View to the Customer What’s driving Big Data Benefits
  3. Big Data consists of huge modules, difficult, growing data sets with numerous and , independent sources. With the fast development of networking, storage of data, and the data gathering capacity, Big Data are now quickly increasing in all science and engineering domains, as well as animal, genetic and biomedical sciences. This paper elaborates a HACE theorem that states the characteristics of the Big Data revolution, and proposes a Big Data processing model from the data mining view.
  4. This requires carefully designed algorithms to analyze model correlations between distributed sites, and fuse decisions from multiple sources to gain a best model out of the Big Data. Developing a safe and sound information sharing protocol is a major challenge. To support Big Data mining, high- performance computing platforms are required, which impose systematic designs to unleash the full power of the Big Data. Big data as an emerging trend and the need for Big data mining is rising in all science and engineering domains.
  5. What is …… ? Data Mining  computational process of discovering patterns in large data sets Big Data  Big data is the data characterized by 3 attributes: volume, variety and velocity.”  it is the term for a collection of data sets so large and complex that it becomes difficult to process  data has exponential growth, both structured and unstructured Data: data is any set of characters that has been gathered and translated for some purpose, usually analysis. It can be any character, including text and numbers, pictures, sound, or video. If data is not put into context, it doesn't do anything to a human or computer.
  6. How much Data does exist? • 2.5 quintillion bytes of data are created EVERY DAY • IBM: 90 percent of the data in the world today were produced with past two years • Forms of Data????
  7. Data Mining Challenges with Big Data • Big Data Mining Platform • Dig Data Semantics and Application Knowledge I. Information Sharing and Data Privacy II. Domain and Application Knowledge • Big Data Mining Algorithm I. Local Learning and Model Fusion for Multiple Information Sources II. mining from Sparse, Uncertain, and Incomplete Data III. Mining Complex and Dynamic Data
  8. Data Mining Algorithm  Decision tree induction classification algorithms  Evolutionary based classification algorithms  Partitioning based clustering algorithms Hierarchical  based clustering algorithms Hierarchical based  clustering algorithms Hierarchical based clustering algorithms  Model based clustering algorithms
  9. How To Produce The Big Data Big Data Types Enterprise Data Transactions Public Data Social Media Sensor Data
  10. Big Data Characteristics  Data has grown tremendously.  Big Data starts with large-volume, heterogeneous, autonomous sources with distributed and decentralized system 11
  11. Applications of Data Mining  Marketing  Analysis of consumer behavior  Advertising campaigns  Targeted mailings Finance o Creditworthiness of clients o Performance analysis of finance investments Manufacturing o Optimization of resources o Optimization of manufacturing processes
  12. Variety (Complexity)  Relational Data (Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data  Social Network, Semantic Web (RDF), …  Streaming Data  You can only scan the data once  A single application can be generating/collecting many types of data  Big Public Data (online, weather, finance, etc) 15 To extract knowledge all these types of data need to linked together
  13. Real-time/Fast Data  The progress and innovation is no longer hindered by the ability to collect data  But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 16 Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  14. Real-Time Analytics/Decision Requirement Customer Influence Behavior Product Recommendations that are Relevant & Compelling Friend Invitations to join a Game or Activity that expands business Preventing Fraud as it is Occurring & preventing more proactively Learning why Customers Switch to competitors and their offers; in time to Counter Improving the Marketing Effectiveness of a Promotion while it is still in Play
  15. A Single View to the Customer Customer Social Media Gamin g Entertain Bankin g Financ e Our Known Histor y Purchas e
  16. 5 Vs of Big Data Volume • Data quantity Velocity • Data Speed Variety • Data Types Veracity • Authenticity Value • Statistical • Events
  17. What’s driving Big Data 20 - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time
  18. Benefits  Cost & management  Economies of scale, “out-sourced” resource management  Reduced Time to deployment  Ease of assembly, works “out of the box”  Scaling  On demand provisioning, co-locate data and compute  Reliability  Massive, redundant, shared resources  Sustainability  Hardware not owned

Notas del editor

  1. IBM