3. Problem Definition:
Big Data consists of huge modules, difficult, growing data sets with
numerous and , independent sources. With the fast development of
networking, storage of data, and the data gathering capacity, Big Data are
now quickly increasing in all science and engineering domains, as well as
animal, genetic and biomedical sciences. This paper elaborates a HACE
theorem that states the characteristics of the Big Data revolution, and
proposes a Big Data processing model from the data mining view.
4. Objective:
This requires carefully designed algorithms to analyze model correlations
between distributed sites, and fuse decisions from multiple sources to gain a best
model out of the Big Data. Developing a safe and sound information sharing
protocol is a major challenge.
To support Big Data mining, high-performance computing platforms are
required, which impose systematic designs to unleash the full power of the Big
Data. Big data as an emerging trend and the need for Big data mining is rising in
all science and engineering domains.
5. Literature Survey
Title/Year Keywords Concept/Abstract Author
“Data Mining With Big
Data,Jan 2014”
Big Data,data
Mining,Heterogeneity,Au
tonomous
sources,Complex,and
Evolving associations.
This paper presents a HACE
theorem that characterizes the
features of Big Data
revolutions,processing model
from data mining.
Xindong Wu, Fellow,
IEEE, Xingquan Zhu,
Senior Member, IEEE,
Gong-Qing Wu, and Wei
Ding
“The Survey of Data
Mining Applications
And Feature
Scope,,June 2012”
Data mining task, Data
mining life cycle ,
Visualization of the data
mining model , Data
mining Methods,s
Data mining applications.
This paper imparts more
number of applications of the
data mining and also o focuses
scope of the data mining which
will helpful in the further
research.
Neelamadhab Padhy1,
Dr. Pragnyaban Mishra 2,
and Rasmita Panigrahi3
“Review on Data
Mining with Big
Data..Dec 2014”
Big Data, data mining,
heterogeneity,
autonomous sources,
complex and evolving
associations.
This data-driven model involves
demand-driven aggregation of
information sources, mining and
analysis, security and privacy
considerations.
Savita Suryavanshi, Prof.
Bharati Kale.
“SURVEY ON BIG
DATA MINING
PLATFORMS,
ALGORITHMS AND
CHALLENGES.sep201
4”
big data, big data mining
platforms, big data
mining algorithms, big
data mining challenges,
data mining.
This paper gives A review on
various big data mining
platforms, algorithms and
challenges is also discussed in
this paper.
SHERIN A1, Dr S UMA2,
SARANYA K3, SARANYA
VANI M4.
7. Data Mining Algorithm
Decision tree induction classification algorithms
Evolutionary based classification algorithms
Partitioning based clustering algorithms
Hierarchical based clustering algorithms
Hierarchical based clustering algorithms
Hierarchical based clustering algorithms
Model based clustering algorithms
8. Existing System:
The rise of Big Data applications where data collection has grown tremendous
doubly and is beyond the ability of commonly used software tools to capture,
manage, and process within a “tolerable elapsed time.”
The most fundamental challenge for Big Data applications is to explore the large
volumes of data and extract useful information or knowledge for future actions.
In many situations, the knowledge extraction process has to be very efficient and
close to real time because storing all observed data is nearly infeasible.
The unprecedented data volumes require an effective data analysis and prediction
platform to achieve fast response and real-time classification for such Big Data.
9. In model level it will produce local pattern. This pattern will be produced after
mined local data.
By sharing these local patterns with other local sites, we can produce a single
global pattern.
At the knowledge level, model correlation analysis investigates the relevance
between models generated from various data sources to determine how related
the data sources are correlated to each other, and how to form accurate decisions
based on models built from autonomous sources
Continue…
10. Big Data
Big Data is a comprehensive term for any collection of data sets so large and multifarious
that it becomes difficult to process them using conventional data processing applications.
There are two types of Big Data: structured and unstructured.
Structured data
Structured data are numbers and words that can be easily categorized and analyzed.
These data are generated by things like network sensors embedded in electronic
devices, smart phones, and global positioning system (GPS) devices. Structured data
also include things like sales figures, account balances, and transaction data.
Unstructured data
Unstructured data include more multifarious information, such as customer reviews
from feasible websites, photos and other multimedia, and comments on social
networking sites. These data can not be separated into categorized or analyzed
numerically.
11. Big Data Characteristic(HACE Theorem)
Figure . The blind men and the enormous elephant: the restricted view
of each blind man leads to a biased conclusion.
12. HACE theorem suggests that the key characteristics of the
Big Data are:
A. Huge with various and miscellaneous data sources
B. Autonomous Sources with circulated & disperse Control
C. Complex and Evolving associations
13. Applications of Data Mining
Marketing
Analysis of consumer behaviour
Advertising campaigns
Targeted mailings
Segmentation of customers, stores, or products
Finance
Creditworthiness of clients
Performance analysis of finance investments
Fraud detection
Manufacturing
Optimization of resources
Optimization of manufacturing processes
Product design based on customer requirements
Health Care
Discovering patterns in X-ray images
Analyzing side effects of drugs
Effectiveness of treatments
14. Big Data Mining Algorithm
Big data applications have so many sources to gather information.
If we want to mine data, we need to gather all distributed data to the
centralized site.But it is prohibited because of high data transmission cost
and privacy concerns.
Most of the mining levels order to achieve the pattern of correlations, or
patterns can be discovered from combined variety of sources.
The global data mining is done through two steps process.
Model level
Knowledge level.
Each and every local sites use local data to calculate the data statistics
and it share this information in order to achieve global data distribution in
their data level.
15. Data Mining Challenges With Big Data
Fig. a conceptual view of the Big Data processing framework
16. DISADVANTAGES OF EXISTING
SYSTEM
To explore Big Data, we have analysed several challenges at the
data, model, and system levels.
The challenges at Tier I focus on data accessing and arithmetic
computing procedures. Because Big Data are often stored at
different locations and data volumes may continuously grow, an
effective computing platform will have to take distributed large-
scale data storage into consideration for computing.
17. PROPOSED SYSTEM
We propose a HACE theorem to model Big Data characteristics. The
characteristics of HACH make it an extreme challenge for
discovering useful knowledge from the Big Data.
18. ADVANTAGES OF PROPOSED SYSTEM
Provide most relevant and most accurate social sensing feedback to
better understand our society at real time.
19. ADVANTAGES OF PROPOSED SYSTEM
Provide most relevant and most accurate social sensing feedback to
better understand our society at real time.
21. Volume- The quantity of data
Variety - categorizing the data
Velocity- speed of generation of data or the speed
of processing the data
Variability- Inconsistency
Complexity- Managing the data
Continue…
22. BIG Data Mining Tools
Hadoop
Apache S4
Strom
Apache Mahout
MOA
24. Conclusion:
Because of Increase in the amount of data in the field of genomics,
meteorology, biology, environmental research, it becomes difficult to handle
the data, to find Associations, patterns and to analyze the large data sets.
As an organization collects more data at this scale, formalizing the process of
big data analysis will become paramount.The paper describes methods for
different algorithms used to handle such large data sets. And it gives an
overview of architecture and algorithms used in large data sets.
25. References
McKinsy Global Institute, Big Data: The next frontier for
innovation, competition and productivity- May 2011
Xindong Wu, Xinguan Zhu, Gong-Qing Wu, Wei Ding, 2013,
Data Mining with Big Data
Ahmed and Karypis 2012, Rezwan Ahmed, George Karpis,
Algorithms for mining the evolution of conserved relational states in
dynamic network
IEEE, Data Mining with Big Data, January 2014
Oracle, June 2013,Unstructured Data Management with Oracle
Database 12c