SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1861 | Page
B. Susrutha1
, J. Vamsi Nath2
, T. Bharath Manohar3
, I. Shalini4
1,4
M. Tech 2ndyr,Dept of CSE,PBRVITS(Affiliated to JNTU Kakinada), Kavali,Nellore.Andhra Pradesh.India.
2
Associate Professor, Dept of CSE, PBRVITS(Affiliated to JNTU Kakinada), Kavali, Nellore, Andhra Pradesh. India.
3
Asst. Professor,Dept of CSE, CMR College of Engineering &Technology,(Affiliated to JNTU Hyderabad) Hyderabad.
Andhra Pradesh. India.
Abstract: Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring
many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare
data sets because they return one column per aggregated group. In general, a significant manual effort is required to build
data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return
aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new
class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized
layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data
mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the
programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using the
PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation
methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general,
the CASE and PIVOT methods exhibit linear scalability, where as the SPJ method does not.
Keywords: Aggregation, Data Preparation, Pivoting, SQL.
I. INTRODUCTION
In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data
set that can be used as input for a data mining or statistical algorithm. Most algorithms require as input a data set with a
horizontal layout, with several Records and one variable or dimension per column. That is the case with models like
clustering, classification, regression and PCA; consult. Each research discipline uses different terminology to describe the
data set. In data mining the common terms are point-dimension. Statistics literature generally uses observation-variable.
Machine learning research uses instance-feature. This article introduces a new class of aggregate functions that can be used
to build data sets in a horizontal layout (denormalized with aggregations), automating SQL query writing and extending SQL
capabilities. We show evaluating horizontal aggregations is a challenging and interesting problem and we introduced
alternative methods and optimizations for their efficient evaluation.
II. MOTIVATION
As mentioned above, building a suitable data set for data mining purposes is a time- consuming task. This task
generally requires writing long SQL statements or customizing SQL Code if it is automatically generated by some tool.
There are two main ingredients in such SQL code: joins and aggregations; we focus on the second one. The most widely-
known aggregation is the sum of a column over groups of rows. Some other aggregations return the average, maximum,
minimum or row count over groups of rows. There exist many aggregations functions and operators in SQL. Unfortunately,
all these aggregations have limitations to build data sets for data mining purposes.
The main reason is that, in general, data sets that are stored in a relational database (or a data warehouse) come
from On-Line Transaction Processing (OLTP) systems where database schemas are highly normalized. But data mining,
statistical or machine learning algorithms generally require aggregated data in summarized form. Based on current available
functions and clauses in SQL, a significant effort is required to compute aggregations when they are desired in a cross
tabular (Horizontal) form, suitable to be used by a data mining algorithm. Such effort is due to the amount and complexity of
SQL code that needs to be written, optimized and tested. There are further practical reasons to return aggregation results in a
horizontal (cross-tabular) layout. Standard aggregations are hard to interpret when there are many result rows, especially
when grouping attributes have high cardinalities. To perform analysis of exported tables into spreadsheets it may be more
convenient to have aggregations on the same group in one row (e.g. to produce graphs or to compare data sets with repetitive
information).
OLAP tools generate SQL code to transpose results (sometimes called PIVOT). Transposition can be more efficient
if there are mechanisms combining aggregation and transposition together. With such limitations in mind, we propose a new
class of aggregate functions that aggregate numeric expressions and transpose results to produce a data set with a horizontal
layout. Functions belonging to this class are called horizontal aggregations. Horizontal aggregations represent an extended
form of traditional SQL aggregations, which return a set of values in a horizontal layout (somewhat similar to a
multidimensional vector), instead of a single value per row. This article explains how to evaluate and optimize horizontal
aggregations generating standard SQL code.
Hortizontal Aggregation in SQL for Data Mining Analysis to
Prepare Data Sets
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1862 | Page
III. LITERATURE SURVEY
We have to analysis the DATA MINING Outline Survey:
Data Mining
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from
different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts
costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data
from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
The Scope of Data Mining
Data mining derives its name from the similarities between searching for valuable business information in a
large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a
vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it
to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate
new business opportunities by providing these capabilities:
 Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in
large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the
data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past
promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other
predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population
likely to respond similarly to given events.
 Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify
previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify
seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting
fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.
The most commonly used techniques in data mining are:
 Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural
networks in structure.
 Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the
classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi
Square Automatic Interaction Detection (CHAID) .
 Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural
selection in a design based on the concepts of evolution.
 Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of
the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.
 Rule induction: The extraction of useful if-then rules from data based on statistical significance.
An Architecture for Data Mining
To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible
interactive business analysis tools. Many data mining tools currently operate outside of the warehouse, requiring extra steps
for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation,
integration with the warehouse simplifies the application of results from data mining. The resulting analytic data warehouse
can be applied to improve business processes throughout the organization, in areas such as promotional campaign
management, fraud detection, new product rollout, and so on.
Figure 1: Integrated Data Mining Architecture
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1863 | Page
The Ideal starting point is a data warehouse containing a combination of internal data tracking all customer contact
coupled with external market data about competitor activity. Background information on potential customers also provides
an excellent basis for prospecting. This warehouse can be implemented in a variety of relational database systems: Sybase,
Oracle, Redbrick, and so on, and should be optimized for flexible and fast data access.
Data Mining Products:
Data mining products are taking the industry by storm. The major database vendors have already taken steps to
ensure that their platforms incorporate data mining techniques. Oracle's Data Mining Suite (Darwin) implements
classification and regression trees, neural networks, k-nearest neighbors, regression analysis and clustering algorithms.
Microsoft's SQL Server also offers data mining functionality through the use of classification trees and clustering algorithms.
If you're already working in a statistics environment, you're probably familiar with the data mining algorithm
implementations offered by the advanced statistical packages SPSS, SAS, and S-Plus.
Conclusion
Comprehensive data warehouses that integrate operational data with customer, supplier, and market information
have resulted in an explosion of information. Competition requires timely and sophisticated analysis on an integrated view of
the data. However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to
effectively analyze and act on the information they contain. Both relational and OLAP technologies have tremendous
capabilities for navigating massive data warehouses, but brute force navigation of data is not enough. A new technological
leap is needed to structure and prioritize information for specific end-user problems. The data mining tools can make this
leap. Quantifiable business benefits have been proven through the integration of data mining with current information
systems, and new products are on the horizon that will bring this integration to an even wider audience of users.
IV. SYSTEM ANALYSIS
Existing System:
An existing to preparing a data set for analysis is generally the most time consuming task in a data mining project,
requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations
to prepare data sets because they return one column per aggregated group.
Disadvantage:
1)Existing SQL aggregations have limitations to prepare data sets.
2)To return one column per aggregated group
Previous Process Flow:
Fig: Previous Process Flow
Proposed System:
Our proposed horizontal aggregations provide several unique features and advantages. First, they represent a
template to generate SQL code from a data mining tool. Such SQL code automates writing SQL queries, optimizing them
and testing them for correctness.
Advantage:
1) The SQL code reduces manual work in the data preparation phase in a data mining project.
2) The SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user.
3) The data sets can be created in less time.
4) The data set can be created entirely inside the DBMS.
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1864 | Page
Proposed Process Flow:
Fig: Proposed Process Flow
Modules Description:
1. Admin Module
2. User Module
3. View Module
4. Download Module
Module 1 : Admin Module
Admin will upload new connection form based on regulations in various states. Admin will be able upload various
details regarding user bills like a new connection to a new user, amount paid or payable by user. In case of payment various
details regarding payment will be entered and separate username and password will be provided to users in large.
Module 2 : User Module
User will be able to view his bill details on any date may be after a month or after months or years and also he can
to view the our bill details in a various ways for instance, The year wise bills, Month wise bills, totally paid to bill in EB.
This will reduce the cost of transaction. If user thinks that his password is insecure, he has option to change it. He also can
view the registration details and allowed to change or edit and save it.
Module 3 : View Module
Admin has three ways to view the user bill details, the 3 ways are
i) SPJ
ii) PIVOT
iii) CASE
i) SPJ : While using SPJ the viewing and processing time of user bills is reduced.
ii) PIVOT : This is used to draw the user details in a customized table. This table will elaborate us on the various bill
details regarding the user on monthly basis.
iii) CASE : using CASE query we can customize the present table and column based on the conditions. This will help
us to reduce enormous amount of space used by various user bill details. It can be viewed in two difference
ways namely Horizontal and Vertical.
In case of vertical the number of rows will be reduced to such an extent it is needed and column will remain the same on
other hand the Horizontal will reduce rows as same as vertical and will also increase the columnar format.
Module 4 : Download Module
User will be able to download the various details regarding bills. If he/she is a new user, he/she can download the
new connection form, subscription details etc. then he/she can download his /her previous bill details in hands so as to ensure
it.
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1865 | Page
V. SYSTEM DESIGN & ARCHITECTURE DIAGRAMS
Data Flow Diagram / Use Case Diagram / Flow Diagram
The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in
terms of the input data to the system, various processing carried out on these data, and the output data is generated by the
system.
Class Diagram:
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1866 | Page
Data Flow Diagram:
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1867 | Page
Activity Diagram:
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1868 | Page
Use case Diagram:
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1869 | Page
Sequence Diagram:
VI. EXPERIMENT & RESULT
We performed several experiments to test the proposed algorithm and evaluate its performance against different
attacks and experienced various results as follows:
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1870 | Page
Fig: Horizontal Aggregations in SQL
Fig: Horizontal view results
Fig: Results obtained processing the records in Database
International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645
www.ijmer.com 1871 | Page
VII. CONCLUSION
We introduced a new class of extended aggregate functions, called horizontal aggregations which help preparing
data sets for data mining and OLAP cube exploration. Specifically, horizontal aggregations are useful to create data sets with
a horizontal layout, as commonly required by data mining algorithms and OLAP cross-tabulation. Basically, a horizontal
aggregation returns a set of numbers instead of a single number for each group, resembling a multi-dimensional vector.
We proposed an abstract, but minimal, extension to SQL standard aggregate functions to compute horizontal
aggregations which just requires specifying subgrouping columns inside the aggregation function call. From a query
optimization perspective, we proposed three query evaluation methods. The first one (SPJ) relies on standard relational
operators. The second one (CASE) relies on the SQL CASE construct. The third (PIVOT) uses a built-in operator in a
commercial DBMS that is not widely available. The SPJ method is important from a theoretical point of view because it is
based on select, project and join (SPJ) queries. The CASE method is our most important contribution. It is in general the
most efficient evaluation method and it has wide applicability since it can be programmed combining GROUP-BY and
CASE statements. We proved the three methods produce the same result.
We have explained it is not possible to evaluate horizontal aggregations using standard SQL without either joins or
”case” constructs using standard SQL operators. Our proposed horizontal aggregations can be used as a database method to
automatically generate efficient SQL queries with three sets of parameters: grouping columns, subgrouping columns and
aggregated column. The fact that the output horizontal columns are not available when the query is parsed (when the query
plan is explored and chosen) makes its evaluation through standard SQL mechanisms infeasible. Our experiments with large
tables show our proposed horizontal aggregations evaluated with the CASE method have similar performance to the built-in
PIVOT operator. We believe this is remarkable since our proposal is based on generating SQL code and not on internally
modifying the query optimizer. Both CASE and PIVOT evaluation methods are significantly faster than the SPJ method.
Precomputing a cube on selected dimensions produced acceleration on all methods.
ACKNOWLEDGMENT
I would like to express my sincere thanks to my Guide and my Co-Authors for their consistence support and valuable
suggestions.
REFERENCES
[1] G. Bhargava, P. Goel, and B.R. Iyer. Hypergraph based reordering of outer join queries with complex predicates. In ACM
SIGMOD Conference, pages 304–315, 1995.
[2] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman. .NET database programmability and extensibility in
Microsoft SQL Server. In Proc. ACM SIGMOD Conference, pages 1087–1098, 2008.
[3] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM
KDD Conference, pages 425–429, 1999.
[4] E.F. Codd. Extending the database relational model to capture more meaning. ACM TODS, 4(4):397–434, 1979.
[5] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an
RDBMS. In Proc. VLDB Conference, pages 998–1009, 2004.
[6] C. Galindo-Legaria and A. Rosenthal. Outer join simplification and reordering for query optimization. ACM TODS, 22(1):43–73,
1997.
[7] H. Garcia-Molina, J.D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 1st edition, 2001.
[8] G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL
databases. In Proc. ACM KDD Conference, pages 204–208, 1998.
[9] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab
and subtotal. In ICDE Conference, pages 152–159, 1996.
[10] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 1st edition, 2001.
[11] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke. Locking protocols for materialized aggregate join views. IEEE Transactions
on Knowledge and Data Engineering (TKDE), 17(6):796–807, 2005.
[12] C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge
Discovery Workshop, pages 35–42, 2004.
[13] C. Ordonez. Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22, 2010.
[14] C. Ordonez. Data set preprocessing and transformation in a database system. Intelligent Data Analysis (IDA), 15(4), 2011.
[15] C. Ordonez and S. Pitchaimalai.Bayesian classifiers programmed in SQL. IEEE Transactions on Knowledge and Data Engineering
(TKDE), 22(1):139–144, 2010.

Más contenido relacionado

La actualidad más candente

IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
A Survey on Automatically Mining Facets for Queries from their Search Results
A Survey on Automatically Mining Facets for Queries from their Search ResultsA Survey on Automatically Mining Facets for Queries from their Search Results
A Survey on Automatically Mining Facets for Queries from their Search ResultsIRJET Journal
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningUsage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningIOSR Journals
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniquesIJDKP
 
Variance rover system
Variance rover systemVariance rover system
Variance rover systemeSAT Journals
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using dataeSAT Publishing House
 
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyushastronish
 

La actualidad más candente (19)

F04463437
F04463437F04463437
F04463437
 
Bo4301369372
Bo4301369372Bo4301369372
Bo4301369372
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
A Survey on Automatically Mining Facets for Queries from their Search Results
A Survey on Automatically Mining Facets for Queries from their Search ResultsA Survey on Automatically Mining Facets for Queries from their Search Results
A Survey on Automatically Mining Facets for Queries from their Search Results
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningUsage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Infos2014
Infos2014Infos2014
Infos2014
 
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 

Destacado

Credit Card Fraud Detection System: A Survey
Credit Card Fraud Detection System: A SurveyCredit Card Fraud Detection System: A Survey
Credit Card Fraud Detection System: A SurveyIJMER
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigmMigrant Systems
 
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD... A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...Nexgen Technology
 
FRAUD DETECTION IN ONLINE AUCTIONING
FRAUD DETECTION IN ONLINE AUCTIONINGFRAUD DETECTION IN ONLINE AUCTIONING
FRAUD DETECTION IN ONLINE AUCTIONINGSatish Chandra
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
UML- Unified Modeling Language
UML- Unified Modeling LanguageUML- Unified Modeling Language
UML- Unified Modeling LanguageShahzad
 
Event Management System Document
Event Management System Document Event Management System Document
Event Management System Document LJ PROJECTS
 
Bj32809815
Bj32809815Bj32809815
Bj32809815IJMER
 
Bo2422422286
Bo2422422286Bo2422422286
Bo2422422286IJMER
 
Bv32891897
Bv32891897Bv32891897
Bv32891897IJMER
 
Strehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterStrehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterIJMER
 
Dx2645324539
Dx2645324539Dx2645324539
Dx2645324539IJMER
 
Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques IJMER
 
Optical and Impedance Spectroscopy Study of ZnS Nanoparticles
Optical and Impedance Spectroscopy Study of ZnS NanoparticlesOptical and Impedance Spectroscopy Study of ZnS Nanoparticles
Optical and Impedance Spectroscopy Study of ZnS NanoparticlesIJMER
 
Ck32985989
Ck32985989Ck32985989
Ck32985989IJMER
 
Modeling and Reduction of Root Fillet Stress in Spur Gear Using Stress Relie...
Modeling and Reduction of Root Fillet Stress in Spur Gear Using  Stress Relie...Modeling and Reduction of Root Fillet Stress in Spur Gear Using  Stress Relie...
Modeling and Reduction of Root Fillet Stress in Spur Gear Using Stress Relie...IJMER
 
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...IJMER
 
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...IJMER
 

Destacado (20)

Credit Card Fraud Detection System: A Survey
Credit Card Fraud Detection System: A SurveyCredit Card Fraud Detection System: A Survey
Credit Card Fraud Detection System: A Survey
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigm
 
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD... A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 
FRAUD DETECTION IN ONLINE AUCTIONING
FRAUD DETECTION IN ONLINE AUCTIONINGFRAUD DETECTION IN ONLINE AUCTIONING
FRAUD DETECTION IN ONLINE AUCTIONING
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
UML- Unified Modeling Language
UML- Unified Modeling LanguageUML- Unified Modeling Language
UML- Unified Modeling Language
 
Event Management System Document
Event Management System Document Event Management System Document
Event Management System Document
 
Uml class Diagram
Uml class DiagramUml class Diagram
Uml class Diagram
 
Bj32809815
Bj32809815Bj32809815
Bj32809815
 
Bo2422422286
Bo2422422286Bo2422422286
Bo2422422286
 
Bv32891897
Bv32891897Bv32891897
Bv32891897
 
Ppt eleni
Ppt eleniPpt eleni
Ppt eleni
 
Strehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterStrehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic Filter
 
Dx2645324539
Dx2645324539Dx2645324539
Dx2645324539
 
Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques
 
Optical and Impedance Spectroscopy Study of ZnS Nanoparticles
Optical and Impedance Spectroscopy Study of ZnS NanoparticlesOptical and Impedance Spectroscopy Study of ZnS Nanoparticles
Optical and Impedance Spectroscopy Study of ZnS Nanoparticles
 
Ck32985989
Ck32985989Ck32985989
Ck32985989
 
Modeling and Reduction of Root Fillet Stress in Spur Gear Using Stress Relie...
Modeling and Reduction of Root Fillet Stress in Spur Gear Using  Stress Relie...Modeling and Reduction of Root Fillet Stress in Spur Gear Using  Stress Relie...
Modeling and Reduction of Root Fillet Stress in Spur Gear Using Stress Relie...
 
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...
A Comparative Study of Low Cost Solar Based Lighting System and Fuel Based Li...
 
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...
 

Similar a Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets

How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...Nicolle Dammann
 
An Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data ClassificationAn Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data ClassificationAM Publications
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
E132833
E132833E132833
E132833irjes
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data miningAlexander Decker
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningIOSR Journals
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Techniqueijtsrd
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 

Similar a Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets (20)

How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
An Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data ClassificationAn Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data Classification
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
E132833
E132833E132833
E132833
 
Unit i
Unit iUnit i
Unit i
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining
 
Z36149154
Z36149154Z36149154
Z36149154
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
G045033841
G045033841G045033841
G045033841
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association Mining
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 

Más de IJMER

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless BicycleIJMER
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And AuditIJMER
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLIJMER
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyIJMER
 

Más de IJMER (20)

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed Delinting
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja Fibre
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless Bicycle
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise Applications
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation System
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological Spaces
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And Audit
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDL
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One Prey
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets

  • 1. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1861 | Page B. Susrutha1 , J. Vamsi Nath2 , T. Bharath Manohar3 , I. Shalini4 1,4 M. Tech 2ndyr,Dept of CSE,PBRVITS(Affiliated to JNTU Kakinada), Kavali,Nellore.Andhra Pradesh.India. 2 Associate Professor, Dept of CSE, PBRVITS(Affiliated to JNTU Kakinada), Kavali, Nellore, Andhra Pradesh. India. 3 Asst. Professor,Dept of CSE, CMR College of Engineering &Technology,(Affiliated to JNTU Hyderabad) Hyderabad. Andhra Pradesh. India. Abstract: Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, where as the SPJ method does not. Keywords: Aggregation, Data Preparation, Pivoting, SQL. I. INTRODUCTION In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data set that can be used as input for a data mining or statistical algorithm. Most algorithms require as input a data set with a horizontal layout, with several Records and one variable or dimension per column. That is the case with models like clustering, classification, regression and PCA; consult. Each research discipline uses different terminology to describe the data set. In data mining the common terms are point-dimension. Statistics literature generally uses observation-variable. Machine learning research uses instance-feature. This article introduces a new class of aggregate functions that can be used to build data sets in a horizontal layout (denormalized with aggregations), automating SQL query writing and extending SQL capabilities. We show evaluating horizontal aggregations is a challenging and interesting problem and we introduced alternative methods and optimizations for their efficient evaluation. II. MOTIVATION As mentioned above, building a suitable data set for data mining purposes is a time- consuming task. This task generally requires writing long SQL statements or customizing SQL Code if it is automatically generated by some tool. There are two main ingredients in such SQL code: joins and aggregations; we focus on the second one. The most widely- known aggregation is the sum of a column over groups of rows. Some other aggregations return the average, maximum, minimum or row count over groups of rows. There exist many aggregations functions and operators in SQL. Unfortunately, all these aggregations have limitations to build data sets for data mining purposes. The main reason is that, in general, data sets that are stored in a relational database (or a data warehouse) come from On-Line Transaction Processing (OLTP) systems where database schemas are highly normalized. But data mining, statistical or machine learning algorithms generally require aggregated data in summarized form. Based on current available functions and clauses in SQL, a significant effort is required to compute aggregations when they are desired in a cross tabular (Horizontal) form, suitable to be used by a data mining algorithm. Such effort is due to the amount and complexity of SQL code that needs to be written, optimized and tested. There are further practical reasons to return aggregation results in a horizontal (cross-tabular) layout. Standard aggregations are hard to interpret when there are many result rows, especially when grouping attributes have high cardinalities. To perform analysis of exported tables into spreadsheets it may be more convenient to have aggregations on the same group in one row (e.g. to produce graphs or to compare data sets with repetitive information). OLAP tools generate SQL code to transpose results (sometimes called PIVOT). Transposition can be more efficient if there are mechanisms combining aggregation and transposition together. With such limitations in mind, we propose a new class of aggregate functions that aggregate numeric expressions and transpose results to produce a data set with a horizontal layout. Functions belonging to this class are called horizontal aggregations. Horizontal aggregations represent an extended form of traditional SQL aggregations, which return a set of values in a horizontal layout (somewhat similar to a multidimensional vector), instead of a single value per row. This article explains how to evaluate and optimize horizontal aggregations generating standard SQL code. Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
  • 2. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1862 | Page III. LITERATURE SURVEY We have to analysis the DATA MINING Outline Survey: Data Mining Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. The Scope of Data Mining Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:  Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.  Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors. The most commonly used techniques in data mining are:  Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.  Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) .  Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.  Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.  Rule induction: The extraction of useful if-then rules from data based on statistical significance. An Architecture for Data Mining To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible interactive business analysis tools. Many data mining tools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation, integration with the warehouse simplifies the application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areas such as promotional campaign management, fraud detection, new product rollout, and so on. Figure 1: Integrated Data Mining Architecture
  • 3. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1863 | Page The Ideal starting point is a data warehouse containing a combination of internal data tracking all customer contact coupled with external market data about competitor activity. Background information on potential customers also provides an excellent basis for prospecting. This warehouse can be implemented in a variety of relational database systems: Sybase, Oracle, Redbrick, and so on, and should be optimized for flexible and fast data access. Data Mining Products: Data mining products are taking the industry by storm. The major database vendors have already taken steps to ensure that their platforms incorporate data mining techniques. Oracle's Data Mining Suite (Darwin) implements classification and regression trees, neural networks, k-nearest neighbors, regression analysis and clustering algorithms. Microsoft's SQL Server also offers data mining functionality through the use of classification trees and clustering algorithms. If you're already working in a statistics environment, you're probably familiar with the data mining algorithm implementations offered by the advanced statistical packages SPSS, SAS, and S-Plus. Conclusion Comprehensive data warehouses that integrate operational data with customer, supplier, and market information have resulted in an explosion of information. Competition requires timely and sophisticated analysis on an integrated view of the data. However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but brute force navigation of data is not enough. A new technological leap is needed to structure and prioritize information for specific end-user problems. The data mining tools can make this leap. Quantifiable business benefits have been proven through the integration of data mining with current information systems, and new products are on the horizon that will bring this integration to an even wider audience of users. IV. SYSTEM ANALYSIS Existing System: An existing to preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. Disadvantage: 1)Existing SQL aggregations have limitations to prepare data sets. 2)To return one column per aggregated group Previous Process Flow: Fig: Previous Process Flow Proposed System: Our proposed horizontal aggregations provide several unique features and advantages. First, they represent a template to generate SQL code from a data mining tool. Such SQL code automates writing SQL queries, optimizing them and testing them for correctness. Advantage: 1) The SQL code reduces manual work in the data preparation phase in a data mining project. 2) The SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user. 3) The data sets can be created in less time. 4) The data set can be created entirely inside the DBMS.
  • 4. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1864 | Page Proposed Process Flow: Fig: Proposed Process Flow Modules Description: 1. Admin Module 2. User Module 3. View Module 4. Download Module Module 1 : Admin Module Admin will upload new connection form based on regulations in various states. Admin will be able upload various details regarding user bills like a new connection to a new user, amount paid or payable by user. In case of payment various details regarding payment will be entered and separate username and password will be provided to users in large. Module 2 : User Module User will be able to view his bill details on any date may be after a month or after months or years and also he can to view the our bill details in a various ways for instance, The year wise bills, Month wise bills, totally paid to bill in EB. This will reduce the cost of transaction. If user thinks that his password is insecure, he has option to change it. He also can view the registration details and allowed to change or edit and save it. Module 3 : View Module Admin has three ways to view the user bill details, the 3 ways are i) SPJ ii) PIVOT iii) CASE i) SPJ : While using SPJ the viewing and processing time of user bills is reduced. ii) PIVOT : This is used to draw the user details in a customized table. This table will elaborate us on the various bill details regarding the user on monthly basis. iii) CASE : using CASE query we can customize the present table and column based on the conditions. This will help us to reduce enormous amount of space used by various user bill details. It can be viewed in two difference ways namely Horizontal and Vertical. In case of vertical the number of rows will be reduced to such an extent it is needed and column will remain the same on other hand the Horizontal will reduce rows as same as vertical and will also increase the columnar format. Module 4 : Download Module User will be able to download the various details regarding bills. If he/she is a new user, he/she can download the new connection form, subscription details etc. then he/she can download his /her previous bill details in hands so as to ensure it.
  • 5. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1865 | Page V. SYSTEM DESIGN & ARCHITECTURE DIAGRAMS Data Flow Diagram / Use Case Diagram / Flow Diagram The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system. Class Diagram:
  • 6. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1866 | Page Data Flow Diagram:
  • 7. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1867 | Page Activity Diagram:
  • 8. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1868 | Page Use case Diagram:
  • 9. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1869 | Page Sequence Diagram: VI. EXPERIMENT & RESULT We performed several experiments to test the proposed algorithm and evaluate its performance against different attacks and experienced various results as follows:
  • 10. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1870 | Page Fig: Horizontal Aggregations in SQL Fig: Horizontal view results Fig: Results obtained processing the records in Database
  • 11. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 www.ijmer.com 1871 | Page VII. CONCLUSION We introduced a new class of extended aggregate functions, called horizontal aggregations which help preparing data sets for data mining and OLAP cube exploration. Specifically, horizontal aggregations are useful to create data sets with a horizontal layout, as commonly required by data mining algorithms and OLAP cross-tabulation. Basically, a horizontal aggregation returns a set of numbers instead of a single number for each group, resembling a multi-dimensional vector. We proposed an abstract, but minimal, extension to SQL standard aggregate functions to compute horizontal aggregations which just requires specifying subgrouping columns inside the aggregation function call. From a query optimization perspective, we proposed three query evaluation methods. The first one (SPJ) relies on standard relational operators. The second one (CASE) relies on the SQL CASE construct. The third (PIVOT) uses a built-in operator in a commercial DBMS that is not widely available. The SPJ method is important from a theoretical point of view because it is based on select, project and join (SPJ) queries. The CASE method is our most important contribution. It is in general the most efficient evaluation method and it has wide applicability since it can be programmed combining GROUP-BY and CASE statements. We proved the three methods produce the same result. We have explained it is not possible to evaluate horizontal aggregations using standard SQL without either joins or ”case” constructs using standard SQL operators. Our proposed horizontal aggregations can be used as a database method to automatically generate efficient SQL queries with three sets of parameters: grouping columns, subgrouping columns and aggregated column. The fact that the output horizontal columns are not available when the query is parsed (when the query plan is explored and chosen) makes its evaluation through standard SQL mechanisms infeasible. Our experiments with large tables show our proposed horizontal aggregations evaluated with the CASE method have similar performance to the built-in PIVOT operator. We believe this is remarkable since our proposal is based on generating SQL code and not on internally modifying the query optimizer. Both CASE and PIVOT evaluation methods are significantly faster than the SPJ method. Precomputing a cube on selected dimensions produced acceleration on all methods. ACKNOWLEDGMENT I would like to express my sincere thanks to my Guide and my Co-Authors for their consistence support and valuable suggestions. REFERENCES [1] G. Bhargava, P. Goel, and B.R. Iyer. Hypergraph based reordering of outer join queries with complex predicates. In ACM SIGMOD Conference, pages 304–315, 1995. [2] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman. .NET database programmability and extensibility in Microsoft SQL Server. In Proc. ACM SIGMOD Conference, pages 1087–1098, 2008. [3] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM KDD Conference, pages 425–429, 1999. [4] E.F. Codd. Extending the database relational model to capture more meaning. ACM TODS, 4(4):397–434, 1979. [5] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages 998–1009, 2004. [6] C. Galindo-Legaria and A. Rosenthal. Outer join simplification and reordering for query optimization. ACM TODS, 22(1):43–73, 1997. [7] H. Garcia-Molina, J.D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 1st edition, 2001. [8] G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. ACM KDD Conference, pages 204–208, 1998. [9] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and subtotal. In ICDE Conference, pages 152–159, 1996. [10] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 1st edition, 2001. [11] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke. Locking protocols for materialized aggregate join views. IEEE Transactions on Knowledge and Data Engineering (TKDE), 17(6):796–807, 2005. [12] C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge Discovery Workshop, pages 35–42, 2004. [13] C. Ordonez. Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22, 2010. [14] C. Ordonez. Data set preprocessing and transformation in a database system. Intelligent Data Analysis (IDA), 15(4), 2011. [15] C. Ordonez and S. Pitchaimalai.Bayesian classifiers programmed in SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22(1):139–144, 2010.