SlideShare una empresa de Scribd logo
1 de 37
Mining Attribute Lifecycle 
to Predict Faults 
and Incompleteness 
in Database Applications 
Presented by:- 
Sandra Alex 
Roll no: 40
Outline 
 INTRODUCTION 
 ATTRIBUTE LIFECYCLE CHARACTERIZATION 
 PROPOSED APPROACH 
 EXPERIMENT 
 PREDICTION 
 RELATED WORK 
 CONCLUSION 
 REFERENCES 
Page  2
Introduction 
 Each attribute  a value created initially via 
Page  3 
insertion 
 Referenced, updated or deleted 
 These occurrences of events, associated with 
the states  attribute lifecycle. 
 Behaviour of an attribute value from its 
insertion to final deletion 
 Extract the attribute lifecycle out of a database 
application
Introduction 
 Our empirical studies discover, 
faults and incompleteness in db applications  
highly associated with attribute lifecycle. 
 Learned prediction model  applied in 
development and maintenance of database 
applications 
 Experiments conducted on PHP systems 
Page  4
Attribute Lifecycle Characterization 
Page  5 
 for each attribute, a value is 
i. created -> insertion 
ii. referenced -> selection 
iii. updated -> updating 
iv. deleted -> deletion 
 These occurrences of events are associated 
with states , to constitute the attribute 
lifecycle.
Attribute Lifecycle Characterization 
Page  6 
State transition diagram of the attribute lifecycle
Attribute Lifecycle Characterization 
 programs sustain attribute lifecycle by 4 
database operations: 
INSERT, SELECT, UPDATE and DELETE 
 formulate the following attributes to 
Page  7 
characterize its lifecycle: 
i. Create (C) -> value of attribute is inserted. 
ii. Null Create (NC) -> inserted without 
value 
iii. Control Update (COU) -> not influenced 
by existing attribute value & inputs from 
user and database.
Attribute Lifecycle Characterization 
Page  8 
iv. Overriding Update (OVU) 
-> not influenced by existing value. 
v. Cumulating Update (CMU) 
-> influenced by existing value. 
vi. Delete (D) : attribute is deleted as a result 
of the deletion of the record 
vii. Use (U): value is used to support the 
insertion, updating or deletion of other 
database attributes or output to the 
external environment.
Attribute Lifecycle Characterization 
 Hence,we characterize the attribute lifecycle by 
Page  9 
a seven element vector 
[m1, m2, m3, m4, m5, m6, m7], 
where m1, m2,m3, m4, m5, m6, m7 
denote whether there is database operation 
performed on the attribute is of type 
C, NC, COU, OVU, CMU, D and U respectively.
Proposed Approach 
A. Mining Attribute Lifecycle 
Page  10
Proposed Approach 
B. Extracting Attribute Lifecycle 
Page  11 
Characterization Data 
1) Query Extraction 
<?php 
function exec_query($q) 
{ return mysql_query($q); } 
$query = "SELECT username FROM users WHERE "; 
if (isset($_POST[‘usertype’])) 
{ $query .= "usertype =" .$_POST[‘usertype’];//use usertype } 
else 
{ $query .= "userid=" .$_POST[‘userid’]; //use userid } 
exec_query($query); 
?> 
 query can be different in runtime.
Proposed Approach 
 control flow graph(CFG) for the code 
Page  12
generates a set of basis paths 
encounter a query execution function like 
“mysql_query”, -> definition of every variable used 
is retrieved 
literals -> replaced by their actual values 
variables whose values are not statically known -> 
replaced by placeholders 
parts of query strings with replaced values -> 
Page  13 
connected 
Proposed Approach
Proposed Approach 
2) Analysis of Attribute Lifecycle 
queries are extracted  analysed to obtain the 
attribute lifecycle patterns 
 by using an SQL grammar parser 
CREATE TABLE : first parsed to collect the schema 
of table 
VIEW: mapping of attributes between the view & 
backup table 
Page  14
Proposed Approach 
 SELECT : 
o query is parsed, table aliases restored by the 
actual table names, & attributes are identified 
o “*” -> schema of table consulted to get all 
attribute names 
o “count(*)” -> not considered, characterized as 
“USE” 
Page  15
Proposed Approach 
 INSERT: 
o table name is identified first 
o no column list -> all the attributes inserted. 
o column list -> attributes are extracted 
o “auto incremental” or have not null default values 
-> treated as inserted by the query 
o These attributes are characterized as “Create” 
o explicitly assigned to null -> marked as “Null 
Create”. 
Page  16
 UPDATE : 
o collect attribute names 
o identify the update pattern 
o attribute assignments in the SET clause are 
Page  17 
separated 
o analyse the value string to determine the update 
characteristic 
o either COU, OVU or CMU 
o attributes used in the WHERE clause -> marked 
as “Use”
DELETE : 
oidentify table name 
omark all the attributes as “Delete 
oattributes in the WHERE clause as “Use” 
For each query, 
attribute names in it -> put into a collection -> create 
Page  18 
attribute lifecycle vectors.
3) Generation of Attribute Lifecycle Vectors 
For example, 
if there is at least one “Create” characteristic for one 
attribute, 
Page  19 
othe first element of the vector 1 
o otherwise 0 
no operation on an attribute, all elements set to 0 
we generate vectors for all attributes in a database 
application.
A. Data Collection 
seed faults in open-source database applications to 
train our model 
we chose systems -> should have very few faults 
associated with attribute lifecycle. 
Page  20 
• source code -> publicly available 
• application size -> considerable (transaction 
number and attribute number) 
• mature enough -> very few faults associated 
with attribute lifecycle. 
Experiment
Experiment 
 “batavi” a web-based e-commerce system; 
 “webERP”, an accounting & business management system; 
 “FrontAccounting”, a professional web based system 
 “OpenBusinessNetwork”, application designed for business; 
 “SchoolMate”, solution for school administrations. 
Page  21
Experiment 
attribute lifecycle have a number of common patterns 
those which do not follow -> cause errors 
we seeded the following common errors 
1) Missing function: attributes are provided, function is not 
Page  22 
catered for during the program design 
2) Inconsistency design: correcting the result of a transaction 
that updates an attribute by “cumulative update” using 
“overriding update” 
3) Redundant function: new programs for different types of 
operations 
4) No Update: new attributes without any update functions
Experiment 
B. Experimental Design 
three classifiers to learn the prediction model 
1) C4.5 classifier 
decision tree classification algorithm 
uses normalized information gain to split data 
information gain of one attribute A 
Page  23
Experiment 
Info(D) is defined as: 
pi : probability that one instance belongs to class i 
 In training process, 
Page  24 
each time the classifier chooses one attribute 
with the highest normalized information gain 
to split the data until all attributes are used.
Experiment 
2) Naïve Bayes classifier 
 generative probabilistic model 
 Bayes’ theorem: 
 assumed that attributes are independent, we have 
 For categorical value, the probability P(xi|Ci) is the 
proportion of the instances in class Ci which have 
attribute xi. 
Page  25
Experiment 
3) SVM classifier 
 Support Vector Machine (SVM) 
 based on the statistical learning theory 
 trains the classification model by 
Page  26 
searching the hyper plane 
which maximizes the margin between classes
Experiment 
C. Model Training 
 attributes from the five systems labelled to create the training 
set 
 manually checked, labelled each attribute as “missing 
function” ,“inconsistency design” ,“redundant function, "no 
update” or “normal” 
Page  27
Experiment 
model was trained by three classifiers 
for evaluation of trained models  10-fold cross 
validation on training set 
set was randomly partitioned into 10 folds 
each time 9 folds of them as training set 
Page  28 
and 1 fold was testing set 
we computed the average measurements
Experiment 
D. Assessing Performance 
 probability of detection pd=(tp/(tp+fn)) 
 probability of false alarm pf=(fp/(fp+tn)) 
 precision pr=(tp/(tp+fp)) 
Page  29 
 pd  1 pf  0
Page  30 
• pd>87 
• pf<1.81 
• SVM>C4.5 
• C4.5>naïve 
Bayes 
• SVM: 
• pd>95% 
• pf<0.07%
Prediction 
applied prediction model on four database applications -> 
to predict whether there are attributes with missing function, 
inconsistency design, redundant function and no update. 
applied our prediction model learned by SVM to these 
systems and counted the attributes that were predicted 
Page  31
Prediction 
designers could take corresponding actions to 
modify these design faults and incompleteness 
further, we manually validate all the attributes 
predicted 
Of all the 107 attributes, 98 are confirmed to be real 
prediction precision is 91.59% 
Page  32
Conclusion 
 For each attribute, we extract the set of attributes 
Page  33 
that can be extracted from code of database 
applications to characterize its lifecycle. 
 a characterization vector is formed 
 Data mining technique is applied to mine the 
attribute lifecycle using the data collected from 
database open-source systems. 
 We seed errors in mature systems and simulate 
the design faults to train our dataset for our 
classification method. 
 Five types of labelled attributes are obtained.
Conclusion 
 Fault and completeness prediction model is then built. 
 In our experiment, the model achieved 98.04% 
precision and 98.25% recall on average for SVM 
 We also applied the model on four database open 
source applications to predict 
 conduct more comprehensive experiments on a larger 
set of systems ,further validate the merits of the 
proposed approach 
Page  34
References 
[1] N. Nagappan and T. Ball, “Static Analysis Tools as Early Indicators 
of Pre-release Defect Density,” in Proceedings of the 27th International 
Conference on Software Engineering. ACM, 2005, pp. 580–586. 
[2] A. Nikora and J. Munson, “Developing Fault Predictors for Evolving 
Software Systems,” in Proceedings of Ninth International Software 
Metrics Symposium, 2003. IEEE, 2003, pp. 338–350. 
[3] A. Watson, T. McCabe, and D. Wallace, “Structured testing: A testing 
methodology using the cyclomatic complexity metric,” NIST special 
Publication, vol. 500, no. 235, pp. 1–114, 1996. 
[4] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan, “Using artificial 
anomalies to detect unknown and known network intrusions,” 
Knowledge and Information Systems, vol. 6, no. 5, pp. 507–527, 2004. 
Page  35
Page  36
Page  37

Más contenido relacionado

Similar a Mining attributes

AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...IJTET Journal
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
MicroManager_MATLAB_Implementation
MicroManager_MATLAB_ImplementationMicroManager_MATLAB_Implementation
MicroManager_MATLAB_ImplementationPhilip Mohun
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolIRJET Journal
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionDr. Amarjeet Singh
 
Chapter 3 SOFTWARE TESTING PROCESS
Chapter 3 SOFTWARE TESTING PROCESSChapter 3 SOFTWARE TESTING PROCESS
Chapter 3 SOFTWARE TESTING PROCESSst. michael
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]IRJET Journal
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Gianmario Spacagna
 

Similar a Mining attributes (20)

AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
MicroManager_MATLAB_Implementation
MicroManager_MATLAB_ImplementationMicroManager_MATLAB_Implementation
MicroManager_MATLAB_Implementation
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka Tool
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern Recognition
 
Chapter 3 SOFTWARE TESTING PROCESS
Chapter 3 SOFTWARE TESTING PROCESSChapter 3 SOFTWARE TESTING PROCESS
Chapter 3 SOFTWARE TESTING PROCESS
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Último (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Mining attributes

  • 1. Mining Attribute Lifecycle to Predict Faults and Incompleteness in Database Applications Presented by:- Sandra Alex Roll no: 40
  • 2. Outline  INTRODUCTION  ATTRIBUTE LIFECYCLE CHARACTERIZATION  PROPOSED APPROACH  EXPERIMENT  PREDICTION  RELATED WORK  CONCLUSION  REFERENCES Page  2
  • 3. Introduction  Each attribute  a value created initially via Page  3 insertion  Referenced, updated or deleted  These occurrences of events, associated with the states  attribute lifecycle.  Behaviour of an attribute value from its insertion to final deletion  Extract the attribute lifecycle out of a database application
  • 4. Introduction  Our empirical studies discover, faults and incompleteness in db applications  highly associated with attribute lifecycle.  Learned prediction model  applied in development and maintenance of database applications  Experiments conducted on PHP systems Page  4
  • 5. Attribute Lifecycle Characterization Page  5  for each attribute, a value is i. created -> insertion ii. referenced -> selection iii. updated -> updating iv. deleted -> deletion  These occurrences of events are associated with states , to constitute the attribute lifecycle.
  • 6. Attribute Lifecycle Characterization Page  6 State transition diagram of the attribute lifecycle
  • 7. Attribute Lifecycle Characterization  programs sustain attribute lifecycle by 4 database operations: INSERT, SELECT, UPDATE and DELETE  formulate the following attributes to Page  7 characterize its lifecycle: i. Create (C) -> value of attribute is inserted. ii. Null Create (NC) -> inserted without value iii. Control Update (COU) -> not influenced by existing attribute value & inputs from user and database.
  • 8. Attribute Lifecycle Characterization Page  8 iv. Overriding Update (OVU) -> not influenced by existing value. v. Cumulating Update (CMU) -> influenced by existing value. vi. Delete (D) : attribute is deleted as a result of the deletion of the record vii. Use (U): value is used to support the insertion, updating or deletion of other database attributes or output to the external environment.
  • 9. Attribute Lifecycle Characterization  Hence,we characterize the attribute lifecycle by Page  9 a seven element vector [m1, m2, m3, m4, m5, m6, m7], where m1, m2,m3, m4, m5, m6, m7 denote whether there is database operation performed on the attribute is of type C, NC, COU, OVU, CMU, D and U respectively.
  • 10. Proposed Approach A. Mining Attribute Lifecycle Page  10
  • 11. Proposed Approach B. Extracting Attribute Lifecycle Page  11 Characterization Data 1) Query Extraction <?php function exec_query($q) { return mysql_query($q); } $query = "SELECT username FROM users WHERE "; if (isset($_POST[‘usertype’])) { $query .= "usertype =" .$_POST[‘usertype’];//use usertype } else { $query .= "userid=" .$_POST[‘userid’]; //use userid } exec_query($query); ?>  query can be different in runtime.
  • 12. Proposed Approach  control flow graph(CFG) for the code Page  12
  • 13. generates a set of basis paths encounter a query execution function like “mysql_query”, -> definition of every variable used is retrieved literals -> replaced by their actual values variables whose values are not statically known -> replaced by placeholders parts of query strings with replaced values -> Page  13 connected Proposed Approach
  • 14. Proposed Approach 2) Analysis of Attribute Lifecycle queries are extracted  analysed to obtain the attribute lifecycle patterns  by using an SQL grammar parser CREATE TABLE : first parsed to collect the schema of table VIEW: mapping of attributes between the view & backup table Page  14
  • 15. Proposed Approach  SELECT : o query is parsed, table aliases restored by the actual table names, & attributes are identified o “*” -> schema of table consulted to get all attribute names o “count(*)” -> not considered, characterized as “USE” Page  15
  • 16. Proposed Approach  INSERT: o table name is identified first o no column list -> all the attributes inserted. o column list -> attributes are extracted o “auto incremental” or have not null default values -> treated as inserted by the query o These attributes are characterized as “Create” o explicitly assigned to null -> marked as “Null Create”. Page  16
  • 17.  UPDATE : o collect attribute names o identify the update pattern o attribute assignments in the SET clause are Page  17 separated o analyse the value string to determine the update characteristic o either COU, OVU or CMU o attributes used in the WHERE clause -> marked as “Use”
  • 18. DELETE : oidentify table name omark all the attributes as “Delete oattributes in the WHERE clause as “Use” For each query, attribute names in it -> put into a collection -> create Page  18 attribute lifecycle vectors.
  • 19. 3) Generation of Attribute Lifecycle Vectors For example, if there is at least one “Create” characteristic for one attribute, Page  19 othe first element of the vector 1 o otherwise 0 no operation on an attribute, all elements set to 0 we generate vectors for all attributes in a database application.
  • 20. A. Data Collection seed faults in open-source database applications to train our model we chose systems -> should have very few faults associated with attribute lifecycle. Page  20 • source code -> publicly available • application size -> considerable (transaction number and attribute number) • mature enough -> very few faults associated with attribute lifecycle. Experiment
  • 21. Experiment  “batavi” a web-based e-commerce system;  “webERP”, an accounting & business management system;  “FrontAccounting”, a professional web based system  “OpenBusinessNetwork”, application designed for business;  “SchoolMate”, solution for school administrations. Page  21
  • 22. Experiment attribute lifecycle have a number of common patterns those which do not follow -> cause errors we seeded the following common errors 1) Missing function: attributes are provided, function is not Page  22 catered for during the program design 2) Inconsistency design: correcting the result of a transaction that updates an attribute by “cumulative update” using “overriding update” 3) Redundant function: new programs for different types of operations 4) No Update: new attributes without any update functions
  • 23. Experiment B. Experimental Design three classifiers to learn the prediction model 1) C4.5 classifier decision tree classification algorithm uses normalized information gain to split data information gain of one attribute A Page  23
  • 24. Experiment Info(D) is defined as: pi : probability that one instance belongs to class i  In training process, Page  24 each time the classifier chooses one attribute with the highest normalized information gain to split the data until all attributes are used.
  • 25. Experiment 2) Naïve Bayes classifier  generative probabilistic model  Bayes’ theorem:  assumed that attributes are independent, we have  For categorical value, the probability P(xi|Ci) is the proportion of the instances in class Ci which have attribute xi. Page  25
  • 26. Experiment 3) SVM classifier  Support Vector Machine (SVM)  based on the statistical learning theory  trains the classification model by Page  26 searching the hyper plane which maximizes the margin between classes
  • 27. Experiment C. Model Training  attributes from the five systems labelled to create the training set  manually checked, labelled each attribute as “missing function” ,“inconsistency design” ,“redundant function, "no update” or “normal” Page  27
  • 28. Experiment model was trained by three classifiers for evaluation of trained models  10-fold cross validation on training set set was randomly partitioned into 10 folds each time 9 folds of them as training set Page  28 and 1 fold was testing set we computed the average measurements
  • 29. Experiment D. Assessing Performance  probability of detection pd=(tp/(tp+fn))  probability of false alarm pf=(fp/(fp+tn))  precision pr=(tp/(tp+fp)) Page  29  pd  1 pf  0
  • 30. Page  30 • pd>87 • pf<1.81 • SVM>C4.5 • C4.5>naïve Bayes • SVM: • pd>95% • pf<0.07%
  • 31. Prediction applied prediction model on four database applications -> to predict whether there are attributes with missing function, inconsistency design, redundant function and no update. applied our prediction model learned by SVM to these systems and counted the attributes that were predicted Page  31
  • 32. Prediction designers could take corresponding actions to modify these design faults and incompleteness further, we manually validate all the attributes predicted Of all the 107 attributes, 98 are confirmed to be real prediction precision is 91.59% Page  32
  • 33. Conclusion  For each attribute, we extract the set of attributes Page  33 that can be extracted from code of database applications to characterize its lifecycle.  a characterization vector is formed  Data mining technique is applied to mine the attribute lifecycle using the data collected from database open-source systems.  We seed errors in mature systems and simulate the design faults to train our dataset for our classification method.  Five types of labelled attributes are obtained.
  • 34. Conclusion  Fault and completeness prediction model is then built.  In our experiment, the model achieved 98.04% precision and 98.25% recall on average for SVM  We also applied the model on four database open source applications to predict  conduct more comprehensive experiments on a larger set of systems ,further validate the merits of the proposed approach Page  34
  • 35. References [1] N. Nagappan and T. Ball, “Static Analysis Tools as Early Indicators of Pre-release Defect Density,” in Proceedings of the 27th International Conference on Software Engineering. ACM, 2005, pp. 580–586. [2] A. Nikora and J. Munson, “Developing Fault Predictors for Evolving Software Systems,” in Proceedings of Ninth International Software Metrics Symposium, 2003. IEEE, 2003, pp. 338–350. [3] A. Watson, T. McCabe, and D. Wallace, “Structured testing: A testing methodology using the cyclomatic complexity metric,” NIST special Publication, vol. 500, no. 235, pp. 1–114, 1996. [4] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan, “Using artificial anomalies to detect unknown and known network intrusions,” Knowledge and Information Systems, vol. 6, no. 5, pp. 507–527, 2004. Page  35