Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
1
Weiyi Shang
Supervisor: Dr. Ahmed E. Hassan
Log Engineering: Towards Systematic Log
Mining to Support the Development of...
Automated profiling & instrumentation are
widely used in software engineering
Overhead No domain knowledgeLarge scale
2
Logs are a valuable source of information about
system execution
3
Field informationDeveloper experience
foo() {
…
Log_sta...
Overview of log mining
4
Software
System
Log collection Log analysis
Log
transformation
Goal
Finding 1. Little research focuses on logging
statements that reside in the source code.
Finding 2. Little research focuse...
Finding 3. Prior research primarily uses ad hoc
log transformation techniques.
6
Software
System
Log collection Log analys...
Finding 4. Prior log mining research does not
address the scalability challenges.
7
Software
System
Log collection Log ana...
Finding 5. There exists limited software log
mining research to support software
development activities
8
Software
System
...
Thesis statement
Logs are a valuable yet rarely explored source of knowledge
about a software system and its operation. Th...
10
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering appr...
11
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering appr...
Motivation: Log understanding is challenging
12
User mailing lists
Hadoop
Cassandra
Zookeeper
14 inquiries asked
about 5 t...
Approach: Attaching development knowledge
to logs
13
Code
commit
Issue reports
Source code
/*
…
*/
Call graph
Code comment...
Development knowledge can resolve real-life
inquiries
Development knowledge can provide help in resolving 9 out of 14
real...
15
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering appr...
Motivation:
How to keep Log Processing Apps in sync with
logs?
Release 1 Release 2 Release 3
16
[WCRE 2011 best paper, JSE...
Approach:
Studying log evolution at the execution level
Data
Collection
Log
Abstraction
System
Deployment
time=1, Trying t...
Generating
Abstract
Syntax Tree
Identifying
logging
statements
Source code
Log.info (“time=%d, Trying to launch,
TaskID=%s...
How do log evolve
over time?
19
Growing &
changing
Document &
track
What types of
modifications
happen to logs?
What infor...
20
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering appr...
Approach: Building statistical models for post-
release defects
21
Logistic
Regression
Model
Traditional metrics
Tradition...
22
Log density
Average logging level
Log add density
Log delete density
Co-change of log and bug fix
Product Process
Appro...
23
There is relationship between logging characteristics and software quality.
Results
• In 7 out of 8 studied releases, a...
24
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering appr...
How to verify the deployment of Big Data
Analytics Apps?
25
Small sample data and pseudo
cloud
Big data and real-life clou...
Traditional approach for verifying BDA apps
26
Keyword scan
Many false positives!!
Large results, too much
effort to manua...
Overview of our approach
27
Small sample data and pseudo
cloud
Big data and real-life cloud
Underlying platform Underlying...
Comparing small and large runs
28
Logs from
testing run
with small
data
Logs from
run with
large data
Execution sequence
E...
How precise is our
approach?
Precision
29
Effort Reduction
How much effort
reduction does our
approach provide?
Reduce log...
Thesis contribution
• We demonstrate the challenges of understanding
logs.
• We show that logging statements continually
e...
31
32
Where else can we find the requested
information?
33
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Where else can we find the requested
information?
34
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Where else can we find the requested
information?
35
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Where else can we find the requested
information?
36
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Where else can we find the requested
information?
37
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Where else can we find the requested
information?
38
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call grap...
Step 1: Log Abstraction
reduces the size of logs
39
Log
abstraction
Log Linking
Simplifying
sequences
Example of log lines...
Step 2: Log linking
provides context for logs
40
Log
abstraction
Log Linking
Simplifying
sequences
Example of log lines
Ex...
Step 3: Sequence simplification
deals with repeated logs
41
Log
abstraction
Log Linking
Simplifying
sequences
Repeated log...
Próxima SlideShare
Cargando en…5
×

Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems

169 visualizaciones

Publicado el

Weiyi Shang's PhD Thesis 2014

Publicado en: Software
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems

  1. 1. 1 Weiyi Shang Supervisor: Dr. Ahmed E. Hassan Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-largeScale Systems
  2. 2. Automated profiling & instrumentation are widely used in software engineering Overhead No domain knowledgeLarge scale 2
  3. 3. Logs are a valuable source of information about system execution 3 Field informationDeveloper experience foo() { … Log_statement(“operation started”); … }
  4. 4. Overview of log mining 4 Software System Log collection Log analysis Log transformation Goal
  5. 5. Finding 1. Little research focuses on logging statements that reside in the source code. Finding 2. Little research focuses on logs generated during the development of system. 5 Software System Log collection Log analysis Log transformation Goal • Types of Logs: • Platform logs: Hadoop logs [Tan et al.] • Application logs: Dell DVD store logs [Jiang et al.] • Sources of Logs: • Logs from the field: [Kavulia et al.] • Logs during development: [Jiang et al.]
  6. 6. Finding 3. Prior research primarily uses ad hoc log transformation techniques. 6 Software System Log collection Log analysis Log transformation Goal • Abstracted logs: Log events [Jiang et al.] • Vectors or sets: Pairs [Jiang et al.], Sequence [Jiang et al.], Suffix arrays [Nagappan et al.], Time series [Bitincka et al.] • Graphs: State machines [Tan et al.], Directed Graph[Nagappan et al.] • Matrixes: [Lou et al.]
  7. 7. Finding 4. Prior log mining research does not address the scalability challenges. 7 Software System Log collection Log analysis Log transformation Goal • Simple calculation: filtering [Salfner et al. ] • Directed Graph-based algorithms: [Nagappan et al.] • Static analysis: [Yuan et al.] • Model checking: [Beschastnikh et al.] • Visualization: [De Pauw et al.] • Statistical methods: PCA [Xu et al.] • Data mining techniques: Co-occurrence analysis [Lou et al.] • Machine learning techniques: Prediction [Salfner et al.] • Other analysis techniques: Compression [Hassan et al.]
  8. 8. Finding 5. There exists limited software log mining research to support software development activities 8 Software System Log collection Log analysis Log transformation Goal • Log mining platforms: [Bitincka et al.] • Log improvements: [Yuan et al.] • Log mining for system administration • Anomaly detection [Xu et al.] • System monitoring [Rabkin et al.] • Work load recovery and capacity planning [Kavulia et al.] • Log mining for software engineering • Program comprehension: [Beschastnikh et al.] • Software testing: [Jiang et al.] • Empirical studies: [Yuan et al.]
  9. 9. Thesis statement Logs are a valuable yet rarely explored source of knowledge about a software system and its operation. There is little research regarding the understanding and evolution of logs. Systematic and scalable log mining approaches are needed to support various software development activities (e.g., code quality improvement, large scale testing and deployment of ultra-large scale applications). 9
  10. 10. 10 Part 1: Study the challenges associated with understanding and evolving logging statements Part 2: Log engineering approaches to support software development activities What are the challenges in understanding logging statements? [Submitted to ICSM 2014] How do logging statements evolve? [WCRE 2011 , JESP] Prioritizing code review and testing efforts using logs and their churn. [EMSE] Verifying deployment of Big Data Analytics applications using logs. [ICSE 2013 ]
  11. 11. 11 Part 1: Study the challenges associated with understanding and evolving logging statements Part 2: Log engineering approaches to support software development activities What are the challenges in understanding logging statements? How do logging statements evolve? Prioritizing code review and testing efforts using logs and their churn. Verifying deployment of Big Data Analytics applications using logs.
  12. 12. Motivation: Log understanding is challenging 12 User mailing lists Hadoop Cassandra Zookeeper 14 inquiries asked about 5 types of information 2 11 1 6 1 0 5 10 Meaning Cause Context Solution Impact # inquires [ICSM 2014 in submission]
  13. 13. Approach: Attaching development knowledge to logs 13 Code commit Issue reports Source code /* … */ Call graph Code comments [ICSM 2014 in submission]
  14. 14. Development knowledge can resolve real-life inquiries Development knowledge can provide help in resolving 9 out of 14 real-life inquiries from the user mailing list 0 2 4 6 8 10 12 Meaning Cause Context Solution Impact # not answered inquires # answered inquires 14 [ICSM 2014 in submission]
  15. 15. 15 Part 1: Study the challenges associated with understanding and evolving logging statements Part 2: Log engineering approaches to support software development activities What are the challenges in understanding logging statements? How do logging statements evolve? Prioritizing code review and testing efforts using logs and their churn. Verifying deployment of Big Data Analytics applications using logs.
  16. 16. Motivation: How to keep Log Processing Apps in sync with logs? Release 1 Release 2 Release 3 16 [WCRE 2011 best paper, JSEP]
  17. 17. Approach: Studying log evolution at the execution level Data Collection Log Abstraction System Deployment time=1, Trying to launch, TaskID=01A time=$t, Trying to launch, TaskID=$id Enterprise Application (EA) 17 Log Events [WCRE 2011 best paper, JSEP]
  18. 18. Generating Abstract Syntax Tree Identifying logging statements Source code Log.info (“time=%d, Trying to launch, TaskID=%s”, time, taskid); time=$t, Trying to launch, TaskID=$id 18 Logging statements Approach: Studying log evolution at the code level [WCRE 2011 best paper, JSEP]
  19. 19. How do log evolve over time? 19 Growing & changing Document & track What types of modifications happen to logs? What information is conveyed by the short-lived logs? Quantity Type Content 8 types Are mostly avoidable Implementation- level details Fragile Maintenance effort Results [WCRE 2011 best paper, JSEP]
  20. 20. 20 Part 1: Study the challenges associated with understanding and evolving logging statements Part 2: Log engineering approaches to support software development activities What are the challenges in understanding logging statements? How do logging statements evolve? Prioritizing code review and testing efforts using logs and their churn. Verifying deployment of Big Data Analytics applications using logs.
  21. 21. Approach: Building statistical models for post- release defects 21 Logistic Regression Model Traditional metrics Traditional metrics Log-related metrics Logistic Regression Model • Are log-related metrics significant in the models? • How much explanatory power improvement can log-related metrics provide over traditional metrics?[EMSE]
  22. 22. 22 Log density Average logging level Log add density Log delete density Co-change of log and bug fix Product Process Approach: Defining log-related metrics Lines of code Pre-release defects Total prior commits log-related metrics Traditional metrics Product Process [EMSE]
  23. 23. 23 There is relationship between logging characteristics and software quality. Results • In 7 out of 8 studied releases, at least one log-related metric is statistically significant in enhancing the model with only traditional metrics. • The log-related metrics provide up to 40% improvement over the explanatory power of the traditional metrics. 0.16.0 to 0.19.0 3.0 to 4.0 [EMSE]
  24. 24. 24 Part 1: Study the challenges associated with understanding and evolving logging statements Part 2: Log engineering approaches to support software development activities What are the challenges in understanding logging statements? How do logging statements evolve? Prioritizing code review and testing efforts using logs and their churn. Verifying deployment of Big Data Analytics applications using logs.
  25. 25. How to verify the deployment of Big Data Analytics Apps? 25 Small sample data and pseudo cloud Big data and real-life cloud How to verify [ICSE 2013 distinguished paper]
  26. 26. Traditional approach for verifying BDA apps 26 Keyword scan Many false positives!! Large results, too much effort to manually examine [ICSE 2013 distinguished paper]
  27. 27. Overview of our approach 27 Small sample data and pseudo cloud Big data and real-life cloud Underlying platform Underlying platform Execution sequences Execution sequences Execution sequence delta [ICSE 2013 distinguished paper]
  28. 28. Comparing small and large runs 28 Logs from testing run with small data Logs from run with large data Execution sequence E1, E2, E3, E5, E6 Execution sequence E1, E2, E3, E5, E6 E1, E2, E3, E7, E5, E6 Execution sequence delta E1, E2, E3, E7, E5, E6 [ICSE 2013 distinguished paper]
  29. 29. How precise is our approach? Precision 29 Effort Reduction How much effort reduction does our approach provide? Reduce logs for manual inspection by over 86% Less false positive [ICSE 2013 distinguished paper]
  30. 30. Thesis contribution • We demonstrate the challenges of understanding logs. • We show that logging statements continually evolve. • We show that there is a relationship between logging characteristics and software defects. • We propose approaches that leverage logs to verify the deployment of Big Data Analytics applications. 30
  31. 31. 31
  32. 32. 32
  33. 33. Where else can we find the requested information? 33 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure From method checkAndInformJobTracker of file ShuffleScheduler.java
  34. 34. Where else can we find the requested information? 34 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure Notify the JobTracker after every read error, if `reportReadErrorImmediately' is true or after every `maxFetchFailuresBeforeReporting' failures
  35. 35. Where else can we find the requested information? 35 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure Called by method copyFailed in class ShuffleScheduler
  36. 36. Where else can we find the requested information? 36 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu.
  37. 37. Where else can we find the requested information? 37 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure MAPREDUCE-1171. … This is caused by a behavioral change in hadoop 0.20.1. … …One solution I could see is "Provide a config option... ”…
  38. 38. Where else can we find the requested information? 38 Code commit Issue reports Source code /* … */ Code comments Call graph fetch failure Meaning: There is a data reading error. Cause: One of the possible reasons is a configuration. Context: The event happens during the shuffle period, while copying data. Impact: The event impacts the jobtracker. Solution: Changing a configuration option would solve the issue. Amareshwari Sriramadasu is the expert to go to.
  39. 39. Step 1: Log Abstraction reduces the size of logs 39 Log abstraction Log Linking Simplifying sequences Example of log lines Execution events Jiang et al. JSME 2008
  40. 40. Step 2: Log linking provides context for logs 40 Log abstraction Log Linking Simplifying sequences Example of log lines Execution events
  41. 41. Step 3: Sequence simplification deals with repeated logs 41 Log abstraction Log Linking Simplifying sequences Repeated logs: task t1 read file A. task t1 read file A. task t1 read file A. Remove repetition and order of events

×