Publicidad

Semantics-Aware Trace Analysis [PLDI 2009]

13 de Aug de 2009
Publicidad

Más contenido relacionado

Publicidad
Publicidad

Semantics-Aware Trace Analysis [PLDI 2009]

  1. Kevin Hoffman, Patrick Eugster, Suresh Jagannathan
  2. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  3. Motivation  Apache XalanJ 2.4.1 works: java … xslt.Process -xsl case1.xsl -in test.xml java … xslt.Process -xsl case2.xsl -in test.xml java … xslt.Process -xsl case3.xsl -in test.xml  Upgrade to 2.5.1, now it‟s broken! java … xslt.Process -xsl case1.xsl -in test.xml java … xslt.Process -xsl case2.xsl -in test.xml java … xslt.Process -xsl case3.xsl -in test.xml
  4. How to find the cause?  Manual inspection is hard  12 months of development from 2.4.1 to 2.5.1  79K new or changed lines of code  97 new features and bugfixes
  5. How to find the cause?  Debugging is hard  Separation of cause and effect ○ e.g. in XalanJ, bug in XSLT compiler  Complex web of interacting components  Debugging requires in-depth domain- specific knowledge (limited resource)
  6. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  7. Challenges: Static Analysis  Dynamically generated code  Advanced language features  Dynamic dispatch (e.g., Polymorphism)  Reflection  Advanced aspect-oriented language features
  8. Challenges: Dynamic Analysis  Dynamic program slicing  Slices are still quite large (e.g. 1000s of events)  Control-flow similarity metrics  State-space exploration / refinement
  9. Execution Indexing  Use structure/state of execution to compute an „index‟ at each execution point  Find correlations between indices for profiling, debugging, execution comparison
  10. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  11. Semantic Trace Views Execution Trace --> LOG-1.addMsg('Handling..') ... <-- LOG-1.addMsg(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 <-- NUM-1.new(..) set SP-1._binConv = NUM-1 ... --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) <-- SP-1.setRequestType(..) Organize execution traces into “views”
  12. Semantic Trace Views Execution Trace (Thread View) --> LOG-1.addMsg('Handling..') ... <-- LOG-1.addMsg(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 <-- NUM-1.new(..) set SP-1._binConv = NUM-1 ... --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) <-- SP-1.setRequestType(..) Thread views based on thread ID
  13. Semantic Trace Views Execution Trace (and Thread View) --> LOG-1.addMsg('Handling..') ... <-- LOG-1.addMsg(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) Method View for SP.setRequestType set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 --> STR-1.equals('text/html') <-- NUM-1.new(..) <-- STR-1.equals(..) ret=true set SP-1._binConv = NUM-1 --> NUM-1.new(32, 127) ... <-- NUM-1.new(..) --> LOG-1.addMsg('Set req..') set SP-1._binConv = NUM-1 ... ... <-- LOG-1.addMsg(..) --> LOG-1.addMsg('Set req..') <-- SP-1.setRequestType(..) <-- LOG-1.addMsg(..) Method views based on top of call stack
  14. Semantic Trace Views Execution Trace (and Thread View) Method View for NUM.new --> LOG-1.addMsg('Handling..') set NUM-1._minCharRange = 32 ... set NUM-1._maxCharRange = 127 <-- LOG-1.addMsg(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) Method View for LOG.addMsg set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 ... <-- NUM-1.new(..) ... set SP-1._binConv = NUM-1 ... --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) <-- SP-1.setRequestType(..) Method views based on top of call stack
  15. Semantic Trace Views Execution Trace (and Thread View) Active Object View for NUM-1.new --> LOG-1.addMsg('Handling..') set NUM-1._minCharRange = 32 ... set NUM-1._maxCharRange = 127 <-- LOG-1.addMsg(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) Active Object View for LOG-1.addMsg set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 ... <-- NUM-1.new(..) ... set SP-1._binConv = NUM-1 ... --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) <-- SP-1.setRequestType(..) Active object views based on top of call stack
  16. Semantic Trace Views Execution Trace (and Thread View) Target Object View for NUM-1 --> LOG-1.addMsg('Handling..') --> NUM-1.new(32, 127) ... set NUM-1._minCharRange = 32 <-- LOG-1.addMsg(..) set NUM-1._maxCharRange = 127 --> SP-1.setRequestType('text/html') <-- NUM-1.new(..) --> STR-1.equals('text/html') <-- STR-1.equals(..) ret=true --> NUM-1.new(32, 127) Target Object View for LOG-1 set NUM-1._minCharRange = 32 set NUM-1._maxCharRange = 127 --> LOG-1.addMsg('Handling..') <-- NUM-1.new(..) <-- LOG-1.addMsg(..) set SP-1._binConv = NUM-1 --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) <-- SP-1.setRequestType(..) Target object views
  17. Semantic Trace Views Execution Trace (and Thread View) Target Object View for NUM-1 --> NUM-1.new(32, 127) --> LOG-1.addMsg('Handling..') set NUM-1._minCharRange = 32 ... set NUM-1._maxCharRange = 127 <-- LOG-1.addMsg(..) <-- NUM-1.new(..) --> SP-1.setRequestType('text/html') --> STR-1.equals('text/html') Method View for SP.setRequestType <-- STR-1.equals(..) ret=true --> STR-1.equals('text/html') --> NUM-1.new(32, 127) <-- STR-1.equals(..) ret=true set NUM-1._minCharRange = 32 --> NUM-1.new(32, 127) set NUM-1._maxCharRange = 127 <-- NUM-1.new(..) <-- NUM-1.new(..) set SP-1._binConv = NUM-1 set SP-1._binConv = NUM-1 ... --> LOG-1.addMsg('Set req..') ... <-- LOG-1.addMsg(..) --> LOG-1.addMsg('Set req..') ... Target Object View for LOG-1 <-- LOG-1.addMsg(..) --> LOG-1.addMsg('Handling..') <-- SP-1.setRequestType(..) <-- LOG-1.addMsg(..) --> LOG-1.addMsg('Set req..') <-- LOG-1.addMsg(..) Views are linked allowing for multilevel analysis
  18. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  19. What if we just used diff?  Collect dynamic traces: 2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml 2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml  Traces are about 48K entries
  20. What if we just used diff?  Collect dynamic traces: 2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml 2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml  Traces are about 48K entries  Run “diff” tool on traces:  Requires 25 minutes on a 1.8GHZ x64 CPU  Requires 27 GB of RAM  Produces 1594 differences (3.3% of trace)
  21. Challenges of diff / LCS Old: New:  diff based on LCS algorithm:  Intractable on large traces: Ω(n2)  Can‟t detect moved sequences  Is not semantic-aware  diff produces too many differences
  22. Leveraging Semantic Views  Use secondary views (method/object) to find correlations in primary view (thread)  Robust against reorderings in other views  Correlations are semantically sound  Apply LCS/diff over fixed-sized windows in primary view to find „best overall correlation‟ in primary view
  23. Recall: What LCS would produce Old: New:
  24. View-based Semantic Differencing Main View Old: CBDHXYFEF Z New: CADFEFXYZ
  25. View-based Semantic Differencing Main View Old: CBDHXYFEF Z New: CADFEFXYZ Secondary View DHXYZ View construction (only one of many secondary views displayed here) DXYZ
  26. View-based Semantic Differencing Main View Old: CBDHXYFEF Z New: CADFEFXYZ Secondary View DHXYZ Lock-step scanning of main view DXYZ
  27. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ Lock-step scanning of main view DXYZ
  28. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ Discovery of correlating secondary views DXYZ
  29. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ Exploration of correlating secondary views DXYZ
  30. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Exploration of correlating secondary views DXYZ
  31. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Exploration of correlating secondary views DXYZ
  32. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Exploration of correlating secondary views DXYZ
  33. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Exploration of correlating secondary views DXYZ
  34. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  35. View-based Semantic Differencing Main View Old: CBDHXYFEF Z   New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  36. View-based Semantic Differencing Main View Old: CBDHXYFEF Z     New: CADFEFXYZ Secondary View DHXYZ Lock-step scanning  of main view; exploration of secondary views DXYZ
  37. View-based Semantic Differencing Main View Old: CBDHXYFEF Z     New: CADFEFXYZ Secondary View DHXYZ Apply LCS over  fixed-size window in main view to find the next correlation DXYZ
  38. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ Apply LCS over  fixed-size window in main view to find the next correlation DXYZ
  39. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  40. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  41. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  42. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ Apply LCS over  fixed-size window in main view to find the next correlation DXYZ
  43. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ  Lock-step scanning of main view DXYZ
  44. View-based Semantic Differencing Main View Old: CBDHXYFEF Z    New: CADFEFXYZ Secondary View DHXYZ View-based  differencing identified moved sequences properly DXYZ
  45. View-Based Differencing vs. LCS  Collect dynamic traces: 2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml 2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml  Traces are about 48K entries  Run view-based differencing tool on traces:  Requires 0.3 minutes instead of 25 minutes  Requires 0.1 GB instead of 27 GB of RAM  Produces 598 differences (1.2% of trace) ○ vs 1594 differences (3.3% of trace) for LCS
  46. Regression Analysis Process Old Program AspectJ Old Program w/ New Program w/ New Program Load-time Instrumentation Instrumentation Weaver Tracing Aspects Trace Trace Working Regressing Test Case(s) Likely Test Case RPrism Analysis (but similar) Regression Algorithm Causes View Trace Traces for 4 Differencing Cases
  47. RPrism Analysis Algorithm Suspected differences set: Old Program New Program Regressing Test Case VS Regressing Test Case Expected differences set: Old Program New Program Working Test Case VS Working Test Case Regression differences set: New Program New Program Working Test Case VS Regressing Test Case
  48. RPrism Analysis Algorithm Regression Results differences set Suspected Expected differences set differences set
  49. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  50. 4 Regressions on 3 Projects  Daikon  Dynamic invariant detector from MIT  Used as a test subject in 11 other publications  Apache XalanJ  Implements XML XPath and XSLT  Interprets XSLT or compiles XSLT to Java bytecode  Used in Sun JDK to implement javax.xml.* classes  Apache Derby (720 KLOC)  Embedded or client/server relational DB  AKA Sun Java DB, included in JDK 6
  51. Daikon Regression  About Daikon  169 KLOC, 1100 classes  Dynamic invariant detector from MIT  Used as a test subject in 11 other publications  About the Regression  Regression first studied by JUnit/CIA [FSE „06] ○ 1 week of differences  Execution traces about 15K entries in length
  52. Daikon Regression  42 differences before, 3 after analysis  Same accuracy as LCS  12.9x speedup  12.1 times less memory
  53. XalanJ-1725 Regression  About XalanJ  365 KLOC, 1500 classes  Implements XPath and XSLT for XML  Used by Sun to implement javax.xml.* classes  About the Regression  Regression from version 2.5.1 to 2.5.2 ○ 4 months of code changes, 84 major changes  Execution traces about 98K entries in length  Regressing behavior exhibited within dynamically generated code
  54. XalanJ-1725 Regression  296 differences before, 1 after analysis  LCS failed to find the regression cause  82.8x speedup  269 times less memory
  55. XalanJ-1802 Regression  About XalanJ  365 KLOC, 1500 classes  Implements XPath and XSLT for XML  Used by Sun to implement javax.xml.* classes  About the Regression  Regression from version 2.4.1 to 2.5.1 ○ 79K changed code over 12 months ○ 97 bugfixes and feature enhancements  Execution traces about 44K entries in length  Regressing behavior exhibited within a completely rearchitected module
  56. XalanJ-1802 Regression  184 differences before, 10 after analysis  Same accuracy as LCS  9.4x speedup  35.4 times less memory
  57. Derby-1633 Regression  About Derby  720K lines of code  Embedded or client/server relational DB  AKA Sun Java DB, included in JDK 6  About the Regression  Regression from version 10.1.2.1 to 10.1.3.1 ○ 7 months of changes, 9 enhancements, 97 bugfixes  Execution traces about 335K entries in length  Involves multiple threads, larger code base (2x), and longer running traces (3x)
  58. Derby-1633 Regression  2663 differences before, 6 after analysis  LCS completely failed (out of memory failure at 32 GB)
  59. Roadmap  Motivation  Prior Approaches  Semantics-Aware Trace Analysis (SATA)  Applying SATA to Regression Analysis  Evaluation  Conclusions
  60. Summary / Future Directions  New view-based model for traces  Facilitates semantics-aware dynamic analyses  One application is efficient trace differencing  Full formal framework in paper  Other potential applications:  Race detection  Object-protocol enforcement  Data-mining from traces  Malware detection
  61. Download RPrism, try it out! http://cs.purdue.edu/homes/kjhoffma/rprism/ Contact Information: Kevin Hoffman kjhoffma@cs.purdue.edu
  62. View-based Diff vs LCS
  63. Regression Cause Analysis  Factors affecting false negatives:  Dynamic traces are complete, set A must contain cause  Differences in set B produced correct output, not likely to contain the direct regression cause  Intersecting with set C can introduce false negatives (e.g., regression caused by code removal)  Factors affecting false positives:  Choice of similar test case affects quality of set B  Intersecting/subtracting set C also helps Set A is the suspected differences set Set B is the expected differences set Set C is the regression differences set
  64. Lock-step Scanning of Main View
  65. Lock-step Scanning of Main View
  66. Exploration of Secondary Views with LCS
  67. Apply LCS over Fixed-size Window in Main View to Find the Next Correlation
  68. Exploration of Secondary Views with LCS
  69. Apply LCS over Fixed-size Window in Main View to Find the Next Correlation
  70. Lock-step Scanning of Main View
Publicidad