SlideShare una empresa de Scribd logo
1 de 33
On the Role of Fitness,
Precision, Generalization and
Simplicity in Process
Discovery

Joos Buijs
Boudewijn van Dongen
Wil van der Aalst




        http://www.win.tue.nl/coselog/
Advances in Process Mining

• Many process discovery and conformance checking algorithms
  and tools are available (cf. the various ProM packages).
• Also commercial software based on these ideas:
  Disco (Fluxicon), Reflect (Futura/Perceptive), BPMOne (Pallas Athena/Perceptive),
  ARIS Process Performance Manager (Software AG), Interstage Automated Process
  Discovery (Fujitsu), QPR ProcessAnalyzer/Analysis (QPR Software), flow
  (fourspark), Discovery Analyst (StereoLOGIC), etc.
• We applied process mining in over 100 organizations.



                    More than 75 people
                    involving more than 50
                    organizations created the
                    Process Mining Manifesto in
                    the context of the IEEE Task
                    Force on Process Mining.

                    Available in 13 languages
                                                                              PAGE 1
Example Process Discovery
(Vestia, Dutch housing agency, 208 cases, 5987 events)




                                                         PAGE 2
Example: Conformance Checking
(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)




                                                                        PAGE 3
Challenge:
Four Competing Quality Criteria

 “able to replay event log”                 “Occam’s razor”

replay fitness                                simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                                PAGE 4
Example: one log four models
                                                                                                               b
                                                                                                            examine
                                                                                                           thoroughly
                                                                                                                                                                            g
                                                                                                                                                                         pay
                                                                                                               c                                                     compensation
                                                                                          a                examine                                 e
                                                                           start     register              casually                           decide                                   end
                                                                                                                                                                                                     #      trace
                                                                                     request
                                                                                                                                                                            h                        455 acdeh
                                                                                                               d                                                         reject
                                                                                                          check ticket                                                  request                      191 abdeg
                                                                                                                                               f     reinitiate
                                                                                                                                                      request                                        177 adceh
                                                                               N1 : fitness = +, precision = +, generalization = +, simplicity = +
                                                                                                                                                                                                     144 abdeh
                                                                                                                                                                                                     111 acdeg
                                                                                     a               c                        d                          e                      h
                                                                                                                                                                                                      82 adceg
                                                                          start    register       examine                   check                      decide                reject     end
                                                                                   request        casually                  ticket                                          request
                                                                                                                                                                                                      56 adbeh
                                                                               N2 : fitness = -, precision = +, generalization = -, simplicity = +
                                                                                                                                                                                                      47 acdefdbeh
 “able to replay event log”                 “Occam’s razor”
                                                                                                                                                                                                      38 adbeg
                                                                                                       examine                                check
                                                                                                      thoroughly        b             d       ticket                        g                         33 acdefbdeh
replay fitness                                simplicity                                                                                                            pay
                                                                                                                                                                compensation
                                                                                          a                                                                                                           14 acdefbdeg
                                                                           start     register   examine
                                                                                                             c                                                                         end            11 acdefdbeg
                                                                                     request    casually
                                                                                                                         e                f        reinitiate               h
                               process                                                                        decide                                request        reject
                                                                                                                                                                  request
                                                                                                                                                                                                         9 adcefcdeh
                              discovery                                        N3 : fitness = +, precision = -, generalization = +, simplicity = +                                                       8 adcefdbeh
                                                                                                                                                                                                         5 adcefbdeg
                                                                                       a              d                        c                           e                    g
                                                                                                                                                                                                         3 acdefbdefdbeg
generalization                                precision                             register
                                                                                    request
                                                                                                    check
                                                                                                    ticket
                                                                                                                         examine
                                                                                                                         casually
                                                                                                                                                        decide              pay
                                                                                                                                                                        compensation
                                                                                                                                                                                                         2 adcefdbeg
                                                                                       a              c                        d                          e                     g                        2 adcefbdefbdeg
 “not overfitting the log”                “not underfitting the log”                register      examine                    check                      decide              pay
                                                                                    request       casually                   ticket                                     compensation                     1 adcefdbefbdeh
                                                                                       a              d                        c                           e                    h                        1 adbefbdefdbeg
                                                                                    register        check                examine                        decide                reject
                                                                                    request         ticket               casually                                            request                     1 adcefdbefcdefdbeg
                                                                                      a               c                       d                           e                     h                   1391
                                                                       start                                                                                                                  end
                                                                                   register       examine                   check                      decide                reject
                                                                                   request        casually                  ticket                                          request


                                                                                                 …                 (all 21 variants seen in the log)


                                                                                      a              b                        d                           e                     g
                                                                                   register        examine                  check                      decide               pay
                                                                                   request        thoroughly                ticket                                      compensation

                                                                                      a              d                        b                           e                     h
                                                                                   register         check                 examine                      decide                reject
                                                                                   request          ticket               thoroughly                                         request

                                                                                      a              b                        d                           e                     h
                                                                                   register        examine                  check                      decide                reject
                                                                                   request        thoroughly                ticket                                          request                        PAGE 5
                                                                                N4 : fitness = +, precision = +, generalization = -, simplicity = -
#     trace
                                                                               455 acdeh
        Model N1                                                               191 abdeg
                                                                               177 adceh
                                                                               144 abdeh
                                                                               111 acdeg
                                                                                82 adceg
                                                                                56 adbeh
                          b                                                     47 acdefdbeh
                       examine
                      thoroughly                                                38 adbeg
                                                              g                 33 acdefbdeh
                                                             pay
                          c                              compensation           14 acdefbdeg
           a          examine               e
                                                                                11 acdefdbeg
start    register     casually          decide                          end
         request                                                                   9 adcefcdeh
                                                              h
                          d                                 reject                 8 adcefdbeh
                     check ticket                          request                 5 adcefbdeg
                                        f   reinitiate                             3 acdefbdefdbeg
                                             request
N1 : fitness = +, precision = +, generalization = +, simplicity = +                2 adcefdbeg
                                                                                   2 adcefbdefbdeg
                                                                                   1 adcefdbefbdeh
                                                                                   1 adbefbdefdbeg
                                                                                   1 adcefdbefcdefdbeg
                                                                                             PAGE 6
                                                                              1391
#     trace
                                                                              455 acdeh
        Model N2                                                              191 abdeg
                                                                              177 adceh
                                                                              144 abdeh
                                                                              111 acdeg
                                                                               82 adceg
                                                                               56 adbeh
                                                                               47 acdefdbeh
                                                                               38 adbeg
          a           c             d            e             h               33 acdefbdeh
start   register   examine        check        decide         reject   end     14 acdefbdeg
        request    casually       ticket                     request
   N2 : fitness = -, precision = +, generalization = -, simplicity = +         11 acdefdbeg
                                                                                  9 adcefcdeh
                                                                                  8 adcefdbeh
                                                                                  5 adcefbdeg
                                                                                  3 acdefbdefdbeg
                                                                                  2 adcefdbeg
                                                                                  2 adcefbdefbdeg
                                                                                  1 adcefdbefbdeh
                                                                                  1 adbefbdefdbeg
                                                                                  1 adcefdbefcdefdbeg
                                                                                            PAGE 7
                                                                             1391
#     trace
                                                                                          455 acdeh
        Model N3                                                                          191 abdeg
                                                                                          177 adceh
                                                                                          144 abdeh
                                                                                          111 acdeg
                                                                                           82 adceg
                                                                                           56 adbeh
                                                                                           47 acdefdbeh
                           examine                  check
                          thoroughly    b   d       ticket                     g           38 adbeg
                                                                        pay                33 acdefbdeh
                                                                    compensation
           a                                                                               14 acdefbdeg
start    register   examine                                                        end     11 acdefdbeg
         request    casually   c
                                        e       f      reinitiate
                                                                      reject
                                                                               h              9 adcefcdeh
                               decide                   request
                                                                     request                  8 adcefdbeh
 N3 : fitness = +, precision = -, generalization = +, simplicity = +
                                                                                              5 adcefbdeg
                                                                                              3 acdefbdefdbeg
                                                                                              2 adcefdbeg
                                                                                              2 adcefbdefbdeg
                                                                                              1 adcefdbefbdeh
                                                                                              1 adbefbdefdbeg
                                                                                              1 adcefdbefcdefdbeg
                                                                                                        PAGE 8
                                                                                         1391
#     trace
                                                                                               455 acdeh
Model N4                                                                                       191 abdeg
                                                                                               177 adceh
                                                                                               144 abdeh
              a             d                 c                e              g                111 acdeg
           register       check           examine            decide          pay
           request        ticket          casually                       compensation           82 adceg
              a             c                 d                e              g                 56 adbeh
           register     examine             check           decide           pay
           request      casually            ticket                       compensation           47 acdefdbeh
              a             d                 c                e              h                 38 adbeg
           register       check           examine           decide           reject
           request        ticket          casually                          request             33 acdefbdeh
              a            c                 d                e               h                 14 acdefbdeg
start                                                                                   end
           register     examine            check            decide           reject
           request      casually           ticket                           request             11 acdefdbeg

                       …             (all 21 variants seen in the log)
                                                                                                   9 adcefcdeh
                                                                                                   8 adcefdbeh
                                                                                                   5 adcefbdeg
             a             b                 d                e               g
           register     examine            check            decide           pay                   3 acdefbdefdbeg
           request     thoroughly          ticket                        compensation
                                                                                                   2 adcefdbeg
             a             d                 b                e               h
           register       check            examine          decide           reject                2 adcefbdefbdeg
           request        ticket          thoroughly                        request
                                                                                                   1 adcefdbefbdeh
             a             b                 d                e               h
          register       examine           check            decide          reject                 1 adbefbdefdbeg
          request       thoroughly         ticket                          request
                                                                                                   1 adcefdbefcdefdbeg
        N4 : fitness = +, precision = +, generalization = -, simplicity = -
                                                                                                             PAGE 9
                                                                                              1391
Another challenge: Huge search space

M    M M         M              M
  M M                M M             M M         M              M
        M   M     M M            M M                 M M
M                       M   M           M M       M M
  M M     M MM
                  M M     M    M    M     M MM          M M    M M
                                 M                M M     M      M
M M M M M M            M M             M M                    M M
                    M           M M          M M M M M
MM    M MM M          M MM M                M M
     M M M      MM               M MM M         MM    M MM      MM
MM    M M M M M M            M          M            M M       M
     M M M M M M M M M M MM M M M M
  M    M M M M         M M    M M       M    M M MM M M M M
    M M             M M                M M M M         M M M M
          M   MM              M    M M              M M
  M                       M               M MM
    M M     M MM
                    M M     M    M
                                   M  M     M MM          M M M M MM
  M M M M                M M MM          M M        M M     M
                  M M                                    M M    M M
  MM    M MM M          M MM M
                                     M
                                             M  M M M
      M M      M MM
                      M M          M MM M M M M         M MM M MM
   MM           M M            M          M           M M     M M
       M              M            M          M
  M       M MM          M M M           M M M       M M
    M M      M MM
                    M M      M    M
                                    M M      M MM         M M M M MM
   M M M M               M M M M                    M M     MM
                   M M                    M M                    M M
        M MM M                        M            M M M M
  MM
       M M      M MM    M MM M
                                    M M M MM M          M MM M MM
                       M M                M     M MM          M M
                                                       M M

                                                               PAGE 10
… with just a few interesting candidates

M    M M         M
  M M                M M       M    M M        M              M
        M   M     M M           M M                 M M
M                       M M M          M M      M M
  M M     M MM
                  M M     M        M     M MM          M M   M M
                                M               M M     M      M
M M M M M M            M M     M M    M M                   M M
      M MM M        M                       M M M M M
MM                    M MM M              M M
     M M M      MM
                     M M    M   M MM M         MM    M MM     MM
      M M M M
MM
     M M                             M MM M M M M M          M
            M M MM M       M MM       M                         M
  M
    M M
       M M M M         M M   M M M M M M MM M M M M M
          M   MM    M M           M              M   M M M M
  M                       M  M         MM   MM    M M
    M M     M MM
                    M M     M   M
                                  M  M     M MM         M M M M MM
  M M M M                M MM M M       M M       M M     M
                  M M                                         M M
        M MM M            M                   M M M M M
  MM
      M M      M MM     M    M MM M M M               M MM M MM
   MM           M M   M M     M      M M      M MM          M M
       M              M           M                  M M
          M MM          M M M               M      M M
  M                                    M M M
    M M      M MM
                    M M     M    M
                                   M M     M MM          M M M M MM
   M M M M                M M M M                  M M    MM
                   M M                  M M             M M    M M
  MM    M M   M M       M MM M
                                     M
                                            M  M M M
       M M      M MM
                       M M         M MM M M M M        M MM M MM
                                         M            M M   M M
                                                              PAGE 11
Two requirements

1. It should be possible to      “able to replay event log”                 “Occam’s razor”

                                replay fitness                                simplicity
   seamlessly balance the
   different quality criteria                                  process
                                                              discovery

   based on user-defined        generalization                                precision
   preferences.                  “not overfitting the log”                “not underfitting the log”




2. The algorithm should
   always return a "correct"
   process model and not
   waste time on model
   having deadlocks and
   other anomalies.
                                                                                     PAGE 12
Proposal: Evolutionary Tree Miner (ETM)

• Process trees as representation (= limit search space
  to "good" models).
• Genetic approach (= very flexible)
• Fitness function uses all four criteria (= seamlessly
  balance the different "forces")




                                                    PAGE 13
Representational Bias: Process Trees

         →           • Always sound because of the
                       block structure            A
     A       →       • Also Loop and OR operator
         ∧       E
                                            B    C       D
     X       D
                         A     B
 B       C                A (BA)*                E




                                                      PAGE 14
Petri Net Semantics
(used for comparison and conformance checking only)




                                                A
             A         B

Sequence
                           Parallellism         B
                   A

                   B                            A
Exclusive Choice

                   A
                                                B
                   B        Or Choice
Loop




                                                      PAGE 15
Steps of the Genetic ETM Algorithm



              Change
             Population

                            No
  Create                                 Return
              Measure     Stop
  Initial                                 Best
              Quality      ?     Yes
Population                             Individual




                                               PAGE 16
Population Change


                           Elite




Population i                               Population i+1




                         Selection




               Replace   Crossover   Mutation

                                                            PAGE 17
Four Metrics (see paper)

          “able to replay event log”                 “Occam’s razor”

         replay fitness                                simplicity

                                        process
                                       discovery



         generalization                                precision
          “not overfitting the log”                “not underfitting the log”




                                                                                1 = optimal
                                                                                0 = very bad

                                                                                       PAGE 18
Example



                              B            E

                 A            C                       G

                              D            F




          A = send e-mail, B = check credit,
          C = calculate capacity, D = check system,
          E = accept, F = reject, G = send e-mail
                                                          PAGE 19
Conventional Algorithms (1/3)
   ("best effort" mapping to process trees to allow for comparison)

alpha miner




                                                     low fitness
ILP miner




                                                    low precision
language-based region miner




                                                     low fitness
                                                                    PAGE 20
Conventional Algorithms (2/3)

heuristic miner




multi-phase miner




                                   PAGE 21
Conventional Algorithms (3/3)

genetic miner




state-based region miner




                                  PAGE 22
Often unsound result and no mechanism
to seamlessly balance the four criteria




                                     PAGE 23
Genetic Mining (ETM) While Considering
    Only One Criterion




                                                          best value
                                                           possible
                                                          for this log




ETM with weight zero to three out of four perspectives.                  PAGE 24
Considering Replay Fitness and One
    Other Criterion




ETM with weight zero to two out of four perspectives.   PAGE 25
Considering 3 of 4 Criteria




     replay fitness needs to have a larger weight   PAGE 26
Considering All Four Criteria with
Emphasis on Fitness




    fitness has weight 10

                                     PAGE 27
Initial Model Versus Discovered Model




    simulated          discovered by ETM




    B    E

A   C           G

    D    F

                                           PAGE 28
Real-Life Event Logs

• Event log L0 is the event log used before. L0 contains
  100 traces, 590 events and 7 activities.
• Event Log L1 contains 105 traces, 743 events in total,
  with 6 different activities.
• Event Log L2 contains 444 traces, 3.269 events in total,
  with 6 different activities.
• Event Log L3 contains 274 traces, 1:582 events in total,
  with 6 different activities.


Event logs L1, L2 and L3 are extracted from
the information systems of municipalities
participating in the CoSeLoG project
(http://www.win.tue.nl/coselog/).
                                                        PAGE 29
Results
          If unsound , the
          sound behavior
          is approximated
          when creating
          the process
          tree.




          Equal weights for
          all criteria.



            PAGE 30
Conclusion

 “able to replay event log”                 “Occam’s razor”

replay fitness                                simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”


                                                                       • First algorithm that allows for balancing
                                                                         all four perspectives.
                                                                       • Genetic algorithm is very flexible, but
                                                                         also very slow.
                                                                       • Process trees only used internally
                                                                         (choose your favorite representation)
                                                                       • Future work:
                                                                            − Improve speed
                                                                            − Distribute PM tasks
                                                                            − Discover configurable process trees
                                                                                                           PAGE 34
www.processmining.org
  www.win.tue.nl/ieeetfpm/
                             PAGE 35

Más contenido relacionado

La actualidad más candente

New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Writing for computer science, 3rd edition springer
Writing for computer science, 3rd edition springerWriting for computer science, 3rd edition springer
Writing for computer science, 3rd edition springerMehdi Hussain
 
Floyd Warshall algorithm easy way to compute - Malinga
Floyd Warshall algorithm easy way to compute - MalingaFloyd Warshall algorithm easy way to compute - Malinga
Floyd Warshall algorithm easy way to compute - MalingaMalinga Perera
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxEncrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxNeo4j
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...OpenSource Connections
 
L'édition scientifique
L'édition scientifiqueL'édition scientifique
L'édition scientifiqueFrederic Blin
 
04 Introduction au logiciel R
04 Introduction au logiciel R04 Introduction au logiciel R
04 Introduction au logiciel RBoris Guarisma
 
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...Wil van der Aalst
 

La actualidad más candente (20)

New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Planar graph
Planar graphPlanar graph
Planar graph
 
Power Bi et Sage
Power Bi et SagePower Bi et Sage
Power Bi et Sage
 
Prolog
PrologProlog
Prolog
 
Writing for computer science, 3rd edition springer
Writing for computer science, 3rd edition springerWriting for computer science, 3rd edition springer
Writing for computer science, 3rd edition springer
 
Floyd Warshall algorithm easy way to compute - Malinga
Floyd Warshall algorithm easy way to compute - MalingaFloyd Warshall algorithm easy way to compute - Malinga
Floyd Warshall algorithm easy way to compute - Malinga
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
 
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxEncrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
 
np complete
np completenp complete
np complete
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
 
L'édition scientifique
L'édition scientifiqueL'édition scientifique
L'édition scientifique
 
04 Introduction au logiciel R
04 Introduction au logiciel R04 Introduction au logiciel R
04 Introduction au logiciel R
 
Glpi 9.2-presentation
Glpi 9.2-presentationGlpi 9.2-presentation
Glpi 9.2-presentation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
AI local search
AI local searchAI local search
AI local search
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
 

Más de Wil van der Aalst

Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)Wil van der Aalst
 
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To AskEverything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To AskWil van der Aalst
 
20 years of Process Mining Research (ICPM 2019 keynote)
20 years of Process Mining Research (ICPM 2019 keynote)20 years of Process Mining Research (ICPM 2019 keynote)
20 years of Process Mining Research (ICPM 2019 keynote)Wil van der Aalst
 
Earth Movers’ Stochastic Conformance Checking
Earth Movers’ Stochastic Conformance CheckingEarth Movers’ Stochastic Conformance Checking
Earth Movers’ Stochastic Conformance CheckingWil van der Aalst
 
Using Process Mining to Remove Operational Friction in Shared Services
Using Process Mining to Remove Operational Friction in Shared ServicesUsing Process Mining to Remove Operational Friction in Shared Services
Using Process Mining to Remove Operational Friction in Shared ServicesWil van der Aalst
 
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...Wil van der Aalst
 
Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Wil van der Aalst
 
Configurable Declare: Designing Customizable Flexible Models
Configurable Declare: Designing Customizable Flexible ModelsConfigurable Declare: Designing Customizable Flexible Models
Configurable Declare: Designing Customizable Flexible ModelsWil van der Aalst
 
A Decade of Business Process Management Conferences: Reflections on a Develop...
A Decade of Business Process Management Conferences: Reflections on a Develop...A Decade of Business Process Management Conferences: Reflections on a Develop...
A Decade of Business Process Management Conferences: Reflections on a Develop...Wil van der Aalst
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataWil van der Aalst
 
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...Wil van der Aalst
 
Discovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesDiscovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesWil van der Aalst
 
Distributed Process Discovery and Conformance Checking
Distributed Process Discovery and Conformance CheckingDistributed Process Discovery and Conformance Checking
Distributed Process Discovery and Conformance CheckingWil van der Aalst
 
Service Interaction: Patterns, Formalization, and Analysis
Service Interaction: Patterns, Formalization, and AnalysisService Interaction: Patterns, Formalization, and Analysis
Service Interaction: Patterns, Formalization, and AnalysisWil van der Aalst
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Wil van der Aalst
 
Keynote on Process Mining at SSCI 2010 / CIDM 2011
Keynote on Process Mining at SSCI 2010 / CIDM 2011Keynote on Process Mining at SSCI 2010 / CIDM 2011
Keynote on Process Mining at SSCI 2010 / CIDM 2011Wil van der Aalst
 
Discovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementDiscovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementWil van der Aalst
 
TomTom for Business Process Managment (TomTom4BPM)
TomTom for Business Process Managment (TomTom4BPM)TomTom for Business Process Managment (TomTom4BPM)
TomTom for Business Process Managment (TomTom4BPM)Wil van der Aalst
 
Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...Wil van der Aalst
 
Process Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - EpilogueProcess Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - EpilogueWil van der Aalst
 

Más de Wil van der Aalst (20)

Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
 
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To AskEverything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
 
20 years of Process Mining Research (ICPM 2019 keynote)
20 years of Process Mining Research (ICPM 2019 keynote)20 years of Process Mining Research (ICPM 2019 keynote)
20 years of Process Mining Research (ICPM 2019 keynote)
 
Earth Movers’ Stochastic Conformance Checking
Earth Movers’ Stochastic Conformance CheckingEarth Movers’ Stochastic Conformance Checking
Earth Movers’ Stochastic Conformance Checking
 
Using Process Mining to Remove Operational Friction in Shared Services
Using Process Mining to Remove Operational Friction in Shared ServicesUsing Process Mining to Remove Operational Friction in Shared Services
Using Process Mining to Remove Operational Friction in Shared Services
 
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
 
Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?
 
Configurable Declare: Designing Customizable Flexible Models
Configurable Declare: Designing Customizable Flexible ModelsConfigurable Declare: Designing Customizable Flexible Models
Configurable Declare: Designing Customizable Flexible Models
 
A Decade of Business Process Management Conferences: Reflections on a Develop...
A Decade of Business Process Management Conferences: Reflections on a Develop...A Decade of Business Process Management Conferences: Reflections on a Develop...
A Decade of Business Process Management Conferences: Reflections on a Develop...
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big Data
 
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
 
Discovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesDiscovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from Examples
 
Distributed Process Discovery and Conformance Checking
Distributed Process Discovery and Conformance CheckingDistributed Process Discovery and Conformance Checking
Distributed Process Discovery and Conformance Checking
 
Service Interaction: Patterns, Formalization, and Analysis
Service Interaction: Patterns, Formalization, and AnalysisService Interaction: Patterns, Formalization, and Analysis
Service Interaction: Patterns, Formalization, and Analysis
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London
 
Keynote on Process Mining at SSCI 2010 / CIDM 2011
Keynote on Process Mining at SSCI 2010 / CIDM 2011Keynote on Process Mining at SSCI 2010 / CIDM 2011
Keynote on Process Mining at SSCI 2010 / CIDM 2011
 
Discovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementDiscovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process Management
 
TomTom for Business Process Managment (TomTom4BPM)
TomTom for Business Process Managment (TomTom4BPM)TomTom for Business Process Managment (TomTom4BPM)
TomTom for Business Process Managment (TomTom4BPM)
 
Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...Keynote at 18th International Conference on Cooperative Information Systems (...
Keynote at 18th International Conference on Cooperative Information Systems (...
 
Process Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - EpilogueProcess Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - Epilogue
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

  • 1. On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Joos Buijs Boudewijn van Dongen Wil van der Aalst http://www.win.tue.nl/coselog/
  • 2. Advances in Process Mining • Many process discovery and conformance checking algorithms and tools are available (cf. the various ProM packages). • Also commercial software based on these ideas: Disco (Fluxicon), Reflect (Futura/Perceptive), BPMOne (Pallas Athena/Perceptive), ARIS Process Performance Manager (Software AG), Interstage Automated Process Discovery (Fujitsu), QPR ProcessAnalyzer/Analysis (QPR Software), flow (fourspark), Discovery Analyst (StereoLOGIC), etc. • We applied process mining in over 100 organizations. More than 75 people involving more than 50 organizations created the Process Mining Manifesto in the context of the IEEE Task Force on Process Mining. Available in 13 languages PAGE 1
  • 3. Example Process Discovery (Vestia, Dutch housing agency, 208 cases, 5987 events) PAGE 2
  • 4. Example: Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 3
  • 5. Challenge: Four Competing Quality Criteria “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 4
  • 6. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh replay fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeg generalization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 5 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  • 7. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbeg start register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg request N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 6 1391
  • 8. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdeh start register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 7 1391
  • 9. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdeg start register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 8 1391
  • 10. # trace 455 acdeh Model N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdeg start end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 9 1391
  • 11. Another challenge: Huge search space M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M M MM M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M MM M M MM M M M M M M MM M MM M MM M MM MM MM M M M M M M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M M MM M M M M M M M M MM M M M MM M M M M M M M MM M M M M MM M M M M M M MM M M M M M M M M M M M MM M MM M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM MM M M M M M M M M M M M M M M MM M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M MM M M M M M M MM M M M MM M MM M M M M MM M M MM M MM M M M M MM M M M M PAGE 10
  • 12. … with just a few interesting candidates M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M MM M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M M M MM M MM M M M M M M MM M M M M MM M MM M MM MM M M M M MM M M M MM M M M M M M M M MM M M MM M M M M M M M M M M M M M M M M M MM M M M M M M MM M M M M M M M M M M M MM MM M M M M M MM M M M M M M M MM M M M M MM M M M M M MM M M M M M M M M M M M M MM M M M M M M M MM M M M MM M M MM M M M M MM M MM MM M M M M M M M M MM M M M M M M M M MM M M M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M M MM M M M M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM M M M M M PAGE 11
  • 13. Two requirements 1. It should be possible to “able to replay event log” “Occam’s razor” replay fitness simplicity seamlessly balance the different quality criteria process discovery based on user-defined generalization precision preferences. “not overfitting the log” “not underfitting the log” 2. The algorithm should always return a "correct" process model and not waste time on model having deadlocks and other anomalies. PAGE 12
  • 14. Proposal: Evolutionary Tree Miner (ETM) • Process trees as representation (= limit search space to "good" models). • Genetic approach (= very flexible) • Fitness function uses all four criteria (= seamlessly balance the different "forces") PAGE 13
  • 15. Representational Bias: Process Trees → • Always sound because of the block structure A A → • Also Loop and OR operator ∧ E B C D X D A B B C A (BA)* E PAGE 14
  • 16. Petri Net Semantics (used for comparison and conformance checking only) A A B Sequence Parallellism B A B A Exclusive Choice A B B Or Choice Loop PAGE 15
  • 17. Steps of the Genetic ETM Algorithm Change Population No Create Return Measure Stop Initial Best Quality ? Yes Population Individual PAGE 16
  • 18. Population Change Elite Population i Population i+1 Selection Replace Crossover Mutation PAGE 17
  • 19. Four Metrics (see paper) “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” 1 = optimal 0 = very bad PAGE 18
  • 20. Example B E A C G D F A = send e-mail, B = check credit, C = calculate capacity, D = check system, E = accept, F = reject, G = send e-mail PAGE 19
  • 21. Conventional Algorithms (1/3) ("best effort" mapping to process trees to allow for comparison) alpha miner low fitness ILP miner low precision language-based region miner low fitness PAGE 20
  • 22. Conventional Algorithms (2/3) heuristic miner multi-phase miner PAGE 21
  • 23. Conventional Algorithms (3/3) genetic miner state-based region miner PAGE 22
  • 24. Often unsound result and no mechanism to seamlessly balance the four criteria PAGE 23
  • 25. Genetic Mining (ETM) While Considering Only One Criterion best value possible for this log ETM with weight zero to three out of four perspectives. PAGE 24
  • 26. Considering Replay Fitness and One Other Criterion ETM with weight zero to two out of four perspectives. PAGE 25
  • 27. Considering 3 of 4 Criteria replay fitness needs to have a larger weight PAGE 26
  • 28. Considering All Four Criteria with Emphasis on Fitness fitness has weight 10 PAGE 27
  • 29. Initial Model Versus Discovered Model simulated discovered by ETM B E A C G D F PAGE 28
  • 30. Real-Life Event Logs • Event log L0 is the event log used before. L0 contains 100 traces, 590 events and 7 activities. • Event Log L1 contains 105 traces, 743 events in total, with 6 different activities. • Event Log L2 contains 444 traces, 3.269 events in total, with 6 different activities. • Event Log L3 contains 274 traces, 1:582 events in total, with 6 different activities. Event logs L1, L2 and L3 are extracted from the information systems of municipalities participating in the CoSeLoG project (http://www.win.tue.nl/coselog/). PAGE 29
  • 31. Results If unsound , the sound behavior is approximated when creating the process tree. Equal weights for all criteria. PAGE 30
  • 32. Conclusion “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” • First algorithm that allows for balancing all four perspectives. • Genetic algorithm is very flexible, but also very slow. • Process trees only used internally (choose your favorite representation) • Future work: − Improve speed − Distribute PM tasks − Discover configurable process trees PAGE 34