SlideShare a Scribd company logo
1 of 4
Download to read offline
Don’t Let A Bad Trigger
                                  Ruin Your Checkin!

                                      Mark Harrison
                                 Pixar Animation Studios


Our Trigger Goals.

Perforce checkin triggers are very useful to us. Users can check in their files in any way
they see fit, and we can provide our services using post-commit triggers. This worked
well, and had several benefits:

       We could guarantee that the triggers would run on all file checkins. There was no
       path by which the code could be avoided.
       We were decoupled from the front end application checking in the file. We did
       not need to be linked in with or share a release schedule with that code base.
       Trigger code could be replayed on a checkin in case of error.

                           Lesson learned: triggers are good!


First Try: Pure Triggers.

But as we went along, we hit a couple of problems.

       As the number of repositories grew (much faster than we anticipated!), it became
       more work to make sure the triggers were in sync. Adding a new trigger likewise
       became a much larger task, since we needed to do so to several repositories.
       Triggers can hang. Sometimes NFS mounts can go bad, or a bad database state
       (e.g. an open transaction holding a write lock) can block a trigger.

The most ironic problem: triggers worked so well for us, everybody wanted one! There
were numerous projects that could benefit from being informed when movie assets were
created or modified; many of these would update some database tables or cache some of
the data in the assets.

This amplified the two problems we had with our own triggers. Calling code outside of
our control meant that we couldn't even fix things ourselves when checkin errors would
happen, and each our trigger configurations showed signs of fulfilling the old game
puzzle of "you are in a twisty maze of little passages, all different."

       What depots were supposed to have which triggers?
       Having to hand-edit numerous trigger specs whenever somebody changed their
       software.



                                 Harrison - Perforce 2011                           Page 1
As more triggers appeared, checkins got slower. Each trigger is run sequentially,
       so we couldn't even take advantage of multiple boxes or processors to speed
       things up. Some of the triggers would scrape metadata out of each file checked in
       (image formatting, color profiles, etc), so we could conceivable end up having to
       read each file multiple numbers of times before the checkin would return to the
       user.
       Having to "slightly" modify trigger parameters ("oh, for that depot can you set the
       option --bargle=4, but if it's on a box without NFS patches can you instead use --
       bargle=4 and --nopts=2?")
       As more triggers started appearing, the number of checking problems due to the
       triggers started to rise. We certainly didn't want that to happen, since one of
       Perforce's selling points is that it's really stable.

                       Lesson learned: lots of triggers are bad!


Second Try: Using Triggers to Enqueue Work.

We looked at the problem again, focusing on these questions:

       How can we allow multiple groups to benefit from check-in driven triggers?
       How can we avoid the slowness involved with running multiple triggers?
       How can we eliminate the administrative overhead of managing triggers?
       How can we eliminate the runtime errors and required troubleshooting with
       triggers?

We came up with these two rules:

       Every set of post-submit triggers must be the same across all depots.
       The post-submit triggers must execute as quickly as possible.

Additionally, we wanted to ensure:

       We would be able to accommodate any groups that needed special backend
       execution.
       We would have some means of telling front-end systems that their trigger was
       finished or that it failed. Preferably this would be a non-blocking mechanism, so
       that the applications could for example keep their GUIs alive. For non-interactive
       applications (e.g. thumbnail generation) we would log the errors and provide an
       error notification.
       We could execute these tasks in parallel on different boxes for speed.

Our solution was to execute exactly two post-submit triggers:

       The LINKATRON (presented at the 2009 conference), which would ensure that
       the trigger-like programs would have access to the files checked in via NFS, and


                                Harrison - Perforce 2011                           Page 2
they wouldn't have to check out the file to process it. This was especially
       important for media files... think of a several-gig video clip where where some
       information needed to be extracted from a header record in the file.
       Our database backend, which would handle the enqueuing of the files and
       changelists to other backend applications.

We would ensure the backend processors would be first-class members of our perforce
infrastructure by writing all of our own processors as plugins. This also gave us the
advantage of being able to process certain items (e.g. thumbnail generation) in parallel.

Our Implementation and Usage.

We implemented this system as a workflow queue manager. There are several off-the-
shelf queueing systems that could be used, but due to our particular requirements and
development environment we ended up implementing our own.

Each application has its own queue, and can register to receive notifications at either:

       The file level. This allowed an application such as our thumbnail generator to
       start processing files quickly, without having to perform the extra processing
       necessary to read a changelist, break it apart, and start processing each item. It
       also has the advantage that each of the files can be treated as atomic work units --
       if a thumbnail fails for one file, there's no reason all the other thumbnails
       shouldn't be generated.
       The changelist level. For some other applications, it was better to receive exactly
       one notification per checkin. For these notifications, we included the depot name
       and the changelist number; if the application wanted to see the contents of the
       changelist, it could examine that on its own.

This has several advantages, both for the end user and for the groups providing the
triggers:

       A single broken queue processor does not break a checkin. Of course, if your
       workflow depends on work being done by that processor you will be blocked, but
       many tasks (e.g. thumbnail generation or keyword mining) can be done after the
       fact.
       It is easy to identify a queue processor that is broken, and notify the responsible
       party. If a queue is filling up and nothing is being processed, we issue a warning
       to the queue owner.
       It is easy to see what work needs to be caught up when breakage is repaired. By
       the nature of the queue system, all uncompleted work is still in the queue, ready to
       be processed when the processor is restarted.

Synchronous Operation




                                 Harrison - Perforce 2011                             Page 3
In order to handle the requirement that the queue processors operate in a synchronous
manner, we use our internally developed Templar Broadcasting System. This messaging
system uses multicast UDP. Measurements on our network showed that the there was
minimal (microsecond) latency, and we could handle a sustained rate of 30,000 or more
messages/second reliably. Of course, delivery is not guaranteed, so applications need to
provide an alternate method for verifying that their work has been completed. A typical
application might query the database for a particular file or changelist.

However, since in our environment multicast is "mostly reliable", we can set a relatively
long timeout period before having to fall back to the polling mechanism. Most
applications are therefore able to continue almost immediately when the notification is
sent.

Summary

We followed these steps in our implementation process and are happy with the results.
They allows several groups to write checkin-time code, and give protection to any
breakage of these bits of code.

       Triggers
       Lots of triggers
       Small number of triggers, feeding work queues

                 Lesson Learned: Trigger + Work Queues are Great!




                                Harrison - Perforce 2011                           Page 4

More Related Content

More from Perforce

Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...Perforce
 
Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...Perforce
 
Understanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPsUnderstanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPsPerforce
 
Branching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development ProcessBranching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development ProcessPerforce
 
How to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOpsHow to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOpsPerforce
 
How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog Perforce
 
Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team Perforce
 
Shift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New WorkflowShift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New WorkflowPerforce
 
Hybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated WorldHybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated WorldPerforce
 
Better, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the EnterpriseBetter, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the EnterprisePerforce
 
Easier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALMEasier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALMPerforce
 
How To Master Your Mega Backlog
How To Master Your Mega Backlog How To Master Your Mega Backlog
How To Master Your Mega Backlog Perforce
 
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...Perforce
 
How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure Perforce
 
Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2Perforce
 
Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?Perforce
 
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...Perforce
 
What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4Perforce
 
Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison Perforce
 
5 Ways to Accelerate Standards Compliance with Static Code Analysis
5 Ways to Accelerate Standards Compliance with Static Code Analysis 5 Ways to Accelerate Standards Compliance with Static Code Analysis
5 Ways to Accelerate Standards Compliance with Static Code Analysis Perforce
 

More from Perforce (20)

Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...
 
Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...
 
Understanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPsUnderstanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPs
 
Branching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development ProcessBranching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development Process
 
How to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOpsHow to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOps
 
How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog
 
Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team
 
Shift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New WorkflowShift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New Workflow
 
Hybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated WorldHybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated World
 
Better, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the EnterpriseBetter, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the Enterprise
 
Easier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALMEasier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALM
 
How To Master Your Mega Backlog
How To Master Your Mega Backlog How To Master Your Mega Backlog
How To Master Your Mega Backlog
 
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
 
How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure
 
Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2
 
Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?
 
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
 
What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4
 
Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison
 
5 Ways to Accelerate Standards Compliance with Static Code Analysis
5 Ways to Accelerate Standards Compliance with Static Code Analysis 5 Ways to Accelerate Standards Compliance with Static Code Analysis
5 Ways to Accelerate Standards Compliance with Static Code Analysis
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

White Paper: Don't Let a Bad Trigger Ruin Your Checkin

  • 1. Don’t Let A Bad Trigger Ruin Your Checkin! Mark Harrison Pixar Animation Studios Our Trigger Goals. Perforce checkin triggers are very useful to us. Users can check in their files in any way they see fit, and we can provide our services using post-commit triggers. This worked well, and had several benefits: We could guarantee that the triggers would run on all file checkins. There was no path by which the code could be avoided. We were decoupled from the front end application checking in the file. We did not need to be linked in with or share a release schedule with that code base. Trigger code could be replayed on a checkin in case of error. Lesson learned: triggers are good! First Try: Pure Triggers. But as we went along, we hit a couple of problems. As the number of repositories grew (much faster than we anticipated!), it became more work to make sure the triggers were in sync. Adding a new trigger likewise became a much larger task, since we needed to do so to several repositories. Triggers can hang. Sometimes NFS mounts can go bad, or a bad database state (e.g. an open transaction holding a write lock) can block a trigger. The most ironic problem: triggers worked so well for us, everybody wanted one! There were numerous projects that could benefit from being informed when movie assets were created or modified; many of these would update some database tables or cache some of the data in the assets. This amplified the two problems we had with our own triggers. Calling code outside of our control meant that we couldn't even fix things ourselves when checkin errors would happen, and each our trigger configurations showed signs of fulfilling the old game puzzle of "you are in a twisty maze of little passages, all different." What depots were supposed to have which triggers? Having to hand-edit numerous trigger specs whenever somebody changed their software. Harrison - Perforce 2011 Page 1
  • 2. As more triggers appeared, checkins got slower. Each trigger is run sequentially, so we couldn't even take advantage of multiple boxes or processors to speed things up. Some of the triggers would scrape metadata out of each file checked in (image formatting, color profiles, etc), so we could conceivable end up having to read each file multiple numbers of times before the checkin would return to the user. Having to "slightly" modify trigger parameters ("oh, for that depot can you set the option --bargle=4, but if it's on a box without NFS patches can you instead use -- bargle=4 and --nopts=2?") As more triggers started appearing, the number of checking problems due to the triggers started to rise. We certainly didn't want that to happen, since one of Perforce's selling points is that it's really stable. Lesson learned: lots of triggers are bad! Second Try: Using Triggers to Enqueue Work. We looked at the problem again, focusing on these questions: How can we allow multiple groups to benefit from check-in driven triggers? How can we avoid the slowness involved with running multiple triggers? How can we eliminate the administrative overhead of managing triggers? How can we eliminate the runtime errors and required troubleshooting with triggers? We came up with these two rules: Every set of post-submit triggers must be the same across all depots. The post-submit triggers must execute as quickly as possible. Additionally, we wanted to ensure: We would be able to accommodate any groups that needed special backend execution. We would have some means of telling front-end systems that their trigger was finished or that it failed. Preferably this would be a non-blocking mechanism, so that the applications could for example keep their GUIs alive. For non-interactive applications (e.g. thumbnail generation) we would log the errors and provide an error notification. We could execute these tasks in parallel on different boxes for speed. Our solution was to execute exactly two post-submit triggers: The LINKATRON (presented at the 2009 conference), which would ensure that the trigger-like programs would have access to the files checked in via NFS, and Harrison - Perforce 2011 Page 2
  • 3. they wouldn't have to check out the file to process it. This was especially important for media files... think of a several-gig video clip where where some information needed to be extracted from a header record in the file. Our database backend, which would handle the enqueuing of the files and changelists to other backend applications. We would ensure the backend processors would be first-class members of our perforce infrastructure by writing all of our own processors as plugins. This also gave us the advantage of being able to process certain items (e.g. thumbnail generation) in parallel. Our Implementation and Usage. We implemented this system as a workflow queue manager. There are several off-the- shelf queueing systems that could be used, but due to our particular requirements and development environment we ended up implementing our own. Each application has its own queue, and can register to receive notifications at either: The file level. This allowed an application such as our thumbnail generator to start processing files quickly, without having to perform the extra processing necessary to read a changelist, break it apart, and start processing each item. It also has the advantage that each of the files can be treated as atomic work units -- if a thumbnail fails for one file, there's no reason all the other thumbnails shouldn't be generated. The changelist level. For some other applications, it was better to receive exactly one notification per checkin. For these notifications, we included the depot name and the changelist number; if the application wanted to see the contents of the changelist, it could examine that on its own. This has several advantages, both for the end user and for the groups providing the triggers: A single broken queue processor does not break a checkin. Of course, if your workflow depends on work being done by that processor you will be blocked, but many tasks (e.g. thumbnail generation or keyword mining) can be done after the fact. It is easy to identify a queue processor that is broken, and notify the responsible party. If a queue is filling up and nothing is being processed, we issue a warning to the queue owner. It is easy to see what work needs to be caught up when breakage is repaired. By the nature of the queue system, all uncompleted work is still in the queue, ready to be processed when the processor is restarted. Synchronous Operation Harrison - Perforce 2011 Page 3
  • 4. In order to handle the requirement that the queue processors operate in a synchronous manner, we use our internally developed Templar Broadcasting System. This messaging system uses multicast UDP. Measurements on our network showed that the there was minimal (microsecond) latency, and we could handle a sustained rate of 30,000 or more messages/second reliably. Of course, delivery is not guaranteed, so applications need to provide an alternate method for verifying that their work has been completed. A typical application might query the database for a particular file or changelist. However, since in our environment multicast is "mostly reliable", we can set a relatively long timeout period before having to fall back to the polling mechanism. Most applications are therefore able to continue almost immediately when the notification is sent. Summary We followed these steps in our implementation process and are happy with the results. They allows several groups to write checkin-time code, and give protection to any breakage of these bits of code. Triggers Lots of triggers Small number of triggers, feeding work queues Lesson Learned: Trigger + Work Queues are Great! Harrison - Perforce 2011 Page 4