SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Joint CGO-PPoPP’17 Artifact Evaluation Discussion
AE Chairs
CGO’17: Joseph Devietti, University of Pennsylvania
PPoPP’17: Wonsun Ahn, University of Pittsburgh
AE CGO-PPoPP-PACT Steering Committee
Grigori Fursin, dividiti / cTuning foundation
Bruce Childers, University of Pittsburgh
Agenda
• Results and issues
• Awards by NVIDIA and dividiti
• Discussion how to improve
and scale future AE
Fantastic Artifact Evaluators and Supporters
cTuning.org/committee.html
cTuning.org/ae/artifacts.html
http://dividiti.blogspot.com/2017/01/artifact-evaluation-discussion-session.html
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
How CGO-PPoPP-PACT AE works
time line
paper
accepted
paper
accepted
artifacts
submitted
artifacts
submitted
evaluatorevaluator
bidding
artifacts
assigned
artifacts
assigned
evaluationsevaluations
available
evaluationsevaluations
finalized
artifactartifact
decision
7..12 days
to prepare
artifacts
according to
guidelines:
cTuning.org/
submission.html
2..4 days
for evaluators
to bid on
artifacts
(according
to their
knowledge and
access to
required
SW/HW)
2 days
to assign
artifacts –
ensure at least
3 reviews
per artifact,
reduce risks,
avoid mix ups
minimize
conflicts of
interests
2 weeks
to review
artifacts
according to
guidelines:
cTuning.org/
reviewing.html
3..4 days
for authors to
respond to
reviews and fix
problems
2..3 days
to finalize
reviews
NOTE: we
consider AE
a cooperative
process and try
to help authors
fix artifacts and
pass evaluation
(particularly if
artifacts will be
open-sourced)
Light communication between
authors and reviewers
is allowed via AE chairs
(to preserve anonymity
of the reviewers)
Year PPoPP CGO Total Problems Rejected
2015 10 8 18 7 2 
2016 12 11 23 4 0
2017 14 13 27 7 0
2..3 days
to add
AE stamp
and AE appendix
to a camera-
ready paper
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
AE: good, bad and ugly …
Good: many interesting and open source artifacts –
authors and evaluators take AE seriously!
Bad: * too many artifacts – need to somehow scale AE while keeping the quality
(41 evaluators, ~120 reviews to handle during 2.5 weeks)
* sometimes difficult to find evaluators with appropriate skills
and access to proprietary SW and rare HW
* very intense schedule and not enough time for rebuttals
* communication between authors and reviewers via AE chairs is a bottleneck
Ugly: * too many ad-hoc scripts to prepare and run experiments
* no common workflow frameworks (in contrast with some other sciences)
* no common formats and APIs (benchmarks, data sets, tools)
* difficult to reproduce empirical results across diverse SW/HW and inputs
time line
paper
accepted
paper
accepted
artifacts
submitted
artifacts
submitted
evaluatorevaluator
bidding
artifacts
assigned
artifacts
assigned
evaluationsevaluations
available
evaluationsevaluations
finalized
artifactartifact
decision
7..12 days 2..4 days 2 days 2 weeks 3..4 days 2..3 days 2..3 days
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Joint CGO-PPoPP’17 awards
a) Promote “good” (well documented, consistent and easy to use) artifacts
NVIDIA donated “Pascal Titan X GPGPU card”
for the highest ranked artifact
b) Promote using workflow frameworks to share artifacts and experiments
as customizable and reusable components with common meta description and API
DIVIDITI donated $500 for the highest ranked artifact
shared using Collective Knowledge workflow framework
dividiti.com
cKnowledge.org
Collective Knowledge is being developed by the community to simplify AE process
and improve sharing of artifacts as customizable and reusable Python components
with extensible JSON meta-description and JSON API, assemble cross-platform workflows,
automate and crowdsource empirical experiments, and enable interactive reports.
Joint CGO/PPoPP Artifact Evaluation Award
for the distinguished open-source artifact
shared in the Collective Knowledge format
“Software Prefetching for Indirect Memory Accesses”
Sam Ainsworth, Timothy M. Jones
University of Cambridge
The cTuning foundation and dividiti
are pleased to grant
February 2017
Joint CGO/PPoPP Distinguished Artifact Award
Xiuxia Zhang1, Guangming Tan1, Shuangbai Xue1, Jiajia Li2, Mingyu Chen1
1 Chinese Academy of Sciences 2 Georgia Institute of Technology
February 2017
for
“Demystifying GPU Microarchitecture
to Tune SGEMM Performance”
The cTuning foundation and
NVIDIA are pleased to present
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Discussion how to improve/scale future AE
time line
paper
accepted
paper
accepted
artifacts
submitted
artifacts
submitted
evaluatorevaluator
bidding
artifacts
assigned
artifacts
assigned
evaluationsevaluations
available
evaluationsevaluations
finalized
artifactartifact
decision
7..12 days 2..4 days 2 days 2 weeks 3..4 days 2..3 days 2..3 days
1) Introduce two evaluation options: private and public
a) traditional evaluation for private artifacts (for example, from industry, though less and less common)
time line
paper
accepted
paper
accepted
artifacts
submitted
artifacts
submitted
AE chairs announce
public artifacts at
XSEDE/GRID5000/etc
AE chairs announce
public artifacts at
XSEDE/GRID5000/etc
AE chairs monitor open
artifacts are evaluated
AE chairs monitor open
discussions until
artifacts are evaluated
artifactartifact
decision
any time 1..2 days from a few days to 2 weeks 3..4 days 2..3 days
b) open evaluation of public and open-source artifacts (if already avialable at
GitHub, BitBucket, GitLab with “discussion mechanisms” during submission…)
At CGO/PPoPP’17, we have sent out requests to validate several open-source artifacts to the public
mailing lists from the conferences, network of excellence, supercomputer centers, etc.
We found evaluators willing to help and having an access to rare hardware or supercomputers as well
as required software and proprietary benchmarks
Authors quickly fixed issues and answered research questions while AE chairs steered the discussion!
See public reviewing examples at cTuning.org/ae/artifacts.html and adapt-workshop.org
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
2) Enable public or private discussion channels between authors and reviewers
for each artifact (rather than communicating via AE chairs)
Useful technology: slack.com , reddit.com
Evaluators can still be anonymous if they wish so …
3) Help authors prepare artifacts and workflows for unified evaluation
(community service by volunteers?)
This year we processed more than 120 evaluation reports. Nearly all artifacts had their
own ad-hoc scripts to build and run workflows, process outputs, validate results, etc.
Since it’s a huge burden for evaluators, they ask us to gradually introduce common
workflows and data formats to unify evaluation.
A possible solution is to introduce an optional service (based on distinguished artifacts)
to help authors convert their ad-hoc scripts to some common format and thus scale AE!
Furthermore, it may help researchers easily reuse and customize past artifacts, and build
upon them!
Discussion how to improve/scale future AE
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Discussion how to improve/scale future AE
4) Should we update Artifact Appendices?
Two years ago we introduced Artifact Appendix templates to unify
Artifact submissions and let authors add up to two pages of such
appendices to their camera ready paper:
http://cTuning.org/ae/submission.html
http://cTuning.org/ae/submission_extra.html
The idea is to help readers better understand what was evaluated
and let them reproduce published research and build upon it.
We did not receive complaints about our appendices and many
researchers decided to add them to their camera ready papers (see
http://cTuning.org/ae/artifacts.html).
Similar AE appendices are now used by other conferences
(SC,RTSS):
http://sc17.supercomputing.org/submitters/technical-
papers/reproducibility-initiatives-for-technical-papers/artifact-description-
paper-title
We suggest to get in touch with AE chairs from all related conferences to sync on future
AE submission and reviewing procedures to avoid defragmentation!
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Discussion how to improve/scale future AE
5) Decide whether to evaluate all experiments or still allow
partial validation or even only artifact sharing
We do not yet have a common methodology to fully validate experimental results from the
research papers in our domain – we also know that full validation of empirical experiments is
very challenging and time consuming.
At the same time, making artifacts available is also extremely valuable to the community (data
sets, predictive models, architecture simulators and their models, benchmarks, tools,
experimental workflows).
Last year we participated in ACM workshop on reproducible research and co-authored the
following ACM Result and Artifact Review and Badging policy (based on our AE experience):
http://www.acm.org/publications/policies/artifact-review-badging
It suggests using several separate badges:
• Artifacts publicly available
• Artifacts evaluated (functional, reusable)
• Results validated (replicated, reproduced)
We consider using above policy and badges for the next AE – feedback is welcome!
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
Discussion how to improve/scale future AE
6) Evaluate artifacts for “tool” papers during main reviewing
We now discuss the possibility to validate artifacts for so-called tool papers during
main reviewing. Such evaluation will influence acceptance decision.
Similar approach seems to be used at SuperComputing’17
(will be useful to discuss that with SC’17 AE organizers).
Current problems are:
• Artifact Evaluation committee may not be prepared yet (though we have a
joint AEC from last years)
• If we ask PC members to evaluate papers and artifacts at the same time, it’s
an extra burden. Furthermore, PC members may not have required technical
skills (that’s why AEC is usually assembled from postdocs and research
engineers)
• CGO and PPoPP use double blind reviewing. However reviewing artifacts
without revealing authors identity is very non-trivial and places an extra
unnecessary burden on the authors and evaluators (and may kill AE)
(we should check how/if SC’16/SC’17 solve this problem
since they also use double blind reviewing).
Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017
We need your feedback! Thank you!!!
We need your feedback - remember that new AE procedures
may affect you at the future conferences
• Contact AE steering committee: http://cTuning.org/committee.html
• Mailing list: https://groups.google.com/forum/#!forum/collective-knowledge
Extra resources
• Artifact Evaluation Website: http://cTuning.org/ae
• ACM Result and Artifact Review and Badging policy:
http://www.acm.org/publications/policies/artifact-review-badging
• CK workflow framework: http://cKnowledge.org
• Community driven artifact/paper evaluation:
http://dl.acm.org/citation.cfm?doid=2618137.2618142

Más contenido relacionado

Destacado

CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
Grigori Fursin
 
Cómo hablar para que los niños escuchen
Cómo hablar para que los niños escuchenCómo hablar para que los niños escuchen
Cómo hablar para que los niños escuchen
Ana Gabriela
 
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
Chona18
 

Destacado (12)

CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
Fichas tecnicas tipologia_cepillos_grupo_207102_19
Fichas tecnicas tipologia_cepillos_grupo_207102_19Fichas tecnicas tipologia_cepillos_grupo_207102_19
Fichas tecnicas tipologia_cepillos_grupo_207102_19
 
Cómo hablar para que los niños escuchen
Cómo hablar para que los niños escuchenCómo hablar para que los niños escuchen
Cómo hablar para que los niños escuchen
 
La naturaleza del aprendiza Karina Domínguez
La naturaleza del aprendiza Karina DomínguezLa naturaleza del aprendiza Karina Domínguez
La naturaleza del aprendiza Karina Domínguez
 
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
Cómo hablar para que los niños escuchen y cómo escuchar para que los niños ha...
 
Npc informe final
Npc informe finalNpc informe final
Npc informe final
 
tecnicas monitoreo calidad aire - monitoreo ambiental.com
 tecnicas monitoreo calidad aire - monitoreo ambiental.com tecnicas monitoreo calidad aire - monitoreo ambiental.com
tecnicas monitoreo calidad aire - monitoreo ambiental.com
 
Media research task 6 audience research
Media research task 6 audience researchMedia research task 6 audience research
Media research task 6 audience research
 
Pesquisando o ensino de ciências por investigação na educação básica utilizan...
Pesquisando o ensino de ciências por investigação na educação básica utilizan...Pesquisando o ensino de ciências por investigação na educação básica utilizan...
Pesquisando o ensino de ciências por investigação na educação básica utilizan...
 
Biodiversidad
BiodiversidadBiodiversidad
Biodiversidad
 
Affective Assessment
Affective AssessmentAffective Assessment
Affective Assessment
 
Formal Assessment of Creativity by Katja Hölttä-Otto (Aalto University)
Formal Assessment of Creativity by Katja Hölttä-Otto (Aalto University)Formal Assessment of Creativity by Katja Hölttä-Otto (Aalto University)
Formal Assessment of Creativity by Katja Hölttä-Otto (Aalto University)
 

Similar a CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible research)

Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Nathan Yergler
 

Similar a CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible research) (20)

Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Research data spring: streamlining deposit
Research data spring: streamlining depositResearch data spring: streamlining deposit
Research data spring: streamlining deposit
 
2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ug2017 06-01-eswc2017-ug
2017 06-01-eswc2017-ug
 
Kaptur environmental assessment methodology
Kaptur environmental assessment methodologyKaptur environmental assessment methodology
Kaptur environmental assessment methodology
 
Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research data
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Author's workflow and the role of open access
Author's workflow and the role of open accessAuthor's workflow and the role of open access
Author's workflow and the role of open access
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
Debbie Liang Resume
Debbie Liang ResumeDebbie Liang Resume
Debbie Liang Resume
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
 
ADVANCING RESEARCH COMPUTING ON CAMPUSES: BEST PRACTICES WORKSHOP - Facilitat...
ADVANCING RESEARCH COMPUTING ON CAMPUSES: BEST PRACTICES WORKSHOP - Facilitat...ADVANCING RESEARCH COMPUTING ON CAMPUSES: BEST PRACTICES WORKSHOP - Facilitat...
ADVANCING RESEARCH COMPUTING ON CAMPUSES: BEST PRACTICES WORKSHOP - Facilitat...
 

Último

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 
MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
Annibale Panichella
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
University of Hertfordshire
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
Sérgio Sacani
 

Último (20)

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdf
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 

CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible research)

  • 1. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Joint CGO-PPoPP’17 Artifact Evaluation Discussion AE Chairs CGO’17: Joseph Devietti, University of Pennsylvania PPoPP’17: Wonsun Ahn, University of Pittsburgh AE CGO-PPoPP-PACT Steering Committee Grigori Fursin, dividiti / cTuning foundation Bruce Childers, University of Pittsburgh Agenda • Results and issues • Awards by NVIDIA and dividiti • Discussion how to improve and scale future AE Fantastic Artifact Evaluators and Supporters cTuning.org/committee.html cTuning.org/ae/artifacts.html http://dividiti.blogspot.com/2017/01/artifact-evaluation-discussion-session.html
  • 2. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 How CGO-PPoPP-PACT AE works time line paper accepted paper accepted artifacts submitted artifacts submitted evaluatorevaluator bidding artifacts assigned artifacts assigned evaluationsevaluations available evaluationsevaluations finalized artifactartifact decision 7..12 days to prepare artifacts according to guidelines: cTuning.org/ submission.html 2..4 days for evaluators to bid on artifacts (according to their knowledge and access to required SW/HW) 2 days to assign artifacts – ensure at least 3 reviews per artifact, reduce risks, avoid mix ups minimize conflicts of interests 2 weeks to review artifacts according to guidelines: cTuning.org/ reviewing.html 3..4 days for authors to respond to reviews and fix problems 2..3 days to finalize reviews NOTE: we consider AE a cooperative process and try to help authors fix artifacts and pass evaluation (particularly if artifacts will be open-sourced) Light communication between authors and reviewers is allowed via AE chairs (to preserve anonymity of the reviewers) Year PPoPP CGO Total Problems Rejected 2015 10 8 18 7 2  2016 12 11 23 4 0 2017 14 13 27 7 0 2..3 days to add AE stamp and AE appendix to a camera- ready paper
  • 3. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 AE: good, bad and ugly … Good: many interesting and open source artifacts – authors and evaluators take AE seriously! Bad: * too many artifacts – need to somehow scale AE while keeping the quality (41 evaluators, ~120 reviews to handle during 2.5 weeks) * sometimes difficult to find evaluators with appropriate skills and access to proprietary SW and rare HW * very intense schedule and not enough time for rebuttals * communication between authors and reviewers via AE chairs is a bottleneck Ugly: * too many ad-hoc scripts to prepare and run experiments * no common workflow frameworks (in contrast with some other sciences) * no common formats and APIs (benchmarks, data sets, tools) * difficult to reproduce empirical results across diverse SW/HW and inputs time line paper accepted paper accepted artifacts submitted artifacts submitted evaluatorevaluator bidding artifacts assigned artifacts assigned evaluationsevaluations available evaluationsevaluations finalized artifactartifact decision 7..12 days 2..4 days 2 days 2 weeks 3..4 days 2..3 days 2..3 days
  • 4. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Joint CGO-PPoPP’17 awards a) Promote “good” (well documented, consistent and easy to use) artifacts NVIDIA donated “Pascal Titan X GPGPU card” for the highest ranked artifact b) Promote using workflow frameworks to share artifacts and experiments as customizable and reusable components with common meta description and API DIVIDITI donated $500 for the highest ranked artifact shared using Collective Knowledge workflow framework dividiti.com cKnowledge.org Collective Knowledge is being developed by the community to simplify AE process and improve sharing of artifacts as customizable and reusable Python components with extensible JSON meta-description and JSON API, assemble cross-platform workflows, automate and crowdsource empirical experiments, and enable interactive reports.
  • 5. Joint CGO/PPoPP Artifact Evaluation Award for the distinguished open-source artifact shared in the Collective Knowledge format “Software Prefetching for Indirect Memory Accesses” Sam Ainsworth, Timothy M. Jones University of Cambridge The cTuning foundation and dividiti are pleased to grant February 2017
  • 6. Joint CGO/PPoPP Distinguished Artifact Award Xiuxia Zhang1, Guangming Tan1, Shuangbai Xue1, Jiajia Li2, Mingyu Chen1 1 Chinese Academy of Sciences 2 Georgia Institute of Technology February 2017 for “Demystifying GPU Microarchitecture to Tune SGEMM Performance” The cTuning foundation and NVIDIA are pleased to present
  • 7. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Discussion how to improve/scale future AE time line paper accepted paper accepted artifacts submitted artifacts submitted evaluatorevaluator bidding artifacts assigned artifacts assigned evaluationsevaluations available evaluationsevaluations finalized artifactartifact decision 7..12 days 2..4 days 2 days 2 weeks 3..4 days 2..3 days 2..3 days 1) Introduce two evaluation options: private and public a) traditional evaluation for private artifacts (for example, from industry, though less and less common) time line paper accepted paper accepted artifacts submitted artifacts submitted AE chairs announce public artifacts at XSEDE/GRID5000/etc AE chairs announce public artifacts at XSEDE/GRID5000/etc AE chairs monitor open artifacts are evaluated AE chairs monitor open discussions until artifacts are evaluated artifactartifact decision any time 1..2 days from a few days to 2 weeks 3..4 days 2..3 days b) open evaluation of public and open-source artifacts (if already avialable at GitHub, BitBucket, GitLab with “discussion mechanisms” during submission…) At CGO/PPoPP’17, we have sent out requests to validate several open-source artifacts to the public mailing lists from the conferences, network of excellence, supercomputer centers, etc. We found evaluators willing to help and having an access to rare hardware or supercomputers as well as required software and proprietary benchmarks Authors quickly fixed issues and answered research questions while AE chairs steered the discussion! See public reviewing examples at cTuning.org/ae/artifacts.html and adapt-workshop.org
  • 8. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 2) Enable public or private discussion channels between authors and reviewers for each artifact (rather than communicating via AE chairs) Useful technology: slack.com , reddit.com Evaluators can still be anonymous if they wish so … 3) Help authors prepare artifacts and workflows for unified evaluation (community service by volunteers?) This year we processed more than 120 evaluation reports. Nearly all artifacts had their own ad-hoc scripts to build and run workflows, process outputs, validate results, etc. Since it’s a huge burden for evaluators, they ask us to gradually introduce common workflows and data formats to unify evaluation. A possible solution is to introduce an optional service (based on distinguished artifacts) to help authors convert their ad-hoc scripts to some common format and thus scale AE! Furthermore, it may help researchers easily reuse and customize past artifacts, and build upon them! Discussion how to improve/scale future AE
  • 9. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Discussion how to improve/scale future AE 4) Should we update Artifact Appendices? Two years ago we introduced Artifact Appendix templates to unify Artifact submissions and let authors add up to two pages of such appendices to their camera ready paper: http://cTuning.org/ae/submission.html http://cTuning.org/ae/submission_extra.html The idea is to help readers better understand what was evaluated and let them reproduce published research and build upon it. We did not receive complaints about our appendices and many researchers decided to add them to their camera ready papers (see http://cTuning.org/ae/artifacts.html). Similar AE appendices are now used by other conferences (SC,RTSS): http://sc17.supercomputing.org/submitters/technical- papers/reproducibility-initiatives-for-technical-papers/artifact-description- paper-title We suggest to get in touch with AE chairs from all related conferences to sync on future AE submission and reviewing procedures to avoid defragmentation!
  • 10. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Discussion how to improve/scale future AE 5) Decide whether to evaluate all experiments or still allow partial validation or even only artifact sharing We do not yet have a common methodology to fully validate experimental results from the research papers in our domain – we also know that full validation of empirical experiments is very challenging and time consuming. At the same time, making artifacts available is also extremely valuable to the community (data sets, predictive models, architecture simulators and their models, benchmarks, tools, experimental workflows). Last year we participated in ACM workshop on reproducible research and co-authored the following ACM Result and Artifact Review and Badging policy (based on our AE experience): http://www.acm.org/publications/policies/artifact-review-badging It suggests using several separate badges: • Artifacts publicly available • Artifacts evaluated (functional, reusable) • Results validated (replicated, reproduced) We consider using above policy and badges for the next AE – feedback is welcome!
  • 11. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 Discussion how to improve/scale future AE 6) Evaluate artifacts for “tool” papers during main reviewing We now discuss the possibility to validate artifacts for so-called tool papers during main reviewing. Such evaluation will influence acceptance decision. Similar approach seems to be used at SuperComputing’17 (will be useful to discuss that with SC’17 AE organizers). Current problems are: • Artifact Evaluation committee may not be prepared yet (though we have a joint AEC from last years) • If we ask PC members to evaluate papers and artifacts at the same time, it’s an extra burden. Furthermore, PC members may not have required technical skills (that’s why AEC is usually assembled from postdocs and research engineers) • CGO and PPoPP use double blind reviewing. However reviewing artifacts without revealing authors identity is very non-trivial and places an extra unnecessary burden on the authors and evaluators (and may kill AE) (we should check how/if SC’16/SC’17 solve this problem since they also use double blind reviewing).
  • 12. Grigori Fursin Joint CGOGrigori Fursin Joint CGO--PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017PPoPP’17 Artifact Evaluation Discussion Austin, TX February 2017 We need your feedback! Thank you!!! We need your feedback - remember that new AE procedures may affect you at the future conferences • Contact AE steering committee: http://cTuning.org/committee.html • Mailing list: https://groups.google.com/forum/#!forum/collective-knowledge Extra resources • Artifact Evaluation Website: http://cTuning.org/ae • ACM Result and Artifact Review and Badging policy: http://www.acm.org/publications/policies/artifact-review-badging • CK workflow framework: http://cKnowledge.org • Community driven artifact/paper evaluation: http://dl.acm.org/citation.cfm?doid=2618137.2618142