SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Pre and post editing
environment for Apertium




                                   Lluís Villarejo
                           Learning Technologies
                                     March 2012
c



                 What is GSoC?
• It's a global program that offers student developers stipends
  to write code for various open source software projects.
• Since 2005

• Inspire young developers to participate in OSS projects.
• Give students more exposure to real-world soft dev
  scenarios.
• Get more open source code created and released.
• Help open source prjs identify and bring in new developers.
c



             Some participants
•   Apache Soft. Found.   •   Sakai Foundation
•   Debian                •   Mozilla
•   Facebook              •   Inclusive Design Inst.
•   Drupal                •   The Linux Foundation
•   Creative Commons      •   The GNU project
•   DocBook project       •   Wikimedia Foundation
•   GCC                   •   WordPress
•   Gnome                 •   Inclusive Design Inst.
•   ...                   •   ...
c



                How does it work?
•   Orgs present themselves as mentoring agents.
•   Orgs present a list of potential projects and mentors.
•   Accepted orgs should try to attract students' interest.
•   Students build project proposals.
•   Google finances slots for each org (5.000 + 500 USD).
•   The project community decides the student-slot assignation.
•   Between end of May and end of August.
c



               GsoC'11 statistics
• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate
c



Accepted Students GSoC'11
c



Why participating with Apertium?
• Strategically:
   – Apertium is a strategic agent inside UOC.
   – Developing Apertium means further developing
     internationalization aids for UOC.
   – Attract and onboard new developers for Apertium.
   – Collaboration with Google's Open Source initiatives.

• Functionally:
   – Opporutnity to further develop specific UOC needs with
     external funding.
   – Capitalize specific user feedback on translation quality.
c



              The Apertium case
• 20 proposed tasks
• 17 tasks got interest from students [1-9]
   – Pre and post-editing environment gets 11 students
     interested.

• Apertium community ranks the 17 tasks
   – Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)
  – Our task goes through and Camille Mougey is selected
    from the Grenoble Insitute of Technology.
c



      Pre and post-editing, why?
• An important part of the errors you get when translating a
  document are due to deficiencies in the original.
• The integration of existing resources can help to ease this
  burden:
   – Digital knowledge sources (digital dictionaries... )
   – Automatic tools (spell-checker, grammar checker, translation
     memory generation, search & replace...)
• These processes should be integrated naturally in the
  translation workflow → the need for an integrated web interface
  to Apertium.
• To improve the system we need to have access to the human
  post-editing process.
c



     Pre and post-editing, features
•   Pre and Post-editing web interface integrated with Apertium translation toolbox.
•   Spell checking on source and target languages. Integration with Aspell
•   Grammar checking on source and target languages. Integration with
    LanguageTool
•   Integration with several external dictionaries.
•   Search & replace functionalities on source and target languages.
•   Ability to deal with formatted text.
•   Logging system. All events are logged as they happen, ie at the very moment
    the user inserts or deletes text. This allows for a further data mining process to
    be run on the logs to detect commonly modified structures or vocabulary.
•   Translation memory generation. Integration of Maligna.
•   PDF translation through pdftohtml
•   Image translation. Through tesseract.
                                                                        Final report 2010
                                                                        Final report 2011
c



        Results & learned lessons
• Fully functional environment, goals accomplished.
• Automatic availability of feedback on post-editing human
  behaviour.

•   Jointly defined task (flexible framework provided).
•   Interest in developing great empathy with the student.
•   Motivated and pro-active student.
•   Student engagement.
•   Very frequent feedback.
•   Mentoring team with access to ABSOLUTELY ALL the
    information regarding the project.
c



                   Further work
• Proof of concept accomplished.
• Base platform developed so further work can be easily
  added.
• Integration of other resources (more external dictionaries).
• Extension of currently used resources (addition of
  grammar rules, dictionaries improvement, format range
  extension).
• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
c



                    GsoC 2012




• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
• Post-edition over formatted text.
c




   Thanks
Questions & answers

Más contenido relacionado

Similar a Google Summer of Code 2011: UOC & Apertium

Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
guest59ccff
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
Satoru Kizaki
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
Danny Liu
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
Jisc
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 

Similar a Google Summer of Code 2011: UOC & Apertium (20)

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher education
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
 
Induction session
Induction sessionInduction session
Induction session
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloud
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraft
 
summer internship
summer internshipsummer internship
summer internship
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source Way
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery Platforms
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a Change
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 

Más de Office of Learning Technologies, Universitat Oberta de Catalunya

Más de Office of Learning Technologies, Universitat Oberta de Catalunya (20)

My uoc mobil
My uoc mobilMy uoc mobil
My uoc mobil
 
How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014
 
Presentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprintPresentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprint
 
Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)
 
Diseño universal y personalización en entornos virtuales de aprendizaje para...
Diseño universal y personalización en entornos virtuales  de aprendizaje para...Diseño universal y personalización en entornos virtuales  de aprendizaje para...
Diseño universal y personalización en entornos virtuales de aprendizaje para...
 
2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users
 
Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...
 
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendacionesEstudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
 
Augmented reality & cultural heritage eiasm 2013
Augmented reality & cultural heritage   eiasm 2013Augmented reality & cultural heritage   eiasm 2013
Augmented reality & cultural heritage eiasm 2013
 
Augmented reality, education & tourism
Augmented reality, education & tourism Augmented reality, education & tourism
Augmented reality, education & tourism
 
E-learning, tourism and augmented reality
E-learning, tourism and augmented realityE-learning, tourism and augmented reality
E-learning, tourism and augmented reality
 
Education and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritageEducation and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritage
 
Augmented reality
Augmented reality   Augmented reality
Augmented reality
 
Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...
 
Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:
 
Iuoc mobile2.0 2011
Iuoc mobile2.0 2011Iuoc mobile2.0 2011
Iuoc mobile2.0 2011
 
iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011
 
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
 
Gestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móvilesGestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móviles
 
Presentació o2
Presentació o2Presentació o2
Presentació o2
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Google Summer of Code 2011: UOC & Apertium

  • 1. Pre and post editing environment for Apertium Lluís Villarejo Learning Technologies March 2012
  • 2. c What is GSoC? • It's a global program that offers student developers stipends to write code for various open source software projects. • Since 2005 • Inspire young developers to participate in OSS projects. • Give students more exposure to real-world soft dev scenarios. • Get more open source code created and released. • Help open source prjs identify and bring in new developers.
  • 3. c Some participants • Apache Soft. Found. • Sakai Foundation • Debian • Mozilla • Facebook • Inclusive Design Inst. • Drupal • The Linux Foundation • Creative Commons • The GNU project • DocBook project • Wikimedia Foundation • GCC • WordPress • Gnome • Inclusive Design Inst. • ... • ...
  • 4. c How does it work? • Orgs present themselves as mentoring agents. • Orgs present a list of potential projects and mentors. • Accepted orgs should try to attract students' interest. • Students build project proposals. • Google finances slots for each org (5.000 + 500 USD). • The project community decides the student-slot assignation. • Between end of May and end of August.
  • 5. c GsoC'11 statistics • $7.2M budget • 1115 students accepted from 68 countries • 2096 mentors and co-mentors from 55 countries • 175 Open Source organizations • 18.1% of students have participated in previous years • 97 countries with student applicants • 88% overall success rate
  • 7. c Why participating with Apertium? • Strategically: – Apertium is a strategic agent inside UOC. – Developing Apertium means further developing internationalization aids for UOC. – Attract and onboard new developers for Apertium. – Collaboration with Google's Open Source initiatives. • Functionally: – Opporutnity to further develop specific UOC needs with external funding. – Capitalize specific user feedback on translation quality.
  • 8. c The Apertium case • 20 proposed tasks • 17 tasks got interest from students [1-9] – Pre and post-editing environment gets 11 students interested. • Apertium community ranks the 17 tasks – Pre and post-editing environment ranks 4th • Google assigns 9 slots to Apertium (49.500 USD) – Our task goes through and Camille Mougey is selected from the Grenoble Insitute of Technology.
  • 9. c Pre and post-editing, why? • An important part of the errors you get when translating a document are due to deficiencies in the original. • The integration of existing resources can help to ease this burden: – Digital knowledge sources (digital dictionaries... ) – Automatic tools (spell-checker, grammar checker, translation memory generation, search & replace...) • These processes should be integrated naturally in the translation workflow → the need for an integrated web interface to Apertium. • To improve the system we need to have access to the human post-editing process.
  • 10. c Pre and post-editing, features • Pre and Post-editing web interface integrated with Apertium translation toolbox. • Spell checking on source and target languages. Integration with Aspell • Grammar checking on source and target languages. Integration with LanguageTool • Integration with several external dictionaries. • Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary. • Translation memory generation. Integration of Maligna. • PDF translation through pdftohtml • Image translation. Through tesseract. Final report 2010 Final report 2011
  • 11. c Results & learned lessons • Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human behaviour. • Jointly defined task (flexible framework provided). • Interest in developing great empathy with the student. • Motivated and pro-active student. • Student engagement. • Very frequent feedback. • Mentoring team with access to ABSOLUTELY ALL the information regarding the project.
  • 12. c Further work • Proof of concept accomplished. • Base platform developed so further work can be easily added. • Integration of other resources (more external dictionaries). • Extension of currently used resources (addition of grammar rules, dictionaries improvement, format range extension). • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine.
  • 13. c GsoC 2012 • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine. • Post-edition over formatted text.
  • 14. c Thanks Questions & answers