SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Pre and post editing
environment for Apertium




                                   Lluís Villarejo
                           Learning Technologies
                                     March 2012
c



                 What is GSoC?
• It's a global program that offers student developers stipends
  to write code for various open source software projects.
• Since 2005

• Inspire young developers to participate in OSS projects.
• Give students more exposure to real-world soft dev
  scenarios.
• Get more open source code created and released.
• Help open source prjs identify and bring in new developers.
c



             Some participants
•   Apache Soft. Found.   •   Sakai Foundation
•   Debian                •   Mozilla
•   Facebook              •   Inclusive Design Inst.
•   Drupal                •   The Linux Foundation
•   Creative Commons      •   The GNU project
•   DocBook project       •   Wikimedia Foundation
•   GCC                   •   WordPress
•   Gnome                 •   Inclusive Design Inst.
•   ...                   •   ...
c



                How does it work?
•   Orgs present themselves as mentoring agents.
•   Orgs present a list of potential projects and mentors.
•   Accepted orgs should try to attract students' interest.
•   Students build project proposals.
•   Google finances slots for each org (5.000 + 500 USD).
•   The project community decides the student-slot assignation.
•   Between end of May and end of August.
c



               GsoC'11 statistics
• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate
c



Accepted Students GSoC'11
c



Why participating with Apertium?
• Strategically:
   – Apertium is a strategic agent inside UOC.
   – Developing Apertium means further developing
     internationalization aids for UOC.
   – Attract and onboard new developers for Apertium.
   – Collaboration with Google's Open Source initiatives.

• Functionally:
   – Opporutnity to further develop specific UOC needs with
     external funding.
   – Capitalize specific user feedback on translation quality.
c



              The Apertium case
• 20 proposed tasks
• 17 tasks got interest from students [1-9]
   – Pre and post-editing environment gets 11 students
     interested.

• Apertium community ranks the 17 tasks
   – Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)
  – Our task goes through and Camille Mougey is selected
    from the Grenoble Insitute of Technology.
c



      Pre and post-editing, why?
• An important part of the errors you get when translating a
  document are due to deficiencies in the original.
• The integration of existing resources can help to ease this
  burden:
   – Digital knowledge sources (digital dictionaries... )
   – Automatic tools (spell-checker, grammar checker, translation
     memory generation, search & replace...)
• These processes should be integrated naturally in the
  translation workflow → the need for an integrated web interface
  to Apertium.
• To improve the system we need to have access to the human
  post-editing process.
c



     Pre and post-editing, features
•   Pre and Post-editing web interface integrated with Apertium translation toolbox.
•   Spell checking on source and target languages. Integration with Aspell
•   Grammar checking on source and target languages. Integration with
    LanguageTool
•   Integration with several external dictionaries.
•   Search & replace functionalities on source and target languages.
•   Ability to deal with formatted text.
•   Logging system. All events are logged as they happen, ie at the very moment
    the user inserts or deletes text. This allows for a further data mining process to
    be run on the logs to detect commonly modified structures or vocabulary.
•   Translation memory generation. Integration of Maligna.
•   PDF translation through pdftohtml
•   Image translation. Through tesseract.
                                                                        Final report 2010
                                                                        Final report 2011
c



        Results & learned lessons
• Fully functional environment, goals accomplished.
• Automatic availability of feedback on post-editing human
  behaviour.

•   Jointly defined task (flexible framework provided).
•   Interest in developing great empathy with the student.
•   Motivated and pro-active student.
•   Student engagement.
•   Very frequent feedback.
•   Mentoring team with access to ABSOLUTELY ALL the
    information regarding the project.
c



                   Further work
• Proof of concept accomplished.
• Base platform developed so further work can be easily
  added.
• Integration of other resources (more external dictionaries).
• Extension of currently used resources (addition of
  grammar rules, dictionaries improvement, format range
  extension).
• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
c



                    GsoC 2012




• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
• Post-edition over formatted text.
c




   Thanks
Questions & answers

Más contenido relacionado

Similar a Google Summer of Code 2011: UOC & Apertium

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...Bluechip Technologies
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationDevCSI
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Codeguest59ccff
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Jeff McKenna
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiSatoru Kizaki
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloudJeff Piontek
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraftbillmenger
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languagesDanny Liu
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayAlexis Monville
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsUna Daly
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayAlexis Monville
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeNeil Chue Hong
 
Shirley Evans
Shirley EvansShirley Evans
Shirley EvansJisc
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 

Similar a Google Summer of Code 2011: UOC & Apertium (20)

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher education
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
 
Induction session
Induction sessionInduction session
Induction session
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloud
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraft
 
summer internship
summer internshipsummer internship
summer internship
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source Way
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery Platforms
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a Change
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 

Más de Office of Learning Technologies, Universitat Oberta de Catalunya

Más de Office of Learning Technologies, Universitat Oberta de Catalunya (20)

My uoc mobil
My uoc mobilMy uoc mobil
My uoc mobil
 
How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014
 
Presentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprintPresentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprint
 
Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)
 
Diseño universal y personalización en entornos virtuales de aprendizaje para...
Diseño universal y personalización en entornos virtuales  de aprendizaje para...Diseño universal y personalización en entornos virtuales  de aprendizaje para...
Diseño universal y personalización en entornos virtuales de aprendizaje para...
 
2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users
 
Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...
 
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendacionesEstudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
 
Augmented reality & cultural heritage eiasm 2013
Augmented reality & cultural heritage   eiasm 2013Augmented reality & cultural heritage   eiasm 2013
Augmented reality & cultural heritage eiasm 2013
 
Augmented reality, education & tourism
Augmented reality, education & tourism Augmented reality, education & tourism
Augmented reality, education & tourism
 
E-learning, tourism and augmented reality
E-learning, tourism and augmented realityE-learning, tourism and augmented reality
E-learning, tourism and augmented reality
 
Education and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritageEducation and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritage
 
Augmented reality
Augmented reality   Augmented reality
Augmented reality
 
Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...
 
Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:
 
Iuoc mobile2.0 2011
Iuoc mobile2.0 2011Iuoc mobile2.0 2011
Iuoc mobile2.0 2011
 
iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011
 
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
 
Gestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móvilesGestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móviles
 
Presentació o2
Presentació o2Presentació o2
Presentació o2
 

Último

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Google Summer of Code 2011: UOC & Apertium

  • 1. Pre and post editing environment for Apertium Lluís Villarejo Learning Technologies March 2012
  • 2. c What is GSoC? • It's a global program that offers student developers stipends to write code for various open source software projects. • Since 2005 • Inspire young developers to participate in OSS projects. • Give students more exposure to real-world soft dev scenarios. • Get more open source code created and released. • Help open source prjs identify and bring in new developers.
  • 3. c Some participants • Apache Soft. Found. • Sakai Foundation • Debian • Mozilla • Facebook • Inclusive Design Inst. • Drupal • The Linux Foundation • Creative Commons • The GNU project • DocBook project • Wikimedia Foundation • GCC • WordPress • Gnome • Inclusive Design Inst. • ... • ...
  • 4. c How does it work? • Orgs present themselves as mentoring agents. • Orgs present a list of potential projects and mentors. • Accepted orgs should try to attract students' interest. • Students build project proposals. • Google finances slots for each org (5.000 + 500 USD). • The project community decides the student-slot assignation. • Between end of May and end of August.
  • 5. c GsoC'11 statistics • $7.2M budget • 1115 students accepted from 68 countries • 2096 mentors and co-mentors from 55 countries • 175 Open Source organizations • 18.1% of students have participated in previous years • 97 countries with student applicants • 88% overall success rate
  • 7. c Why participating with Apertium? • Strategically: – Apertium is a strategic agent inside UOC. – Developing Apertium means further developing internationalization aids for UOC. – Attract and onboard new developers for Apertium. – Collaboration with Google's Open Source initiatives. • Functionally: – Opporutnity to further develop specific UOC needs with external funding. – Capitalize specific user feedback on translation quality.
  • 8. c The Apertium case • 20 proposed tasks • 17 tasks got interest from students [1-9] – Pre and post-editing environment gets 11 students interested. • Apertium community ranks the 17 tasks – Pre and post-editing environment ranks 4th • Google assigns 9 slots to Apertium (49.500 USD) – Our task goes through and Camille Mougey is selected from the Grenoble Insitute of Technology.
  • 9. c Pre and post-editing, why? • An important part of the errors you get when translating a document are due to deficiencies in the original. • The integration of existing resources can help to ease this burden: – Digital knowledge sources (digital dictionaries... ) – Automatic tools (spell-checker, grammar checker, translation memory generation, search & replace...) • These processes should be integrated naturally in the translation workflow → the need for an integrated web interface to Apertium. • To improve the system we need to have access to the human post-editing process.
  • 10. c Pre and post-editing, features • Pre and Post-editing web interface integrated with Apertium translation toolbox. • Spell checking on source and target languages. Integration with Aspell • Grammar checking on source and target languages. Integration with LanguageTool • Integration with several external dictionaries. • Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary. • Translation memory generation. Integration of Maligna. • PDF translation through pdftohtml • Image translation. Through tesseract. Final report 2010 Final report 2011
  • 11. c Results & learned lessons • Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human behaviour. • Jointly defined task (flexible framework provided). • Interest in developing great empathy with the student. • Motivated and pro-active student. • Student engagement. • Very frequent feedback. • Mentoring team with access to ABSOLUTELY ALL the information regarding the project.
  • 12. c Further work • Proof of concept accomplished. • Base platform developed so further work can be easily added. • Integration of other resources (more external dictionaries). • Extension of currently used resources (addition of grammar rules, dictionaries improvement, format range extension). • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine.
  • 13. c GsoC 2012 • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine. • Post-edition over formatted text.
  • 14. c Thanks Questions & answers