SlideShare a Scribd company logo
1 of 25
Catalan daily goes Catalan

       LocWord 2012, A4
  Magí Camps (La Vanguardia)
  Blanca Vidal (Lucy Software)
[1] Introduction, background
                                                Newspapers in Catalan
                                                   Net Circulation
                                                90.000
                                                         79.239
                                                80.000
                                                70.000
                                                60.000
                                                50.000            45.309
                                                40.000
                                                                           31.762
                                                30.000
                                                20.000                              15.662
                                                10.000                                       6.779
                                                    0



Source: Estudi General de Mitjans (EGM), 2012
Introduction, background
Results
                      Increase
                      +4% of copies
                      +7% of readers

                      Distribution
                      57% Spanish
                      43% Catalan
Introduction, background

        Why a Catalan version?
   Celebration of LV’s 130 anniversary
  Normalization of the use of Catalan
      Investment to face the crisis
Opportunity to consolidate LV’s hegemony
[2] Customer goals

To publish two language    Journalists should be
  editions of the same         able to write in
    newspaper daily                  any
  (supplements incl.).     of the two languages.


               Neither quality nor
             distribution timeframes
                should be affected.
Customer requirements

           • Tailor-made system
           • Complying with LV’s style guide
           • Seamless integration into journalist’s
             workflow
      MT   • Translation of Hermes XML and
             InDesign formats
           • Reliability, high availability
           • High performance
[3] Ramp-up phase
Project set-up

Work areas       MT linguistic improvement/tuning
                 Post-editing preparation
                 MT system set-up and integration
                 MT lexicon training
Duration         8 months (+ 3 months)
Staff            LV: 10-12 in-house journalists
                 Lucy: 3 computational linguists / lexicographers
                       1 software developer
                 Incyta: 2 professional post-editors
Important!       On-site support
Subphases
TASKS                                                  Phase 1   Phase 2   Phase 3   Phase 4

Linguistic improvement/tuning
    - Language-type definition                              x

    - Creation of a corpus of real texts                    x         x         x         x

    - Analysis of the translation quality                   x         x         x         x

    - Error reporting (lexicon and grammar errors)          x         x         x         x

    - Linguistic implementation (lex and grammar)           x         x         x         x

    - Pre and post-editing filters                          x         x         x         x

Post-editing preparation
    - Gathering of MT post-editing guidelines               x

    - Evaluation of post-editing effort                     x                   x

    - Creation and training of the post-editing team                            x

Technical set-up

    - System set-up and integration                         x

    - Preparation of XML converters                         x

Maintenance
    - Lexicon maintenance training                                                        x

Duration                                                  2 mo      3 mo      3 mo      3 mo
[a] Linguistic tuning
Language
 model

                Corpus
  Translation
 quality (TQ)
                         Analysis and
                         error-reporting
    Implementation



                     Accomplished
                   improvement data
Linguistic tuning

 Catalan language model
 • no exclusion
 • compliant with standards
 • innovative in terminology
 • dynamic in syntactical structures



 Corpus
 • ES: 500,000 transl. units – 8,300,000 words
 • CA: 250,000 transl. units – 3,000,000 words
Linguistic tuning
           Translation Quality
               Medium
 Minimal
               post-edit
  post-
                 2%
 editing
  24%

                           Perfect
                            74%




Conclusions
• No specific domains (except Sports)
• Culture: proper names
• Opinion: idioms, plays on words
• Errors not repetitive
• % style to be post-edited
Linguistic tuning

 Analysis and error reporting
 • Semi-automatic detection of missing words
 • Terminology lists
 • New and different translations, error
   reporting



 Implementation
 • Proper names [44.5 % of the TUs ]
 • Idioms
 • Alternatives
Linguistic tuning
Accomplished improvement data
• Work in figures
        40,000 lexicon entries (20,000 for each transl. direction)
        Around 440 grammar rules
        Around 7,200 words in the proper names files (each transl. dir)
• Non-measurable work
        Understanding of the MT system
        Understanding of the newspaper specificities
        Support in the style guide taking into account MT
• Improvement
        ES>CA 41% diff => 35% better , 4% similar, 2% worse
        CA>ES 36% diff => 32% better, 3% similar, 1% worse
[b] Post-editing
Post-editing
             Metrics on
         translation volume
                         Metrics on
Specificities            post-editing effort
 of the text
                              Post-editors
        Post-editing          workspace
          resources
                                    Error reporting
                                    process and tools
        Post-editing
      team and profile
Post-editing: metrics
                               Total     Lex/gram                     Style
File               translation units   post-edition       %    post-edition       %

LV_2010-10-27                 2,474            464    18.79%           394    15.96%
(= 42.512 words)



       Conclusions
       •   Different sections had different levels of post-editing
       •   What style corrections could be avoided?
       •   Post-editing speed: 1,000-1,500 words/h
       •   Daily volume: 75,000 words
       •   New post-editing team: 20 post-editors/12 editors
Post-editing: resources, workspace
    Post-editors
                                                            Resources on
    should have         Post-editing   Adapt CMS to new
                                                          Intranet language
 proficiency in their      guide          workflow
                                                                portal
   skills BUT also

   Be trained on                              New           Bilingual style
                          Classified
    MT post-ed                             processing            guide
                          frequent
                          MT errors          status
      Have an                                                Links to all
     integrated                                               reference
     workspace                                               dictionaries
                         Reference
        Have            document for     New mark-ups
                          training                          MT portal for
     resources                                              any journalist
      at a click
Post-editing: resources, workspace




       La Vanguardia’s intranet: linguistic portal
Post-editing: error reporting, team
     Error reporting

     • Crucial for continuous improvement
     • Not automated (yet)
     • Provide better support to error reporting

     Definition of post-editing profile and team

     • Proficient in Catalan
     • Journalist background
[c] System integration
   During phase 1: pre-production
   • Pre-production set-up and installation
   • Hermes XML converter
   • Changes in the LT engine to translate InDesign
     files


   During phase 3: production
   • Production installation
   • Test (load, performance and stress)
   • Performance 500-1,200 w/sec
   • Definition of the final installation size
System integration

                         Language                Hermes
Hermes      InDesign
                           portal               InDesign

           Web Service                        Web Service




          Production                        Pre-production            Maintenance



•   Production: balanced high performance (HP) and high availability (HA) configuration
•   System requirements: normal Windows Server -> low HW footprint
    (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)
[4] Operation: production process




  Staff               Effort                        Timeline
  • 20 post-editors   • 30’ linguistic review       • Start 5 p.m.
  • 12 editors        • 10’ journalistic review     • First edition 11.30 p.m.
                      • 70,000 words/day + suppl.   • Second edition 2.30 a.m.
Operation: production process
[5] Next goals

Success! Yes.
Thanks to
• Close work and
                     Next!
  cooperation        • How to reduce
• Three parties        post-editing effort
  involved           • How to re-use
• Time and effort      post-edited text
  investment
• Customisation
Thank you for your attention




Magí Camps               Blanca Vidal                    Ignasi Navarro
La Vanguardia            Lucy Software Ibérica           Incyta
mcamps@lavanguardia.es   blanca.vidal@lucysoftware.com   Ignasi_navarro@incyta.com
www.lavanguardia.es      www.lucysoftware.com            www.incyta.com

More Related Content

Viewers also liked

Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...
Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...
Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...Adi Werschlein
 
Critical online success factors with dynatrace
Critical online success factors with dynatraceCritical online success factors with dynatrace
Critical online success factors with dynatraceDynatraceANZ
 
Firma Digital Biométrica en Consentimientos Informados.
Firma Digital Biométrica en Consentimientos Informados.Firma Digital Biométrica en Consentimientos Informados.
Firma Digital Biométrica en Consentimientos Informados.edatalia signature solutions
 
Derecho de obligaciones. diapositivas del dr. edgardo quispe v. parte 5
Derecho de obligaciones.  diapositivas del dr. edgardo quispe v. parte 5Derecho de obligaciones.  diapositivas del dr. edgardo quispe v. parte 5
Derecho de obligaciones. diapositivas del dr. edgardo quispe v. parte 5edgardoquispe
 
Generalidades Sobre Telefonia Celular Y Gsm
Generalidades Sobre Telefonia Celular Y GsmGeneralidades Sobre Telefonia Celular Y Gsm
Generalidades Sobre Telefonia Celular Y GsmJuan Pernia (juanrules)
 
E-banking - L'E-transformation de la Banque
E-banking - L'E-transformation de la BanqueE-banking - L'E-transformation de la Banque
E-banking - L'E-transformation de la BanqueElena HERNANDEZ
 
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?Hans-Dieter Zimmermann
 
Production écrite 6eme
Production écrite 6emeProduction écrite 6eme
Production écrite 6ememjnifen
 

Viewers also liked (10)

Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...
Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...
Bodywrapping gegen cellulite oberarmstraffung, ohne, op, gewebe, cellulite, o...
 
Critical online success factors with dynatrace
Critical online success factors with dynatraceCritical online success factors with dynatrace
Critical online success factors with dynatrace
 
Firma Digital Biométrica en Consentimientos Informados.
Firma Digital Biométrica en Consentimientos Informados.Firma Digital Biométrica en Consentimientos Informados.
Firma Digital Biométrica en Consentimientos Informados.
 
Derecho de obligaciones. diapositivas del dr. edgardo quispe v. parte 5
Derecho de obligaciones.  diapositivas del dr. edgardo quispe v. parte 5Derecho de obligaciones.  diapositivas del dr. edgardo quispe v. parte 5
Derecho de obligaciones. diapositivas del dr. edgardo quispe v. parte 5
 
Generalidades Sobre Telefonia Celular Y Gsm
Generalidades Sobre Telefonia Celular Y GsmGeneralidades Sobre Telefonia Celular Y Gsm
Generalidades Sobre Telefonia Celular Y Gsm
 
Retrato Fotografico
Retrato FotograficoRetrato Fotografico
Retrato Fotografico
 
E-banking - L'E-transformation de la Banque
E-banking - L'E-transformation de la BanqueE-banking - L'E-transformation de la Banque
E-banking - L'E-transformation de la Banque
 
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?
Die digitale Kulturrevolution – haben Bücher und Zeitungen ausgedient?
 
Help – Hilfe zur Selbsthilfe / Über uns
Help – Hilfe zur Selbsthilfe / Über unsHelp – Hilfe zur Selbsthilfe / Über uns
Help – Hilfe zur Selbsthilfe / Über uns
 
Production écrite 6eme
Production écrite 6emeProduction écrite 6eme
Production écrite 6eme
 

Similar to Catalan daily goes Catalan

70global presentation
70global presentation70global presentation
70global presentationMark Fisher
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
70global presentation updated
70global presentation updated70global presentation updated
70global presentation updatedAssaf Sayada
 
Lean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopLean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopIXIASOFT
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44Alain Désilets
 
TAUS Webinar - Introduction to the Gengo API Ecosystem
TAUS Webinar - Introduction to the Gengo API EcosystemTAUS Webinar - Introduction to the Gengo API Ecosystem
TAUS Webinar - Introduction to the Gengo API EcosystemGengo
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Lionel Briand
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual Ramon Navarro
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Joseba Abaitua
 
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content ManagementTatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content Managementmbruemmer
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling WorldsIstvan Rath
 
Doctrain Life Sciences Handling Dita Topics And Translation In A Regulated ...
Doctrain Life Sciences   Handling Dita Topics And Translation In A Regulated ...Doctrain Life Sciences   Handling Dita Topics And Translation In A Regulated ...
Doctrain Life Sciences Handling Dita Topics And Translation In A Regulated ...Scott Abel
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languagesGuy De Pauw
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO RIILP
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosaPharo
 
Multilingual Data Value Chain for CEF Automated Translation: Interoperability...
Multilingual Data Value Chain for CEF Automated Translation:Interoperability...Multilingual Data Value Chain for CEF Automated Translation:Interoperability...
Multilingual Data Value Chain for CEF Automated Translation: Interoperability...Dave Lewis
 

Similar to Catalan daily goes Catalan (20)

70global presentation
70global presentation70global presentation
70global presentation
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
70global presentation updated
70global presentation updated70global presentation updated
70global presentation updated
 
Lean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopLean and Collaborative Content - Workshop
Lean and Collaborative Content - Workshop
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
 
TAUS Webinar - Introduction to the Gengo API Ecosystem
TAUS Webinar - Introduction to the Gengo API EcosystemTAUS Webinar - Introduction to the Gengo API Ecosystem
TAUS Webinar - Introduction to the Gengo API Ecosystem
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
 
Managing multilingual webcontent
Managing multilingual webcontentManaging multilingual webcontent
Managing multilingual webcontent
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)Genre discovery in corpus management systems (2004)
Genre discovery in corpus management systems (2004)
 
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content ManagementTatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
Doctrain Life Sciences Handling Dita Topics And Translation In A Regulated ...
Doctrain Life Sciences   Handling Dita Topics And Translation In A Regulated ...Doctrain Life Sciences   Handling Dita Topics And Translation In A Regulated ...
Doctrain Life Sciences Handling Dita Topics And Translation In A Regulated ...
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languages
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
 
Multilingual Data Value Chain for CEF Automated Translation: Interoperability...
Multilingual Data Value Chain for CEF Automated Translation:Interoperability...Multilingual Data Value Chain for CEF Automated Translation:Interoperability...
Multilingual Data Value Chain for CEF Automated Translation: Interoperability...
 

Recently uploaded

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSendBig4
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 

Recently uploaded (20)

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.com
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 

Catalan daily goes Catalan

  • 1. Catalan daily goes Catalan LocWord 2012, A4 Magí Camps (La Vanguardia) Blanca Vidal (Lucy Software)
  • 2. [1] Introduction, background Newspapers in Catalan Net Circulation 90.000 79.239 80.000 70.000 60.000 50.000 45.309 40.000 31.762 30.000 20.000 15.662 10.000 6.779 0 Source: Estudi General de Mitjans (EGM), 2012
  • 3. Introduction, background Results Increase +4% of copies +7% of readers Distribution 57% Spanish 43% Catalan
  • 4. Introduction, background Why a Catalan version? Celebration of LV’s 130 anniversary Normalization of the use of Catalan Investment to face the crisis Opportunity to consolidate LV’s hegemony
  • 5. [2] Customer goals To publish two language Journalists should be editions of the same able to write in newspaper daily any (supplements incl.). of the two languages. Neither quality nor distribution timeframes should be affected.
  • 6. Customer requirements • Tailor-made system • Complying with LV’s style guide • Seamless integration into journalist’s workflow MT • Translation of Hermes XML and InDesign formats • Reliability, high availability • High performance
  • 7. [3] Ramp-up phase Project set-up Work areas MT linguistic improvement/tuning Post-editing preparation MT system set-up and integration MT lexicon training Duration 8 months (+ 3 months) Staff LV: 10-12 in-house journalists Lucy: 3 computational linguists / lexicographers 1 software developer Incyta: 2 professional post-editors Important! On-site support
  • 8. Subphases TASKS Phase 1 Phase 2 Phase 3 Phase 4 Linguistic improvement/tuning - Language-type definition x - Creation of a corpus of real texts x x x x - Analysis of the translation quality x x x x - Error reporting (lexicon and grammar errors) x x x x - Linguistic implementation (lex and grammar) x x x x - Pre and post-editing filters x x x x Post-editing preparation - Gathering of MT post-editing guidelines x - Evaluation of post-editing effort x x - Creation and training of the post-editing team x Technical set-up - System set-up and integration x - Preparation of XML converters x Maintenance - Lexicon maintenance training x Duration 2 mo 3 mo 3 mo 3 mo
  • 9. [a] Linguistic tuning Language model Corpus Translation quality (TQ) Analysis and error-reporting Implementation Accomplished improvement data
  • 10. Linguistic tuning Catalan language model • no exclusion • compliant with standards • innovative in terminology • dynamic in syntactical structures Corpus • ES: 500,000 transl. units – 8,300,000 words • CA: 250,000 transl. units – 3,000,000 words
  • 11. Linguistic tuning Translation Quality Medium Minimal post-edit post- 2% editing 24% Perfect 74% Conclusions • No specific domains (except Sports) • Culture: proper names • Opinion: idioms, plays on words • Errors not repetitive • % style to be post-edited
  • 12. Linguistic tuning Analysis and error reporting • Semi-automatic detection of missing words • Terminology lists • New and different translations, error reporting Implementation • Proper names [44.5 % of the TUs ] • Idioms • Alternatives
  • 13. Linguistic tuning Accomplished improvement data • Work in figures 40,000 lexicon entries (20,000 for each transl. direction) Around 440 grammar rules Around 7,200 words in the proper names files (each transl. dir) • Non-measurable work Understanding of the MT system Understanding of the newspaper specificities Support in the style guide taking into account MT • Improvement ES>CA 41% diff => 35% better , 4% similar, 2% worse CA>ES 36% diff => 32% better, 3% similar, 1% worse
  • 15. Post-editing Metrics on translation volume Metrics on Specificities post-editing effort of the text Post-editors Post-editing workspace resources Error reporting process and tools Post-editing team and profile
  • 16. Post-editing: metrics Total Lex/gram Style File translation units post-edition % post-edition % LV_2010-10-27 2,474 464 18.79% 394 15.96% (= 42.512 words) Conclusions • Different sections had different levels of post-editing • What style corrections could be avoided? • Post-editing speed: 1,000-1,500 words/h • Daily volume: 75,000 words • New post-editing team: 20 post-editors/12 editors
  • 17. Post-editing: resources, workspace Post-editors Resources on should have Post-editing Adapt CMS to new Intranet language proficiency in their guide workflow portal skills BUT also Be trained on New Bilingual style Classified MT post-ed processing guide frequent MT errors status Have an Links to all integrated reference workspace dictionaries Reference Have document for New mark-ups training MT portal for resources any journalist at a click
  • 18. Post-editing: resources, workspace La Vanguardia’s intranet: linguistic portal
  • 19. Post-editing: error reporting, team Error reporting • Crucial for continuous improvement • Not automated (yet) • Provide better support to error reporting Definition of post-editing profile and team • Proficient in Catalan • Journalist background
  • 20. [c] System integration During phase 1: pre-production • Pre-production set-up and installation • Hermes XML converter • Changes in the LT engine to translate InDesign files During phase 3: production • Production installation • Test (load, performance and stress) • Performance 500-1,200 w/sec • Definition of the final installation size
  • 21. System integration Language Hermes Hermes InDesign portal InDesign Web Service Web Service Production Pre-production Maintenance • Production: balanced high performance (HP) and high availability (HA) configuration • System requirements: normal Windows Server -> low HW footprint (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)
  • 22. [4] Operation: production process Staff Effort Timeline • 20 post-editors • 30’ linguistic review • Start 5 p.m. • 12 editors • 10’ journalistic review • First edition 11.30 p.m. • 70,000 words/day + suppl. • Second edition 2.30 a.m.
  • 24. [5] Next goals Success! Yes. Thanks to • Close work and Next! cooperation • How to reduce • Three parties post-editing effort involved • How to re-use • Time and effort post-edited text investment • Customisation
  • 25. Thank you for your attention Magí Camps Blanca Vidal Ignasi Navarro La Vanguardia Lucy Software Ibérica Incyta mcamps@lavanguardia.es blanca.vidal@lucysoftware.com Ignasi_navarro@incyta.com www.lavanguardia.es www.lucysoftware.com www.incyta.com