SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Estimating Dyslexia in the Web

Ricardo Baeza-Yates                      Luz Rello

Yahoo! Research &                        Web Research and
Web Research Group,                      NLP Groups
Pompeu Fabra University,                 Pompeu Fabra University,
Barcelona, Spain                         Barcelona, Spain




                   W4A 2011, Hyderabad
Outline
                                      Outline




                       — What

                       — Why
                                                    to distinguish dyslexic errors
                       — How                        to build a sample
                                                    to measure dyslexia

                       — Results



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
What
                                      Outline

                                    Dyslexia is a neurologically-based disorder which
         Dyslexia                   interferes with the acquisition and processing of
                                    language. It manifests itself with difficulties in
                                    receptive and expressive language, including
                                    phonological processing, in reading, writing, spelling
          (The Boder’s Test         and handwriting and sometimes in arithmetic.
          of Reading-Spelling
          Patterns)                                            (Committee of Members Orton
                                                               Dyslexia Society. Definition of
                                                               Dyslexia, 1994.)

                                    The largest of the three subtypes of dyslexia that
         Dysphonetic                the author presents. Dysphonetic dyslexia is
         dyslexia                   viewed as a disability in associating symbols with
                                    sounds. The misspellings typical of this disorder
                                    are due to phonetic inaccuracy.         (Boder &
                                                                          Jarrico, 1982)

Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad           Estimating Dyslexia in the Web
Why
                                       Outline


                                    There is a universal neuro-cognitive basis for
                                    dyslexia.
                                                                   (Paulesu et al. 2001)


                                    It manifestations are culture-specific due to
        All languages               different orthographies.
                                                                            (Alegria, 2006)


                                    English is a language with deep orthography,
                                    the mapping between letters, speech sounds, and
                                    whole-word sounds is often highly ambiguous and
                                    therefore dyslexics examples are more
                                    widespread than in other languages with
                                    transparent or shallow orthography.
                                                                      (Paulesu et al. 2001)

Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad            Estimating Dyslexia in the Web
Why
                                         Outline




                               Researchers estimate that 10-17% of the population
                               in the U.S.A. has dyslexia and only 30% of dyslexics
                               have trouble with reversing letters and numbers. On
                               the other hand, the level of dyslexia in other regions
                               such as Europe or China is lower.
         Frequent
                                                                       (H. Meng et al., 2005)




                               There are around 38 million of dyslexics in Europe.

                                                                      (Ruiz del Árbol, 2008)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad            Estimating Dyslexia in the Web
Why
                                         Outline


                             Detecting the presence of dyslexic texts in the Web helps us
                             to know the real impact of dyslexia in the Web as well as
                             to value dyslexic-accessible practices.


         Useful              There is a common agreement in these studies that the
                             application of dyslexic-accessible practices benefits also the
                             readability for non-dyslexic users as well as other users
                             with disabilities such as low vision. (McCarthy & Swierenga, 2010)
                                                                           (Evett & Brown, 2005)

                             Spelling error rates has proven to be a useful index for
                             website content quality.
                                                                      (Gelman & Barletta, 2008)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad           Estimating Dyslexia in the Web
Why
                                         Outline



                               Estimating dyslexia in a group of web pages depending
                               on their domain.
                                                                  (Ringlstetter et al. 2006)



             Novel




                               This is the first attempt to estimate the amount of
                               texts containing English dyslexic errors in the Web.




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad            Estimating Dyslexia in the Web
How
                                          Outline

                               Two examples of dyslexic texts


      There seams to be some confusetion. Althrow
      he rembers the situartion, he is not clear on
                                    z
      detailes. With regard to deleteing parts,
      could you advice me of the excat nature of the
      promblem and I will investgate it imeaditly.



                                                        I halve a spelling chequer
                                                        It cam with my pea see
                                                        Eye now I’ve gut the spilling rite
                                                        Its plane fore al too sea ... I
                                                        ts latter prefect awl the weigh
                                                        My chequer tolled mi sew.
     (Pedler, 2007)

Ricardo Baeza-Yates and Luz Rello       W4A 2011, Hyderabad          Estimating Dyslexia in the Web
How
                                         Outline

            How many kinds of errors can be produced by a dyslexic?


                                    Simple errors             53%
                                    Multi errors              39%
                                    Word boundary errors       8%
                                                             ——
                                                             100%
              dyslexic
              errors                Real-word errors          17%
                                    Non-word errors           83%
                                                             ——
                                                             100%

                                    First letter errors       5%
                                                                    (Pedler, 2007)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad          Estimating Dyslexia in the Web
How
                                       Outline

                         How many kinds of errors in the Web?

         1. Dyslexic errors: Among the different kinds of errors commonly made made by
         dyslexics (i.e. unfinishedwords or letters, omitted words, inconsistent spaces
         between words and letters (Vellutino, 1979). *reiecve instead of receive

         2. Regular spelling errors produced by non-impaired native English individuals,
         such as the transposition error, i.e. *recieve.

         3. Regular typos caused by the adjacency of letters in the keyboard, i.e. *teceive.

         4. OCR errors, due to letters of similar shape, such as *ieceive.

         5. Errors made by non-native speakers who use English as a foreign
         language. For example, *receibe is a typical error made by Spanish learners of
         English, since the graphemes ‘b’ and ‘v’ are pronounced as /b/, and
         the phoneme /v/ does not exist in the standard Spanish phonemic system.

Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                         Outline

                                       Selection criteria

    To avoid the overlap of dyslexic errors and other errors:

                 — We consider only words written by dyslexics containing multi-
                 errors, that is, the dyslexic word differs from the intended correct
                 word by more than one letter. For example, the dyslexic word
                 *konwlegde from knowledge.


    To avoid the overlap of dyslexic errors and real words:

                 — Errors which coincide with other existing words in English are
                 omitted, i.e. *trust being the intended word truth.

                 — Errors which give as a result a proper name are also filtered, for
                 instance the typo *wirries from worries is also a proper name.


Ricardo Baeza-Yates and Luz Rello     W4A 2011, Hyderabad            Estimating Dyslexia in the Web
                                                                                         in the
How
                                       Outline

                                     Selection criteria

     — All the dyslexic spelling errors are extracted from samples of text written by adults
     with diagnosed dyslexia (extracted from a corpus compiled for this purpose) and from
     literature (Pedler, 2007).

     — Among the dyslexic errors, we take in account the ones which include the letters
     that produce more confusion among dyslexic individuals, such as ‘b’, ‘d’, ‘p’, ‘m’, ‘n’,
     ‘u’ and ‘w’ together with other similar looking letters. For instance, it is specially
     frequent to find reversals of similar letters, such as ‘b’ and ‘d’ (Deloche et al. 1982).
     i.e. *impossidle being the intended word impossible.


     — Errors due to homophone confusion, that is words which have a similar
     pronunciation (Pedler, 2007), are not selected even though 15% of the dyslexic errors
     presented homophone confusion in a corpus of dyslexic texts (witch and which).


Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                        Outline

                   Sample D, an example for the word comparison


      1. Dyslexic error:            *comaprsion.

      2. Spelling errors:           *comparision, *conparison and *coparison.

      3. Typos:                     *vomparison, *xomparison, *cimparison, *cpmparison,
                                    *conparison, *co,parison, *comoarison, *com[arison,
                                    *comprison, *compsrison, *compaeison, *compatison,
                                    *comparuson, *comparoson, *compariaon,*comparidon,
                                    *comparisin, *comparispn, *comparisob and *comparisom.

      4. OCR errors:                *compaiison and *comparisom.

      5. Non-native speakers        *comparition and *comparizon.
      errors:

Ricardo Baeza-Yates and Luz Rello     W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                          Outline


                                    Sample D, dyslexic errors


                          comparison                          *comaprsion
                          understanding                       *understangind
                          knowledge                           *knwolegde
                          impossible                          *inpossbile
                          tomorrow                            *torromow
                          worries                             *worires
                          explain                             *exaplin
                          interesting                         *intersenting
                          situation                           *situartion
                          confusion                           *confusetion


Ricardo Baeza-Yates and Luz Rello       W4A 2011, Hyderabad              Estimating Dyslexia in the
How
                                       Outline

                              Estimating Dyslexia in the Web


           — Let us define:

                   f : fraction of Web pages with lexical errors.
                   d : fraction of dyslexic errors among all lexical errors.

           — Then, the fraction of Web pages with dyslexia is f × d.



           — We find a lower bound for f and d, to obtain a lower bound for the
           fraction of dyslexic pages in the Web.




Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad         Estimating Dyslexia in the Web
How
                                       Outline

                              Estimating Dyslexia in the Web




          — We use the main search engines (Bing, Google and Yahoo!)
          to estimate the document frequency of a word.

          — Each of the words in our list is searched only in English web
          pages to avoid cases of wrong words that may have a meaning
          in other language.




Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad       Estimating Dyslexia in the Web
How
                                      Outline

                              Estimating Dyslexia in the Web



       — We bound the relative fraction of documents with lexical error, f, by
       using a sample of frequent words that appear in most documents,
       usually called stopwords in information retrieval (becuase, trhough, etc.).

       — We use the largest relative fraction of misspells for all these words to
       estimate f, as we cannot assume that all of them appear in different pages.

       — To bound d we do the same frequency search with a sample of non-
       frequent words (Sample D) where we can distinguish the different types of
       errors without ambiguity.



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad         Estimating Dyslexia in the Web
Results
                                      Outline

                              Estimating Dyslexia in the Web




                       Range of percentages and average for the
                                different error classes.

             We use the real document frequencies of the terms from one of
             the search engines to validate the results obtained, finding very
             similar results.



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
Results
                                      Outline

                              Estimating Dyslexia in the Web


             — From the sample D, the percentage of dyslexic errors among all
             lexical errors is very low with an average of 0.67%

             — From Pedler (2007), only 39% of dyslexics errors are multi-errors

             — This implies that the lower bound is at least d/0.39, but we can
             safely use a factor of 3 to correct this fact.

             — We have that f is at least 0.27% from the word becuase.

             — Then, we can estimate d as 2.01%.

             — Lower bound for dyslexia in the Web is 0.005%.


Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad       Estimating Dyslexia in the Web
Conclusions
                                      Outline




         • The amount of dyslexic texts in the Web is not as large as it could
         be. This suggests the idea that the widespread use of spell checkers
         ameliorates dyslexia in the Web.



         • Particular words can be used to detect dyslexic texts, and hence
         dyslexic users. This can be used to improve Web accessibility as
         well as future spell checkers or other tools targeted to dyslexic users.




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
Conclusions
                                     Outline


        • Since this is the first attempt to estimate text written by dyslexics
        individuals in the Web, a comparison with previous work is not possible.



        • Previous research on dyslexia reveals that error frequency is related
        with word length (Pedler, 2007). Short words such as there, where, form,
        etc. are misspelled much more frequently in dyslexic texts than long words
        like the ones used in our experiments. Hence, we can do a better estimation
        by using a larger sample of stopwords as well as long dyslexic words.



        • As a byproduct we have found that other types of errors are much more
        frequent in the Web and this can be used to assess the quality of Web
        text.


Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad         Estimating Dyslexia in the Web
On-going Work
                                      Outline


        New methodology.
                  Sample enlarged to 50 words.
                  Real data extracted from a leading search engine.
                  Up-down/Left-right typos.
                  New lower bound: 0.8 % (16 times better).




                        Range of percentages and average for the
                                 different error classes.


Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad      Estimating Dyslexia in the Web
Future Work
                                     Outline




             1 — Identification of dyslexic errors. Dyslexia diagnosis.

             2 — NLP techniques for making text more accessible for
             dyslexic users.

             3 — Web quality estimation (Gelman & Barletta, 2008),
             across countries, domiens and social media.




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad   Estimating Dyslexia in the Web
Outline




                             Zank u beri mach




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad   Estimating Dyslexia in the Web

Más contenido relacionado

Similar a Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011

Dyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiologyDyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiology2PIR
 
Presentation of SpecialNeed
Presentation of SpecialNeedPresentation of SpecialNeed
Presentation of SpecialNeedCArol Pun
 
Neurological Basis of Dyslexia
Neurological Basis of DyslexiaNeurological Basis of Dyslexia
Neurological Basis of DyslexiaCecilia Marcano
 
Understanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesUnderstanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesBin Goldman, PsyD
 
Meeting the needs of families part 1
Meeting the needs of families part 1Meeting the needs of families part 1
Meeting the needs of families part 1elaine santos
 
Diagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your ClassroomDiagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your Classroomjoepvdw
 
Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...CHIBUIKE CHINE
 
Dare2 read parent information evening
Dare2 read parent information eveningDare2 read parent information evening
Dare2 read parent information eveningRobyn Monaghan
 
Diagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLDiagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLKLSagert
 
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Luz Rello
 
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk
 

Similar a Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011 (20)

Dyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiologyDyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiology
 
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s AphasiaRole of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
 
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغةالمجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
 
Vol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
Vol. 2, No. 3 , Ahwaz Journal of Linguistics StudiesVol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
Vol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
 
Presentation of SpecialNeed
Presentation of SpecialNeedPresentation of SpecialNeed
Presentation of SpecialNeed
 
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia   Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
 
Neurological Basis of Dyslexia
Neurological Basis of DyslexiaNeurological Basis of Dyslexia
Neurological Basis of Dyslexia
 
Dyslexia
DyslexiaDyslexia
Dyslexia
 
Understanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesUnderstanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning Disabilities
 
Dyslexia and Dysgraphia
Dyslexia and DysgraphiaDyslexia and Dysgraphia
Dyslexia and Dysgraphia
 
Meeting the needs of families part 1
Meeting the needs of families part 1Meeting the needs of families part 1
Meeting the needs of families part 1
 
Diagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your ClassroomDiagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your Classroom
 
Brain Research
Brain ResearchBrain Research
Brain Research
 
surface dyslexia
surface dyslexia surface dyslexia
surface dyslexia
 
Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...
 
Dare2 read parent information evening
Dare2 read parent information eveningDare2 read parent information evening
Dare2 read parent information evening
 
Diagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLDiagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOL
 
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
 
Dyslexia
DyslexiaDyslexia
Dyslexia
 
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
 

Último

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011

  • 1. Estimating Dyslexia in the Web Ricardo Baeza-Yates Luz Rello Yahoo! Research & Web Research and Web Research Group, NLP Groups Pompeu Fabra University, Pompeu Fabra University, Barcelona, Spain Barcelona, Spain W4A 2011, Hyderabad
  • 2. Outline Outline — What — Why to distinguish dyslexic errors — How to build a sample to measure dyslexia — Results Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 3. What Outline Dyslexia is a neurologically-based disorder which Dyslexia interferes with the acquisition and processing of language. It manifests itself with difficulties in receptive and expressive language, including phonological processing, in reading, writing, spelling (The Boder’s Test and handwriting and sometimes in arithmetic. of Reading-Spelling Patterns) (Committee of Members Orton Dyslexia Society. Definition of Dyslexia, 1994.) The largest of the three subtypes of dyslexia that Dysphonetic the author presents. Dysphonetic dyslexia is dyslexia viewed as a disability in associating symbols with sounds. The misspellings typical of this disorder are due to phonetic inaccuracy. (Boder & Jarrico, 1982) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 4. Why Outline There is a universal neuro-cognitive basis for dyslexia. (Paulesu et al. 2001) It manifestations are culture-specific due to All languages different orthographies. (Alegria, 2006) English is a language with deep orthography, the mapping between letters, speech sounds, and whole-word sounds is often highly ambiguous and therefore dyslexics examples are more widespread than in other languages with transparent or shallow orthography. (Paulesu et al. 2001) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 5. Why Outline Researchers estimate that 10-17% of the population in the U.S.A. has dyslexia and only 30% of dyslexics have trouble with reversing letters and numbers. On the other hand, the level of dyslexia in other regions such as Europe or China is lower. Frequent (H. Meng et al., 2005) There are around 38 million of dyslexics in Europe. (Ruiz del Árbol, 2008) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 6. Why Outline Detecting the presence of dyslexic texts in the Web helps us to know the real impact of dyslexia in the Web as well as to value dyslexic-accessible practices. Useful There is a common agreement in these studies that the application of dyslexic-accessible practices benefits also the readability for non-dyslexic users as well as other users with disabilities such as low vision. (McCarthy & Swierenga, 2010) (Evett & Brown, 2005) Spelling error rates has proven to be a useful index for website content quality. (Gelman & Barletta, 2008) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 7. Why Outline Estimating dyslexia in a group of web pages depending on their domain. (Ringlstetter et al. 2006) Novel This is the first attempt to estimate the amount of texts containing English dyslexic errors in the Web. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 8. How Outline Two examples of dyslexic texts There seams to be some confusetion. Althrow he rembers the situartion, he is not clear on z detailes. With regard to deleteing parts, could you advice me of the excat nature of the promblem and I will investgate it imeaditly. I halve a spelling chequer It cam with my pea see Eye now I’ve gut the spilling rite Its plane fore al too sea ... I ts latter prefect awl the weigh My chequer tolled mi sew. (Pedler, 2007) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 9. How Outline How many kinds of errors can be produced by a dyslexic? Simple errors 53% Multi errors 39% Word boundary errors 8% —— 100% dyslexic errors Real-word errors 17% Non-word errors 83% —— 100% First letter errors 5% (Pedler, 2007) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 10. How Outline How many kinds of errors in the Web? 1. Dyslexic errors: Among the different kinds of errors commonly made made by dyslexics (i.e. unfinishedwords or letters, omitted words, inconsistent spaces between words and letters (Vellutino, 1979). *reiecve instead of receive 2. Regular spelling errors produced by non-impaired native English individuals, such as the transposition error, i.e. *recieve. 3. Regular typos caused by the adjacency of letters in the keyboard, i.e. *teceive. 4. OCR errors, due to letters of similar shape, such as *ieceive. 5. Errors made by non-native speakers who use English as a foreign language. For example, *receibe is a typical error made by Spanish learners of English, since the graphemes ‘b’ and ‘v’ are pronounced as /b/, and the phoneme /v/ does not exist in the standard Spanish phonemic system. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 11. How Outline Selection criteria To avoid the overlap of dyslexic errors and other errors: — We consider only words written by dyslexics containing multi- errors, that is, the dyslexic word differs from the intended correct word by more than one letter. For example, the dyslexic word *konwlegde from knowledge. To avoid the overlap of dyslexic errors and real words: — Errors which coincide with other existing words in English are omitted, i.e. *trust being the intended word truth. — Errors which give as a result a proper name are also filtered, for instance the typo *wirries from worries is also a proper name. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web in the
  • 12. How Outline Selection criteria — All the dyslexic spelling errors are extracted from samples of text written by adults with diagnosed dyslexia (extracted from a corpus compiled for this purpose) and from literature (Pedler, 2007). — Among the dyslexic errors, we take in account the ones which include the letters that produce more confusion among dyslexic individuals, such as ‘b’, ‘d’, ‘p’, ‘m’, ‘n’, ‘u’ and ‘w’ together with other similar looking letters. For instance, it is specially frequent to find reversals of similar letters, such as ‘b’ and ‘d’ (Deloche et al. 1982). i.e. *impossidle being the intended word impossible. — Errors due to homophone confusion, that is words which have a similar pronunciation (Pedler, 2007), are not selected even though 15% of the dyslexic errors presented homophone confusion in a corpus of dyslexic texts (witch and which). Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 13. How Outline Sample D, an example for the word comparison 1. Dyslexic error: *comaprsion. 2. Spelling errors: *comparision, *conparison and *coparison. 3. Typos: *vomparison, *xomparison, *cimparison, *cpmparison, *conparison, *co,parison, *comoarison, *com[arison, *comprison, *compsrison, *compaeison, *compatison, *comparuson, *comparoson, *compariaon,*comparidon, *comparisin, *comparispn, *comparisob and *comparisom. 4. OCR errors: *compaiison and *comparisom. 5. Non-native speakers *comparition and *comparizon. errors: Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 14. How Outline Sample D, dyslexic errors comparison *comaprsion understanding *understangind knowledge *knwolegde impossible *inpossbile tomorrow *torromow worries *worires explain *exaplin interesting *intersenting situation *situartion confusion *confusetion Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the
  • 15. How Outline Estimating Dyslexia in the Web — Let us define: f : fraction of Web pages with lexical errors. d : fraction of dyslexic errors among all lexical errors. — Then, the fraction of Web pages with dyslexia is f × d. — We find a lower bound for f and d, to obtain a lower bound for the fraction of dyslexic pages in the Web. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 16. How Outline Estimating Dyslexia in the Web — We use the main search engines (Bing, Google and Yahoo!) to estimate the document frequency of a word. — Each of the words in our list is searched only in English web pages to avoid cases of wrong words that may have a meaning in other language. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 17. How Outline Estimating Dyslexia in the Web — We bound the relative fraction of documents with lexical error, f, by using a sample of frequent words that appear in most documents, usually called stopwords in information retrieval (becuase, trhough, etc.). — We use the largest relative fraction of misspells for all these words to estimate f, as we cannot assume that all of them appear in different pages. — To bound d we do the same frequency search with a sample of non- frequent words (Sample D) where we can distinguish the different types of errors without ambiguity. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 18. Results Outline Estimating Dyslexia in the Web Range of percentages and average for the different error classes. We use the real document frequencies of the terms from one of the search engines to validate the results obtained, finding very similar results. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 19. Results Outline Estimating Dyslexia in the Web — From the sample D, the percentage of dyslexic errors among all lexical errors is very low with an average of 0.67% — From Pedler (2007), only 39% of dyslexics errors are multi-errors — This implies that the lower bound is at least d/0.39, but we can safely use a factor of 3 to correct this fact. — We have that f is at least 0.27% from the word becuase. — Then, we can estimate d as 2.01%. — Lower bound for dyslexia in the Web is 0.005%. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 20. Conclusions Outline • The amount of dyslexic texts in the Web is not as large as it could be. This suggests the idea that the widespread use of spell checkers ameliorates dyslexia in the Web. • Particular words can be used to detect dyslexic texts, and hence dyslexic users. This can be used to improve Web accessibility as well as future spell checkers or other tools targeted to dyslexic users. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 21. Conclusions Outline • Since this is the first attempt to estimate text written by dyslexics individuals in the Web, a comparison with previous work is not possible. • Previous research on dyslexia reveals that error frequency is related with word length (Pedler, 2007). Short words such as there, where, form, etc. are misspelled much more frequently in dyslexic texts than long words like the ones used in our experiments. Hence, we can do a better estimation by using a larger sample of stopwords as well as long dyslexic words. • As a byproduct we have found that other types of errors are much more frequent in the Web and this can be used to assess the quality of Web text. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 22. On-going Work Outline New methodology. Sample enlarged to 50 words. Real data extracted from a leading search engine. Up-down/Left-right typos. New lower bound: 0.8 % (16 times better). Range of percentages and average for the different error classes. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 23. Future Work Outline 1 — Identification of dyslexic errors. Dyslexia diagnosis. 2 — NLP techniques for making text more accessible for dyslexic users. 3 — Web quality estimation (Gelman & Barletta, 2008), across countries, domiens and social media. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 24. Outline Zank u beri mach Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web