Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
A Corpus LinguisticsBased Approach forEstimating ArabicOnline Content
5,340,000
1,950,000
0.5 %
1%
1.4 %
3%
0.5 %1.4 % %     1
Zipff’s Law
CorporaBuilding
Dmoz corpus75,560 pages530.1 MB659,756 uniq. words
Wikipedia corpus95,140 pages213.3 MB760,690 uniq. words
CCA corpus377 pages82,878 uniq. words
Common
‫‪Word‬‬   ‫‪Document‬‬   ‫‪Frequency‬‬   ‫‪Word‬‬       ‫‪Document‬‬   ‫‪Frequency‬‬ ‫فً‬      ‫812,06‬   ‫882,770,1‬    ...
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
Próxima SlideShare
Cargando en…5
×

A corpus linguistics based approach for estimating online content

858 visualizaciones

Publicado el

Publicado en: Tecnología, Educación
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

A corpus linguistics based approach for estimating online content

  1. 1. A Corpus LinguisticsBased Approach forEstimating ArabicOnline Content
  2. 2. 5,340,000
  3. 3. 1,950,000
  4. 4. 0.5 %
  5. 5. 1%
  6. 6. 1.4 %
  7. 7. 3%
  8. 8. 0.5 %1.4 % % 1
  9. 9. Zipff’s Law
  10. 10. CorporaBuilding
  11. 11. Dmoz corpus75,560 pages530.1 MB659,756 uniq. words
  12. 12. Wikipedia corpus95,140 pages213.3 MB760,690 uniq. words
  13. 13. CCA corpus377 pages82,878 uniq. words
  14. 14. Common
  15. 15. ‫‪Word‬‬ ‫‪Document‬‬ ‫‪Frequency‬‬ ‫‪Word‬‬ ‫‪Document‬‬ ‫‪Frequency‬‬ ‫فً‬ ‫812,06‬ ‫882,770,1‬ ‫أو‬ ‫967,62‬ ‫457,501‬ ‫من‬ ‫949,16‬ ‫250,068‬ ‫هذه‬ ‫982,92‬ ‫469,79‬‫على‬ ‫648,65‬ ‫496,894‬ ‫بين‬ ‫266,23‬ ‫535,48‬ ‫إلى‬ ‫995,84‬ ‫513,872‬ ‫اهلل‬ ‫308,62‬ ‫612,48‬ ‫أن‬ ‫934,04‬ ‫564,772‬ ‫أخبار‬ ‫010,03‬ ‫498,18‬‫عن‬ ‫637,05‬ ‫428,142‬ ‫كل‬ ‫772,03‬ ‫422,18‬‫التً‬ ‫734,53‬ ‫200,661‬ ‫الزئيسية‬ ‫000,14‬ ‫161,08‬ ‫ال‬ ‫221,04‬ ‫788,351‬ ‫بعد‬ ‫073,23‬ ‫713,87‬ ‫مع‬ ‫797,83‬ ‫751,031‬ ‫الصفحة‬ ‫738,72‬ ‫449,66‬ ‫ما‬ ‫736,33‬ ‫403,921‬ ‫لم‬ ‫304,52‬ ‫152,46‬ ‫هذا‬ ‫363,13‬ ‫521,901‬ ‫كان‬ ‫613,32‬ ‫813,36‬‫الذي‬ ‫474,23‬ ‫448,801‬ ‫العالم‬ ‫782,32‬ ‫864,06‬

×