Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Mapper 1: RFC822 Parser map_parse.py Getting Started on Hadoop

20.557 visualizaciones

Publicado el

Mapper 1: RFC822 Parser

map_parse.py takes a list of URI for where to read email messages, parses
each message, then emits multiple kinds of output tuples:



(doc_id, msg_uri, date)

(sender, receiver, doc_id)

(term, term_freq, doc_id)

(term, co_term, doc_id)

Note that our dataset includes approximately 500,000 email messages, with an
average of about 100 words in each message.
Also, there are 10E+5 unique terms. That will tend to be a constant in English
texts, which is great to know when configuring capacity.

Publicado en: Tecnología

×