Analisis predictivo en la web con Azure Machine Learning

1.716 visualizaciones

Publicado el

Publicado en: Datos y análisis
0 comentarios
5 recomendaciones
  • Sé el primero en comentar

Sin descargas
Visualizaciones totales
En SlideShare
De insertados
Número de insertados
Insertados 0
No insertados

No hay notas en la diapositiva.
  • How did we get over this problem? Our research has two approaches, one based on the human behavior when they are posting , and the second one based on the meaning of the text they are writing
    We also know we have a history of every post with its result, moderated or not, so we will be able to train the model and guide the data mining. We don’t have to guess/try the result or classify posts. Just learn about the history.

    Starting with the first approach we thought about the information that we have:
    The entryid,threadid,forumid of the post
    The userid
    The date and time when they posted
    The result of the moderation (fail or not)

    But this information is not quite enough for us to extract knowledge… to get patterns
    What is going to happen if some new user post on a new thread? We don’t have any history about his behavior, or the behavior of the thread. What should our system do in that case?
    We started building some attributes, like these:
    Percentage of fails in a certain thread
    Percentage of fails per user
    Difference in hours between the date he posted and the date the forum was created
    Percentage of fails in some forum
    Difference in hours between the date he posted and the date he failed on a forum
    The hour of the day
    Difference in hours between the date he posted and the date he joined the forum
    Percentage of fails per user in a thread
    Difference in hours between the date he posted and the date the thread was created
    The day of the week

    And you can imagine how many combinations can happen among these attributes: Percentage of fails per user in a thread on mondays, Percentage of fails during weekends, during national holidays, during last week…

    There are lots of patterns and uses:
    1.- Moderating half of what moderators are moderating right now, Data Miming will still get more than 95% of the failing posts. That is, moderating 600.000 posts, you will fail 86.000 out of the 91.000 that you are moderating to fail now.
    2.- In heavy posting days, where you will not able to moderate everything, you can automatically decide to not moderate posts that are likely to not fail. This way you minimize the risk compare to random selection of post to not moderate

  • Microsoft Azure Machine Learning, a fully-managed cloud service for building predictive analytics solutions, helps overcome the challenges most businesses have in deploying and using machine learning.
    How? By delivering a comprehensive machine learning service that has all the benefits of the cloud.
    Azure Ml brings together the capabilities of new analytics tools, powerful algorithms developed for Microsoft products like Xbox and Bing, and years of machine learning experience into one simple and easy-to-use cloud service.

  • •Data scientists can bring their existing assets in R and integrate them seamlessly into their Azure ML workflows.
    •Using Azure ML Studio, R scripts can be operationalized as scalable, low latency web services on Azure in a matter of minutes!
    •Data scientists have access to over 400 of the most popular CRAN packages, pre-installed. Additionally, they have access to optimized linear algebra kernels that are part of the Intel Math Kernel Library.
    •Data scientists can visualize their data using R plotting libraries such as ggplot2.
    •The platform and runtime environment automatically recognize and provide extensibility via high fidelity bi-directional dataframeand schema bridges, for interoperability.
    •Developers can access common ML algorithms from
    R and compose them with other algorithms provided by the Azure ML platform.

    R most widely used data analysis software – used by 2M + data scientist, statisticians and analysts
    Most powerful statistical programming language
    used with RStudio, it can help you for the purposes of productivity
    Create beautiful and unique data visualisations – as seen in New York Times, Twitter and Flowing Data
    Thriving open-source community – leading edge of analytics research
    Fills the talent gap – new graduates prefer R.
    It’s fun!

    Why else might you use R?
    Pivot Tables are not always enough
    Scaling Data (ScaleR)
    R is very good at static data visualisation but Power BI and Excel are very good at dynamic data visualisation
    You want to double check your results or do further analysis

    You can use RODBC to connect to data between R and SQL Server, or R and Excel. Alternatively you can import data in.

  • Analisis predictivo en la web con Azure Machine Learning

    1. 1. Rubén Pertusa Lopez MVP SQL Server BI & BigData Architect SolidQ HOL: Análisis Predictivo en nuestra web con Azure Machine Learning Y A X B
    2. 2. @rpertusa
    3. 3. Tiempo: 2h15min Necesario: Cuenta Azure Opcional: VS2012>= Materiales:
    4. 4. ¿Hacia dónde vamos?
    5. 5. Data sources
    6. 6. Azure SDK 2.5 (VS 2012 , VS 2013 , VS 2015 Preview) Hbase en VS Proyecto Marlin en GitHub
    7. 7.{subscripti on-id}/resourceGroups/{resource-group- name}/providers/Microsoft.StreamAnalytics/streamingjo bs/{job-name}/transformations/{transformation- name}?api-version={api-version}
    8. 8. Data Developers
    9. 9. 3/2/2015 | Footer Goes Here28 | Sistemas que pueden aprender de los datos y descubrir patrones y reglas ocultos para poder explotar nuevas relaciones de negocio
    10. 10. Explosion Big Data Sin mejoras Deep neural Networks Mejoras en SSAS DM Modelos gráficos SSAS 2000 DM Scoring Sistemas expertos & Arboles decisión Redes neuronales
    11. 11. Computación escalable y barata (Big Data) + Los mejores algoritmos ML + Adopción de la cultura de datos = ML a su máxima exponencia
    12. 12. Reconocimiento de Texto
    13. 13. Limpieza y etiquetado de datos Entrenar el modelo ML Obtener el score de la entrada
    14. 14. Caso de estudio Predicción de problemas de rendimiento Cia Worldwide Automóviles: Campaña de fidelización y cálculo de stock Cia Retail Worldwide: Toma de decisiones
    15. 15. Input EntryId Date UserId SiteId ForumId ThreadId ParentId PrevId NextId Texto Tabla de casos
    16. 16. Objetivo: Predecir cuando va a fallar Pasos
    17. 17. Compañia de automóviles worldwide
    18. 18. Compañia Retail worldwide
    19. 19. Sin límites
    20. 20. Servicio escalable administrado en la nube Rápido desarrollo y despliegue en PRO Interfaz dirigida a workflow Algoritmos ML actualizados Colaborativo Accesible via navegador web
    21. 21. Lenguaje de programación más potente para estadistica Más de 400 paquetes R ya disponibles Visualización con R Compatibilidad con R y Python 3/2/2015 | Footer Goes Here43 |
    22. 22. Obtener/Preparar datos Crear experimento Lanzar experimento Ver resultados Salvar modelo entrenado Comprobar Input/Output del servicio web Publicar servicio web Desplegar a PRO Data Scientist IT
    23. 23. 3/2/2015 | Footer Goes Here46 |
    24. 24. Consumir resultados desde AzureML http://microsoftazuremachinelearning.azu
    25. 25. Convertir problemas en problemas ML Integración sencilla con .NET AzureML + Big Data + Cultura de datos Más materiales US/home?forum=MachineLearning 3/2/2015 | Footer Goes Here50 |
    26. 26. Y A X B
    27. 27. Y A X B
    28. 28. Speaker name Title ¡¡¡Si te ha gustado no olvides rellenar la encuesta!!! Thanks Y A X B