LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestras Condiciones de uso y nuestra Política de privacidad para más información.
LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestra Política de privacidad y nuestras Condiciones de uso para más información.
How did we get over this problem? Our research has two approaches, one based on the human behavior when they are posting , and the second one based on the meaning of the text they are writing We also know we have a history of every post with its result, moderated or not, so we will be able to train the model and guide the data mining. We don’t have to guess/try the result or classify posts. Just learn about the history.
Starting with the first approach we thought about the information that we have: The entryid,threadid,forumid of the post The userid The date and time when they posted The result of the moderation (fail or not)
But this information is not quite enough for us to extract knowledge… to get patterns What is going to happen if some new user post on a new thread? We don’t have any history about his behavior, or the behavior of the thread. What should our system do in that case? We started building some attributes, like these: Percentage of fails in a certain thread Percentage of fails per user Difference in hours between the date he posted and the date the forum was created Percentage of fails in some forum Difference in hours between the date he posted and the date he failed on a forum The hour of the day Difference in hours between the date he posted and the date he joined the forum Percentage of fails per user in a thread Difference in hours between the date he posted and the date the thread was created The day of the week
And you can imagine how many combinations can happen among these attributes: Percentage of fails per user in a thread on mondays, Percentage of fails during weekends, during national holidays, during last week…
There are lots of patterns and uses: Like: 1.- Moderating half of what moderators are moderating right now, Data Miming will still get more than 95% of the failing posts. That is, moderating 600.000 posts, you will fail 86.000 out of the 91.000 that you are moderating to fail now. 2.- In heavy posting days, where you will not able to moderate everything, you can automatically decide to not moderate posts that are likely to not fail. This way you minimize the risk compare to random selection of post to not moderate
Microsoft Azure Machine Learning, a fully-managed cloud service for building predictive analytics solutions, helps overcome the challenges most businesses have in deploying and using machine learning. How? By delivering a comprehensive machine learning service that has all the benefits of the cloud. Azure Ml brings together the capabilities of new analytics tools, powerful algorithms developed for Microsoft products like Xbox and Bing, and years of machine learning experience into one simple and easy-to-use cloud service.
•Data scientists can bring their existing assets in R and integrate them seamlessly into their Azure ML workflows. •Using Azure ML Studio, R scripts can be operationalized as scalable, low latency web services on Azure in a matter of minutes! •Data scientists have access to over 400 of the most popular CRAN packages, pre-installed. Additionally, they have access to optimized linear algebra kernels that are part of the Intel Math Kernel Library. •Data scientists can visualize their data using R plotting libraries such as ggplot2. •The platform and runtime environment automatically recognize and provide extensibility via high fidelity bi-directional dataframeand schema bridges, for interoperability. •Developers can access common ML algorithms from R and compose them with other algorithms provided by the Azure ML platform.
R most widely used data analysis software – used by 2M + data scientist, statisticians and analysts Most powerful statistical programming language used with RStudio, it can help you for the purposes of productivity Create beautiful and unique data visualisations – as seen in New York Times, Twitter and Flowing Data Thriving open-source community – leading edge of analytics research Fills the talent gap – new graduates prefer R. It’s fun!
Why else might you use R? Pivot Tables are not always enough Scaling Data (ScaleR) R is very good at static data visualisation but Power BI and Excel are very good at dynamic data visualisation You want to double check your results or do further analysis
You can use RODBC to connect to data between R and SQL Server, or R and Excel. Alternatively you can import data in.
Analisis predictivo en la web con Azure Machine Learning
Rubén Pertusa Lopez
MVP SQL Server
BI & BigData Architect SolidQ
Análisis Predictivo en
nuestra web con
Azure Machine Learning
Consumir resultados desde AzureML
Convertir problemas en problemas ML
Integración sencilla con .NET
AzureML + Big Data + Cultura de datos
3/2/2015 | Footer Goes Here50 |
Guarda las diapositivas más importantes con los recortes.
Los recortes son una forma práctica de recopilar y organizar las diapositivas más importantes de una presentación. Puedes guardar tus magníficos descubrimientos en tableros de recortes organizados por temas.