Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Data Product Architectures

3.369 visualizaciones

Publicado el

Data products derive their value from data and generate new data in return; as a result, machine learning techniques must be applied to their architecture and their development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. Data product architectures are therefore life cycles and understanding the data product lifecycle will enable architects to develop robust, failure free workflows and applications. In this talk we will discuss the data product life cycle, explore how to engage a model build, evaluation, and selection phase with an operation and interaction phase. Following the lambda architecture, we will investigate wrapping a central computational store for speed and querying, as well as incorporating a discussion of monitoring, management, and data exploration for hypothesis driven development. From web applications to big data appliances; this architecture serves as a blueprint for handling data services of all sizes!

Publicado en: Tecnología

Data Product Architectures

  1. 1. Data Product Architectures Benjamin Bengfort @bbengfort District Data Labs
  2. 2. Abstract
  3. 3. What is data science? Or what is the goal of data science? Or why do they pay us so much?
  4. 4. Two Objectives Orient Data Science to Users
  5. 5. Data Products are self-adapting, broadly applicable software-based engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data.
  6. 6. Data Products are Applications that Employ Many Machine Learning Models
  7. 7. Data Report
  8. 8. Without Feedback Models are Disconnected They cannot adapt, tune, or react.
  9. 9. Data Products aren’t single models So how do we architect data products?
  10. 10. The Lambda Architecture
  11. 11. Three Case Studies
  12. 12. Analyst Architecture
  13. 13. Analyst Architecture: Document Review
  14. 14. Analyst Architecture: Triggers
  15. 15. Recommender Architecture
  16. 16. Recommender: Annotation Service
  17. 17. Partisan Discourse Architecture
  18. 18. Partisan Discourse: Adding Documents
  19. 19. Partisan Discourse: Documents
  20. 20. Partisan Discourse: User Specific Models
  21. 21. Commonalities?
  22. 22. Microservices Architecture: Smart Endpoints, Dumb Pipe HTTP HTTP HTTP HTTP HTTPHTTP HTTP Stateful Services Database Backed Services
  23. 23. Django Application Model
  24. 24. Class Based, Definitional Programming from rest_framework import viewsets class InstanceViewSet(viewsets.ModelViewSet): queryset = Instance.objects.all() serializer_class = InstanceSerializer def list(self, request): pass def create(self, request): pass def retrieve(self, request, pk=None): pass def update(self, request, pk=None): pass def destroy(self, request, pk=None): pass from django.db import models from rest_framework import serializers as rf class InstanceSerializer(rf.ModelSerializer): prediction = rf.CharField(read_only=True) class Meta: model = Instance fields = ('color', 'shape', 'amount') class Instance(models.Model): SHAPES = ('square', 'triangle', 'circle') color = models.CharField(default='red') shape = models.CharField(choices=SHAPES) amount = models.IntegerField()
  25. 25. Features and Instances as Star Schema
  26. 26. REST API Feature Interaction
  27. 27. Model (ML) Build Process: Export Instance Table COPY ( SELECT instances.* FROM instances JOIN feature on = ... ORDER BY instance.created LIMIT 10000 ) as instances TO '/tmp/instances.csv' DELIMITER ',' CSV HEADER;
  28. 28. Model (ML) Build Process: Build Model import pandas as pd from sklearn.svm import SVC from sklearn.cross_validation import KFold # Load Data data = pd.read_csv('/tmp/instances.csv') scores = [] # Evaluation folds = KFold(n=len(data), n_folds=12) for train, test in folds: model = SVC()[train]) score = model.score(data[test]) scores.append(score) # Build the actual model model = SVC()
  29. 29. Model (ML) Build Process: Store Model import json import pickle import base64 import datetime data = pickle.dump(model) data = base64.base64encode(data) return { "model": data, "created":, "form": repr(model), "name": model.__class__.__name__, "scores": scores, }
  30. 30. Model Data Storage from django.db import models class PredictiveModel(models.Model): name = models.CharField() params = models.JSONField() build = models.FloatField() f1_score = models.FloatField() created = models.DateTimeField() data = models.BinaryField()
  31. 31. REST API Model Interaction featurize() predict() Models Stored in Memory Update Annotations
  32. 32. Build Data Products!
  33. 33. Questions?