Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

A Year in Review - Building a Comprehensive Data Management Program

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 23 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a A Year in Review - Building a Comprehensive Data Management Program (20)

Más de DataWorks Summit (20)

Anuncio

Más reciente (20)

A Year in Review - Building a Comprehensive Data Management Program

  1. 1. A Year in Review - Building a Comprehensive Data Management Program @ Microsoft Research
  2. 2. What Exactly Is Big Data? 2 Wikipedia: “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” Critical tool for Microsoft’s businesses Opportunity to deliver transformative new capabilities to our enterprise customers
  3. 3. MSR and Big Data 3 First, the sword: Shame on us… Many undergrads with better big data capabilities Martians versus Earthlings Finally…Big data has been fully embraced by MSR as A vital tool to enable research A vital area in which to do research We are MAKING THE INVESTMENT
  4. 4. Microsoft Research’s Centralized Data Management and Data Processing Platform Founded June - 2013
  5. 5. Microsoft Research’s Centralized Data Management and Data Processing Platform Project Vision
  6. 6. Motivation: • Numerous Areas of Research are Driven by Data (Research Need) • Data comes in very different forms from very different sources (Adapting to Change) • Identified need standardized Data Storage and Data Processing resource for MSR (Community) • Many different research groups were processing and storing the same data sets. (Shared Knowledge / Data Sharing) • Some research groups were not aware that so many different types of data was available. (Communication / Collaboration) Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Adapting to Change Community Collaboration Shared Knowledge Data Sharing Research Need
  7. 7. Guiding Principles: • Secure and Compliant (e.g. Data Security, Privacy and Ethics) • World-wide Access (equal opportunity for access and use given to all MSR labs) • Created through Partnerships with teams throughout Microsoft • Driven by Researcher Needs and Requirements (e.g. Tools, Hardware, Software, Datasets) • Flexibility Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Security Driven by Researcher Needs Research and Product Team Partnerships Global Access Compliance Ethics
  8. 8. Goals: • Centralized, Compliant, and Curated Data Storage Facilities • Multi-Purpose Data Processing Architecture (mix of different types of Hardware) • Flexibility with Software • Active User Community (supported through Outreach and Training) Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Centralized Compliant Curated User Community Flexibility with Software and Tools Blend of Technology and Services
  9. 9. Centralized Data Management Research and Innovation Support Innovative Hardware and Tools Partnerships Data Privacy and Security Community and Outreach Microsoft Research’s Centralized Data Management and Data Processing Platform
  10. 10. Microsoft Research’s Centralized Data Management and Data Processing Platform System Architecture
  11. 11. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  12. 12. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  13. 13. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  14. 14. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  15. 15. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST
  16. 16. Microsoft Research’s Centralized Data Management and Data Processing Platform Bing
  17. 17. Microsoft Research’s Centralized Data Management and Data Processing Platform Data Management
  18. 18. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST
  19. 19. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST Compliance Security Data Management Ethics Policy
  20. 20. Microsoft Research’s Centralized Data Management and Data Processing Platform ComplianceSecurity Ethics • Policy / Procedure • Standardization / Common Platform • Technology • Corporate Technology and Compliance • Standardization / Common Platform • Technology • Ethical Review Board / Legal and Corporate Affairs • Standardization / Common Platform • Technology
  21. 21. Microsoft Research’s Centralized Data Management and Data Processing Platform ComplianceSecurity Ethics
  22. 22. Microsoft Research’s Centralized Data Management and Data Processing Platform Fun Examples F sharp Naiad Skype Translator Azure ML
  23. 23. Microsoft Research’s Centralized Data Management and Data Processing Platform Discussion / Questions / Next Steps

Notas del editor

  • 5
  • 23

×