Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Graph technology meetup slides

106 visualizaciones

Publicado el

Graph technology meetup slides

Publicado en: Datos y análisis
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Sé el primero en recomendar esto

Graph technology meetup slides

  1. 1. Graph
  2. 2. Property Graph
  3. 3. Another Graph Example Nodes Edges
  4. 4. Why graphs?
  5. 5. Queries: SQL v. Gremlin select p2.name,p1.post from posts p1 inner join persons p2 on p1.to_id=p2.id where p1.from_id in(select id from persons where name='keith'); graph.traversal().V().has('name','keith').outE('posts') .as('msg').inV().as('who') .select('msg', 'who') .by('comment').by('name’)
  6. 6. Traversing a graph Suggest to John that he might know Jane
  7. 7. Traversing a graph
  8. 8. Traversing a graph
  9. 9. OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing Graph Databases
  10. 10. Image Reference: Kelvin Lawrence & Co Graph Databases
  11. 11. Graph Databases Landscape OLTP • IBM Graph • Titan • Neo4j • OrientDB • TAO • Jena OLAP • Pregel • Spark GraphX • Spark GraphFrames • Giraph
  12. 12. Questions? u Prachi Khadke (pskhadke@us.ibm.com) or the people wearing the Graph T-shirts! u Sean Mulvehill (mulvehill@us.ibm.com)
  13. 13. The Social+Data Graph of Life Science Barry Wark, PhD ovation.io
  14. 14. Modern life science R&D is challenging ● $50B “Cottage industry” now globalized and highly collaborative ○ Distributed teams ○ Universities, clinicians, non-profit labs, CROs, Biotech, Pharma ● PB of data, millions experiments per year ● Science is complicated R&D organizations are expected to produce efficient pipelines from academic research to clinical development
  15. 15. 70% of data collected annually in life science goes “dark” — unaccessible, undiscoverable or unuseable
  16. 16. $35B of data collected annually in life science goes “dark” — unaccessible, undiscoverable or unuseable
  17. 17. Why does data go dark? ?
  18. 18. Wark/Maslow hierarchy of scientific data needs Data Storage Metadata Versioning Collaboration
  19. 19. ● Secure cloud storage (HIPAA, 21 CFR 11) ● Metadata tied to files ● File/data Provenance across collaborators and analyses ● Integrated annotation, chat ● Low threshold: continue to use preferred capture, analysis tools A Scientific data layer stops data from going dark
  20. 20. The Social+Data Graph
  21. 21. ● Real-time ● Structural information: projects, experiments, people ● High information events ○ Researcher annotation ○ Communication ○ File selection Social+Data graph
  22. 22. ● 350,000 Researchers ● O(100B) files ● Average academic researcher writes 1 paper per year with 3 other colleagues in >1 countries ● k=8 ● 40,000 users to a fully connected graph Global Social+Data graph
  23. 23. Assisting R&D organizations to mobilize idle assets 1. Find relevant internal experts 2. Recommend existing, relevant data (and the resources to utilize it) 3. Identify the best external resources and opportunities 4. Organizational analytics a. Who are the effective collaborators? b. Which are the most valuable data sets?
  24. 24. Calculate (weighted) pairwise distances for all nodes using A*
  25. 25. ● Shuffle rows & columns of the matrix to minimize loss (spectral, information, etc.) ● Well-studied in bioinformatics (not that different) and text classification ● NP-complete ● Clusters allow us to look up in both directions ○ User → Data ○ Data → Users ○ (Users → Users) Bi-cluster to identify relevant groups
  26. 26. Data architecture + GraphX
  27. 27. Relevant, related people and data
  28. 28. Questions
  29. 29. Appendix
  30. 30. Publication graph ● Incomplete ● Late ● Post-hoc
  31. 31. Information-based co-clustering https://cs.gmu.edu/~carlotta/publications/IBCC_TMW_final.pdf https://pdfs.semanticscholar.org/4a3e/b95f17a88e14227b05a590639e8cd3346a99 .pdf

×