Publicidad

Más contenido relacionado

Presentaciones para ti(20)

Publicidad

Data catalog

  1. Data Catalog: Overview corporate.hrs.com
  2. | HRS | Increase the productivity of data users: ● data scientists ● data analyst ● BI engineers Why do we need it Title of presentation 2
  3. | HRS | Step 1: Search and find the data Step 2: Understand the data Step 3: Perform and analysis and visualization Step 4: Make a decision and/or share insights Data-Driven Decision Making Process Title of presentation 3 Data Discovery
  4. | HRS | 1. Ask coworkers 2. Ask in wider Zoom channel 3. Search over Confluence 4. Search over Repositories 5. Explore using * SQL queries Challenge: Search and find the data Title of presentation 4
  5. | HRS | ● Multiple results, which one is correct or up to date? ● What do different columns mean? Challenge: Understand the data Title of presentation 5
  6. | HRS | Data scientists spend up to ⅓ time in Data Discovery Title of presentation 6
  7. | HRS | 1. Discover new data sources 2. Identify end users to notify them of changes 3. Understand the popularity and trustworthiness of data 4. Investigate/monitor the magnitude of protected data exposure 5. Know what your boss or colleagues are using 6. Talk to upstream producers 7. +30% productivity for data users Metadata is the key to next bigdata wave Title of presentation 7
  8. | HRS | What type of questions we want to answer Title of presentation 8
  9. | HRS | Features matrix, to the best of my knowledge Title of presentation 9
  10. | HRS | ● First person to explore both North and South poles ● Norwegian explorer, Roald Amundsen Amundsen: Person Title of presentation 10
  11. | HRS | • Amundsen is a data discovery and metadata engine for improving the productivity of data users • It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables) • Think of it as Google search for data Amundsen: The tool Title of presentation 11
  12. | HRS | Architecture: Key components Title of presentation 12 Athena MSSql Exasol ... Glue CI/CD Source File Databuilder Crawler Neo4j Elastic Search Metadata Service Search Service Frontend Service ML Feature Service Security Service Other Microservices Metadata Sources
  13. | HRS | Landing page Title of presentation 13
  14. | HRS | Search Title of presentation 14
  15. | HRS | Table detail page Title of presentation 15
  16. | HRS | Computed stats about column metadata Title of presentation 16
  17. | HRS | People search Title of presentation 17
  18. | HRS | People page Title of presentation 18
  19. | HRS | ElasticSearch for search and relevance Title of presentation 19 ● Normal search: match records based on relevancy ● Category search: match records first based on data type, then relevancy ○ column: warehouse_cost ● Wildcard search: ○ event_*
  20. | HRS | Amundsen uses Apache Airflow to orchestrate Databuilder jobs Title of presentation 20
  21. THANK YOU
Publicidad