Enviar búsqueda
Cargar
Apache Arrow and Python: The latest
•
7 recomendaciones
•
5,799 vistas
Wes McKinney
Seguir
Talk from Data Science Summit 2016 in San Francisco
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 19
Descargar ahora
Descargar para leer sin conexión
Recomendados
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
Recomendados
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Databricks
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
Moving to Databricks & Delta
Moving to Databricks & Delta
Databricks
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
Ori Reshef
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Más contenido relacionado
La actualidad más candente
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Databricks
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
Moving to Databricks & Delta
Moving to Databricks & Delta
Databricks
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
Ori Reshef
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
La actualidad más candente
(20)
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Hive 3 - a new horizon
Hive 3 - a new horizon
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Moving to Databricks & Delta
Moving to Databricks & Delta
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Spark streaming , Spark SQL
Spark streaming , Spark SQL
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Similar a Apache Arrow and Python: The latest
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
Work-Bench
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
High-Performance Python On Spark
High-Performance Python On Spark
Jen Aman
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Hakka Labs
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Building data pipelines with kite
Building data pipelines with kite
Joey Echeverria
Similar a Apache Arrow and Python: The latest
(20)
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
High Performance Python on Apache Spark
High Performance Python on Apache Spark
High-Performance Python On Spark
High-Performance Python On Spark
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
PyData: The Next Generation
PyData: The Next Generation
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Data Science and CDSW
Data Science and CDSW
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Building data pipelines with kite
Building data pipelines with kite
Más de Wes McKinney
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
New Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Wes McKinney
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Más de Wes McKinney
(20)
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
New Directions for Apache Arrow
New Directions for Apache Arrow
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Último
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
AnitaRaj43
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Remote DBA Services
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Orbitshub
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Samir Dash
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
johnbeverley2021
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
UiPathCommunity
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
Último
(20)
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Apache Arrow and Python: The latest
1.
1© Cloudera, Inc.
All rights reserved. Apache Arrow and Python in context Wes McKinney @wesmckinn Data Science Summit 2016-07-12
2.
2© Cloudera, Inc.
All rights reserved. Me • Data Science Tools at Cloudera • Creator of pandas • Wrote Python for Data Analysis 2012 (2nd ed coming 2017) • Open source projects • Python {pandas, Ibis, statsmodels} • Apache {Arrow, Parquet, Kudu (incubating)} • Mostly work in Python and Cython/C/C++
3.
3© Cloudera, Inc.
All rights reserved. WrangleConf - July 28 in San Francisco http://wrangleconf.com Storytelling from real-world data science work (and BBQ, of course)
4.
4© Cloudera, Inc.
All rights reserved. Python + Big Data: The State of things • See “Python and Apache Hadoop: A State of the Union” from February 17 • Areas where much more work needed • Binary file format read/write support (e.g. Parquet files) • File system libraries (HDFS, S3, etc.) • Client drivers (Spark, Hive, Impala, Kudu) • Compute system integration (Spark, Impala, etc.)
5.
5© Cloudera, Inc.
All rights reserved. Apache Arrow Many slides here from my joint talk with Jacques Nadeau, VP Apache Arrow
6.
6© Cloudera, Inc.
All rights reserved. Arrow in a Slide • New Top-level Apache Software Foundation project • Announced Feb 17, 2016 • Focused on Columnar In-Memory Analytics 1. 10-100x speedup on many workloads 2. Common data layer enables companies to choose best of breed systems 3. Designed to work with any programming language 4. Support for both relational and complex data as-is • Developers from 13+ major open source projects involved Calcite Cassandra Deeplearning4j Drill Hadoop HBase Ibis Impala Kudu Pandas Parquet Phoenix Spark Storm R
7.
7© Cloudera, Inc.
All rights reserved. High Performance Sharing & Interchange Today With Arrow • Each system has its own internal memory format • 70-80% CPU wasted on serialization and deserialization • Similar functionality implemented in multiple projects • All systems utilize the same memory format • No overhead for cross-system communication • Projects can share functionality (eg, Parquet-to-Arrow reader)
8.
8© Cloudera, Inc.
All rights reserved. Apache Arrow: What is it? • http://arrow.apache.org • Specification matters more than Implementation • A standardized in-memory representation for columnar data • Enables • Suitable for implementing high-performance analytics in-memory (think like “pandas internals”) • Cheap data interchange amongst systems, little or no serialization • Flexible support for complex JSON-like data • Targets: Impala, Kudu, Parquet, Spark
9.
9© Cloudera, Inc.
All rights reserved. Focus on CPU Efficiency Traditional Memory Buffer Arrow Memory Buffer •Cache Locality •Super-scalar & vectorized operation •Minimal Structure Overhead •Constant value access • With minimal structure overhead •Operate directly on columnar compressed data
10.
10© Cloudera, Inc.
All rights reserved. Example: Feather File Format for Python and R •Problem: fast, language- agnostic binary data frame file format •Written by Wes McKinney (Python) Hadley Wickham (R) •Read speeds close to disk IO performance
11.
11© Cloudera, Inc.
All rights reserved. Real World Example: Feather File Format for Python and R library(feather) path <- "my_data.feather" write_feather(df, path) df <- read_feather(path) import feather path = 'my_data.feather' feather.write_dataframe(df, path) df = feather.read_dataframe(path) R Python
12.
12© Cloudera, Inc.
All rights reserved. In progress: Parquet on HDFS for pandas users pandas pyarrow libarrow libarrow_io Parquet files in HDFS / filesystems Arrow-Parquet adapter Native libhdfs, other filesystem interfaces C++ libraries Python + C extensions Data structures parquet-cpp Raw filesystem interface Python wrapper classes
13.
13© Cloudera, Inc.
All rights reserved. Language Bindings • Target Languages • Java (beta) • CPP (underway) • Python & Pandas (underway) • R • Julia • Initial Focus • Read a structure • Write a structure • Manage Memory
14.
14© Cloudera, Inc.
All rights reserved. RPC & IPC: Moving Data Between Systems RPC • Avoid Serialization & Deserialization • Layer TBD: Focused on supporting vectored io • Scatter/gather reads/writes against socket IPC • Alpha implementation using memory mapped files • Moving data between Python and Drill • Working on shared allocation approach • Shared reference counting and well-defined ownership semantics
15.
15© Cloudera, Inc.
All rights reserved. Executing data science languages in the compute layer
16.
16© Cloudera, Inc.
All rights reserved. Real World Example: Python With Spark, Drill, Impala
17.
17© Cloudera, Inc.
All rights reserved. What’s on the horizon • Parquet for Python & C++ • Using Arrow as intermediary • IPC Implementation + Java/C++ interop • Spark, Drill Integration • Faster UDFs, Storage interfaces
18.
18© Cloudera, Inc.
All rights reserved. Get Involved • Join the community • dev@arrow.apache.org • Slack: https://apachearrowslackin.herokuapp.com/ • http://arrow.apache.org • @ApacheArrow
19.
19© Cloudera, Inc.
All rights reserved. Thank you Wes McKinney @wesmckinn Views are my own
Descargar ahora