Science and data manipulation in Pharo

E
ESUGESUG
Science and data
manipulation in Pharo
Ferlicot-Delbecque Cyril | ESUG 2023
cyril@ferlicot.fr
Polymath, Pharo-AI and DataFrame
1
Summary
2
History
What is new?
Toward a new stage
History
3
Polymath
DataFrame
Pharo-AI
PolyMath
● Computation library for Pharo
● Similar to NumPy and SciPy in Python or SciRuby in Ruby
● Originally SciSmalltalk in Squeak
● Present in Pharo since a long time
4
DataFrame
5
Columns
Rows
Cells
DataFrame
● Table data structure
● Similar to DataFrame in Pandas, Julia, …
● Heavily used in data science
● Created in 2017 during GSoC
6
Pharo-AI
● Created in 2020
● Implements classical machine learning
algorithms (not deep learning)
○ K-Means, Linear Regression, N-Gram
Model, …
7
Harmony of communities
8
Polymath DataFrame
Pharo-AI
What is new?
9
Polymath: Modularization
10
● Rearchitecture
○ Extraction of data structures and random generators
○ Extraction of distributions in progress
● Cleaning of internal dependencies
Polymath
11
● Improvement of the CI robustness
● Align some conventions with Pharo-AI
● Divers cleanings and bug fixes
● Pharo 11 compatibility
Pharo-AI : Data manipulation
12
● Data partitioners : create tests sets
● Imputers : fill missing values
● Encoders : Standardize your datas
● Normalizer : Use common scales in your project
Pharo-AI
13
● Uniformization of projects
● Documentation
● Graph algorithms updates
● Divers speed up
● Cleaning and bug fixes in algos
DataFrame
14
● Speed up
● Better integration with other collections
● New visualizations based on DataFrames
● Integration with pharo-AI data preprocessing
DataFrame : GSoC 2023
15
● GSoC of Joshua Jose Dias Barreto
● Implementation of missing features
○ Better sorting
○ Data manipulation
○ Missing values management
○ …
DataFrame : GSoC 2023
16
Further improvements of DataFrame inspector
Toward a new stage
17
A push from the students
18
● DataFrame was started as a GSoC
● First AI algorithm were students projects
=> Data science interest more and more people
19
We are answering to this call
● Engineers are now pushing those projects
● Projects are maintained
● Speed is becoming correct
20
21
Are you using scientific
computing or data science?
Is the speed enough for you?
Are you encountering any
problem?
Are you missing features for
you projects?
Let us know ;)
1 de 21

Recomendados

Industrialiser spark por
Industrialiser sparkIndustrialiser spark
Industrialiser sparkLucien Fregosi
118 vistas53 diapositivas
(ATS6-PLAT03) What's behind Discngine collections por
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
1.7K vistas72 diapositivas
Python in geospatial analysis por
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
539 vistas19 diapositivas
Data and AI summit: data pipelines observability with open lineage por
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
454 vistas27 diapositivas
Observability for Data Pipelines With OpenLineage por
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
625 vistas27 diapositivas
Glowing bear por
Glowing bear Glowing bear
Glowing bear thehyve
903 vistas28 diapositivas

Más contenido relacionado

Similar a Science and data manipulation in Pharo

Dataframes Showdown (miniConf 2022) por
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)8thLight
60 vistas18 diapositivas
Machine learning at scale with Google Cloud Platform por
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
8K vistas35 diapositivas
Python ml por
Python mlPython ml
Python mlShubham Sharma
170 vistas29 diapositivas
Complex Analysis in Public Transportation: A Step towards Smart Cities por
Complex Analysis in Public Transportation: A Step towards Smart CitiesComplex Analysis in Public Transportation: A Step towards Smart Cities
Complex Analysis in Public Transportation: A Step towards Smart CitiesDataWorks Summit
3.7K vistas17 diapositivas
Physical Plans in Spark SQL por
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
7K vistas126 diapositivas
Data Discovery and Metadata por
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
610 vistas56 diapositivas

Similar a Science and data manipulation in Pharo(20)

Dataframes Showdown (miniConf 2022) por 8thLight
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
8thLight60 vistas
Machine learning at scale with Google Cloud Platform por Matthias Feys
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
Matthias Feys8K vistas
Complex Analysis in Public Transportation: A Step towards Smart Cities por DataWorks Summit
Complex Analysis in Public Transportation: A Step towards Smart CitiesComplex Analysis in Public Transportation: A Step towards Smart Cities
Complex Analysis in Public Transportation: A Step towards Smart Cities
DataWorks Summit3.7K vistas
Physical Plans in Spark SQL por Databricks
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks7K vistas
Data Discovery and Metadata por markgrover
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover610 vistas
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021 por StreamNative
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative536 vistas
Dirty data? Clean it up! - Datapalooza Denver 2016 por Dan Lynn
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn1.3K vistas
Future se oct15 por CS, NcState
Future se oct15Future se oct15
Future se oct15
CS, NcState1.4K vistas
MapReduce: Optimizations, Limitations, and Open Issues por Vasia Kalavri
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri1.8K vistas
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry por Marcus Hanwell
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell535 vistas
Pharo DataFrame: Past, Present, and Future por ESUG
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG43 vistas
Better Together: How Graph database enables easy data integration with Spark ... por TigerGraph
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph258 vistas
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science por Ferdin Joe John Joseph PhD
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016 por Dan Lynn
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn457 vistas

Más de ESUG

Workshop: Identifying concept inventories in agile programming por
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingESUG
12 vistas16 diapositivas
Technical documentation support in Pharo por
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in PharoESUG
28 vistas39 diapositivas
The Pharo Debugger and Debugging tools: Advances and Roadmap por
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapESUG
56 vistas44 diapositivas
Sequence: Pipeline modelling in Pharo por
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoESUG
86 vistas22 diapositivas
Migration process from monolithic to micro frontend architecture in mobile ap... por
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...ESUG
20 vistas35 diapositivas
Analyzing Dart Language with Pharo: Report and early results por
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsESUG
106 vistas30 diapositivas

Más de ESUG(20)

Workshop: Identifying concept inventories in agile programming por ESUG
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG12 vistas
Technical documentation support in Pharo por ESUG
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
ESUG28 vistas
The Pharo Debugger and Debugging tools: Advances and Roadmap por ESUG
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG56 vistas
Sequence: Pipeline modelling in Pharo por ESUG
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG86 vistas
Migration process from monolithic to micro frontend architecture in mobile ap... por ESUG
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG20 vistas
Analyzing Dart Language with Pharo: Report and early results por ESUG
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG106 vistas
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6 por ESUG
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG37 vistas
A Unit Test Metamodel for Test Generation por ESUG
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG49 vistas
Creating Unit Tests Using Genetic Programming por ESUG
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG46 vistas
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes por ESUG
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
ESUG52 vistas
Exploring GitHub Actions through EGAD: An Experience Report por ESUG
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG17 vistas
Pharo: a reflective language A first systematic analysis of reflective APIs por ESUG
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG57 vistas
Garbage Collector Tuning por ESUG
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG20 vistas
Improving Performance Through Object Lifetime Profiling: the DataFrame Case por ESUG
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG43 vistas
thisContext in the Debugger por ESUG
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG36 vistas
Websockets for Fencing Score por ESUG
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG18 vistas
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript por ESUG
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ESUG46 vistas
Advanced Object- Oriented Design Mooc por ESUG
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG85 vistas
A New Architecture Reconciling Refactorings and Transformations por ESUG
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG28 vistas
BioSmalltalk por ESUG
BioSmalltalkBioSmalltalk
BioSmalltalk
ESUG415 vistas

Último

LAVADORA ROLO.docx por
LAVADORA ROLO.docxLAVADORA ROLO.docx
LAVADORA ROLO.docxSamuelRamirez83524
7 vistas1 diapositiva
El Arte de lo Possible por
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo PossibleNeo4j
39 vistas35 diapositivas
Copilot Prompting Toolkit_All Resources.pdf por
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
8 vistas4 diapositivas
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... por
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...Deltares
9 vistas34 diapositivas
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... por
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Marc Müller
37 vistas83 diapositivas
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... por
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...Deltares
6 vistas15 diapositivas

Último(20)

El Arte de lo Possible por Neo4j
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo Possible
Neo4j39 vistas
Copilot Prompting Toolkit_All Resources.pdf por Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana8 vistas
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... por Deltares
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
Deltares9 vistas
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... por Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller37 vistas
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... por Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 vistas
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... por Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares10 vistas
SUGCON ANZ Presentation V2.1 Final.pptx por Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 vistas
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema por Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 vistas
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... por Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 vistas
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... por Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
Deltares17 vistas
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... por Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 vistas
Tridens DevOps por Tridens
Tridens DevOpsTridens DevOps
Tridens DevOps
Tridens9 vistas
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols por Deltares
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
Deltares7 vistas
Neo4j y GenAI por Neo4j
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI
Neo4j45 vistas
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs por Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares8 vistas
Fleet Management Software in India por Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 vistas
Cycleops - Automate deployments on top of bare metal.pptx por Thanassis Parathyras
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... por Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 vistas

Science and data manipulation in Pharo