SlideShare una empresa de Scribd logo
1 de 15
EDDI 2013
5th Annual European DDI User Conference
December 3, 2013 – December 4, 2013
Paris, France
Metadata Technology North America
http://www.mtna.us
http://www.openmetadata.org
mtna@mtna.us
The <Meta>Data Dance
The <Meta/>Data Dance
The <Meta>Data Dance
 Tools to facilitate statistical/scientific data management
 Complement existing tools (not replace)
 Facilitate adoption of standards and promote best practices
 Alleviate the need to master complexities of metadata standards,
technologies

 Vision: web based services (cloud users), stand alone desktop /server
products (agencies/enclaves), libraries (developers)
 Broad Target audience: Data Producers / Manager / Researchers /
Users
 DataForge Today (lean model):
 Simple desktop based tools
 SledgeHammer - Data/metadata transformation


LavaCore - Java library

 Caelum - Reporting / Publication
 Asmurex: ASCII Multi-Record Exctraction tool

 DataForge Tomorrow:
 DataForge Online, Web Services, More Desktop tools, Java Library
__ _
_
/ _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___
___ _ __
  | |/ _ / _` |/ _` |/ _ / /_/ / _` | '_ ` _ | '_ ` _  / _  '__|
_ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ |
__/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_|
|___/

 Data Liberation
 Unlock from proprietary formats, turn into open data
 Produce metadata + convert data into standard ASCII formats

 Open <Meta>Data Integration
 Generating scripts for loading into stats packages, databases, big
data engines, cloud
 Use with open systems / environments
script

triple-s
 Challenges
 Read proprietary formats
 Data integrity
 ASCII optimization
 Cross package formatting issues
 SQL Database limits (columns, indexes, views)
 Performance (Timings, size, memory usage)
 …
 Key Features
 Input: SPSS, Stata, SAS/Syntax+ASCII, ASCII+DDI-C / DDI-L,
ASCII+SSS 1.1/2.0, ASCII+StatTransfer
 Data Out: Fixed w/optimization, CSV, Delimited
 Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1SU/Colectica, DDI-L 3.2-RP, DDI-L 3.2 SU, Triple-S 1.1/2.0
 Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev
/Variance / Frequencies, Weighted *, Save in CSV*.
 Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MSSQL*, HSQL*, Google BigQuery*, HP/Vertica* +
dimension/lookup tables, indexes/foreign keys*
 Command line interface
 Key changes since IASSIST 2013
 Desktop User Interface
 SAS (Win+DSRead), handling of missing values, DDI 3.2, lots of
tweaks

 Editions
 Freeware
 Community: 100 variables / 1,000 obs.
 Registered (free): 500 variables / 5,000 obs. (or contact us)
 Available today (Candidate Release)

 Pro/Enterprise (1Q2014)
 No data size limits
 Weighted stats, save stats, classification auto-harmonizer, batch mode,
add database formats & star schema
 Subscription: $29/3-month or $99/year , Perpetual: $279

 LavaCore:
 Currently internal only, planning for OEM
 Future Features (demand driven)
 Support additional input/output formats
 Additional ID generators

 DataWeaver (data transformation)
 Produce data subset
 Synthetic data generators
 Disclosure control

 DataForge Online (2014)
 Free and pay as you go services

 DataForge Services (available today)
 Community support in Google groups
 MTNA: pay as you go / annual support, custom developments
 IDMS: data/metadata management / processing
Products

SledgeHammer

Caelum

DataWeaver

Asmurex

Data/metadata
Management
(free/paid)

Documentation
XML Transform
(free)

Data subsets,
derivation, transforms
(internal/project)

ASCII multi record
Extraction
(free)

DMP Editor

CMS

Simple editor for
Data Management
Plan
(OSS)

Classification
Management System
(2014Q3 / paid )

SledgeHammer--BI Survey Manager
Integration with
Business Intelligence
(internal/projects)

DDI 3.1 Editor for
simple projects or
training
(OSS)
Services
Expertise
Expert Advisory and Technical Assistance
• Establishment of corporate data
management strategies
• Adoption of metadata standards and
related best practices
• Development of technical specifications
for data management infrastructure
• Statistical and scientific data management
technologies

Training
Training in and Support for:
• Data Documentation Initiative (DDI)
• Statistical Data and Metadata Exchange
(SDMX)
• Generic Statistical Business Process Model
(GSBPM)
• Generic Statistical Information Model (GSIM)
• General data and metadata management
support

Development
Application Development
• Statistical and Scientific data
management solutions
• Solutions include production, archiving,
dissemination, analysis
• Web-based data delivery solutions (web
services, portals, secure environments)
• Standards-based data management
tools/utilities

Technologies
Leverage New Technologies
• Help introduce new state-of-the-art
technologies to the community and
customers
• Deployment, hosting, and management
of cloud-based infrastructures
http://www.openmetadata.org/dataforge/sledgehammer

DEMO

Más contenido relacionado

La actualidad más candente

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel AbdellaouiStatsCommunications
 
WEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETWEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETDhruvVekariya3
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
DW Appliance
DW ApplianceDW Appliance
DW ApplianceShankar R
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & martsNymphea Saraf
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsDataminingTools Inc
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...StatsCommunications
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...StatsCommunications
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 

La actualidad más candente (20)

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
 
WEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETWEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NET
 
Data services
Data servicesData services
Data services
 
Database and types of databases
Database and types of databasesDatabase and types of databases
Database and types of databases
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
MS Access
MS AccessMS Access
MS Access
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & marts
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
 
Tableau best practises
Tableau best practisesTableau best practises
Tableau best practises
 
Bigdata
BigdataBigdata
Bigdata
 
Database
DatabaseDatabase
Database
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Tableau file types
Tableau   file typesTableau   file types
Tableau file types
 

Destacado

NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelSusanna-Assunta Sansone
 
Beyond regulatory submission - standards metadata management
Beyond regulatory submission  - standards metadata managementBeyond regulatory submission  - standards metadata management
Beyond regulatory submission - standards metadata managementKevin Lee
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsSusanna-Assunta Sansone
 
A brief overview of metadata for datasets
A brief overview of metadata for datasetsA brief overview of metadata for datasets
A brief overview of metadata for datasetssesrdm
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 

Destacado (7)

What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
Beyond regulatory submission - standards metadata management
Beyond regulatory submission  - standards metadata managementBeyond regulatory submission  - standards metadata management
Beyond regulatory submission - standards metadata management
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
 
A brief overview of metadata for datasets
A brief overview of metadata for datasetsA brief overview of metadata for datasets
A brief overview of metadata for datasets
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 

Similar a OpenDataForge - SledgeHammer EDDI 2013 presentation

Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...DataScienceConferenc1
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...DataBench
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalknzhang
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdfMASSAL3
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022Kavika Roy
 
Analyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalytixDataServices
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developersllangit
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000ukdpe
 

Similar a OpenDataForge - SledgeHammer EDDI 2013 presentation (20)

jagadeesh updated
jagadeesh updatedjagadeesh updated
jagadeesh updated
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
 
Data mining
Data miningData mining
Data mining
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdf
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Analyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentation
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Mihai_Nuta
Mihai_NutaMihai_Nuta
Mihai_Nuta
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
 
BR_Summation_08052014
BR_Summation_08052014BR_Summation_08052014
BR_Summation_08052014
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000
 

Último

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

OpenDataForge - SledgeHammer EDDI 2013 presentation

  • 1. EDDI 2013 5th Annual European DDI User Conference December 3, 2013 – December 4, 2013 Paris, France Metadata Technology North America http://www.mtna.us http://www.openmetadata.org mtna@mtna.us
  • 5.  Tools to facilitate statistical/scientific data management  Complement existing tools (not replace)  Facilitate adoption of standards and promote best practices  Alleviate the need to master complexities of metadata standards, technologies  Vision: web based services (cloud users), stand alone desktop /server products (agencies/enclaves), libraries (developers)  Broad Target audience: Data Producers / Manager / Researchers / Users  DataForge Today (lean model):  Simple desktop based tools  SledgeHammer - Data/metadata transformation  LavaCore - Java library  Caelum - Reporting / Publication  Asmurex: ASCII Multi-Record Exctraction tool  DataForge Tomorrow:  DataForge Online, Web Services, More Desktop tools, Java Library
  • 6. __ _ _ / _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___ ___ _ __ | |/ _ / _` |/ _` |/ _ / /_/ / _` | '_ ` _ | '_ ` _ / _ '__| _ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ | __/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_| |___/  Data Liberation  Unlock from proprietary formats, turn into open data  Produce metadata + convert data into standard ASCII formats  Open <Meta>Data Integration  Generating scripts for loading into stats packages, databases, big data engines, cloud  Use with open systems / environments
  • 8.  Challenges  Read proprietary formats  Data integrity  ASCII optimization  Cross package formatting issues  SQL Database limits (columns, indexes, views)  Performance (Timings, size, memory usage)  …
  • 9.  Key Features  Input: SPSS, Stata, SAS/Syntax+ASCII, ASCII+DDI-C / DDI-L, ASCII+SSS 1.1/2.0, ASCII+StatTransfer  Data Out: Fixed w/optimization, CSV, Delimited  Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1SU/Colectica, DDI-L 3.2-RP, DDI-L 3.2 SU, Triple-S 1.1/2.0  Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev /Variance / Frequencies, Weighted *, Save in CSV*.  Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MSSQL*, HSQL*, Google BigQuery*, HP/Vertica* + dimension/lookup tables, indexes/foreign keys*  Command line interface
  • 10.  Key changes since IASSIST 2013  Desktop User Interface  SAS (Win+DSRead), handling of missing values, DDI 3.2, lots of tweaks  Editions  Freeware  Community: 100 variables / 1,000 obs.  Registered (free): 500 variables / 5,000 obs. (or contact us)  Available today (Candidate Release)  Pro/Enterprise (1Q2014)  No data size limits  Weighted stats, save stats, classification auto-harmonizer, batch mode, add database formats & star schema  Subscription: $29/3-month or $99/year , Perpetual: $279  LavaCore:  Currently internal only, planning for OEM
  • 11.  Future Features (demand driven)  Support additional input/output formats  Additional ID generators  DataWeaver (data transformation)  Produce data subset  Synthetic data generators  Disclosure control  DataForge Online (2014)  Free and pay as you go services  DataForge Services (available today)  Community support in Google groups  MTNA: pay as you go / annual support, custom developments  IDMS: data/metadata management / processing
  • 12.
  • 13. Products SledgeHammer Caelum DataWeaver Asmurex Data/metadata Management (free/paid) Documentation XML Transform (free) Data subsets, derivation, transforms (internal/project) ASCII multi record Extraction (free) DMP Editor CMS Simple editor for Data Management Plan (OSS) Classification Management System (2014Q3 / paid ) SledgeHammer--BI Survey Manager Integration with Business Intelligence (internal/projects) DDI 3.1 Editor for simple projects or training (OSS)
  • 14. Services Expertise Expert Advisory and Technical Assistance • Establishment of corporate data management strategies • Adoption of metadata standards and related best practices • Development of technical specifications for data management infrastructure • Statistical and scientific data management technologies Training Training in and Support for: • Data Documentation Initiative (DDI) • Statistical Data and Metadata Exchange (SDMX) • Generic Statistical Business Process Model (GSBPM) • Generic Statistical Information Model (GSIM) • General data and metadata management support Development Application Development • Statistical and Scientific data management solutions • Solutions include production, archiving, dissemination, analysis • Web-based data delivery solutions (web services, portals, secure environments) • Standards-based data management tools/utilities Technologies Leverage New Technologies • Help introduce new state-of-the-art technologies to the community and customers • Deployment, hosting, and management of cloud-based infrastructures