Enviar búsqueda
Cargar
I Can Convert
•
0 recomendaciones
•
556 vistas
S
SvenAas
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 53
Recomendados
Portal / BI 2008 Presentation by Ted Tschopp
Portal / BI 2008 Presentation by Ted Tschopp
Ted Tschopp
XML in software development
XML in software development
Lars Marius Garshol
End of the year slideshow-PreK
End of the year slideshow-PreK
karaleighmartin
III Conferência CMMI Portugal, Workshop 1: Introduction to change Management,...
III Conferência CMMI Portugal, Workshop 1: Introduction to change Management,...
isabelmargarido
08 ls8
08 ls8
Jobs
Hizb 56
Hizb 56
steka36
มโนท ศน เทคโนโลย_ทางการศ_กษา
มโนท ศน เทคโนโลย_ทางการศ_กษา
Julalak Kaewjoonla
迪士尼的感動魔法:全心待客之道
迪士尼的感動魔法:全心待客之道
巧芯 徐
Recomendados
Portal / BI 2008 Presentation by Ted Tschopp
Portal / BI 2008 Presentation by Ted Tschopp
Ted Tschopp
XML in software development
XML in software development
Lars Marius Garshol
End of the year slideshow-PreK
End of the year slideshow-PreK
karaleighmartin
III Conferência CMMI Portugal, Workshop 1: Introduction to change Management,...
III Conferência CMMI Portugal, Workshop 1: Introduction to change Management,...
isabelmargarido
08 ls8
08 ls8
Jobs
Hizb 56
Hizb 56
steka36
มโนท ศน เทคโนโลย_ทางการศ_กษา
มโนท ศน เทคโนโลย_ทางการศ_กษา
Julalak Kaewjoonla
迪士尼的感動魔法:全心待客之道
迪士尼的感動魔法:全心待客之道
巧芯 徐
Report polsci
Report polsci
Jilian Amor Saldua
Respective scopes of european and national laws concerning crowdfunding opera...
Respective scopes of european and national laws concerning crowdfunding opera...
FinPart
Benefits usa senior deck
Benefits usa senior deck
leeg69
Sammousa - The story in pictures
Sammousa - The story in pictures
subravedula
The Power of Attendance
The Power of Attendance
BIE Resources
กลุ่มอาการดาวน์
กลุ่มอาการดาวน์
Atirak Pakdepin
HHS Ignite: Year One Results
HHS Ignite: Year One Results
Steven Randazzo
Criolla music day
Criolla music day
alvarorv14
測試用簡報
測試用簡報
資訊 奇豐
Assumptions in problem framing
Assumptions in problem framing
Bhanu Pratap Singh
Installprocedure bp publ_sector_en_be
Installprocedure bp publ_sector_en_be
jl_merino
Jadwal pelajaran dan daftar piket kelas 48
Jadwal pelajaran dan daftar piket kelas 48
agus ZM
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
Mohammad Khamiseh
Portafolio electronico
Portafolio electronico
paco-andrea
Hizb 37
Hizb 37
steka36
Archiving Web News (captioned)
Archiving Web News (captioned)
SvenAas
SEASR eScience 2008
SEASR eScience 2008
Loretta Auvil
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
DATAVERSITY
NoSQL on ACID - Meet Unstructured Postgres
NoSQL on ACID - Meet Unstructured Postgres
EDB
Meandre Architecture
Meandre Architecture
Loretta Auvil
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
Loretta Auvil
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
Loretta Auvil
Más contenido relacionado
Destacado
Report polsci
Report polsci
Jilian Amor Saldua
Respective scopes of european and national laws concerning crowdfunding opera...
Respective scopes of european and national laws concerning crowdfunding opera...
FinPart
Benefits usa senior deck
Benefits usa senior deck
leeg69
Sammousa - The story in pictures
Sammousa - The story in pictures
subravedula
The Power of Attendance
The Power of Attendance
BIE Resources
กลุ่มอาการดาวน์
กลุ่มอาการดาวน์
Atirak Pakdepin
HHS Ignite: Year One Results
HHS Ignite: Year One Results
Steven Randazzo
Criolla music day
Criolla music day
alvarorv14
測試用簡報
測試用簡報
資訊 奇豐
Assumptions in problem framing
Assumptions in problem framing
Bhanu Pratap Singh
Installprocedure bp publ_sector_en_be
Installprocedure bp publ_sector_en_be
jl_merino
Jadwal pelajaran dan daftar piket kelas 48
Jadwal pelajaran dan daftar piket kelas 48
agus ZM
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
Mohammad Khamiseh
Portafolio electronico
Portafolio electronico
paco-andrea
Hizb 37
Hizb 37
steka36
Destacado
(15)
Report polsci
Report polsci
Respective scopes of european and national laws concerning crowdfunding opera...
Respective scopes of european and national laws concerning crowdfunding opera...
Benefits usa senior deck
Benefits usa senior deck
Sammousa - The story in pictures
Sammousa - The story in pictures
The Power of Attendance
The Power of Attendance
กลุ่มอาการดาวน์
กลุ่มอาการดาวน์
HHS Ignite: Year One Results
HHS Ignite: Year One Results
Criolla music day
Criolla music day
測試用簡報
測試用簡報
Assumptions in problem framing
Assumptions in problem framing
Installprocedure bp publ_sector_en_be
Installprocedure bp publ_sector_en_be
Jadwal pelajaran dan daftar piket kelas 48
Jadwal pelajaran dan daftar piket kelas 48
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
61557874 volume-i-ericsson-umts-rf-optimization-12 dec2003
Portafolio electronico
Portafolio electronico
Hizb 37
Hizb 37
Similar a I Can Convert
Archiving Web News (captioned)
Archiving Web News (captioned)
SvenAas
SEASR eScience 2008
SEASR eScience 2008
Loretta Auvil
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
DATAVERSITY
NoSQL on ACID - Meet Unstructured Postgres
NoSQL on ACID - Meet Unstructured Postgres
EDB
Meandre Architecture
Meandre Architecture
Loretta Auvil
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
Loretta Auvil
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
Loretta Auvil
Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing Documents
Jim Downing
MichaelLutherResume60
MichaelLutherResume60
michael luther
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
Terminalfour
Data Persistence as a Language Feature
Data Persistence as a Language Feature
Rob Tweed
Json
Json
Anderson Oliveira
Advanced Site Studio Class, June 18, 2012
Advanced Site Studio Class, June 18, 2012
Lee Klement
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
MongoDB
Semantic Web For Energy [Malcolm Murray]
Semantic Web For Energy [Malcolm Murray]
University of the Highlands and Islands
394 wade word2007-ssp2008
394 wade word2007-ssp2008
Society for Scholarly Publishing
MongoDB using PHP: Using a New Framework Called Ox
MongoDB using PHP: Using a New Framework Called Ox
MongoDB
DDS tutorial with connector
DDS tutorial with connector
Javier Povedano
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Itai Yaffe
Compiler project
Compiler project
Monsur Ahmed Shafiq
Similar a I Can Convert
(20)
Archiving Web News (captioned)
Archiving Web News (captioned)
SEASR eScience 2008
SEASR eScience 2008
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
NoSQL on ACID - Meet Unstructured Postgres
NoSQL on ACID - Meet Unstructured Postgres
Meandre Architecture
Meandre Architecture
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing Documents
MichaelLutherResume60
MichaelLutherResume60
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
Data Persistence as a Language Feature
Data Persistence as a Language Feature
Json
Json
Advanced Site Studio Class, June 18, 2012
Advanced Site Studio Class, June 18, 2012
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
Semantic Web For Energy [Malcolm Murray]
Semantic Web For Energy [Malcolm Murray]
394 wade word2007-ssp2008
394 wade word2007-ssp2008
MongoDB using PHP: Using a New Framework Called Ox
MongoDB using PHP: Using a New Framework Called Ox
DDS tutorial with connector
DDS tutorial with connector
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Compiler project
Compiler project
Último
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Rick Flair
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Último
(20)
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
I Can Convert
1.
I Can Convert! by
Sven Aas and Jason Proctor
2.
I Can Convert! •
Sven Aas: @svenaas / saas@mtholyoke.edu • Jason Proctor: @jmpmhc / jproctor@mtholyoke.edu • #TPR2 ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
3.
We’re going to
talk about • Stories • Patterns • Tools ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
4.
Use Your Tools!
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
5.
Use Your Tools •
Spreadsheet • Programmer’s Editor • Programming Language ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
6.
Spreadsheet
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
7.
Spreadsheet
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
8.
Programmer’s Editor
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
9.
Programmer’s Editor
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
10.
Programming Language
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
11.
Programming Language
©2012 Sven Aas and Jason Proctor, ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
12.
Use Your Tools!
You’ve GOT this stuff. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
13.
Getting Deported
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
14.
Portal News
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
15.
Unusual Data Representation
+""""""""""""""+ |$4692909$|$G1158673129"8322$|$$$16$|$rwlrwlr"l$|$ |$Data$$$$$$$$$| +""""""""""""""+ 21139$|$71$1000009$1000010$1000011$1000012$1000013$ |$node$$$$$$$$$|$ 1000014$1000015$1000016$1000017$1000018$1000019$ |$name$$$$$$$$$|$ |$type$$$$$$$$$|$ 1000020$|$$$$$$|$$$$$$|$2100709$|$$$NULL$|$1158673129$ |$mode$$$$$$$$$|$ |$1170344089$|$21139$$|$$$$$$$1$| |$owner$$$$$$$$|$ |$group$$$$$$$$|$ 01|Second*Saturday:$MHC$Students$Hit$the$Road|As$part$ |$url$$$$$$$$$$|$ of$new$student$orientation,$members$of$the$class$of$ |$desc$$$$$$$$$|$ 2010$worked$on$community$service$projects$across$the$ |$parent$$$$$$$|$ |$linkto$$$$$$$|$ Pioneer$Valley$on$September$16.$View$the$photo$ |$ctime$$$$$$$$|$ gallery.||http://www.mtholyoke.edu/offices/comm/news/ |$mtime$$$$$$$$|$ |$mod_by$$$$$$$|$ sec_sat_06/page1.html|1158638400|1170305999||||| |$visible$$$$$$|$ 11.41|:^:^:^:^:^JPG:^75:^75:^2813:^Second$ |$userdata$$$$$|$ |$datasize$$$$$|$ Saturday:^:^:^:^0:^$ |$datafilename$|$ |$$$$$2813$|$V1158673129"9689$| +""""""""""""""+ ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
16.
Ruby to the
Rescue LegacyUser User Item Portal News Importer System System LegacyItem Story Link Channel ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
17.
ActiveRecord •
A Ruby library which implements the ActiveRecord software architecture pattern. • The original Model and ORM component of Ruby on Rails. • We used it to provide a convenient object layer on top of two underlying relational databases. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
18.
Conversion Patterns
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
19.
Object Extraction Context: Ingesting
source data. Problem: Source data objects contain multiple target objects. Solution: Process or parse target data just enough to extract objects. Tools: String methods, RegEx, DOM/XML selection. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
20.
Encoding Change Context: Mapping
source data to target. Problem: Source text encoding differs from target. Solution: Perform intermediate translation. Tools: String methods, RegEx, programming libraries. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
21.
URL/Path Translation Context: Preparing
target environment and data. Problem: Assets in target system will be available at different paths or URLs from their locations in source system. Solution: Map source locations to target locations. Replace references in data before saving to target. Tools: String methods, RegEx, DOM/XML selection. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
22.
Getting the News
Out ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
23.
Easy Come, Easy
Go 1. Export Athletics news items to hosted service. 2. Export all news items to digital archives. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
24.
Exporting Athletics Items •
10 years of Athletics news in 14 channels. • Export each item in a minimal, predictable HTML wrapper. • Include metadata for each item in <meta> tags in the <head>. • Group items by sport and by academic year. • Generally accommodate the target system. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
25.
HAML •
A lightweight markup language used to generate HTML. • A meta-markup language. • We used it to succinctly express the HTML we wanted from within our Ruby code. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
26.
Archiving Web News •
14 years of news: 6,000 items, 5,000 images, 34 channels. • Export each news item in an archival form preserving the original markup and character entities (but not the design) • PDF generated from HTML generated from HAML • Export Dublin Core metadata for each news item: • XML generated via Builder ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
27.
Builder •
A Ruby library for generating XML. • We used it to dynamically generate simple XML from within a Ruby application. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
28.
wkhtmltopdf •
A shell utility for generating PDF files by rendering HTML documents using the WebKit rendering engine. • A Ruby library providing programmatic access to the wkhtmltopdf shell utility. • We used it so that we could use familiar web development techniques to generate PDFs without having to implement our own rendering and layout routines. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
29.
Familiar Patterns •
Object Extraction • Encoding Change • URL/Path Translation ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
30.
Direct Translation Context: Simple
conversion. Problem: Data conversion. Solution: Read source objects and write targets in single pass. Tools: Varies. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
31.
Markup Change Context: Mapping
source data to target. Problem: Source text markup differs from target. Solution: Perform intermediate translation. Tools: String methods, RegEx, DOM/XML selection, programming libraries. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
32.
Data Cleanup Context: Ingesting
source data. Problem: Source data is ... imperfect. Solution: Fix what you can confidently fix. Tools: Varies. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
33.
Convert All the
Things! ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
34.
Finally Done with
News? • HTML files scraped via Nokogiri scripts. • Quite a bit of cleanup: garbage in, garbage out. • Unscrapable news items. • “September 12, 2001”. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
35.
Nokogiri •
A Ruby library for parsing XML and HTML. • Supports DOM or SAX parsing. • Implements both XPath and CSS3 selectors. • We used it to parse and extract content from the set of HTML files containing existing news stories. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
36.
Familiar Patterns •
Direct Translation • Encoding Change • Markup Change • URL/Path Translation • Data Cleanup ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
37.
The Big One
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
38.
CMS Conversion •
Old CMS pages all published with several different presentational styles, but all with the same DOM. That means we can scrape ’em! • We agreed not to change anything else during the import. That means we can treat it as a clean switchover. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
39.
Three-Pronged Conversion
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
40.
Three-Pronged Conversion •
Build the necessary structures and themes to accommodate and represent our old content. • Build a library of code for scraping the pages generated by the old site, cataloging data and metadata, and storing them in an intermediate representation. • Build a library of code for importing this intermediate representation into the new CMS structures. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
41.
Migrate •
An Drupal module providing a framework for data import into the Drupal content management system. • Supports a variety of sources and targets out of the box. • Extensible to support additional migration sources and targets. • We used it to import the XML representation of our site into our Drupal system. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
42.
Familiar Patterns •
Object Extraction • Encoding Change • Markup Change • URL/Path Translation • Data Cleanup ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
43.
Intermediate Representation Context: Complex
conversion. Problem: Data conversion. Solution: Convert source data to intermediate representation in one pass. Then convert intermediate representation to target. Tools: Representation: Database, XML, CSV. Conversion: Varies. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
44.
Object Identity Context: Ingesting
source data. Problem: Data objects are repeated in source data Solution: Uniquely identify source objects. Tools: String methods, RegEx, DOM/XML selection. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
45.
Object Aggregation Context: Ingesting
source data. Problem: Target data objects contain multiple source objects. Solution: Aggregate objects at intermediate or output stage. Tools: Varies. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
46.
Lessons •
You already have a good toolbox. Keep your tools sharp. • Understand your source and target models. • Watch for familiar patterns. • Conversion is an opportunity for cleanup and improvement. • Human labor can sometimes be cheaper than automation. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
47.
YOU Can Convert
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
48.
Questions?
©2012 Sven Aas and Jason Proctor, Mount Holyoke College
49.
Thank you, &
keep in touch! • Sven Aas: @svenaas / saas@mtholyoke.edu • Jason Proctor: @jmpmhc / jproctor@mtholyoke.edu • #TPR2 ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
50.
Colophon •
This presentation is set in Exo Extra Bold from Natanael Gama’s ndiscovered, with headings in ChunkFive from The League of Movable Type. • Background images were adapted from FreeSeamlessTextures.com’s Red Watercolor and The Grid, by Willem Pirquin. ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
51.
Colophon (continued) •
Card-size survival tool photo via acreativeedge.info • Leatherman photo via SonnyandSandy • Studley Tool Chest photo via FineWoodworking.com ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
52.
Colophon (continued) •
Audio from Wikipedia:Sound/List: • Edvard Grieg - Piano Concerto in A Minor, Op. 16 - iii. Allegro moderato molto, recorded by the Skidmore College Orchestra. • W.A. Mozart - 5th Piano Concerto, i. Allegro aperto, recorded by Ben Goldstein and Bendik Eide. • Anton Reicha - Variations for Bassooon, recorded by Arthur Grossman • J.S. Bach - Cello Suite 1 in G - Minuets, recorded by John Michel • Mississippi John Hurt - “Nobody’s Dirty Business” ©2012 Sven Aas and Jason Proctor, Mount Holyoke College
53.
Colophon (continued) •
Other Audio • Jack Beaver - “Workaday World” • Danny Elfman - “Breakfast Machine” ©2012 Sven Aas and Jason Proctor, Mount Holyoke College