SlideShare una empresa de Scribd logo
1 de 32
Data Migrations
Some Considerations when Preparing to
Migrate to AtoM
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
Overview• Assess what data will be part of the migration
• Do any in-system clean up prior to export
• Review AtoM data formats and available fields
• Establish a crosswalk between your data and AtoM’s fields
• Export your data
• Perform any additional clean up as needed
• Transform your data to an AtoM compatible import format
• Import
• Review your work
• Revise and reimport if needed
• Make small clean up edits in AtoM directly
Expect this to take time
• Our average length for a client data migration project is
around 4-6 months. Even for a simple project, there will
be a lot of time needed for data clean-up, quality
assurance review, and reimports.
Expect to do your import more than once
• It’s unlikely that everything will go perfectly on the
first attempt. You’ll discover some records don’t quite
match the same pattern as the rest, or one field didn’t
import, etc. Don’t be discouraged, and do budget your
time with this assumption in mind.
Before Starting
Develop a data management plan while you migrate
• How will you ensure you are not stranding data during the
time of your migration? Will you freeze data entry
entirely for the length? Manage your data in a
spreadsheet? Run a small migration for new data at the
end? Make sure everyone knows the plan.
Clarify roles, deadlines, and communication channels
• Ensure everyone involved knows what is expected of them
throughout the project, and when. Clearly identify those
responsible for key roles, and where to go for support.
Before Starting
Data assessment
• How many descriptions do you have? How many top-level
records?
• Have the records been described based on any content
standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?)
• Are there custom fields with data in your system? How many?
Do they readily map to known standards or not?
• What export formats does your system support?
• Is all record data captured in the exports?
• Are some descriptions “draft” or non-public? Is this
information captured in the export?
Questions to ask in the data assessment phase:
Data assessment
• How many digital objects do you have to migrate? What types
(images, text, video, etc) and formats (e.g. JPG, mp4, etc)
are represented?
• Are authority records maintained separately from
descriptions? What about other entities? Accession records?
• Is the relationship between these entities and descriptions
captured in the export formats available?
• Do these other record types have their own export formats?
(e.g. EAC-CPF XML, SKOS XML, CSV, etc)
• How are hierarchical relationships captured in the export?
Questions to ask in the data assessment phase:
AtoM Data FormatsArchival descriptions
• CSV, EAD 2002 XML, MODS XML
Authority records
• EAC-CPF XML, CSV
Accessions
• CSV
Terms (Subjects, Places, Genres, etc.)
• SKOS - many serializations supported
Repository records
• CSV
Current as of
version 2.4
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
CSV import will be the best way to get data into AtoM -
because the CSV import template is a format specific to
AtoM, there is no data loss and all fields are represented.
If you are able to get your data out of your legacy system
and transform it into an AtoM-compatible CSV format, we
recommend using this method for your migration project.
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
In the example CSV files from v2.2 on, we have
included the relevant content standard name and number
in the sample data field. This means you can import
the CSV template to produce a sort of “crosswalk” or
key, showing you how fields in AtoM map to the column
headers.
AtoM CSV Templates
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
AtoM EAD 2002 XML
EAD 2002 XML is a flexible standard with many possible valid
but different implementations. For this reason, your locally
generated EAD, while valid, may still not import perfectly
into AtoM. This is why we prefer working with CSV imports
whenever possible.
We recommend running a test import of a representative sample
from your source system into AtoM, and using the crosswalk
method discussed above to evaluate if you will need to make
changes to how your EAD XML is encoded for a successful import
into AtoM.
Crosswalking
Crosswalking is the process of mapping your source
data fields to equivalent AtoM ones.
To do so, you must understand how AtoM handles some
data (such as authority records, terms, etc.) first.
There will be cases where there are no 1:1
equivalencies either - you will have to make decisions
about how to combine or split apart your existing data
to make it work with what is available.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
Crosswalking
AtoM is standards-based.
This means you can focus on crosswalking to the
content standard you know best. Use the guidance
provided in the relevant standard to help inform your
mapping.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
AtoM Entity Types
A(n incomplete) list
of the main entity
types around which
AtoM was built.
https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
Accessions
Accession records have their own CSV import format. As
there is currently no international accessions standard,
you will need to review the available fields in AtoM
closely and determine where to map your data.
Descriptions can be linked to Accessions via the
accessionNumber column in the description CSV templates.
We recommend importing your Accessions first, then your
descriptions with the corresponding accession number, to
establish links.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#import-accessions-via-csv
ActorsIn AtoM, creators and name access points are maintained
separately as authority records, so they can be re-used and
linked to multiple descriptions.
This means any creator name or name access point you import
with your descriptions will create an authority record, or
link to an existing match!
Make sure that names are consistent in your data, and the
biographical/administrative history is about the actor only -
not specific to the description.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority-
records/#authority-bioghist-access
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on-
authority-records-archival-descriptions-and-csv-imports
ActorsThe Actor data you can add to a description CSV is minimal -
if you do maintain authority records, then you may want to
import them separately via AtoM’s authority record CSV
templates.
There are 3 actor CSV templates - the main actors template, 1
to supplement relationship data (between actors and/or
resources) and 1 to supplement alternative forms of name.
We recommend importing authority records before descriptions,
so you can link them on description import.
See:
• https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator-
related-import-columns-actors-and-events
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import-
authority-records-via-csv
Event Dates
Description edit templates have 3 date fields. The
Display date is what the end user will see - it is
free text. The start and end dates must follow ISO
8601 (YYYY-MM-DD, etc) formatting. These fields are
used to support AtoM’s date range search.
• Display date
• Start date
• End date
Event DatesDuring CSV import, Creators and Dates are paired (as
Events - see Entity types diagram).
Use the | pipe character to add multiple creators/dates.
You can use a literal NULL value in your CSV file to
keep the spacing correct for dates without actors or
vice versa:
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#creator-related-import-columns-actors-and-events
Access PointsIn AtoM, access points on a description (e.g. subjects,
places, genre terms) are maintained separately as terms in a
taxonomy so they can be controlled and reused.
This means that access point data in your description imports
will either create new terms or link to existing ones. Make
sure your data is consistent so you don’t have near-
duplicates later! (e.g. “cars” vs “car” vs “automobiles”)
The exception is name access points - these are authority
records!
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-
content/terms/#term-name-vs-subject
Hierarchies
Hierarchies are managed in the description CSV
templates via the legacyId and parentId columns.
Parent records must import in a row above child
records. The children should have the legacyId value
of the parent record in the parentId column.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#hierarchical-relationships
Can be imported with descriptions using the
digitalObjectURI or digitalObjectPath columns.
URIs point to external, web-accessible resources -
must end in file extension!
Paths point to a local directory added to your
server prior to import.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#digital-object-related-import-columns
Digital Objects
You have 3 main options when it comes to
transforming your data into an AtoM-
compatible format:
• Manual data transformation
• Tools such as OpenRefine
• Transformation script
http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly
Data Transformation
OpenRefine is “a free, open source
power tool for working with messy
data and improving it.”
See:
• http://openrefine.org/
• https://github.com/OpenRefine/OpenRefine
There are many great free resources to help
you get started.
Data Transformation
Use OpenRefine to:
• Add AtoM column headers
• Normalize names and terms
• Standardize identifiers or accession
numbers
• Split source data into separate columns
• Combine data into a single column
• Delete unnecessary rows
• Global search/replace
• etc
You can also use OpenRefine to clean up XML data!
Data Transformation
A transformation script
is generally a script prepared
by a developer that takes an
input (your source data) and
runs a series of operations to
transform the data into the
desired output (an AtoM-
compatible file).
These can be prepared in many
programming languages (e.g.
PHP, Python, etc).
Data Transformation
Import Ordering
If you are working with
several different types of
data, you may need to
perform multiple imports,
possibly in different
formats. If so, we
recommend proceeding in
this order to link entities
together as your imports
proceed.
We also recommend running a
smaller sample test first!
1. Terms
2. Repositories
3. Actors
4. Accessions
5. Descriptions
Questions?
info@artefactual.com
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html

Más contenido relacionado

La actualidad más candente

CSS - Styling Links and Creating Navigation
CSS - Styling Links and Creating NavigationCSS - Styling Links and Creating Navigation
CSS - Styling Links and Creating NavigationVidya Ananthanarayanan
 
AEM Sightly Template Language
AEM Sightly Template LanguageAEM Sightly Template Language
AEM Sightly Template LanguageGabriel Walt
 
HTL(Sightly) - All you need to know
HTL(Sightly) - All you need to knowHTL(Sightly) - All you need to know
HTL(Sightly) - All you need to knowPrabhdeep Singh
 
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...Artefactual Systems - AtoM
 
Oracle Forms Tutorial
Oracle Forms TutorialOracle Forms Tutorial
Oracle Forms TutorialATR Login
 
AEM Best Practices for Component Development
AEM Best Practices for Component DevelopmentAEM Best Practices for Component Development
AEM Best Practices for Component DevelopmentGabriel Walt
 
AEM Rich Text Editor (RTE) Deep Dive
AEM Rich Text Editor (RTE) Deep DiveAEM Rich Text Editor (RTE) Deep Dive
AEM Rich Text Editor (RTE) Deep DiveHanish Bansal
 
Cascading Style Sheet
Cascading Style SheetCascading Style Sheet
Cascading Style Sheetvijayta
 
Oracle Forms : Validation Triggers
Oracle Forms : Validation TriggersOracle Forms : Validation Triggers
Oracle Forms : Validation TriggersSekhar Byna
 
Sling models by Justin Edelson
Sling models by Justin Edelson Sling models by Justin Edelson
Sling models by Justin Edelson AEM HUB
 
Understanding Sling Models in AEM
Understanding Sling Models in AEMUnderstanding Sling Models in AEM
Understanding Sling Models in AEMAccunity Software
 
Rest and Sling Resolution
Rest and Sling ResolutionRest and Sling Resolution
Rest and Sling ResolutionDEEPAK KHETAWAT
 
Master page and Theme ASP.NET with VB.NET
Master page and Theme ASP.NET with VB.NETMaster page and Theme ASP.NET with VB.NET
Master page and Theme ASP.NET with VB.NETShyam Sir
 

La actualidad más candente (20)

AtoM Implementations
AtoM ImplementationsAtoM Implementations
AtoM Implementations
 
AtoM feature development
AtoM feature developmentAtoM feature development
AtoM feature development
 
CSS - Styling Links and Creating Navigation
CSS - Styling Links and Creating NavigationCSS - Styling Links and Creating Navigation
CSS - Styling Links and Creating Navigation
 
AEM Sightly Template Language
AEM Sightly Template LanguageAEM Sightly Template Language
AEM Sightly Template Language
 
HTL(Sightly) - All you need to know
HTL(Sightly) - All you need to knowHTL(Sightly) - All you need to know
HTL(Sightly) - All you need to know
 
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...
AtoM and Vagrant: Installing and Configuring the AtoM Vagrant Box for Local T...
 
Html forms
Html formsHtml forms
Html forms
 
Oracle Forms Tutorial
Oracle Forms TutorialOracle Forms Tutorial
Oracle Forms Tutorial
 
CSS
CSSCSS
CSS
 
Constructing SQL queries for AtoM
Constructing SQL queries for AtoMConstructing SQL queries for AtoM
Constructing SQL queries for AtoM
 
CSS
CSSCSS
CSS
 
AEM Best Practices for Component Development
AEM Best Practices for Component DevelopmentAEM Best Practices for Component Development
AEM Best Practices for Component Development
 
AEM Rich Text Editor (RTE) Deep Dive
AEM Rich Text Editor (RTE) Deep DiveAEM Rich Text Editor (RTE) Deep Dive
AEM Rich Text Editor (RTE) Deep Dive
 
Cascading Style Sheet
Cascading Style SheetCascading Style Sheet
Cascading Style Sheet
 
Oracle Forms : Validation Triggers
Oracle Forms : Validation TriggersOracle Forms : Validation Triggers
Oracle Forms : Validation Triggers
 
Sling models by Justin Edelson
Sling models by Justin Edelson Sling models by Justin Edelson
Sling models by Justin Edelson
 
Understanding Sling Models in AEM
Understanding Sling Models in AEMUnderstanding Sling Models in AEM
Understanding Sling Models in AEM
 
Rest and Sling Resolution
Rest and Sling ResolutionRest and Sling Resolution
Rest and Sling Resolution
 
Angular vs react
Angular vs reactAngular vs react
Angular vs react
 
Master page and Theme ASP.NET with VB.NET
Master page and Theme ASP.NET with VB.NETMaster page and Theme ASP.NET with VB.NET
Master page and Theme ASP.NET with VB.NET
 

Similar a Migrate Data to AtoM

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh Patel
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAmazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecoreSurendra Sharma
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowCambridge Semantics
 

Similar a Migrate Data to AtoM (20)

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
CADA
CADA CADA
CADA
 
SAS - Training
SAS - Training SAS - Training
SAS - Training
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 

Más de Artefactual Systems - AtoM

Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantArtefactual Systems - AtoM
 
Looking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and futureLooking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and futureArtefactual Systems - AtoM
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsArtefactual Systems - AtoM
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopArtefactual Systems - AtoM
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...Artefactual Systems - AtoM
 
Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015Artefactual Systems - AtoM
 
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...Artefactual Systems - AtoM
 
Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)Artefactual Systems - AtoM
 

Más de Artefactual Systems - AtoM (17)

AtoM Community Update: 2019-05
AtoM Community Update: 2019-05AtoM Community Update: 2019-05
AtoM Community Update: 2019-05
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with Vagrant
 
Looking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and futureLooking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and future
 
Contributing to the AtoM documentation
Contributing to the AtoM documentationContributing to the AtoM documentation
Contributing to the AtoM documentation
 
Installing AtoM with Ansible
Installing AtoM with AnsibleInstalling AtoM with Ansible
Installing AtoM with Ansible
 
Installing and Upgrading AtoM
Installing and Upgrading AtoMInstalling and Upgrading AtoM
Installing and Upgrading AtoM
 
Command-Line 101
Command-Line 101Command-Line 101
Command-Line 101
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual Systems
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshop
 
Artefactual and Open Source Development
Artefactual and Open Source DevelopmentArtefactual and Open Source Development
Artefactual and Open Source Development
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
AtoM Community Update 2016
AtoM Community Update 2016AtoM Community Update 2016
AtoM Community Update 2016
 
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
 
Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015
 
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
 
Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Migrate Data to AtoM

  • 1. Data Migrations Some Considerations when Preparing to Migrate to AtoM http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
  • 2. Overview• Assess what data will be part of the migration • Do any in-system clean up prior to export • Review AtoM data formats and available fields • Establish a crosswalk between your data and AtoM’s fields • Export your data • Perform any additional clean up as needed • Transform your data to an AtoM compatible import format • Import • Review your work • Revise and reimport if needed • Make small clean up edits in AtoM directly
  • 3. Expect this to take time • Our average length for a client data migration project is around 4-6 months. Even for a simple project, there will be a lot of time needed for data clean-up, quality assurance review, and reimports. Expect to do your import more than once • It’s unlikely that everything will go perfectly on the first attempt. You’ll discover some records don’t quite match the same pattern as the rest, or one field didn’t import, etc. Don’t be discouraged, and do budget your time with this assumption in mind. Before Starting
  • 4. Develop a data management plan while you migrate • How will you ensure you are not stranding data during the time of your migration? Will you freeze data entry entirely for the length? Manage your data in a spreadsheet? Run a small migration for new data at the end? Make sure everyone knows the plan. Clarify roles, deadlines, and communication channels • Ensure everyone involved knows what is expected of them throughout the project, and when. Clearly identify those responsible for key roles, and where to go for support. Before Starting
  • 5. Data assessment • How many descriptions do you have? How many top-level records? • Have the records been described based on any content standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?) • Are there custom fields with data in your system? How many? Do they readily map to known standards or not? • What export formats does your system support? • Is all record data captured in the exports? • Are some descriptions “draft” or non-public? Is this information captured in the export? Questions to ask in the data assessment phase:
  • 6. Data assessment • How many digital objects do you have to migrate? What types (images, text, video, etc) and formats (e.g. JPG, mp4, etc) are represented? • Are authority records maintained separately from descriptions? What about other entities? Accession records? • Is the relationship between these entities and descriptions captured in the export formats available? • Do these other record types have their own export formats? (e.g. EAC-CPF XML, SKOS XML, CSV, etc) • How are hierarchical relationships captured in the export? Questions to ask in the data assessment phase:
  • 7. AtoM Data FormatsArchival descriptions • CSV, EAD 2002 XML, MODS XML Authority records • EAC-CPF XML, CSV Accessions • CSV Terms (Subjects, Places, Genres, etc.) • SKOS - many serializations supported Repository records • CSV Current as of version 2.4
  • 9. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates CSV import will be the best way to get data into AtoM - because the CSV import template is a format specific to AtoM, there is no data loss and all fields are represented. If you are able to get your data out of your legacy system and transform it into an AtoM-compatible CSV format, we recommend using this method for your migration project.
  • 10. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates In the example CSV files from v2.2 on, we have included the relevant content standard name and number in the sample data field. This means you can import the CSV template to produce a sort of “crosswalk” or key, showing you how fields in AtoM map to the column headers.
  • 12. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 13. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 15. AtoM EAD 2002 XML EAD 2002 XML is a flexible standard with many possible valid but different implementations. For this reason, your locally generated EAD, while valid, may still not import perfectly into AtoM. This is why we prefer working with CSV imports whenever possible. We recommend running a test import of a representative sample from your source system into AtoM, and using the crosswalk method discussed above to evaluate if you will need to make changes to how your EAD XML is encoded for a successful import into AtoM.
  • 16. Crosswalking Crosswalking is the process of mapping your source data fields to equivalent AtoM ones. To do so, you must understand how AtoM handles some data (such as authority records, terms, etc.) first. There will be cases where there are no 1:1 equivalencies either - you will have to make decisions about how to combine or split apart your existing data to make it work with what is available. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 17. Crosswalking AtoM is standards-based. This means you can focus on crosswalking to the content standard you know best. Use the guidance provided in the relevant standard to help inform your mapping. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 18. AtoM Entity Types A(n incomplete) list of the main entity types around which AtoM was built. https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
  • 19. Accessions Accession records have their own CSV import format. As there is currently no international accessions standard, you will need to review the available fields in AtoM closely and determine where to map your data. Descriptions can be linked to Accessions via the accessionNumber column in the description CSV templates. We recommend importing your Accessions first, then your descriptions with the corresponding accession number, to establish links. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#import-accessions-via-csv
  • 20. ActorsIn AtoM, creators and name access points are maintained separately as authority records, so they can be re-used and linked to multiple descriptions. This means any creator name or name access point you import with your descriptions will create an authority record, or link to an existing match! Make sure that names are consistent in your data, and the biographical/administrative history is about the actor only - not specific to the description. See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority- records/#authority-bioghist-access • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on- authority-records-archival-descriptions-and-csv-imports
  • 21. ActorsThe Actor data you can add to a description CSV is minimal - if you do maintain authority records, then you may want to import them separately via AtoM’s authority record CSV templates. There are 3 actor CSV templates - the main actors template, 1 to supplement relationship data (between actors and/or resources) and 1 to supplement alternative forms of name. We recommend importing authority records before descriptions, so you can link them on description import. See: • https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator- related-import-columns-actors-and-events • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import- authority-records-via-csv
  • 22. Event Dates Description edit templates have 3 date fields. The Display date is what the end user will see - it is free text. The start and end dates must follow ISO 8601 (YYYY-MM-DD, etc) formatting. These fields are used to support AtoM’s date range search. • Display date • Start date • End date
  • 23. Event DatesDuring CSV import, Creators and Dates are paired (as Events - see Entity types diagram). Use the | pipe character to add multiple creators/dates. You can use a literal NULL value in your CSV file to keep the spacing correct for dates without actors or vice versa: See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#creator-related-import-columns-actors-and-events
  • 24. Access PointsIn AtoM, access points on a description (e.g. subjects, places, genre terms) are maintained separately as terms in a taxonomy so they can be controlled and reused. This means that access point data in your description imports will either create new terms or link to existing ones. Make sure your data is consistent so you don’t have near- duplicates later! (e.g. “cars” vs “car” vs “automobiles”) The exception is name access points - these are authority records! See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit- content/terms/#term-name-vs-subject
  • 25. Hierarchies Hierarchies are managed in the description CSV templates via the legacyId and parentId columns. Parent records must import in a row above child records. The children should have the legacyId value of the parent record in the parentId column. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#hierarchical-relationships
  • 26. Can be imported with descriptions using the digitalObjectURI or digitalObjectPath columns. URIs point to external, web-accessible resources - must end in file extension! Paths point to a local directory added to your server prior to import. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#digital-object-related-import-columns Digital Objects
  • 27. You have 3 main options when it comes to transforming your data into an AtoM- compatible format: • Manual data transformation • Tools such as OpenRefine • Transformation script http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly Data Transformation
  • 28. OpenRefine is “a free, open source power tool for working with messy data and improving it.” See: • http://openrefine.org/ • https://github.com/OpenRefine/OpenRefine There are many great free resources to help you get started. Data Transformation
  • 29. Use OpenRefine to: • Add AtoM column headers • Normalize names and terms • Standardize identifiers or accession numbers • Split source data into separate columns • Combine data into a single column • Delete unnecessary rows • Global search/replace • etc You can also use OpenRefine to clean up XML data! Data Transformation
  • 30. A transformation script is generally a script prepared by a developer that takes an input (your source data) and runs a series of operations to transform the data into the desired output (an AtoM- compatible file). These can be prepared in many programming languages (e.g. PHP, Python, etc). Data Transformation
  • 31. Import Ordering If you are working with several different types of data, you may need to perform multiple imports, possibly in different formats. If so, we recommend proceeding in this order to link entities together as your imports proceed. We also recommend running a smaller sample test first! 1. Terms 2. Repositories 3. Actors 4. Accessions 5. Descriptions