Hybridoma Technology ( Production , Purification , and Application )
Diacritics Online
1. 1
How to import diacritics into CONTENTdm
from a library catalog using Excel and
MarcEdit
Jill Strass
This talk was inspired by our struggles to digitize some
Nordic Solo Songs as collected by Dan Dressen and
bravely cataloged and uploaded by Kathy Blough.
Jill Strass
St. Olaf College
Upper Midwest Online CONTENTdm Conference
November 8‐9, 2010
The Challenge
• Shortcut to metadata: obtain MARC records• Shortcut to metadata: obtain MARC records
containing diacritics from a library catalog
as a tab‐delimited file for easy import into
CONTENTdm
2. 2
The Method
• Export our records from the library catalog• Export our records from the library catalog
as a delimited file
The Method
• Export our records from the library catalog• Export our records from the library catalog
as a delimited file
• Use the tab‐delimited file to generate
metadata for CONTENTdm
3. 3
The Method
• Export our records from the library catalog• Export our records from the library catalog
as a delimited file
• Use the tab‐delimited file to generate
metadata for CONTENTdm
• Upload as a compound object into Up oad as a co pou d object to
CONTENTdm
The Challenge
• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t
allow us to cleanly export fields with
repeating values from the catalog to a
delimited file.
4. 4
The Workaround – Catalog to MarcEdit
• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t
allow us to cleanly export fields with
repeating values from the catalog to a
delimited file.
• No worries, we’ll use MarcEdit
The Workaround – Catalog to MarcEdit
• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t
allow us to cleanly export fields with
repeating values from the catalog to a
delimited file.
• No worries, we’ll use MarcEdit
• Convert the tab delimited file (.out) from
the catalog into an (.mrc) format file using
MarcEdit
5. 5
The Workaround – Catalog to MarcEdit
• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t
allow us to cleanly export from the catalog
to a delimited file.
• No worries, we’ll use MarcEdit
• Convert the tab delimited file (.out) from Co e t t e tab de ted e ( out) o
the catalog into an (.mrc) format file using
MarcEdit
• Take the (.mrc) file and export using
MarcEdit’s tool for tab‐delimited files.
The Workaround – Catalog to MarcEdit
• Uh oh we have an export bug that won’t allow us to• Uh oh, we have an export bug that won t allow us to
cleanly export from the catalog to a delimited file.
• No worries, we’ll use MarcEdit
• Convert the tab delimited file (.out) from the catalog
into an (.mrc) format file using MarcEdit
• Take the (.mrc) file and export using MarcEdit’s tool ( ) p g
for tab‐delimited files.
• In MarcEdit, we choose which MARC fields we want
for our metadata in digital collections.
6. 6
The Trick to know in MarcEdit for
diacritics
• Use the MarcEdit Characterset Translation• Use the MarcEdit Characterset Translation
tool, and while breaking the record, select
UTF‐8 as the format, so Excel can recognize
diacritic characters.
The Trick to know in MarcEdit for
diacritics
Note that the box forNote that the box for
Translate to UTF-8 is
checked.
7. 7
The Trick to know in MarcEdit for
diacritics
Yippee! If youYippee! If you
look real close,
you can see
diacritics are
showing up in
the text editor in
MarcEdit.
Trick for Diacritics in Excel
• Now we have our diacritics within a tab• Now we have our diacritics within a tab
delimited file, courtesy of MarcEdit.
• There is a trick you’ll need to use when you
first open Excel.
8. 8
Trick for Diacritics in Excel
When you first open your
tab-delimited file from
MarcEdit, when Excel takes
you through its wizard for
importing the tab delimited
file, select 65001 Unicode
(UTF-8) from the File Origin
pull-down menu.
This will allow Excel to
“see” the diacritics.
Generating Metadata from tab‐
delimited files
• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that
allows us to take a row from a tab delimited
file, copy and paste it into Excel, and then
Excel generates a compound object
template for easy upload into CONTENTdm.
9. 9
Generating Metadata from tab‐
delimited files
• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that
allows us to take a row from a tab‐
delimited file, copy and paste it into Excel,
and then Excel generates a compound
object template for easy upload into
CONTENTdm.
• We do this to avoid manual data entry as
much as possible.
Generating Metadata from tab‐
delimited files
• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that
allows us to take a row from a tab‐
delimited file, copy and paste it into Excel,
and then Excel generates a compound
object template for easy upload into
CONTENTdm.
• We do this to avoid manual data entry as
much as possible.
• If you’d like a spreadsheet file and
documentation on how to use it contact
10. 10
Generating Metadata from tab‐
delimited files
• To convert the xls file• To convert the .xls file
to .txt, we select,
copy and paste from
Excel into Notepad++.
• We do this so we can
see exactly what
characters are
showing up in our
text files.
Generating Metadata from tab‐
delimited files
• Note that Notepad++• Note that Notepad++
is so cool, we don’t
need any tricks to
use it!
12. 12
Uploading into CONTENTdm with
Diacritics (CDM 5.3)
If only it were thisIf only it were this
simple…. For us,
we had to select
ANSI for this to
work, but according
to the
documentation,
UTF-8 as encodingUTF 8 as encoding
is supposed to
work.
Uploading into CONTENTdm with
Diacritics (CDM 5.3)
We may nevery
know why this is so
for us. Please
share your
experiences.
13. 13
A Sample of Diacritics on CONTENTdm
And here we are, at journey’s, j y
end….
Summary of Diacritics on
CONTENTdm
• Export MARC records from your catalog or source forExport MARC records from your catalog or source for
text with diacritics.
• If you need to use MarcEdit in this process, select the
UTF‐8 box in the Characterset Translation Tool.
• When first opening a tab‐delimited file in Excel, select
65001 Unicode (UTF‐8) from the File Origin pull‐down
menu.
• When uploading to CONTENTdm, experiment with the
UTF‐8 vs ANSI setting in the Add Compound Object,
File Mapping, Encoding box.