Workshop presented at the Wisconsin Conference for Local History and Historic Preservation, Wisconsin Rapids, October 11, 2013. Presenters: Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society and Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS.
USPSÂŽ Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
Â
Building Digital Collections: Managing and Sharing
1. Local History and Historic Preservation Conference
Building Digital Collections
Part 2: Managing and sharing
Supported by WHRAB
2. TODAYâS AGENDA
⢠Introductions
⢠Tell us about yourself
⢠Creating an inventory
⢠Starting your inventory
⢠Selecting content to preserve
⢠Managing your collections
⢠Organizing collections
⢠Management tools
⢠Storage options
⢠Access considerations
⢠Why provide access?
⢠Software options
⢠Promoting your collections
⢠Wrap-up and final thoughts
Waterford Public Library/University of
Wisconsin Digital Collections
3. introductions
⢠We areâŚ
⢠Sarah Grimm, Electronic
Records Archivist, Wisconsin
Historical Society
⢠Emily Pfotenhauer,
Recollection Wisconsin
Program Manager, WiLS
⢠You areâŚ
⢠What organization do you
represent?
⢠What digital projects are you
currently working on or
thinking about?
Eager Free Public Library/University of
Wisconsin Digital Collections
4. LOC and DPOE
The Library of Congress started the Digital Preservation
Outreach and Education (DPOE) program in order to foster
national outreach and education to encourage individuals
and organizations to actively preserve their digital content.
http://www.digitalpreservation.gov/education/
5. Digital Preservation
Digital preservation combines policies, strategies and
actions to ensure access to reformatted and born
digital content regardless of the challenges of media
failure and technological change. The goal of digital
preservation is the accurate rendering of
authenticated content over time.
Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007
6. What is Digital content?
⢠Digital content is any content that is published or
distributed in a digital form, including text, data, sound
recordings, photographs and images, motion pictures,
and software.
⢠Digital materials created from analogue sources
⢠Born-digital content
⢠Digital materials you currently have or create â or expect
to have â that you want to preserve.
7. Well-managed Collections
⢠Sample characteristics of well-managed:
⢠Basic information about each collection
⢠Minimal metadata for objects (you define)
⢠Common file formats
⢠Controlled and known storage of content
⢠Multiple copies in at least 2 locations
9. Why Do We Identify Content?
⢠Not all digital content can
or should be preserved
⢠Good digital preservation
requires an explicit
commitment of resources,
which - for most
organizations - means
planning ahead
⢠An explicit inventory is the
best way to identify content Food for the Boys in France
Image ID: WHi-35438
10. First Steps
⢠Identifying content is a first step to planning for current
and future preservation needs
⢠Ask: what content
do I have,
will I have,
might I have,
must I have?
An inventory is the best way to identify what content you
have now â and raise awareness in your institution.
11. Goals
⢠Identify potential digital content you may need to preserve
⢠Treat the inventory as a management tool that grows as your
preservation program grows
⢠Use it as a planning tool â e.g., to prepare staff, training,
annual growth
⢠Use as a basis for acquiring content, defining submission
agreements, plans
13. Inventory Considerations
⢠Inventory content is more important than the technology
⢠Inventory results should be:
⢠Documented: an inventory should actually exist
⢠Usable: use a simple format to sort, list, etc.
⢠Available: accessible to others
⢠Scalable: should be able to add content/fields over time
⢠Current: update periodically and date it
14. Inventory Tips
⢠Donât let implementing the software become the focus
⢠Use software you know and have available
⢠Test it out with a number of people and collections
⢠Stick with a single format; don't change once you've
decided on it
⢠Be consistent, comprehensive, and concise
15. How Much Detail to Include
⢠Inventories can be general to detailed
⢠Determine appropriate level of detail for you
⢠Factors in determining level of detail:
⢠Extent of content to be inventoried
⢠Nature & location of content
⢠Resources available to complete inventory
⢠Timeframe & deadlines for completion
16. What Do You Have?
⢠Identify collections of digital materials.
(Donât work at the item levelâŚ..)
⢠Provide a brief title and description
⢠Estimated growth over time ***
17. Who is involved?
⢠Who is currently managing the collection/digital content
⢠Who knows the most about it?
⢠Creator (Internal or External) â who created the digital
content
Digital
Management
Collections Management
Creator
THESE MAY BE DIFFERENT PEOPLE
18. What does it consist of?
⢠Medium (6cds, 1 hard drive, 115 floppy disks)
⢠Extent = Format + Amount (600 .pdfs, 30 .doc)
⢠File Size â (MB, GB, TB)
http://www.csgnetwork.com/memconv.html
19. Date Considerations
Inventories should note:
â˘Date of inventory and updates to it
â˘Dates associated with the content (1860ď 1865)
â˘Date of files â created or modified (2009)
â˘Date received â if relevant / possible (2011)
Shawano
Probate Cases
1860-1865
Digitized by USG
In 2009
Received by WHS
In 2011
20. Content Location
Locations of content are important :
⢠List primary locations (Network drive location, Hard drive on
Bobâs shelf)
⢠List locations of all backups/copies (CDs in the storage room,
weekly backup tapes)
Remember to change locations as content moves
22. Why select content to preserve?
Log jam on the St. Croix River, 1886
Wisconsin Historical Society WHi-2364
23. Why select content to preserve?
⢠Cost: storage may be cheap, management is notâŚ
especially over time
⢠Discovery and dissemination services: scale, scope,
performance, sustainability
⢠Quality of content may be variable
⢠Content meets organizationâs mission
24. Selection Criteria
Ask yourself which materials areâŚ
â˘most significant to your organization?
â˘most unique?
â˘highest value?
â˘most extensive?
â˘most requested/used?
â˘easiest?
â˘oldest?
â˘newest?
â˘at risk?
Neville Public Museum of Brown County
25. Show Stoppers
Stop if or when the answer is NO
â˘Content
⢠Does the content have long term value?
⢠Does it fit your scope and mission?
â˘Technical
⢠Is it feasible for you to preserve the content?
â˘Access
⢠Is it possible to make the content available?
⢠Are you the only holder of this content?
26. Add to your inventory
Supplement your inventory with more detailed
information about the material you plan to preserve
over the long term.
â˘Access
⢠How will the public access the content?
⢠Is access restricted? How? For how long?
â˘Rights
⢠Who owns the rights to preserve and disseminate?
â˘Use
⢠Whatâs the lifespan of the content?
⢠Will its value/use change over time?
27. Add to your inventory
⢠Data criticality
⢠Is it only in digital form?
Do we hold the only
copy?
⢠Business/mission
criticality
⢠If we lose it, whatâs the
damage to our
reputation? How will it
impact our function or
services?
Charlie Chaplin and Jackie Coogan in The Kid.
Image ID: WHi-68423
30. Analyze the Results
When the inventory is complete, ask yourselves what digital
content
â˘do we have that we didnât
know about?
â˘should we be keeping that we
arenât now?
â˘will we create or likely acquire
in the future?
â˘are we required to keep?
â˘do we need to review?
"Deering Ideal" Stripper Harvester Catalog Cover
Image ID: WHi-27577
31. ORGANIZE YOUR FILES
⢠Centralize your files
⢠Minimize your layers
⢠Leave breadcrumbs (AKA
âREAD MEâ)
⢠Determine what you donât
know
IH General Office Mail Room
Image ID: WHi-12016
32. WHAT NOT TO KEEP?
⢠Backups/copies/drafts
⢠Supplementary files that
provide no additional long-
term value
⢠Corrupted files
⢠Same item â different file
formats
⢠Items that donât fit your
organizationâs purpose
Boy on Curb near Trash Pile
Image ID: WHi-57208
33. Goals/Outcomes
⢠Expanded inventory of content to preserve
âŚand what you can delete (gray areas identified)
⢠Well-defined and documented selection criteria, policies and
procedures
⢠Better understanding of content for future planning and
growth
Greater knowledge = greater control!
35. Remove Empty Directories
The application searches and deletes empty directories
recursively below a given start folder and shows the result in a
well arranged tree
http://sourceforge.net/projects/rem-empty-
dir/files/latest/download?source=files
36. Remove Duplicate Files
⢠Auslogics Duplicate File Finder
http://www.auslogics.com/en/software/duplicate-file-finder/
⢠Similar Images
http://similarimages.en.softonic.com/
⢠VisiPics
http://www.visipics.info/index.php?title=Main_Page
49. Checksums
⢠Checksums (AKA âHash Sumsâ) are created by programs
running an algorithm against the contents of a file.
(there are many free utilities that will perform this function for you)
⢠The resulting checksum is a short
sequence of letters and/or numbers
that uniquely identifies that file.
(think âelectronic fingerprintâ)
Unix cksum utility
50. Why is this a good thing?
⢠Checksums help maintain the INTEGRITY of your
collections because they will tell you when things change
over time.
⢠If two files are exactly the same, the checksums of those
files will also be exactly the same (generally speaking )
⢠If a file becomes corrupted, degraded or is changed in
some way, the next time you run the utility on it, the
checksum will change
57. Verify Hash Values
⢠Copy files to another
directory
(think âbackupâ)
⢠Open MD5Summer
⢠Select the files in
the new location
⢠Click âVerify Sumsâ
58. Open the Md5sum file
⢠Find your MD5 file
⢠Click âOpenâ
61. Things to remember
Things that will NOT affect checksums
â˘Moving items from one place to another
â˘Changing the file name
Run on the master files
when a collection is
completed
Set up a schedule to run
âverify checksâ periodically
St. Mary of the Lake Parish School First Day
Image ID: WHi-98433
63. Key Decision Points
⢠How are you going to organize it?
⢠What are you going to store it on?
⢠Where are you going to store it?
⢠How many copies do you
need?
Post Office
Image ID: WHi-9135
64. Factors to consider
⢠Immediate Costs
⢠Quantity (size and number of files)
⢠Number of copies
⢠Media (life span, availability, $$)
⢠Other resources
⢠Expertise (skills required to manage)
⢠Services (local vs. hosted)
⢠Partners (achieving geographic distribution)
⢠Institutional constraints
65. How Many and Where?
⢠Multiple
⢠Minimum: two (2) copies in two locations
⢠Optimum: six (6) copies
⢠Geographically distributed
⢠Donât keep your copies onsite if possible
66. Local STORAGE OPTIONS
⢠Local network
⢠RAID device
⢠External hard drive
⢠Archival quality (gold) CDs
or DVDs
Take into account potential
future storage needs.
Villa Terrace Decorative Arts Museum
67. Cloud storage options
Commercial options:
â˘Google Drive
⢠Up to 5GB free (approx. 140 high-resolution TIFF files)
⢠25GB = $2.50/month
⢠Amazon Simple Storage Service (S3)
⢠$.095 per GB/month
Institutional options:
â˘DuraCloud
*Public Records Board Guidance on the Use of Contractors for
Records Management Services
*Use of Contractors for Records Management Services
69. why are you providing
access to content?
⢠User demand
⢠Institutional visibility
⢠Legal mandates or grant
requirements
⢠Generate revenue
⢠Contribute to our collective
knowledge
South Wood County Historical Museum
70. What makes a good
online collection?
⢠Publicly accessible.
⢠Searchable - Includes keywords and other descriptive
information (metadata) so users can find what theyâre
looking for.
⢠Organized and consistent.
⢠Based on existing international/national/statewide
standards and best practices.
⢠Uses software that is sustainable (will be around for a
long time) and interoperable (can be migrated or
shared).
⢠Respects intellectual property rights.
71. What are we aiming for?
Content should be delivered to users over time:
â˘Easily â using current and known technologies
â˘Coherently â well-documented and presented
â˘Completely â intact and well-formed
â˘Correctly â accurately representing content
â˘Reliably â using well-managed technologies
â˘Consistently â in accordance with policies
â˘Fairly â with equity and precedent
72. Some software options
⢠CONTENTdm
⢠ResCarta Web
⢠PastPerfect Online
⢠Omeka
Beloit College
73. contentdm
⢠Hosted by Milwaukee Public Library through Recollection
Wisconsin
⢠Produced and distributed by OCLC
⢠Costs:
⢠$200 one-time setup fee
⢠Annual hosting fees starting at $75
78. Rescarta web
⢠Free and open source
⢠Host it yourself; or hosting available through Northern
Micrographics (fee-based)
⢠ResCarta Foundation â based in La Crosse
86. omeka
⢠Free and open source
⢠Host it yourself; or subscribe to hosted version, omeka.net
⢠Developed by the Center for History and New Media, George
Mason University
91. Potential audiences
⢠Local residents
⢠Students and teachers
⢠Genealogists
⢠Specialists (e.g. Civil War
re-enactors, railroad
buffs)
⢠Academic researchers
⢠Curious Wisconsinites
⢠Everyone!
College of Menominee Nation
92. Stakeholders and partners
⢠Board
⢠Staff and/or volunteers
⢠Local experts
⢠Community members
⢠Chamber of Commerce
⢠Local government
⢠Students
⢠Other organizations in
your community/
county/region
⢠Who else? McMillan Memorial Library, Wisconsin Rapids
93. Encouraging use of your collections
⢠Organizations are moving
away from âif you build it,
they will comeâ approach â
Google is not enough
⢠Participatory archives
conceptâshared authority,
community engagement
⢠Bring your content to your
audienceâfind them where
they already are
⢠Let them look behind the
curtain and see projects in
progress, warts and all Milwaukee Public Library
94. PROMOTION â BRAINSTORMING
⢠What are some ways youâve had success promoting
your digital collections?
⢠What are cool ideas youâve seen that youâd like to
try?
95. Marketing ideas
⢠Add introduction/background
information on your own
website
⢠http://www.newberlinhistoricalsociety.org
⢠Highlight an item of the
day/week/month
⢠https://www.facebook.com/lacrosse.history
⢠Host an opening event
⢠Whitefish Bay Public Library
⢠College of Menominee Nation
⢠Host a slide show or exhibition
⢠South Wood County Historical
Museum
⢠Mineral Point Historical Society Rock County Historical Society
96. Marketing ideas
⢠Send someone with a laptop to popular local
spots/events to demonstrate digital collections:
⢠Ask, âWhere do people go first to look for this kind of
information?â and then, market there!
⢠Upload a few digitized images to Flickr with descriptions that
point back to your related digital and physical collections.
⢠Contribute to relevant pages on Wikipedia and include
references pointing to specific digital materials.
⢠Request that the Chamber of Commerce and other
relevant local organizations link to the new digital
collections from their websites.
⢠Send a press release to local media
97. EVALUATING IMPACTEVALUATING IMPACT
Understanding current usersâŚ
ďOnline survey instrument
ďWeb analytics
ďEmail subscriber lists
ďVisitor forms
Understanding future usersâŚ
ďSpecial interest groups (AASLH, SAA, etc.)
ďListservs
ďWorkshops and conference sessions
98. WRAPPING UP â FINAL THOUGHTS
Commencement, 1978
UW-Madison Archives
99. Next steps/To do list
⢠Create and maintain an inventory
⢠Develop your selection criteria
⢠Play with the tools
⢠Develop a storage management policy
⢠E.g., number of copies, locations
⢠Monitor copies of content for errors/changes
⢠Evaluate technology to determine your preferred
access platform
⢠Develop a marketing plan
⢠Determine how you will evaluate the success of your
marketing plan
100. Thank you!
⢠Sarah Grimm, Wisconsin
Historical Society
sarah.grimm@wisconsinhistory.org
608-261-1008
⢠Emily Pfotenhauer, WiLS
emily@wils.org
608-616-9756
⢠Slides and handouts
available at
http://recollectionwisconsin.org/localhistory2013
South Wood County Historical Museum
Editor's Notes
As part of that, they developed 6 modules regarding different aspects of managing e-records and have trained several groups of people to bring those modules to groups dedicated to working with e-records.
We are looking to digital preservation for an answer because we realize that being in digital form is not the same as being digitally preserved. Digital preservation is active management of digital content over the long term with access as it âs ultimate goal. With books or documents â We can read it and put it on the shelf and continue to open it and read it for decades with proper handling. However, once something is digitized, we can ât expect to set it aside and then open it in 10 years much less 50 without active management. We must find ways to ensure that the digital item is accessible. In order determine how we are going to preserve something, we must first have an understanding of what we have. We must IDENTIFY it
Ask if anyone currently has an inventory and what software is being used
Ask if anyone currently has an inventory and what software is being used
Refer to the handout here âŚ.. Extent â How much is there? Nature and Location â is all the information onsite, or would you need to travel to multiple locations to capture everything? Resources â How many people can you get to help â is it just you, a small staff, volunteers? Timeframe â Give yourself a time frame for this. Keep in mind this is never âdoneâ
HANDOUT â What? - Work at the collection level, not the item level. What is the familiar title for the collection? Description â Provide a brief description of what is in the collection. (photos, postcards, diaries, maps, etc) You are collecting information about items that are known and may be in your catalog + items that have come in your door that are waiting to be dealt with + items that are being created (digitization projects) + things you may not even know about yetâŚâŚ.
Creator â so that you can go back to them with any issues
It â s a good idea to note the format of you digital media, or what the digital content is stored on, since some format types last longer than others. Digital content on more fragile media (floppy disc) might be a higher priority.
You should make sure to specify the location of digital content in your inventory. Some things you will want to consider: How will you specify whether content is located online (meaning on your computer hard drive or a network server), or offline (meaning stored on some removable piece of media, like a CD or flash drive)? Location in storage system Keep in mind that you will need to update the inventory whenever the content moves. If you get too specific you might spend all your time updating file locations. Ask Audience - WHAT OTHER FIELDS ARE NOT INCLUDED THAT WOULD BE HELPFUL
After you âve compiled your inventory, it can be easy to get overwhelmed. You know youâve got lots of digital content, but how much of it is really your organizationâs responsibility to preserve? Meanwhile, youâve still got more logsâmore new digital contentâcoming in down the river. One of the goals of selecting content to preserve is to help get your logs moving againâstart setting priorities and pick a few things to tackle first so everything can start flowing more efficiently.
Not all of the content you âre dealing with may in fact be appropriate or necessary for you to preserve, and you donât want to commit resources to preserving materials you donât have to. You may hear people argue that storage is cheap so we should keep everything. Unfortunately that perspective is rather short-sighted. Storage may be cheap, but preserving the quality of content over the long-term is not. There are periodic migration costs , moving the digital materials into systems where you will preserve it. Monitoring files for corruption and change . [have you lost bits? Are the files degrading? Not to mention maintaining access to the files, which means updating your discovery and dissemination services every time hardware and software change. [an ongoing, recurring cost] The idea behind long-term preservation is that you will be making this content available in the future . It isn ât enough just to save the content if you canât access it any more. [ Quality ] Even if we could keep everything forever, would we want to? Is that manageable given the type of content that you hold? Not all digital content may be preservation quality â if you have high resolution scans of your photos, do you also need to preserve the lower quality versions of these scans? And not all will be significant enough to warrant preservation. [that string of emails about organizing the staff Christmas partyâŚ] Does the digital content we take in match our mission and scope of collections ? Quite often materials find their way to us that have little or nothing to do with our mission, yet we give hem a home and expend our resources on maintaining them. Maybe there is a better/more logical home for that content? [Maybe you could partner with another org that is better placed to hold and preserve that content.] The selection process for digital content is very analogous to the selection process for non-digital materials â you don ât collect materials for your archive that donât match your mission, and you should keep that same principles in mind when selecting digital content.
When you â re first getting started, it â s helpful to treat selection as a managed, structured project in order to plan and coordinate the process [and plan for the future]. The selection criteria you choose will be unique specific to your situation, your organization and its mission. Once you have your selection criteria, it may not be possible to review/select everything at once, so how might you sequence the process? Again, the answer will be different for each organization. Think about what âs most significant to your organization? Look inside your organization first: are there mission-related documents that might give you clues? existing manuals and policies , such as records retention schedules? Or Collecting policies? On the question of uniqueness , Is this the only source or is it preserved elsewhere? Avoid duplication The value of material can be determined by a variety of factors - historical, evidential, can ât reproduce? This must be assessed within the context of your own institution. most extensive ? (and therefore a more coherent body of material to manage) most requested/used ? Easiest to tackle (e.g. most familiar, most ready for ingest â a quick win for your digital preservation process; very helpful when you are having to prove the value of your efforts to a reluctant administration ) Oldest ( possible historical importance) Newest (possible immediate interest) Mandated (via local policies, legislation, etc.) At risk ? If it were no longer available, what digital files would be the hardest to replace? Some formats become obsolete a lot faster than other formats. PDFs are viable for a really long time â video files, however, get old very quickly.
Even if something fits your desired criteria, it still might not be reasonable for you to select it. You can use decision tree or list of questions to help you decide what âs practical to preserve. You âve already considered the content in view of your selection criteria. And you should already have answered âyesâ to both of these Qs to continue considering the materials you hold. does the content have long term value? does it fit your scope and mission ? Next you need to consider Technical issues: is it feasible for you to preserve the content? [Is it a â digital time bomb â ? S ome formats are a challenge to preserve, such as video/time-based media. Some may be too damaged to preserve. Do you have the skills and resources (either to undertake the preservation yourself or to buy the skills in)? Some types of material may require far more expertise and resources than you have available. And Access. Even if we âre not making it public, how useful is a server full of digital content that is safe, but that we canât access? We need to ask is it possible to make the content available over time? Are you the only holder of this content? [ Duplication] If it is not feasible to preserve the content, and not possible to make it available and usable, then it probably shouldn ât be included in your selection âespecially if you know you are not the only holder of this digital content.
Identify and Select An Introduction to Digital Preservation
Identify and Select An Introduction to Digital Preservation
Co-locate â It âs OK to move things around if it makes sense to do so. Layers â If you have several layers to hunt through, it can be really hard to find anything â Shallow is better Searching is really difficult if you have to search through multiple layers Breadcrumbs â OK to leave âsticky notesâ (AKA âREAD MEâ) files in folders. Can give a brief description of contents, retention schedule, any naming conventions Donât know â unknown file formats, files on old media (floppies), password protected
File backups â EX: Speeches had multiple drafts ď Final + copies in several different font sizes Supplementary files â folder of images that were used in a power point. Files you canât open â Corrupted Formats â may receive Word and pdf â May not want to keep both. As you are creating your inventory, you are likely to discover a lot of really simple places you can clean up the files you are reviewing. Co-locate â It âs OK to move things around if it makes sense to do so. Bury â If you have several layers to hunt through, it can be really hard to find anything â Shallow is better
The outcome of going through the work of selection is to gain a sense of control over what you have to deal with, what your scope is, and what your policies and priorities are for selection. This is critical to developing a sustainable program for support of long-term preservation and access. By applying your selection criteria to your inventory, you will have more detailed information to work with in your planning. This documentation can also inform your work with creators of digital content. This might include the creation of submission agreements or other policies so that the content coming in to your organization fits your selection criteria for long-term support. The selection process puts you on the path to a sustainable program. Selecting content is ultimately not a one-time project but a long-term, ongoing process, so formalizing it through policies, schedules and other documented criteria will help you avoid more log jams in the future. Identify and Select An Introduction to Digital Preservation
We âve learned that it is essential to remove duplicates first. Once you start using other tools and changing things, the duplicate finder applications are no longer as accurate. These are all FREE We have used all three of these and they all work a little bit differently under the covers so the results vary a bit. Auslogics has a number of products that are for sale, but this one falls under their âfreebieâ category We found it really helpful with documents and wanted to try it with the images. Similar Images - It creates a database, so consecutive runs go faster but the first run while it is creating the database can be really slow. This application works with lots of file formats Visipics â This application only works with a handful of file formats, but it hits the main ones and does it really well. It will detect two different resolution files of the same picture as a duplicate (we had a number of photos that were corrected with Photoshop and this picks those up), or the same picture saved in different formats, or duplicates where only minor cosmetic changes have taken place.
This application had some real strengths + You can run it against USB removable devices (but not against a network drive) When it searches, it pulls duplicates by comparing the MD5 checksum of a file â so we were highly confident that these were the exact same image + Can âbulk deleteâ if you are confident so that is a speed/efficiency plus Negative parts of this tool Unlike the other two programs, there is no âpreview modeâ so you canât see the images although you can pull them up if you click on them. You can also right-click and check the properties of the images and get all kinds of metadata â (camera make and model, f stop, exposure time, ISO Speed, date, time, dimensions, dpi, bit depth, This doesn ât catch the âalteredâ images (changed the lighting) or âSimilarâ images
When run again choosing âIgnore File Namesâ and âIgnore File Datesâ more duplicates are likely to come up It will find: + It found duplicates with a name change + Found some photo glitches that the other programs didn ât It also found a lot of problems (same picture with different dates) so it may provide you some information to resolve metadata issues.
Similar Images Will show errors (.avi file, image file) â often you will find these are corrupted images. Default threshold is 35 and should catch âpairsâ that are pretty similar This found 483 duplicates, 18 of those are âsimilar imagesâ with the threshold set at 15. (483 results) (0 ď 100). When run at a threshold of 30 â there were 538 results. Default threshold is 35 and should catch âpairsâ that are pretty similar Fast / Duplicates : 0 - 10 Photos : About 12 Collages : 12 â 35 (Images close to each other) Collections of Scans : 50 - 60 Mixed Stuff : 20 - 40 Semi Automated Deletion â Based on rules you set âthis feature pre-select the file to delete but you have to confirm Lower/higher resolution, bigger/smaller file size, older/newer file, left/right file â The rules will be checked from top to bottom LOTS of file types to choose from to include in the comparison Database stores â file name, file size, last change, dimensions, color/model and will remember if you indicate a pair is not a duplicate
Similar Images has a nice working screen â photos are side by side and a lot of the key metadata is there to compare. File name and path are there. File size â the bigger one is in green Things that are == show in red CRC32 â fast hashing algorithm that generates 32 bit checksum on the file â if they match, they are red Type of file â match == red Last modified â the most recent will be green if they are different. + Nice large screen in which to view both photos together + Easy GUI interface + Colorcoding + They have a pretty decent basic âHelpâ document to explain the icons/colorings/setup. Negatives When the view window is up, you can ât do anything on the main window which sits behind it Can only view one set at a time and use the Next/Previous to move around
Everything is TOTALLY identical except for the name â they are even in the same folder Keep the labeled photo
Adjusted photo â The right photo now gives us a name and cleans it up
Visipics Like Similar Images â You choose the folders you want to compare and set the Threshhold The pictures appear along the left and you can skip around between the âpagesâ You can mark the deletions by clicking on the photo you want to delete. Runs faster than Similar Images Auto Select allows you to preselect images for deletion based on rules (but many fewer rules available than Similar Images) You can also move or rename photos from this screen if you need to
We âve learned that it is essential to remove duplicates first. Once you start using other tools and changing things, the duplicate finder applications are no longer as accurate. These are all FREE We have used all three of these and they all work a little bit differently under the covers so the results vary a bit. Auslogics has a number of products that are for sale, but this one falls under their âfreebieâ category We found it really helpful with documents and wanted to try it with the images. Similar Images - It creates a database, so consecutive runs go faster but the first run while it is creating the database can be really slow. This application works with lots of file formats Visipics â This application only works with a handful of file formats, but it hits the main ones and does it really well. It will detect two different resolution files of the same picture as a duplicate (we had a number of photos that were corrected with Photoshop and this picks those up), or the same picture saved in different formats, or duplicates where only minor cosmetic changes have taken place.
This screen is very representative of what many of our folders look like when we open them. Irfanview âs strengths + Can change the size of these photos so that you can see more/less + Can delete from this page without having to go back into the folder + Can double-click on photos to see full screen versions and move through the images that way
Resulting MD5 file can be opened in any text editorâŚ..
WHAT are you going to store it on? WHERE you are going to store it? HOW MANY COPIES are you going to make?
WHERE are you going to store it? What are your Options? Decisions can be determined by a number of thingsâŚâŚ. Size â The options you consider will vary depending on how much you have to store. Media â CDs â on average 5 years Gold CDs - more If youâve burned it â as little as 2 depending among other things on the quality of the CD to begin with. Magnetic Tape â could last 30 years but its very sensitive to heat, magnetic fields and dust. Is the company producing the hardware you are using to run the storage media still around? Cloud â whatâs it going to cost to rent space. Sometimes it costs more when you pull it out than when you put it in. You also need to determine where you donât want to store it and migrate it off those devices accordingly USB drives, old media,
Three copies is a happy medium if you are able
RAID = Redundant Array of Independent Disks = multiple hard drives in one package
These are the elements you want to aim for when planning for providing access to your content over time. Easily: this will change over time and will depend a lot on your staff, users, nature of your institution Coherently: document what you âre doing, provide context Completely, Correctly, Reliably: these all relate to how you manage the technology Consistently, Fairly: write good policies and procedures and stick to them
Some ideas: Think about providing an online survey instrument at various strategic points within your access environment. Implement some free web analytics â find out where people are linking into your site, what are they requesting, how much time do they spend on what materials. If you have email subscribersâ solicit some inquiries from your regular users about their visits to your digital collections and how they might be finding them useful, as well as ways of improving upon your existing levels of service. If you are only providing access on-site, make sure you add some lines on your visitor forms that account for their use of your digital collections. So, you may get very good at serving your digital collections up to your current stakeholders through these different monitoring measures. But what about new users, users who could really benefit from exposure to your materials but may require it in a different form or through different means, like tablets or cell phones. Maybe near-future users will want to run all sorts of sophisticated services over the top of your materials along with materials from all sorts of other institutions. How might you be able to track trends or use cases from other similar institutions to find out where access needs are heading?