SlideShare una empresa de Scribd logo
1 de 117
Descargar para leer sin conexión
Rachel Lovinger @rlovinger
Confab, 22 May, 2015
Image via Bond
2
©2015 All rights reserved.
• Experience Director, Content Strategy;
Razorfish New York
• Co-editor of scatter/gather, a content
strategy blog:
http://scattergather.razorfish.com
• Author of Nimble: A Razorfish Report
on Publishing in the Digital Age (June
2010): http://nimble.razorfish.com
• Twitter: @rlovinger
4
5
6
7
8
9
10
11
©2015 All rights reserved.
is
HARDCORE
12
©2015 All rights reserved.
2006
2009
2008
2012
2011
2010
13
©2015 All rights reserved.
Metadata = Context
Context enables Connections
How does one convey that in a concise and powerful way?
14
Photo by Jesse Chan-Norris
Metadata Is A
Love note
To the Future
16
Tweet and photo by Erin Kissane, Tumblr by Austin Kleon
429 notes
82 retweets
17
Photo and shirt by Sarah
18
Photo by Rachel Lovinger
19
Content Strategy for Mobile by Karen McGrane
21
• Nearly 60,000
files archived
• Mostly from
1980-1995
• Collected and
curated since
1998
• Almost no
metadata
Textfiles.com
22
Who needs a database?
23
Metadata Skeptic transformed into… Metadata Warrior
Photos by Jason Scott and Rachel Lovinger
24
Photo by Rachel Lovinger
25
• Me?
Photo by Rachel Lovinger
ENTERTAINMENTWEEKLY
Metadata for Journalism Products
27
©2015 All rights reserved.
~3 years online content ~10 years magazine content
28
©2015 All rights reserved.
Imported from text files to CMS
29
©2015 All rights reserved.
Semi-structured information
allowed us to map the files to
content types and site sections,
and add some metadata (author,
published date, keywords, etc.)
10 years
x 50 issues per year
x 100 files per issue (approx.)
50,000 estimated articles
30
©2015 All rights reserved.
Once in the CMS, we could add
photos, links, formatting, etc.
31
©2015 All rights reserved.
For the content already in the
CMS, keywords had been
manually typed in by authors
• 6790 “different” keywords
• Removed 12% during clean up
• Typos
• Redundant
• Not Useful
33
©2015 All rights reserved.
• Star Wars: Episode I -- The Phantom Menace
• Episode 1
• Episode I
• Phantom Menace
• Star Wars Episode I The Phantom Menace
• Star Wars Episode I: The Phantom Menace
• Star Wars prequel
• Star Wars: Episode 1 -- The Phantom Menace
• Star Wars: Episode i -- the Phantom Menace
• Star Wars: Episode I: The Phantom Menace
• Star Wars: Episode I--The Phantom Menace
• Star Wars: Episode I--The Phantom Menance
• Star Wars: Episode One -- The Phantom Menace
• Star Wars: The Phantom Menace
• Star Wars: The Phantom Menace -- Episode I
• The Phantom Menace
• The Phanton Menace
34
©2015 All rights reserved.
• TAFKAP?
35
©2015 All rights reserved.
• TAFKAP?
• The Artist
• Artist Formerly Known as Prince
• The Artist Formerly Known As Prince
• The Artist formerly known as Prince
• the Artist Formerly Known as Prince
• The Artist Formerly Known as Prince (PKA)
37
©2015 All rights reserved.
• The magazine was once a week
• The website published new
articles several times a day
• Plus: Over 50,000 past articles!
• How could we better use all
that content?
38
©2015 All rights reserved.
If you like James Bond, we wanted it to be easy for you to
discover everything we had.
Cover Story
Interview
Photo Gallery
Etc.
39
Entertainment Weekly
Journalism
IMDb-like
Information
40
41
©2015 All rights reserved.
We put our controlled vocabulary into categories, to make them more
distinct and meaningful.
For example:
• Book > Product > Harry Potter and the Goblet of Fire
• Movie > Product > Harry Potter and the Goblet of Fire
• Person > Individual > Daniel Radcliffe
• Person > Individual > J.K. Rowling
42
Capsule
Move
Review
Preview
Move Review
DVD Review
43
• Relationships
defined for each
media type
• Managed
separately from
the article content
• The full set of
metadata was
available to all
articles
44
©2015 All rights reserved.
• Standard relationships
• For example, for Movie:
- Lead Performers
- Director
- Writer
- Release Date
- EW Grade
- Etc.
• Select a related category for
each relationship, as applicable
• Some allow multiple values
45
• Authors just
selected the
primary category
• Related metadata
pulled in
automatically
• Updates appeared
on all articles
*Metadata categories and
relationships were managed
by a dedicated data librarian
46
47
©2015 All rights reserved.
• “Best Results” linked directly to
an aggregated page based on
the category.
• For example:
- “Cats & Dogs” vs. “The Truth
About Cats & Dogs”
- The Green Mile (Movie) vs. The
Green Mile (Book)
49
• Wal-mart sold gallon jars of
Vlasic pickles for $2.97.
• A popular item – priced so low
it nearly put Vlasic out of
business.
• By achieving their goals, they
put themselves in a position
they might not survive.
See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
50
©2015 All rights reserved.
• We wanted people to
discover older content, and
they did!
• By 2006, we had 16 years of
magazine and web content.
• Other Time Inc. publications
were interested in using our
categorization system, too.
51
Not well-suited for our expensive
and frequent database calls.
52
Our webservers were optimized to
serve up the latest “issue” of content.
40% of Time Inc.’s database calls,
only 25% of the total traffic
53
A 2007 redesign removed the “third column” entirely.
54
©2015 All rights reserved.
The creator of Freebase (a semi-semantic UGC site for structured
content, now read-only) said EW.com was way ahead of its time.
METADATAWARRIOR
The making of a
57
Who needs a database?
58
“The hardest part of
[recording] history is to be
there when it happens.”
Photo by Rachel Lovinger
59
60
• An informal post on August 4th
• Notification sent out September 30th
• Shut down October 31st
61
“What happened to my web page on my husband, Bob Champine,
that took me many years to put together on his career and which
meant a lot to me and to the aviation community. I noticed with 9.0
I lost the left margin and the picture of him exiting the X-1. I need to
restore it to the internet as it is history. Please tell me what to do. I
will be glad to retype it, I just don’t want it lost to the world. I need
help. Gloria Champine”
62
Illustration from “Fire in the Library,” MIT Technology Review
63
“Archive Team is a loose collective of rogue archivists,
programmers, writers and loudmouths dedicated to
saving our digital heritage. Since 2009 this variant
force of nature has caught wind of shutdowns,
shutoffs, mergers, and plain old deletions - and done
our best to save the history before it's lost forever.”
64
65
66
67
68
69
70
71
72
• In 6 months Archive Team saved 900 Gb
• Estimated 4-5 Tb total
• Other people saved additional pages,
but probably ¼ is gone forever
• For many people, Geocities was their
first web presence
73
74
75
76
Those screenshots were automatically generated from
Geocities sites rescued by Archive Team in 2009
See more at One Terabyte of Kilobyte Age Photo Op:
http://oneterabyteofkilobyteage.tumblr.com/
77
Due to lack of metadata:
• The rescued data was less useful
• Really bulky files
• Case-sensitive filenames difficult to access and read
• Not in a web-ready format (WARC)
• The process was less efficient and more error prone
• Poor tracking of completed activity
• Lots of duplication of data
• Took way too long (6 months vs. 3 days)
• Could have gotten all the data in a month (estimated)
78
79
©2015 All rights reserved.
Mission:
The Internet Archive’s purposes include offering permanent access
for researchers, historians, scholars, people with disabilities, and the
general public to historical collections that exist in digital format.
Photo by Ulf Benjaminsson
80
81
82
83
Save the history before it's lost
forever
Offer permanent access to
historical collections that exist in
digital format
84
©2015 All rights reserved.
Internet Archive contains: web pages, texts, videos, audio files,
software, and images. (Plus concerts and collections)
• Media Type makes it Readable or Playable
• Emulator (for software) makes it Executable
• Subject Keywords makes it Findable
86
©2015 All rights reserved.
• Is it Accurate?
• Is it Credible?
• What is the Source? (machines or people)
• It’s a lot of Effort. Do we have enough people and time?
88
©2015 All rights reserved.
Additional processing takes place, depending on the type
89
• Description and keywords are required, but open fields
• Other metadata is optional
90
91
• Metadata attributes
determined by the
community
92
©2015 All rights reserved.
• For user-generated content, it’s just easier for people not to.
• Internet Archive will never have enough people on staff to do it
properly.
93
Crowdsource manual creation of metadata
Photo by Pascal
94
• Small a pool of volunteers, and
their drive didn’t last long
• Tools didn’t provide immediate
feedback/satisfaction. They had
to email their inputs and wait.
Photo by psyberartist
95
• 10 most common words + 10
most common 2-word phrases
• Applied to 200,000 items
• Much more scalable
• Heavily machine assisted: a
person can validate data and
create collections
Photo by James St. John
96
97
“Controversial, but roughly as
good as a bored intern.”
98
Topics:
switch, atari,
antenna, game,
cable, terminals,
console, television,
video, program,
power supply,
console unit, video
computer, game
program, computer
system, atari game,
power switch,
switch box, atari
video, screw
terminals
99
Having the stuff is vital, the
most important thing. But
it’s also vital to have a
system by which these
things are described.
“If a person can’t get the
information they need, then
we’re failing.”
Photo by Rachel Lovinger
101
• Jason had converted to a
metadata advocate
But I realized that…
• Content strategists who care
about the long game should
think like historians,
archivists and futurists, too.
NATURALIS BIODIVERSITY
CENTER
Metadata from the past
103
• Dutch leader in academic research and education on
biodiversity and taxonomy.
• Has a collection of 37 million natural history objects.
104
Describe, understand and explore biodiversity for human
wellbeing and the future of our planet.
They do this with:
• Accessible collections
• Contributions to global
scientific research
• Awe of natural history
• Openly shared knowledge
105
• From 2010 to June 2015
• 250 staff members & 450 volunteers
• Digitizing 7 million objects in detail
• Adding metadata for the other 30 million objects
106
• Information is
more easily
discovered,
studied, and used.
• Scientists
worldwide can
access it directly
online, without
assistance.
• Some of this data
has never been
available in digital
form before.
107
• Scientific name
• Where it was found
• When it was found
• Who found it
“Objects [in the collection] have no scientific value
without this information.” - Suzanne de Jong-Kole
108
109
Employees enter data, verbatim, into the collection registration system.
110
This allows them to retrieve the physical specimen if requested.
111
• Vele Handen = Many Hands
• People helped transcribe
hand written labels
• In 9 months, people did
200,000, of which about half
were usable.
112
The person who collected the specimen wrote the metadata on the label.
This could be a professional researcher, or a non-professional enthusiast.
113
Darwin’s Finches
114
The oldest is this Spanish
pepper from 1550!
115
When they wrote this metadata, they had no idea that nearly
half a millennium later people would be “digitizing” it.
116
©2015 All rights reserved.
The ‘love note’ is when
you behave selflessly for
a partner – or customer –
that doesn’t exist yet.
A drawing Jason drew in my notebook in high
school, 20+ years before we ever dated.
Rachel Lovinger @rlovinger
Image via Bond

Más contenido relacionado

La actualidad más candente

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Data Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and SparkData Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and SparkAnant Corporation
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
Big Data Fabric Capability Maturity Model
Big Data Fabric Capability Maturity ModelBig Data Fabric Capability Maturity Model
Big Data Fabric Capability Maturity ModelRoss Collins
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
الفهرسة الاجتماعية والفهارس الهوائية
الفهرسة الاجتماعية والفهارس الهوائيةالفهرسة الاجتماعية والفهارس الهوائية
الفهرسة الاجتماعية والفهارس الهوائيةProf. Sherif Shaheen
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 

La actualidad más candente (20)

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Data Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and SparkData Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and Spark
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Introducation to metadata
Introducation to metadataIntroducation to metadata
Introducation to metadata
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Big Data Fabric Capability Maturity Model
Big Data Fabric Capability Maturity ModelBig Data Fabric Capability Maturity Model
Big Data Fabric Capability Maturity Model
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
Dublin Core Intro
Dublin Core IntroDublin Core Intro
Dublin Core Intro
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
الفهرسة الاجتماعية والفهارس الهوائية
الفهرسة الاجتماعية والفهارس الهوائيةالفهرسة الاجتماعية والفهارس الهوائية
الفهرسة الاجتماعية والفهارس الهوائية
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 

Destacado

10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content Strategist10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content StrategistRachel Lovinger
 
What is Metadata?
What is Metadata?What is Metadata?
What is Metadata?Adgistics
 
Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)robin fay
 
Content Modelling Workshop Preview
Content Modelling Workshop PreviewContent Modelling Workshop Preview
Content Modelling Workshop PreviewRachel Lovinger
 
Content Auditing: Unearthing the Substance of Your Brand
Content Auditing: Unearthing the Substance of Your BrandContent Auditing: Unearthing the Substance of Your Brand
Content Auditing: Unearthing the Substance of Your BrandRachel Lovinger
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouseSiddique Ibrahim
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology RegistriesMarcia Zeng
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCjonphipps
 
Empowering Your Audience Ambassadors with Semantic Publishing
Empowering Your Audience Ambassadors with Semantic Publishing Empowering Your Audience Ambassadors with Semantic Publishing
Empowering Your Audience Ambassadors with Semantic Publishing Rachel Lovinger
 
Content in the Age of Promiscuous Reuse
Content in the Age of Promiscuous ReuseContent in the Age of Promiscuous Reuse
Content in the Age of Promiscuous ReuseRachel Lovinger
 
Making of The DEFCON Documentary
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON DocumentaryRachel Lovinger
 
Journey Towards Datameaningfulness
Journey Towards DatameaningfulnessJourney Towards Datameaningfulness
Journey Towards DatameaningfulnessRachel Lovinger
 
BBC Digital Design Framework
BBC Digital Design FrameworkBBC Digital Design Framework
BBC Digital Design Frameworkbbcinternetblog
 

Destacado (20)

Does metadata matter?
Does metadata matter?Does metadata matter?
Does metadata matter?
 
10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content Strategist10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content Strategist
 
What is Metadata?
What is Metadata?What is Metadata?
What is Metadata?
 
Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)
 
Content Modelling Workshop Preview
Content Modelling Workshop PreviewContent Modelling Workshop Preview
Content Modelling Workshop Preview
 
Content Auditing: Unearthing the Substance of Your Brand
Content Auditing: Unearthing the Substance of Your BrandContent Auditing: Unearthing the Substance of Your Brand
Content Auditing: Unearthing the Substance of Your Brand
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 
About Scanning and Metadata Standards - NEMO 2010
About Scanning and Metadata Standards - NEMO 2010About Scanning and Metadata Standards - NEMO 2010
About Scanning and Metadata Standards - NEMO 2010
 
"Love notes to the future": research data, metadata and long term re-use
"Love notes to the future": research data, metadata and long term re-use"Love notes to the future": research data, metadata and long term re-use
"Love notes to the future": research data, metadata and long term re-use
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYC
 
Empowering Your Audience Ambassadors with Semantic Publishing
Empowering Your Audience Ambassadors with Semantic Publishing Empowering Your Audience Ambassadors with Semantic Publishing
Empowering Your Audience Ambassadors with Semantic Publishing
 
Content in the Age of Promiscuous Reuse
Content in the Age of Promiscuous ReuseContent in the Age of Promiscuous Reuse
Content in the Age of Promiscuous Reuse
 
Making of The DEFCON Documentary
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON Documentary
 
Journey Towards Datameaningfulness
Journey Towards DatameaningfulnessJourney Towards Datameaningfulness
Journey Towards Datameaningfulness
 
BBC Design Research Framework
BBC Design Research FrameworkBBC Design Research Framework
BBC Design Research Framework
 
BBC Digital Design Framework
BBC Digital Design FrameworkBBC Digital Design Framework
BBC Digital Design Framework
 

Similar a Metadata is a Love Note to the Future

Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)ux singapore
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Dr. Starr Hoffman
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 
How To Create Content
How To Create ContentHow To Create Content
How To Create ContentAmy Vernon
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...NetSquared Vancouver
 
IWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 2004: Socrates Building an intranet for the UK Research CouncilsIWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 2004: Socrates Building an intranet for the UK Research CouncilsIWMW
 
Webinar - The Changing Landscape of Library Privacy - 2016-06-15
Webinar - The Changing Landscape of Library Privacy - 2016-06-15Webinar - The Changing Landscape of Library Privacy - 2016-06-15
Webinar - The Changing Landscape of Library Privacy - 2016-06-15TechSoup
 
The Digital 4 Ps of Marketing Campaigns Dave Drodge
The Digital 4 Ps of Marketing Campaigns Dave DrodgeThe Digital 4 Ps of Marketing Campaigns Dave Drodge
The Digital 4 Ps of Marketing Campaigns Dave DrodgeDavid Drodge
 
PTTP09 London Film Fest Workshop
PTTP09 London Film Fest WorkshopPTTP09 London Film Fest Workshop
PTTP09 London Film Fest WorkshopBrian Newman
 
Building Thought Leadership through Content Curation
Building Thought Leadership through Content CurationBuilding Thought Leadership through Content Curation
Building Thought Leadership through Content CurationCorinne Weisgerber
 
Online Trends August 2009
Online Trends August 2009Online Trends August 2009
Online Trends August 2009Bjorn Elmberg
 

Similar a Metadata is a Love Note to the Future (20)

Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Selected Thoughts on Modern Discovery and Access
Selected Thoughts on Modern Discovery and AccessSelected Thoughts on Modern Discovery and Access
Selected Thoughts on Modern Discovery and Access
 
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Groeling, Tim: NewsScape: Preserving TV News
Groeling, Tim: NewsScape: Preserving TV NewsGroeling, Tim: NewsScape: Preserving TV News
Groeling, Tim: NewsScape: Preserving TV News
 
Creating content
Creating contentCreating content
Creating content
 
How To Create Content
How To Create ContentHow To Create Content
How To Create Content
 
Information symposium
Information symposiumInformation symposium
Information symposium
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
 
IWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 2004: Socrates Building an intranet for the UK Research CouncilsIWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 2004: Socrates Building an intranet for the UK Research Councils
 
Webinar - The Changing Landscape of Library Privacy - 2016-06-15
Webinar - The Changing Landscape of Library Privacy - 2016-06-15Webinar - The Changing Landscape of Library Privacy - 2016-06-15
Webinar - The Changing Landscape of Library Privacy - 2016-06-15
 
Sx sw 2012
Sx sw 2012Sx sw 2012
Sx sw 2012
 
The Digital 4 Ps of Marketing Campaigns Dave Drodge
The Digital 4 Ps of Marketing Campaigns Dave DrodgeThe Digital 4 Ps of Marketing Campaigns Dave Drodge
The Digital 4 Ps of Marketing Campaigns Dave Drodge
 
PTTP09 London Film Fest Workshop
PTTP09 London Film Fest WorkshopPTTP09 London Film Fest Workshop
PTTP09 London Film Fest Workshop
 
Building Thought Leadership through Content Curation
Building Thought Leadership through Content CurationBuilding Thought Leadership through Content Curation
Building Thought Leadership through Content Curation
 
Online Trends August 2009
Online Trends August 2009Online Trends August 2009
Online Trends August 2009
 

Más de Rachel Lovinger

Content Strategy as a Methodology
Content Strategy as a MethodologyContent Strategy as a Methodology
Content Strategy as a MethodologyRachel Lovinger
 
Making of The DEFCON Documentary
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON DocumentaryRachel Lovinger
 
Content Strategy: Why Now?
Content Strategy: Why Now?Content Strategy: Why Now?
Content Strategy: Why Now?Rachel Lovinger
 
Make Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UKMake Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UKRachel Lovinger
 
Make Your Content Nimble - Confab
Make Your Content Nimble - ConfabMake Your Content Nimble - Confab
Make Your Content Nimble - ConfabRachel Lovinger
 
Semantics in Publishing & Media
Semantics in Publishing & MediaSemantics in Publishing & Media
Semantics in Publishing & MediaRachel Lovinger
 
STC Summit 2010: Semantic Web and Content Strategy
STC Summit 2010: Semantic Web and Content StrategySTC Summit 2010: Semantic Web and Content Strategy
STC Summit 2010: Semantic Web and Content StrategyRachel Lovinger
 
Semantic Web and Content Strategy
Semantic Web and Content StrategySemantic Web and Content Strategy
Semantic Web and Content StrategyRachel Lovinger
 
The Rise and Fall of TOPICS
The Rise and Fall of TOPICSThe Rise and Fall of TOPICS
The Rise and Fall of TOPICSRachel Lovinger
 
Representing Taxonomies: What am I looking at here?
Representing Taxonomies: What am I looking at here?Representing Taxonomies: What am I looking at here?
Representing Taxonomies: What am I looking at here?Rachel Lovinger
 
Metadata Strategies And Tools
Metadata Strategies And ToolsMetadata Strategies And Tools
Metadata Strategies And ToolsRachel Lovinger
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsRachel Lovinger
 

Más de Rachel Lovinger (16)

Content Strategy as a Methodology
Content Strategy as a MethodologyContent Strategy as a Methodology
Content Strategy as a Methodology
 
Making of The DEFCON Documentary
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON Documentary
 
Orchestrated Content
Orchestrated ContentOrchestrated Content
Orchestrated Content
 
Content Strategy: Why Now?
Content Strategy: Why Now?Content Strategy: Why Now?
Content Strategy: Why Now?
 
Make Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UKMake Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UK
 
Make Your Content Nimble - Confab
Make Your Content Nimble - ConfabMake Your Content Nimble - Confab
Make Your Content Nimble - Confab
 
Semantics in Publishing & Media
Semantics in Publishing & MediaSemantics in Publishing & Media
Semantics in Publishing & Media
 
Nimble Report
Nimble ReportNimble Report
Nimble Report
 
STC Summit 2010: Semantic Web and Content Strategy
STC Summit 2010: Semantic Web and Content StrategySTC Summit 2010: Semantic Web and Content Strategy
STC Summit 2010: Semantic Web and Content Strategy
 
Semantic Web and Content Strategy
Semantic Web and Content StrategySemantic Web and Content Strategy
Semantic Web and Content Strategy
 
The Rise and Fall of TOPICS
The Rise and Fall of TOPICSThe Rise and Fall of TOPICS
The Rise and Fall of TOPICS
 
Content Gone Wild!
Content Gone Wild!Content Gone Wild!
Content Gone Wild!
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Representing Taxonomies: What am I looking at here?
Representing Taxonomies: What am I looking at here?Representing Taxonomies: What am I looking at here?
Representing Taxonomies: What am I looking at here?
 
Metadata Strategies And Tools
Metadata Strategies And ToolsMetadata Strategies And Tools
Metadata Strategies And Tools
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building Tools
 

Último

TAM Sports IPL 17 Advertising Report- M01 - M23
TAM Sports IPL 17 Advertising Report- M01 - M23TAM Sports IPL 17 Advertising Report- M01 - M23
TAM Sports IPL 17 Advertising Report- M01 - M23Social Samosa
 
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content Club
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content ClubSEO Forecasting by Nitin Manchanda at Berlin SEO & Content Club
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content ClubNitin Manchanda
 
Introduction to marketing Management Notes
Introduction to marketing Management NotesIntroduction to marketing Management Notes
Introduction to marketing Management NotesKiranTiwari42
 
social media optimization complete indroduction
social media optimization complete indroductionsocial media optimization complete indroduction
social media optimization complete indroductioninfoshraddha747
 
Content Marketing: How To Find The True Value Of Your Marketing Funnel
Content Marketing: How To Find The True Value Of Your Marketing FunnelContent Marketing: How To Find The True Value Of Your Marketing Funnel
Content Marketing: How To Find The True Value Of Your Marketing FunnelSearch Engine Journal
 
Digital Marketing complete introduction.
Digital Marketing complete introduction.Digital Marketing complete introduction.
Digital Marketing complete introduction.Kashish Bindra
 
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...Marketing BRANDING
 
History of JWT by The Knowledge Center.pdf
History of JWT by The Knowledge Center.pdfHistory of JWT by The Knowledge Center.pdf
History of JWT by The Knowledge Center.pdfwilliam charnock
 
Understand the Key differences between SMO and SMM
Understand the Key differences between SMO and SMMUnderstand the Key differences between SMO and SMM
Understand the Key differences between SMO and SMMsearchextensionin
 
Master the art of Social Selling to increase sales by fostering relationships...
Master the art of Social Selling to increase sales by fostering relationships...Master the art of Social Selling to increase sales by fostering relationships...
Master the art of Social Selling to increase sales by fostering relationships...VereigenMedia1
 
15 Tactics to Scale Your Trade Show Marketing Strategy
15 Tactics to Scale Your Trade Show Marketing Strategy15 Tactics to Scale Your Trade Show Marketing Strategy
15 Tactics to Scale Your Trade Show Marketing StrategyBlue Atlas Marketing
 
top marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar Ctop marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar CManojkumar C
 
A Comprehensive Guide to Technical SEO | Banyanbrain
A Comprehensive Guide to Technical SEO | BanyanbrainA Comprehensive Guide to Technical SEO | Banyanbrain
A Comprehensive Guide to Technical SEO | BanyanbrainBanyanbrain
 
Best digital marketing e-book form bignners
Best digital marketing e-book form bignnersBest digital marketing e-book form bignners
Best digital marketing e-book form bignnersmuntasibkhan58
 
How To Become a Master In Search Engine Optimization (SEO)
How To Become a Master In Search Engine Optimization (SEO)How To Become a Master In Search Engine Optimization (SEO)
How To Become a Master In Search Engine Optimization (SEO)Blessings Ngalande
 
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...Associazione Digital Days
 
Paul Russell Confidential Resume for Fahlo.pdf
Paul Russell Confidential Resume for Fahlo.pdfPaul Russell Confidential Resume for Fahlo.pdf
Paul Russell Confidential Resume for Fahlo.pdfpaul8402
 
2024 WTF - what's working in mobile user acquisition
2024 WTF - what's working in mobile user acquisition2024 WTF - what's working in mobile user acquisition
2024 WTF - what's working in mobile user acquisitionJohn Koetsier
 
Exploring the Impact of Social Media Trends on Society.pdf
Exploring the Impact of Social Media Trends on Society.pdfExploring the Impact of Social Media Trends on Society.pdf
Exploring the Impact of Social Media Trends on Society.pdfolivalibereo
 
SEO and Digital PR - How to Connect Your Teams to Maximise Success
SEO and Digital PR - How to Connect Your Teams to Maximise SuccessSEO and Digital PR - How to Connect Your Teams to Maximise Success
SEO and Digital PR - How to Connect Your Teams to Maximise SuccessLiv Day
 

Último (20)

TAM Sports IPL 17 Advertising Report- M01 - M23
TAM Sports IPL 17 Advertising Report- M01 - M23TAM Sports IPL 17 Advertising Report- M01 - M23
TAM Sports IPL 17 Advertising Report- M01 - M23
 
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content Club
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content ClubSEO Forecasting by Nitin Manchanda at Berlin SEO & Content Club
SEO Forecasting by Nitin Manchanda at Berlin SEO & Content Club
 
Introduction to marketing Management Notes
Introduction to marketing Management NotesIntroduction to marketing Management Notes
Introduction to marketing Management Notes
 
social media optimization complete indroduction
social media optimization complete indroductionsocial media optimization complete indroduction
social media optimization complete indroduction
 
Content Marketing: How To Find The True Value Of Your Marketing Funnel
Content Marketing: How To Find The True Value Of Your Marketing FunnelContent Marketing: How To Find The True Value Of Your Marketing Funnel
Content Marketing: How To Find The True Value Of Your Marketing Funnel
 
Digital Marketing complete introduction.
Digital Marketing complete introduction.Digital Marketing complete introduction.
Digital Marketing complete introduction.
 
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...
Agencia Marketing Branding Examen Fundamentals Digital Marketing Google Abril...
 
History of JWT by The Knowledge Center.pdf
History of JWT by The Knowledge Center.pdfHistory of JWT by The Knowledge Center.pdf
History of JWT by The Knowledge Center.pdf
 
Understand the Key differences between SMO and SMM
Understand the Key differences between SMO and SMMUnderstand the Key differences between SMO and SMM
Understand the Key differences between SMO and SMM
 
Master the art of Social Selling to increase sales by fostering relationships...
Master the art of Social Selling to increase sales by fostering relationships...Master the art of Social Selling to increase sales by fostering relationships...
Master the art of Social Selling to increase sales by fostering relationships...
 
15 Tactics to Scale Your Trade Show Marketing Strategy
15 Tactics to Scale Your Trade Show Marketing Strategy15 Tactics to Scale Your Trade Show Marketing Strategy
15 Tactics to Scale Your Trade Show Marketing Strategy
 
top marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar Ctop marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar C
 
A Comprehensive Guide to Technical SEO | Banyanbrain
A Comprehensive Guide to Technical SEO | BanyanbrainA Comprehensive Guide to Technical SEO | Banyanbrain
A Comprehensive Guide to Technical SEO | Banyanbrain
 
Best digital marketing e-book form bignners
Best digital marketing e-book form bignnersBest digital marketing e-book form bignners
Best digital marketing e-book form bignners
 
How To Become a Master In Search Engine Optimization (SEO)
How To Become a Master In Search Engine Optimization (SEO)How To Become a Master In Search Engine Optimization (SEO)
How To Become a Master In Search Engine Optimization (SEO)
 
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...
Francesco d’Angela, Service Designer di @HintoGroup- “Oltre la Frontiera Crea...
 
Paul Russell Confidential Resume for Fahlo.pdf
Paul Russell Confidential Resume for Fahlo.pdfPaul Russell Confidential Resume for Fahlo.pdf
Paul Russell Confidential Resume for Fahlo.pdf
 
2024 WTF - what's working in mobile user acquisition
2024 WTF - what's working in mobile user acquisition2024 WTF - what's working in mobile user acquisition
2024 WTF - what's working in mobile user acquisition
 
Exploring the Impact of Social Media Trends on Society.pdf
Exploring the Impact of Social Media Trends on Society.pdfExploring the Impact of Social Media Trends on Society.pdf
Exploring the Impact of Social Media Trends on Society.pdf
 
SEO and Digital PR - How to Connect Your Teams to Maximise Success
SEO and Digital PR - How to Connect Your Teams to Maximise SuccessSEO and Digital PR - How to Connect Your Teams to Maximise Success
SEO and Digital PR - How to Connect Your Teams to Maximise Success
 

Metadata is a Love Note to the Future

  • 1. Rachel Lovinger @rlovinger Confab, 22 May, 2015 Image via Bond
  • 2. 2 ©2015 All rights reserved. • Experience Director, Content Strategy; Razorfish New York • Co-editor of scatter/gather, a content strategy blog: http://scattergather.razorfish.com • Author of Nimble: A Razorfish Report on Publishing in the Digital Age (June 2010): http://nimble.razorfish.com • Twitter: @rlovinger
  • 3.
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11 ©2015 All rights reserved. is HARDCORE
  • 12. 12 ©2015 All rights reserved. 2006 2009 2008 2012 2011 2010
  • 13. 13 ©2015 All rights reserved. Metadata = Context Context enables Connections How does one convey that in a concise and powerful way?
  • 14. 14 Photo by Jesse Chan-Norris
  • 15. Metadata Is A Love note To the Future
  • 16. 16 Tweet and photo by Erin Kissane, Tumblr by Austin Kleon 429 notes 82 retweets
  • 17. 17 Photo and shirt by Sarah
  • 18. 18 Photo by Rachel Lovinger
  • 19. 19 Content Strategy for Mobile by Karen McGrane
  • 20.
  • 21. 21 • Nearly 60,000 files archived • Mostly from 1980-1995 • Collected and curated since 1998 • Almost no metadata Textfiles.com
  • 22. 22 Who needs a database?
  • 23. 23 Metadata Skeptic transformed into… Metadata Warrior Photos by Jason Scott and Rachel Lovinger
  • 24. 24 Photo by Rachel Lovinger
  • 25. 25 • Me? Photo by Rachel Lovinger
  • 27. 27 ©2015 All rights reserved. ~3 years online content ~10 years magazine content
  • 28. 28 ©2015 All rights reserved. Imported from text files to CMS
  • 29. 29 ©2015 All rights reserved. Semi-structured information allowed us to map the files to content types and site sections, and add some metadata (author, published date, keywords, etc.) 10 years x 50 issues per year x 100 files per issue (approx.) 50,000 estimated articles
  • 30. 30 ©2015 All rights reserved. Once in the CMS, we could add photos, links, formatting, etc.
  • 31. 31 ©2015 All rights reserved. For the content already in the CMS, keywords had been manually typed in by authors • 6790 “different” keywords • Removed 12% during clean up • Typos • Redundant • Not Useful
  • 32.
  • 33. 33 ©2015 All rights reserved. • Star Wars: Episode I -- The Phantom Menace • Episode 1 • Episode I • Phantom Menace • Star Wars Episode I The Phantom Menace • Star Wars Episode I: The Phantom Menace • Star Wars prequel • Star Wars: Episode 1 -- The Phantom Menace • Star Wars: Episode i -- the Phantom Menace • Star Wars: Episode I: The Phantom Menace • Star Wars: Episode I--The Phantom Menace • Star Wars: Episode I--The Phantom Menance • Star Wars: Episode One -- The Phantom Menace • Star Wars: The Phantom Menace • Star Wars: The Phantom Menace -- Episode I • The Phantom Menace • The Phanton Menace
  • 34. 34 ©2015 All rights reserved. • TAFKAP?
  • 35. 35 ©2015 All rights reserved. • TAFKAP? • The Artist • Artist Formerly Known as Prince • The Artist Formerly Known As Prince • The Artist formerly known as Prince • the Artist Formerly Known as Prince • The Artist Formerly Known as Prince (PKA)
  • 36.
  • 37. 37 ©2015 All rights reserved. • The magazine was once a week • The website published new articles several times a day • Plus: Over 50,000 past articles! • How could we better use all that content?
  • 38. 38 ©2015 All rights reserved. If you like James Bond, we wanted it to be easy for you to discover everything we had. Cover Story Interview Photo Gallery Etc.
  • 40. 40
  • 41. 41 ©2015 All rights reserved. We put our controlled vocabulary into categories, to make them more distinct and meaningful. For example: • Book > Product > Harry Potter and the Goblet of Fire • Movie > Product > Harry Potter and the Goblet of Fire • Person > Individual > Daniel Radcliffe • Person > Individual > J.K. Rowling
  • 43. 43 • Relationships defined for each media type • Managed separately from the article content • The full set of metadata was available to all articles
  • 44. 44 ©2015 All rights reserved. • Standard relationships • For example, for Movie: - Lead Performers - Director - Writer - Release Date - EW Grade - Etc. • Select a related category for each relationship, as applicable • Some allow multiple values
  • 45. 45 • Authors just selected the primary category • Related metadata pulled in automatically • Updates appeared on all articles *Metadata categories and relationships were managed by a dedicated data librarian
  • 46. 46
  • 47. 47 ©2015 All rights reserved. • “Best Results” linked directly to an aggregated page based on the category. • For example: - “Cats & Dogs” vs. “The Truth About Cats & Dogs” - The Green Mile (Movie) vs. The Green Mile (Book)
  • 48.
  • 49. 49 • Wal-mart sold gallon jars of Vlasic pickles for $2.97. • A popular item – priced so low it nearly put Vlasic out of business. • By achieving their goals, they put themselves in a position they might not survive. See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
  • 50. 50 ©2015 All rights reserved. • We wanted people to discover older content, and they did! • By 2006, we had 16 years of magazine and web content. • Other Time Inc. publications were interested in using our categorization system, too.
  • 51. 51 Not well-suited for our expensive and frequent database calls.
  • 52. 52 Our webservers were optimized to serve up the latest “issue” of content. 40% of Time Inc.’s database calls, only 25% of the total traffic
  • 53. 53 A 2007 redesign removed the “third column” entirely.
  • 54. 54 ©2015 All rights reserved. The creator of Freebase (a semi-semantic UGC site for structured content, now read-only) said EW.com was way ahead of its time.
  • 55.
  • 57. 57 Who needs a database?
  • 58. 58 “The hardest part of [recording] history is to be there when it happens.” Photo by Rachel Lovinger
  • 59. 59
  • 60. 60 • An informal post on August 4th • Notification sent out September 30th • Shut down October 31st
  • 61. 61 “What happened to my web page on my husband, Bob Champine, that took me many years to put together on his career and which meant a lot to me and to the aviation community. I noticed with 9.0 I lost the left margin and the picture of him exiting the X-1. I need to restore it to the internet as it is history. Please tell me what to do. I will be glad to retype it, I just don’t want it lost to the world. I need help. Gloria Champine”
  • 62. 62 Illustration from “Fire in the Library,” MIT Technology Review
  • 63. 63 “Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.”
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70
  • 71. 71
  • 72. 72 • In 6 months Archive Team saved 900 Gb • Estimated 4-5 Tb total • Other people saved additional pages, but probably ¼ is gone forever • For many people, Geocities was their first web presence
  • 73. 73
  • 74. 74
  • 75. 75
  • 76. 76 Those screenshots were automatically generated from Geocities sites rescued by Archive Team in 2009 See more at One Terabyte of Kilobyte Age Photo Op: http://oneterabyteofkilobyteage.tumblr.com/
  • 77. 77 Due to lack of metadata: • The rescued data was less useful • Really bulky files • Case-sensitive filenames difficult to access and read • Not in a web-ready format (WARC) • The process was less efficient and more error prone • Poor tracking of completed activity • Lots of duplication of data • Took way too long (6 months vs. 3 days) • Could have gotten all the data in a month (estimated)
  • 78. 78
  • 79. 79 ©2015 All rights reserved. Mission: The Internet Archive’s purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Photo by Ulf Benjaminsson
  • 80. 80
  • 81. 81
  • 82. 82
  • 83. 83 Save the history before it's lost forever Offer permanent access to historical collections that exist in digital format
  • 84. 84 ©2015 All rights reserved. Internet Archive contains: web pages, texts, videos, audio files, software, and images. (Plus concerts and collections) • Media Type makes it Readable or Playable • Emulator (for software) makes it Executable • Subject Keywords makes it Findable
  • 85.
  • 86. 86 ©2015 All rights reserved. • Is it Accurate? • Is it Credible? • What is the Source? (machines or people) • It’s a lot of Effort. Do we have enough people and time?
  • 87.
  • 88. 88 ©2015 All rights reserved. Additional processing takes place, depending on the type
  • 89. 89 • Description and keywords are required, but open fields • Other metadata is optional
  • 90. 90
  • 92. 92 ©2015 All rights reserved. • For user-generated content, it’s just easier for people not to. • Internet Archive will never have enough people on staff to do it properly.
  • 93. 93 Crowdsource manual creation of metadata Photo by Pascal
  • 94. 94 • Small a pool of volunteers, and their drive didn’t last long • Tools didn’t provide immediate feedback/satisfaction. They had to email their inputs and wait. Photo by psyberartist
  • 95. 95 • 10 most common words + 10 most common 2-word phrases • Applied to 200,000 items • Much more scalable • Heavily machine assisted: a person can validate data and create collections Photo by James St. John
  • 96. 96
  • 97. 97 “Controversial, but roughly as good as a bored intern.”
  • 98. 98 Topics: switch, atari, antenna, game, cable, terminals, console, television, video, program, power supply, console unit, video computer, game program, computer system, atari game, power switch, switch box, atari video, screw terminals
  • 99. 99 Having the stuff is vital, the most important thing. But it’s also vital to have a system by which these things are described. “If a person can’t get the information they need, then we’re failing.” Photo by Rachel Lovinger
  • 100.
  • 101. 101 • Jason had converted to a metadata advocate But I realized that… • Content strategists who care about the long game should think like historians, archivists and futurists, too.
  • 103. 103 • Dutch leader in academic research and education on biodiversity and taxonomy. • Has a collection of 37 million natural history objects.
  • 104. 104 Describe, understand and explore biodiversity for human wellbeing and the future of our planet. They do this with: • Accessible collections • Contributions to global scientific research • Awe of natural history • Openly shared knowledge
  • 105. 105 • From 2010 to June 2015 • 250 staff members & 450 volunteers • Digitizing 7 million objects in detail • Adding metadata for the other 30 million objects
  • 106. 106 • Information is more easily discovered, studied, and used. • Scientists worldwide can access it directly online, without assistance. • Some of this data has never been available in digital form before.
  • 107. 107 • Scientific name • Where it was found • When it was found • Who found it “Objects [in the collection] have no scientific value without this information.” - Suzanne de Jong-Kole
  • 108. 108
  • 109. 109 Employees enter data, verbatim, into the collection registration system.
  • 110. 110 This allows them to retrieve the physical specimen if requested.
  • 111. 111 • Vele Handen = Many Hands • People helped transcribe hand written labels • In 9 months, people did 200,000, of which about half were usable.
  • 112. 112 The person who collected the specimen wrote the metadata on the label. This could be a professional researcher, or a non-professional enthusiast.
  • 114. 114 The oldest is this Spanish pepper from 1550!
  • 115. 115 When they wrote this metadata, they had no idea that nearly half a millennium later people would be “digitizing” it.
  • 116. 116 ©2015 All rights reserved. The ‘love note’ is when you behave selflessly for a partner – or customer – that doesn’t exist yet. A drawing Jason drew in my notebook in high school, 20+ years before we ever dated.