SlideShare a Scribd company logo
1 of 60
Download to read offline
From	
  Flickr	
  by	
  Jeff	
  Golden	
  

Spooky	
  
Spreadsheets	
  

Carly	
  Strasser	
  |	
  California	
  Digital	
  Library	
  
UCSB/Bren	
  Oct	
  2013	
  
Roadmap	
  

3.  Toolbox	
  
2. Best	
  practices	
  
	
  
1.  Background	
  
	
  
Scientists	
  are	
  bad	
  at	
  
data	
  management.	
  
From	
  Flickr	
  by	
  robertpaulyoung	
  
Many	
  tables	
  
Embedded	
  
figures	
  
my	
  spreadsheet	
  

No	
  headings	
  
my	
  spreadsheet	
  
my	
  spreadsheet	
  
?
www.petshaming.net	
  

NO	
  

Reproducibility	
  
Transparency	
  
Reuse	
  

Didn’t	
  share	
  the	
  data	
  
Didn’t	
  document	
  the	
  data	
  (metadata)	
  
Didn’t	
  document	
  provenance/workflow	
  
Why	
  should	
  I	
  care?	
  
From	
  Flickr	
  by	
  johntrainor	
  
Because	
  
they	
  care:	
  

From	
  Flickr	
  by	
  Redden-­‐McAllister	
  
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  

Best	
  
Practices	
  

ent
data managem
From	
  Flickr	
  by	
  Mark	
  Sardella	
  

Plan	
  before	
  data	
  
collection	
  
Design	
  sample	
  naming	
  scheme	
  

From	
  Flickr	
  by	
  zebbie	
  

•  Create	
  a	
  key	
  (data	
  dictionary)	
  
•  Make	
  sure	
  names	
  are	
  unique	
  
•  Define	
  codes	
  

Planning	
  
Design	
  file	
  naming	
  scheme	
  

PhDcomics.com	
  

Planning	
  
Design	
  file	
  naming	
  scheme	
  

Planning	
  

	
  Use	
  descriptive	
  file	
  names	
  *	
  
•  Unique	
  
•  Reflect	
  contents	
  
Bad:	
  
	
  
	
  

	
  Mydata.xls	
  
	
  2001_data.csv	
  
	
  best	
  version.txt	
  

Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
Study	
  
organism	
  

Site	
  
name	
  

Year	
  

What	
  was	
  
measured	
  	
  

*Not	
  for	
  everyone	
  
From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
Design	
  file	
  organization	
  

Planning	
  

From	
  S.	
  Hampton	
  
Design	
  file	
  organization	
  
Biodiversity	
  

Lake	
  
Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
Biodiv_H20_predatorExp_2001to2003.csv	
  
…	
  
Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
Biodiv_H20_ChlAprofiles_2003.csv	
  
…	
  
	
  

Planning	
  

Consider…	
  
•  Dependencies?	
  
•  File	
  formats?	
  
•  Time	
  of	
  collection?	
  
•  Order	
  of	
  analysis?	
  

Wo r

ws !
kflo

Grassland	
  

From	
  S.	
  Hampton	
  
Design	
  your	
  spreadsheet	
  
Constrain	
  entries	
  	
  
Atomize	
  
Break	
  down	
  spreadsheets	
  

From	
  Flickr	
  by	
  Ulleskelf	
  

Planning	
  
Consider	
  a	
  database	
  

Planning	
  

A	
  relational	
  database	
  is	
  	
  
	
  A	
  set	
  of	
  tables	
  
	
  Relationships	
  among	
  the	
  tables	
  
	
  A	
  language	
  to	
  specify	
  &	
  query	
  the	
  tables	
  
	
  
A	
  RDB	
  provides	
  
	
  Scalability:	
  millions+	
  records	
  
	
  Features	
  for	
  sub-­‐setting,	
  querying,	
  sorting	
  
	
  Reduced	
  redundancy	
  &	
  entry	
  errors	
  
	
  
From	
  Mark	
  Schildhauer	
  
Consider	
  a	
  database	
  

Planning	
  

You	
  should	
  invest	
  time	
  in	
  learning	
  databases	
  if	
  	
  
	
  your	
  data	
  sets	
  are	
  large	
  or	
  complex	
  
	
  

Consider	
  investing	
  time	
  in	
  learning	
  databases	
  if	
  
	
  your	
  data	
  are	
  small	
  and	
  humble	
  
	
  you	
  ever	
  intend	
  to	
  share	
  your	
  data	
  
	
  you	
  are	
  <	
  30	
  years	
  old	
  

From	
  Mark	
  Schildhauer	
  
Planning	
  

Pick	
  a	
  data	
  repository	
  
Store	
  your	
  data	
  in	
  a	
  repository	
  
Institutional	
  archive	
  

Ask	
  a	
  librarian	
  

Discipline/specialty	
  archive	
  

	
  
	
  

	
  

Repos	
  of	
  repos:	
  

databib.org	
  
re3data.org	
  
From	
  Flickr	
  by	
  torkildr	
  
Decide	
  on	
  preservation/backup	
  

Planning	
  

What	
  software?	
  
What	
  hardware?	
  
What	
  personnel?	
  
How	
  often?	
  
Set	
  up	
  reminders!	
  
Test	
  system	
  
	
  

From	
  Flickr	
  	
  by	
  withassociates	
  

From	
  Flickr	
  by	
  sepa	
  synod	
  

From	
  Flickr	
  by	
  	
  taberandrew	
  
Write	
  a	
  data	
  
management	
  plan!	
  

Planning	
  

…document	
  that	
  
describes	
  what	
  you	
  will	
  
do	
  with	
  your	
  data	
  
throughout	
  	
  
the	
  research	
  project	
  

From	
  Flickr	
  by	
  Barbies	
  Land	
  
Planning	
  

DMP	
  components	
  
• 
• 
• 
• 
• 
• 

From	
  Flickr	
  by	
  Barbies	
  Land	
  

What	
  will	
  be	
  collected	
  
Methods	
  
Standards	
  
Metadata	
  
Sharing/access	
   have
But they all
different requirements
Long-­‐term	
  storage	
  
and express them in
different ways
dmptool.org	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  

Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
create	
  |	
  edit	
  |	
  re-­‐use	
  |	
  share	
  
Free	
  &	
  open	
  to	
  community	
  	
  

Planning	
  
During	
  Data	
  Collection	
  &	
  Entry	
  

From	
  Flickr	
  by	
  Julia	
  Manzerova	
  
Keep	
  raw	
  data	
  raw	
  
Realistically:	
  	
  
•  Archive	
  .csv	
  version	
  of	
  raw	
  data	
  
•  Make	
  a	
  “raw”	
  tab	
  in	
  working	
  data	
  file	
  
•  Do	
  all	
  work	
  on	
  other	
  tabs	
  

During	
  
collection	
  
Keep	
  raw	
  data	
  raw	
  
Ideally:	
  
•  Use	
  scripts	
  to	
  process	
  data	
  	
  
•  Save	
  them	
  with	
  data	
  
	
  
Raw	
  data	
  as	
  .csv	
  

During	
  
collection	
  

R	
  script	
  for	
  processing	
  &	
  analysis	
  
Document	
  your	
  workflow	
  

During	
  
collection	
  

Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
products	
  of	
  your	
  research	
  
	
  

Simple	
  workflow:	
  flow	
  chart	
  
Temperature	
  
data	
  
Salinity	
  	
  	
  	
  	
  	
  	
  	
  
data	
  
“Clean”	
  T	
  
&	
  S	
  data	
  

Data	
  import	
  into	
  Excel	
  

Data	
  in	
  
spread-­‐
sheet	
  

Quality	
  control	
  &	
  
data	
  cleaning	
  
Analysis:	
  mean,	
  SD	
  
Graph	
  production	
  

Summary	
  
statistics	
  
Document	
  your	
  workflow	
  

During	
  
collection	
  

Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
products	
  of	
  your	
  research	
  
	
  

Simple	
  workflow:	
  commented	
  script	
  

•  R,	
  SAS,	
  MATLAB…	
  
•  Well-­‐documented	
  code	
  is	
  
Easier	
  to	
  review	
  
Easier	
  to	
  share	
  
Easier	
  to	
  use	
  for	
  repeat	
  analysis	
  

#	
  
%	
  
$	
  
&	
  
Document	
  your	
  workflow	
  

During	
  
collection	
  

Fancy	
  schmancy	
  workflows	
  
Resulting	
  output	
  

https://kepler-­‐project.org	
  
Document	
  your	
  workflow	
  

During	
  
collection	
  

Workflows	
  enable	
  
•  Reproducibility	
  
•  Transparency	
  	
  
•  Reuse	
  
	
  

From	
  Flickr	
  by	
  merlinprincesse	
  
Constrain	
  data	
  entries	
  
•  Excel	
  lists	
  
•  Data	
  validation	
  
•  Google	
  docs	
  forms	
  	
  

Modified	
  from	
  K.	
  Vanderbilt	
  	
  

During	
  
collection	
  
Atomize	
  

During	
  
collection	
  

One	
  piece	
  of	
  information	
  per	
  cell	
  
Break	
  down	
  spreadsheets	
  
Fake	
  a	
  relational	
  database	
  

During	
  
collection	
  

	
  Create	
  parameter	
  table	
  

Create	
  a	
  site	
  table	
  

From	
  doi:10.3334/ORNLDAAC/777	
  
From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
Create	
  metadata	
  

During	
  
Why	
  are	
  you	
  
collection	
  
promoting	
  
Excel?	
  
During	
  
collection	
  

Create	
  metadata	
  
	
  	
  Metadata:	
  data	
  reporting	
  
	
  

WHO	
  created	
  the	
  data?	
  
WHAT	
  is	
  the	
  content	
  	
  
	
  of	
  the	
  data	
  set?	
  
WHEN	
  was	
  it	
  created?	
  
HOW	
  was	
  it	
  developed?	
  
WHY	
  was	
  it	
  developed?	
  

From	
  Flickr	
  by	
  	
  //ichael	
  Patric|{	
  
	
  

WHERE	
  was	
  it	
  collected?	
  
During	
  
collection	
  

Create	
  metadata	
  
Digital	
  context	
  

Scientific	
  context	
  

• 

Name	
  of	
  the	
  data	
  set	
  

• 

Scientific	
  reason	
  why	
  the	
  data	
  were	
  collected	
  

• 

The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
  set	
  

• 

What	
  data	
  were	
  collected	
  

• 

Date	
  the	
  data	
  set	
  was	
  last	
  modified	
  

• 

• 

Example	
  data	
  file	
  records	
  for	
  each	
  data	
  type	
  
file	
  

What	
  instruments	
  (including	
  model	
  &	
  serial	
  
number)	
  were	
  used	
  

• 

Environmental	
  conditions	
  during	
  collection	
  

• 

Pertinent	
  companion	
  files	
  

• 

Temporal	
  &	
  spatial	
  resolution	
  	
  

• 

List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
  

• 

Standards	
  or	
  calibrations	
  used	
  

• 

Software	
  (including	
  version	
  number)	
  used	
  to	
   Information	
  about	
  parameters	
  
prepare/read	
  	
  the	
  data	
  set	
  
•  How	
  each	
  was	
  measured	
  or	
  produced	
  
Data	
  processing	
  that	
  was	
  performed	
  
•  Units	
  of	
  measure	
  

• 

Personnel	
  &	
  stakeholders	
  

• 

Format	
  used	
  in	
  the	
  data	
  set	
  

• 

Who	
  collected	
  	
  

• 

Precision	
  &	
  accuracy	
  if	
  known	
  

• 

Who	
  to	
  contact	
  with	
  questions	
  

• 

Funders	
  

Information	
  about	
  data	
  
• 

Definitions	
  of	
  codes	
  used	
  

• 

Quality	
  assurance	
  &	
  control	
  measures	
  

• 

Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
uncertainty,	
  sampling	
  problems)	
  	
  
Create	
  metadata	
  
<

a n da rd
St

During	
  
collection	
  
What	
  is	
  

metadata?	
  

Metadata	
  standards…	
  
•  Provide	
  structure	
  to	
  describe	
  data	
  
Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Come	
  in	
  many	
  flavors	
  
	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  

•  Can	
  be	
  met	
  using	
  software	
  tools	
  
	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
	
  
	
  
During	
  
collection	
  

Back	
  up	
  daily	
  

Near	
  
Original	
  
From	
  Flickr	
  by	
  see	
  phar	
  

From	
  Flickr	
  by	
  lippo	
  

Far	
  
Remember	
  that	
  data	
  
management	
  plan?	
  

During	
  
collection	
  

Revisit	
  
Review	
  
Revise	
  

From	
  Flickr	
  by	
  Barbies	
  Land	
  
From	
  Flickr	
  by	
  purplemattfish	
  

During	
  
collection	
  

Revisit	
  
Review	
  
Revise	
  
Schedule	
  a	
  time	
  each	
  
week	
  or	
  month	
  
From	
  Flickr	
  by	
  dipster1	
  

Toolbox	
  
Write	
  a	
  DMP	
  
dmptool.org	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  

Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
create	
  |	
  edit	
  |	
  re-­‐use	
  |	
  share	
  
Free	
  &	
  open	
  to	
  community	
  	
  
Find	
  a	
  repository	
  

Where	
  
should	
  I	
  put	
  
my	
  data?	
  

databib.org	
  
Manage	
  &	
  share	
  

•  Help	
  researchers	
  manage,	
  describe,	
  
and	
  share	
  tabular	
  data	
  
•  Free	
  
•  Add-­‐in	
  for	
  Excel	
  &	
  web	
  application	
  	
  
Manage	
  &	
  share	
  

Features	
  
1. 
2. 
3. 
4. 

Best	
  practices	
  check	
  
Generate	
  metadata	
  
Get	
  identifier	
  &	
  citation	
  
Post	
  data	
  to	
  repository	
  
Create	
  metadata	
  
Create	
  metadata	
  
Clean	
  data	
  

Open	
  Refine	
  =	
  Google	
  Refine	
  
	
  

• 
• 
• 
• 

Open	
  source	
  desktop	
  application	
  	
  
Used	
  for	
  data	
  cleanup	
  and	
  transformation	
  to	
  other	
  formats	
  
Works	
  with	
  spreadsheets	
  but	
  behaves	
  like	
  a	
  database	
  
User	
  can	
  filter	
  the	
  rows	
  to	
  display	
  using	
  facets	
  that	
  define	
  
filtering	
  criteria	
  
Open	
  Refine	
  =	
  Google	
  Refine	
  
	
  

• 
• 
• 
• 

Open	
  source	
  desktop	
  application	
  	
  
Used	
  for	
  data	
  cleanup	
  and	
  transformation	
  to	
  other	
  formats	
  
Works	
  with	
  spreadsheets	
  but	
  behaves	
  like	
  a	
  database	
  
User	
  can	
  filter	
  the	
  rows	
  to	
  display	
  using	
  facets	
  that	
  define	
  
filtering	
  criteria	
  
Get	
  help	
  

Toolbox:	
  
	
  DCXL	
  blog:	
  dcxl.cdlib.org	
  
From	
  Flickr	
  by	
  twm1340	
  

Culture	
  
Shift	
  Ahead	
  
From	
  Flickr	
  by	
  cdsessums	
  

science	
  
source	
  
notebook	
  
content	
  
access	
  
data	
  
government	
  
knowledge	
  
Make	
  a	
  
resolution	
  
•  Triage	
  on	
  current	
  
projects	
  
•  Get	
  	
  advisor,	
  lab	
  mates,	
  
collaborators	
  on	
  board	
  
•  Do	
  better	
  next	
  time	
  
From	
  Flickr	
  by	
  Andy	
  Graulund	
  
Website	
  
Email	
  
Twitter	
  
Slides	
  

carlystrasser.net	
  
carlystrasser@gmail.com	
  
@carlystrasser	
  	
  
slideshare.net/carlystrasser	
  

More Related Content

What's hot

Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...University of Edinburgh
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008bosc_2008
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Ibi accessing and preparing data
Ibi accessing and preparing dataIbi accessing and preparing data
Ibi accessing and preparing dataClif Kranish
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryMark Grover
 
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Cyndy Parr
 
Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Martin Magdinier
 

What's hot (20)

Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Ibi accessing and preparing data
Ibi accessing and preparing dataIbi accessing and preparing data
Ibi accessing and preparing data
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
 
Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 

Similar to Spooky Spreadsheets: Best Practices for Data Management

Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...CIARD Movement
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Graph databases and the #panamapapers
Graph databases and the #panamapapersGraph databases and the #panamapapers
Graph databases and the #panamapapersdarthvader42
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsSEAD
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data scienceLoïc Lejoly
 
It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?Patricia Hswe
 

Similar to Spooky Spreadsheets: Best Practices for Data Management (20)

Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Graph databases and the #panamapapers
Graph databases and the #panamapapersGraph databases and the #panamapapers
Graph databases and the #panamapapers
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and Tools
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?
 

More from Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14Carly Strasser
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCarly Strasser
 
Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Carly Strasser
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCarly Strasser
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 

More from Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open Science
 
Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 

Spooky Spreadsheets: Best Practices for Data Management

  • 1. From  Flickr  by  Jeff  Golden   Spooky   Spreadsheets   Carly  Strasser  |  California  Digital  Library   UCSB/Bren  Oct  2013  
  • 2. Roadmap   3.  Toolbox   2. Best  practices     1.  Background    
  • 3. Scientists  are  bad  at   data  management.   From  Flickr  by  robertpaulyoung  
  • 6. my  spreadsheet   No  headings  
  • 9.
  • 10. ?
  • 11. www.petshaming.net   NO   Reproducibility   Transparency   Reuse   Didn’t  share  the  data   Didn’t  document  the  data  (metadata)   Didn’t  document  provenance/workflow  
  • 12. Why  should  I  care?   From  Flickr  by  johntrainor  
  • 13. Because   they  care:   From  Flickr  by  Redden-­‐McAllister  
  • 14.
  • 15. From  Flickr  by  Big  Swede  Guy   Best   Practices   ent data managem
  • 16. From  Flickr  by  Mark  Sardella   Plan  before  data   collection  
  • 17. Design  sample  naming  scheme   From  Flickr  by  zebbie   •  Create  a  key  (data  dictionary)   •  Make  sure  names  are  unique   •  Define  codes   Planning  
  • 18. Design  file  naming  scheme   PhDcomics.com   Planning  
  • 19. Design  file  naming  scheme   Planning    Use  descriptive  file  names  *   •  Unique   •  Reflect  contents   Bad:        Mydata.xls    2001_data.csv    best  version.txt   Better:  Eaffinis_nanaimo_2010_counts.xls   Study   organism   Site   name   Year   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 20. Design  file  organization   Planning   From  S.  Hampton  
  • 21. Design  file  organization   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Planning   Consider…   •  Dependencies?   •  File  formats?   •  Time  of  collection?   •  Order  of  analysis?   Wo r ws ! kflo Grassland   From  S.  Hampton  
  • 22. Design  your  spreadsheet   Constrain  entries     Atomize   Break  down  spreadsheets   From  Flickr  by  Ulleskelf   Planning  
  • 23. Consider  a  database   Planning   A  relational  database  is      A  set  of  tables    Relationships  among  the  tables    A  language  to  specify  &  query  the  tables     A  RDB  provides    Scalability:  millions+  records    Features  for  sub-­‐setting,  querying,  sorting    Reduced  redundancy  &  entry  errors     From  Mark  Schildhauer  
  • 24. Consider  a  database   Planning   You  should  invest  time  in  learning  databases  if      your  data  sets  are  large  or  complex     Consider  investing  time  in  learning  databases  if    your  data  are  small  and  humble    you  ever  intend  to  share  your  data    you  are  <  30  years  old   From  Mark  Schildhauer  
  • 25. Planning   Pick  a  data  repository   Store  your  data  in  a  repository   Institutional  archive   Ask  a  librarian   Discipline/specialty  archive         Repos  of  repos:   databib.org   re3data.org   From  Flickr  by  torkildr  
  • 26. Decide  on  preservation/backup   Planning   What  software?   What  hardware?   What  personnel?   How  often?   Set  up  reminders!   Test  system     From  Flickr    by  withassociates   From  Flickr  by  sepa  synod   From  Flickr  by    taberandrew  
  • 27. Write  a  data   management  plan!   Planning   …document  that   describes  what  you  will   do  with  your  data   throughout     the  research  project   From  Flickr  by  Barbies  Land  
  • 28. Planning   DMP  components   •  •  •  •  •  •  From  Flickr  by  Barbies  Land   What  will  be  collected   Methods   Standards   Metadata   Sharing/access   have But they all different requirements Long-­‐term  storage   and express them in different ways
  • 29. dmptool.org                     Step-­‐by-­‐step  wizard  for  generating  DMP   create  |  edit  |  re-­‐use  |  share   Free  &  open  to  community     Planning  
  • 30. During  Data  Collection  &  Entry   From  Flickr  by  Julia  Manzerova  
  • 31. Keep  raw  data  raw   Realistically:     •  Archive  .csv  version  of  raw  data   •  Make  a  “raw”  tab  in  working  data  file   •  Do  all  work  on  other  tabs   During   collection  
  • 32. Keep  raw  data  raw   Ideally:   •  Use  scripts  to  process  data     •  Save  them  with  data     Raw  data  as  .csv   During   collection   R  script  for  processing  &  analysis  
  • 33. Document  your  workflow   During   collection   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflow:  flow  chart   Temperature   data   Salinity                 data   “Clean”  T   &  S  data   Data  import  into  Excel   Data  in   spread-­‐ sheet   Quality  control  &   data  cleaning   Analysis:  mean,  SD   Graph  production   Summary   statistics  
  • 34. Document  your  workflow   During   collection   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflow:  commented  script   •  R,  SAS,  MATLAB…   •  Well-­‐documented  code  is   Easier  to  review   Easier  to  share   Easier  to  use  for  repeat  analysis   #   %   $   &  
  • 35. Document  your  workflow   During   collection   Fancy  schmancy  workflows   Resulting  output   https://kepler-­‐project.org  
  • 36. Document  your  workflow   During   collection   Workflows  enable   •  Reproducibility   •  Transparency     •  Reuse     From  Flickr  by  merlinprincesse  
  • 37. Constrain  data  entries   •  Excel  lists   •  Data  validation   •  Google  docs  forms     Modified  from  K.  Vanderbilt     During   collection  
  • 38. Atomize   During   collection   One  piece  of  information  per  cell  
  • 39. Break  down  spreadsheets   Fake  a  relational  database   During   collection    Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 40. Create  metadata   During   Why  are  you   collection   promoting   Excel?  
  • 41. During   collection   Create  metadata      Metadata:  data  reporting     WHO  created  the  data?   WHAT  is  the  content      of  the  data  set?   WHEN  was  it  created?   HOW  was  it  developed?   WHY  was  it  developed?   From  Flickr  by    //ichael  Patric|{     WHERE  was  it  collected?  
  • 42. During   collection   Create  metadata   Digital  context   Scientific  context   •  Name  of  the  data  set   •  Scientific  reason  why  the  data  were  collected   •  The  name(s)  of  the  data  file(s)  in  the  data  set   •  What  data  were  collected   •  Date  the  data  set  was  last  modified   •  •  Example  data  file  records  for  each  data  type   file   What  instruments  (including  model  &  serial   number)  were  used   •  Environmental  conditions  during  collection   •  Pertinent  companion  files   •  Temporal  &  spatial  resolution     •  List  of  related  or  ancillary  data  sets   •  Standards  or  calibrations  used   •  Software  (including  version  number)  used  to   Information  about  parameters   prepare/read    the  data  set   •  How  each  was  measured  or  produced   Data  processing  that  was  performed   •  Units  of  measure   •  Personnel  &  stakeholders   •  Format  used  in  the  data  set   •  Who  collected     •  Precision  &  accuracy  if  known   •  Who  to  contact  with  questions   •  Funders   Information  about  data   •  Definitions  of  codes  used   •  Quality  assurance  &  control  measures   •  Known  problems  that  limit  data  use  (e.g.   uncertainty,  sampling  problems)    
  • 43. Create  metadata   < a n da rd St During   collection   What  is   metadata?   Metadata  standards…   •  Provide  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   •  Come  in  many  flavors    EML  ,  FGDC,  ISO19115,  DarwinCore,…   •  Can  be  met  using  software  tools    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)        
  • 44. During   collection   Back  up  daily   Near   Original   From  Flickr  by  see  phar   From  Flickr  by  lippo   Far  
  • 45. Remember  that  data   management  plan?   During   collection   Revisit   Review   Revise   From  Flickr  by  Barbies  Land  
  • 46. From  Flickr  by  purplemattfish   During   collection   Revisit   Review   Revise   Schedule  a  time  each   week  or  month  
  • 47. From  Flickr  by  dipster1   Toolbox  
  • 48. Write  a  DMP   dmptool.org                     Step-­‐by-­‐step  wizard  for  generating  DMP   create  |  edit  |  re-­‐use  |  share   Free  &  open  to  community    
  • 49. Find  a  repository   Where   should  I  put   my  data?   databib.org  
  • 50. Manage  &  share   •  Help  researchers  manage,  describe,   and  share  tabular  data   •  Free   •  Add-­‐in  for  Excel  &  web  application    
  • 51. Manage  &  share   Features   1.  2.  3.  4.  Best  practices  check   Generate  metadata   Get  identifier  &  citation   Post  data  to  repository  
  • 54. Clean  data   Open  Refine  =  Google  Refine     •  •  •  •  Open  source  desktop  application     Used  for  data  cleanup  and  transformation  to  other  formats   Works  with  spreadsheets  but  behaves  like  a  database   User  can  filter  the  rows  to  display  using  facets  that  define   filtering  criteria  
  • 55. Open  Refine  =  Google  Refine     •  •  •  •  Open  source  desktop  application     Used  for  data  cleanup  and  transformation  to  other  formats   Works  with  spreadsheets  but  behaves  like  a  database   User  can  filter  the  rows  to  display  using  facets  that  define   filtering  criteria  
  • 56. Get  help   Toolbox:    DCXL  blog:  dcxl.cdlib.org  
  • 57. From  Flickr  by  twm1340   Culture   Shift  Ahead  
  • 58. From  Flickr  by  cdsessums   science   source   notebook   content   access   data   government   knowledge  
  • 59. Make  a   resolution   •  Triage  on  current   projects   •  Get    advisor,  lab  mates,   collaborators  on  board   •  Do  better  next  time   From  Flickr  by  Andy  Graulund  
  • 60. Website   Email   Twitter   Slides   carlystrasser.net   carlystrasser@gmail.com   @carlystrasser     slideshare.net/carlystrasser