Handwritten Text Recognition for manuscripts and early printed texts
#RADC4L16: An API-First Archives Approach at NPR
1. Transcending Traditional Systems and Labels: An API-First Archives Approach
Camille Salas | Product Owner Will Boyd | Developer John Nelson | Developer
@NPR_RAD | @unacamisa | @jhnsln
Research, Archives, and Data Strategy
3. Let there be metadata for many, many reasons
Broadcast Librarians would apply standardized program information to a new
system in 1973 and would work backwards chronologically to input previously
existing programs -CPB Grant Application
“... [T]he licensee of each station must maintain a station log . . . that reflects the
station operation” - FCC Requirements
4. 21st century solutions
First CMS
● Similar to an
Access Database
● Manual entry of
metadata time
consuming
● Limited reporting
options
Artemis 1.0
2011
Open source software
● Communicates with
our digital production
system via API
● Increase of workflow
efficiencies/ ingest of
transcripts
However
● Upgrading was not an
option/ too customized
Artemis 2.0
2016
In-house development
solution
● Focus on the metadata
first
● An eye towards the
future
● All media - videos and
podcasts
● Represent our new
name
5. Solution: An API-
First Approach
Lessons Learned From Previous
Systems Implementation:
NPR has unique archival, search,
and business needs
6. Growing business needs and a challenge
What system can we put in place to respond to NPR’s ever-growing business needs?
Requirements
Flexible and responsive to our growing needs
Searchable system
Keep up with the pace of new daily content
Generate reports
Access to our historical audio for not only
research but future re-use
Reduce our metadata entry workflow
7. Early 2015
Started with building of
an API first application
Develop our
data model
We called it the
“Trapper Keeper”
Frame our
MVP’s
Ingest Content,
Edit our Metadata,
Search Functionality,
Distribute Content
Spring 2016
Work with our in-
house UI team on new
updated interface
Launch
within sight
Wrap-up MVPs and
test
8. How did we get here? ¯_(ツ)_/¯
● Our data model was probably the most important part of the process. We needed
something that could be flexible enough to accommodate our growing digital
archive needs while also being able to handle.
● We settled on the analogy of a trapper keeper instead of using terms like stuff and
things.
● I personally voted for stuff and things, but got vetoed. I guess the stuff and things
api isn’t really that catchy
9. Development Process
● We didn’t need to have everything worked out up front!
● Hypermedia over RESTful
● Managing current product with new development!
● Microservices-ISH
11. The Angular front end
● My angular perspective, coming from mostly backbone and jquery
○ Intimidating to get into
○ Angular docs have complicated examples
○ Once you get it, it’s EASY and FAST
● Automated everything
○ Set up with Yeoman and Jenkins
○ On every push: pull code, install, test, serve the code
○ 5 minute install with npm and bower
12. ● Every application state is stored in the URL
● We created a simple CMS for catalogers and it gives the cataloger MORE POWER
My favorite front-end features
19. Wait, where is the business / domain logic?
● So there is front-end code and an API.
○ Some logic in the API schema documents
○ Some logic is stuffed into the front end
● What about stuff that needs to run on a server?
○ OAuth 2.0
○ Connect to Active Directory?
● So we built a proxy layer. Another API, specific for Artemis logic.
20. The proxy. An API in front of the API.
● Built with nodejs and express
● We keep finding uses for it:
○ Authenticate users with oauth 2.0
○ Connect to other internal NPR APIs
○ Query the NPR RAD taxonomy
○ Running bulk updates
○ Get and Set user preferences (part of the authentication endpoint)
○ Export search results
○ Browse the massive Filesystem that stores all of our archived wav files
22. The proxy
● An authentication microservice between the UI and the API.
● Built with nodejs and express
● We keep finding uses for it:
○ Authenticate users with oauth 2.0
○ Connect to other internal NPR APIs
○ Query and use our taxonomy
○ Authenticate users with oauth 2.0
○ Running bulk updates
○ Get and Set user preferences (part of the authentication endpoint)
○ Export search results
○ Browse the massive Filesystem that stores all of our archived wav files
24. The proxy
● An authentication microservice between the UI and the API.
● Built with nodejs and express
● We keep finding uses for it:
○ Authenticate users with oauth 2.0
○ Connect to other internal NPR APIs
○ Query the NPR RAD taxonomy
○ Running bulk updates
○ Get and Set user preferences
○ Export search results
○ Browse the mounted filesystem with our wav files
25. Migration hardship
● MySQL to NoSQL requires a LOT of http calls
● Querying a year of data in MySQL on Old Artemis takes less than a minute.
● Inserting that data into DynamoDB and then Elastic takes about an hour.
● Mistakes add lots of time.
● HTTP calls will be dropped.
● Duplicates will be made.
● Re-migration is inevitable.
26. Lessons learned: a Product Owner’s view
● API-First means our front end options were limitless; deploying our front end
became simpler and more frequent
● Documentation is key
● Plan on backup resources just in case - special thanks @sayrahknight
● Managing expectations with users is challenging
● Make scrum fit your team’s needs