Archivematica is an open source digital preservation system. The document discusses Archivematica's APIs, which allow programmatic interaction with its functions. It provides examples of starting transfers, checking statuses, and downloading packages using the APIs and tools like cURL and AMClient.py. The document also explains some Archivematica terminology and concepts related to its APIs.
2. Introduction
● Developer at Artefactual, November 2017
● One of a handful working on the Archivematica Project
● Background in digital preservation:
○ The National Archives UK
○ Archives New Zealand
● Current work interests:
○ Dataverse Integration for Scholars Portal
○ Technical training (next camp, Houston in November!)
○ And of course, improving our API and Docs!
3. What is an API
API stands for Application Programming Interface
A description of a software library or web service and how users and
software agents are expected to interact with it.
Common uses of an API might be to contribute, or retrieve data from a web
service.
“
4. There are APIs for Everything!
https://github.com/toddmotto/public-apis
5. Archivematica’s APIs
Two Primary APIs:
Storage Service: https://wiki.archivematica.org/Storage_Service_API
Archivematica: https://wiki.archivematica.org/Archivematica_API
Additionally:
SWORD: https://wiki.archivematica.org/Sword_API
Plus AtoM: https://www.accesstomemory.org/en/docs/2.4/#api
6. Jargon
Endpoints: The API address (URL) that performs a particular action according
to its specification.
SFTP: Secure (encrypted) file transfer protocol
cURL: Command line tool for transferring and retrieving data.
HTTPie: A simpler more user friendly cURL like tool!
Verbs: Actions associated with an API command, GET (retrieve), POST
(submit).
7. Jargon
Processing configuration: Archivematica’s configuration file which
determines the defaults associated with microservices, e.g. ‘Normalize for
Access’.
Decision Points: Microservices which halt by default unless configured
otherwise.
RPC: Remote procedure call. A method of interacting with the Archivematica
job server where decision points need to be manually moved along.
Devtools: Repository of tools capturing some miscellaneous Archivematica
functionality.
8. Jargon
AMClient.py: A set of procedures that group API calls together, or make it
easier to make an individual call.
CLI: Command-line interface, or terminal. Text-based interaction with the
operating system using special commands.
9. Regular Workflow
> Location configured in Storage Service, (e.g. UUID:
6b82c1f5-2b87-49ba-8e5f-947759201518)
> SFTP data to a location (e.g. /home/transfer-data/)
> Transfer is started via /api/transfer/start_transfer endpoint
> With processingMCP.xml configured to be as automated
as possible; transfer should run to completion...
11. API Calls
Starting a transfer using cURL:
curl -v -X POST
-H "Authorization: ApiKey test:test"
--data "name=api-demo-1&type=standard"
--data "paths[]=
[$(echo -n '6b82c1f5-2b87-49ba-8e5f-947759201518:/home/.../udenver'
| base64 -w 0)]"
"http://127.0.0.1:62080/api/transfer/start_transfer/"
12. API Calls
Starting a transfer using HTTPie:
http --pretty=format
-f
POST "http://127.0.0.1:62080/api/transfer/start_transfer/"
Authorization:"ApiKey test:test"
name="api-demo-1"
type="standard"
paths[]="
[$(echo -n '6b82c1f5-2b87-49ba-8e5f-947759201518:/home/.../udenver'
| base64 -w 0)]"
15. --data
Submit as URL Encoded Form -f
(Verb) Submit data to the server: POST
API Endpoint: “http://127.0.0.1:62080/api/transfer/start_transfer/”
API User and Key as a HTTP Header Authorization:"ApiKey test:test"
Transfer Name name="api-demo-1"
Transfer Type type="standard"
Transfer source UUID and paths to
the data in the Transfer Source
encoded as Base64
paths[]="
[$(echo -n
'6b82c1f5-2b87-49ba-8e5f-947759201518:/home/.../udenver' |
base64 -w 0)]"
16. How do I find the location?
Via the Storage Service UI, or Storage Service API:
http --pretty=format
GET "http://127.0.0.1:62081/api/v2/location/"
Authorization:"ApiKey test:test"
18. Location :: Response
We can access the structure (JSON) and find the pieces we require to
construct a path:
● "relative_path": "home",
● "uuid": "ef938612-846e-4585-b665-c5596305547b",
Becomes:
● 'ef938612-846e-4585-b665-c5596305547b:
/home/archivematica/archivematica-sampledata/api-demo/udenver'
19. API Calls
If something went wrong starting the transfer:
HTTP/1.1 403 FORBIDDEN
Connection: keep-alive
Content-Language: en
Content-Type: application/json
Date: Thu, 20 Sep 2018 22:56:29 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Language, Cookie
{
"error": true,
"message": "API key not valid."
}
20. API Calls
Or something else went wrong:
HTTP/1.1 500 INTERNAL SERVER ERROR
Server: nginx
Date: Thu, 20 Sep 2018 23:04:22 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Language, Cookie
Content-Language: en
21. API Calls
Response if it went well:
{
"message": "Copy successful.",
"path": "/var/archivematica/.../api-demo-1/"
}
HTTP/1.1 200 OK
Connection: keep-alive
Content-Language: en
Content-Type: application/json
Date: Thu, 20 Sep 2018 22:51:57 GMT
Server: nginx/1.14.0
Transfer-Encoding: chunked
Vary: Accept-Language, Cookie
22. Next Steps
Approve Transfer:
http --pretty=format
-f
POST "http://127.0.0.1:62080/api/transfer/approve"
Authorization:"ApiKey test:test"
type="standard"
directory="api-demo-1"
23. And what’s going on there?
Submit as URL Encoded Form -f
(Verb) Submit data to the server: POST
API Endpoint: “http://127.0.0.1:62080/api/transfer/approve”
API User and Key as a HTTP Header Authorization:"ApiKey test:test"
Transfer type type=”standard”
Directory to approve (consider that this directory
has become ‘api-demo-1’ in the watched transfers
folder)
directory=”api-demo-1”
26. Outputs become inputs...
Our transfer UUID can be used for a status update:
● "uuid": "1a06315f-0962-4f73-9f89-65d26c087a2b"
http --pretty=format
-f
GET "http://127.0.0.1:62080/api/transfer/status/1a06315f-0962-4f73-9f89-65d26c087a2b/"
Authorization:"ApiKey test:test"
27. Which provides a SIP UUID
{
"directory": "api-demo-1-1a06315f-0962-4f73-9f89-65d26c087a2b",
"message": "Fetched status for 1a06315f-0962-4f73-9f89-65d26c087a2b successfully.",
"microservice": "Check transfer directory for objects",
"name": "api-demo-1",
"path": "/var/archivematica/.../api-demo-1-1a06315f-0962-4f73-9f89-65d26c087a2b/",
"sip_uuid": "d29d91ff-1dff-4a4b-a855-2556d2ed3534",
"status": "COMPLETE",
"type": "transfer",
"uuid": "1a06315f-0962-4f73-9f89-65d26c087a2b"
}
28. Outputs become inputs...
Our SIP UUID can be used for a status update as well:
● "uuid": "d29d91ff-1dff-4a4b-a855-2556d2ed3534"
http --pretty=format
-f
GET "http://127.0.0.1:62080/api/ingest/status/d29d91ff-1dff-4a4b-a855-2556d2ed3534/"
Authorization:"ApiKey test:test"
29. Which tells us where our AIP is...
{
"directory": "api-demo-1-d29d91ff-1dff-4a4b-a855-2556d2ed3534",
"message": "Fetched status for d29d91ff-1dff-4a4b-a855-2556d2ed3534 successfully.",
"microservice": "Remove the processing directory",
"name": "api-demo-1",
"path": "/var/archivematica/.../api-demo-1-d29d91ff-1dff-4a4b-a855-2556d2ed3534/",
"status": "COMPLETE",
"type": "SIP",
"uuid": "d29d91ff-1dff-4a4b-a855-2556d2ed3534"
}
31. Reingest.py
● Script created for Canadian Center of Architects (CCA)
● Link: https://bit.ly/2zlJzqB
● Relies on being run and re-run, e.g. via CRON
● Once a status of COMPLETE for a SIP is returned, a new job is started
● Automation-tools transfer.py was the basis for that work
32. Point of Note
The Storage Service is the better place to retrieve an update on a package’s
status. COMPLETE in Archivematica is simply a status from the microservice
processing, so…
http --pretty=format
GET "http://127.0.0.1:62081/api/v2/file/d29d91ff-1dff-4a4b-a855-2556d2ed3534/"
Authorization:"ApiKey test:test"
34. Working with Packages
Download a package:
http
GET "http://127.0.0.1:62081/api/v2/file/d29d91ff-1dff-4a4b-a855-2556d2ed3534/download/"
Authorization:"ApiKey test:test"
+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+
35. Working with Packages
Download an individual file:
http
GET "http://127.0.0.1:62081/api/v2/file/{sip-uuid}/
extract_file/?relative_path_to_file=api-demo-1-{sip-uuid}
/data/METS.{sip-uuid}.xml"
Authorization:"ApiKey test:test"
Archivematica AIP Structure:
https://www.archivematica.org/en/docs/archivematica-1.7/user-manual/archival-storage/aip-structure/
37. AMClient.py
● Wraps API functionalities as a CLI tool.
● Used inside the Automation Tools, and Automated Testing. It is also a very
convenient script for developers to use…
● Examples:
python -m transfers.amclient get-pipelines
--ss-user-name test
--ss-url http://127.0.0.1:62081
test | python -m json.tool
python -m transfers.amclient aips
--ss-user-name test
--ss-url http://127.0.0.1:62081
test | python -m json.tool
38. Devtools - mcp-rpc-cli
If the processing configuration accidentally contains decision points but you
still want to move a transfer forward automatically, use mcp-rpc-cli:
● Demo: https://asciinema.org/a/195109
● Archivematica Documentation: https://bit.ly/2OIUzUu
Support for this tool is unlikely after Archivematica 1.8
39. Looking to the future
Just slightly beyond Archivematica 1.8, and a little bit in 1.8 too… (Resource
oriented; masking of complexity (e.g. no-longer two-steps to approve); more
considered API design practice)
http --pretty=format
POST 127.0.0.1:62080/api/v2beta/package
"Authorization: ApiKey test:test"
processing_config=automated
path=$(echo -n '/home/archivematica/archivematica-sampledata/api-demo/udenver' | base64 -w 0)
name=v2_beta_endpoint_example