Open Source

Each of the following section breaks down an area in this diagram, listing the purpose of each service in that area.

Authentication

Services that manage users.

Service	Description	Technologies	Status	Key Contributors
Identity	A central service for creating an account, login, logout, managing user info, etc. Other services can talk to the identity service to get information about a user. It's hosted here. Normal humans can create & manage accounts using the archivers 2.0 webapp.	alpha	Go	@b5
IdentityDB	Database of all user identities. Only way to talk to it is through the identity service.	alpha	Postgres	@b5

Guidance

Services that guide Data Rescue efforts.

Service	Description	Technologies	Status	Key Contributors
Agency Primers	Spreadsheet of agencies & sub-agencies for archiving.	Google Sheets, Airtable	in use	@mayaad, @trinberg, Andrew Bergman
Chrome Extension	Chrome extension to nominate government data that needs to be preserved.	Javascript, Chrome Extension	in use	@ates, @titaniumbones
Uncrawlables Spreadsheet	The chrome extension dumps it's output to a google sheet of uncrawlable content.	Google Sheets	in use

Reporting

Services for collecting & delivering platform-wide reports.

Service	Description	Technologies	Status	Key Contributors
Stats	Server that periodically collects key stats platform-wide & reports them for easy public consumption. This service will ideally just consume the JSON API's of all the other services & output a dashboard, there are lots of frameworks out in the wild that do this. We should research & pick one.	planning to use an existing solution	not yet started
Health	Server that periodically polls all services on the platform & outputs a page that shows when a service is down / malfunctioning. Status check frameworks for this already exist. We should research & pick one.	planning to use an existing solution	not yet started

Archiving 1.0

Current Services for downloading & storing content.

Service	Description	Technologies	Status	Key Contributors
Archivers 1.0	App for volunteers to research urls, add metadata, and upload archived .zip's.	Javascript, MeteorJS, ReactJS	in use	@kmcculloch, @danielballan, @b5
Archivers 1.0 DB	Archivers Backing database	MongoDB	in use	@b5
S3-Upload-Server	Server for making uploads to S3 buckets via browser or AWS CLI tokens	in use	Go	@b5
Zip-Starter	Server for generating base metadata zip archivers	in use	Go	@b5

Archiving 2.0

New services for archiving & describing content.

Service	Description	Technologies	Status	Key Contributors
Archivers 2.0	Webapp for volunteers to add metadata to urls & content.	Javascript, ReactJS, Redux	alpha	@b5
Patchbay	Backing service for archivers app, it coordinates realtime communication between users of the archivers 2.0 app, and does on-the-fly archiving of undiscovered urls.	Go	alpha	@b5
Miru	A website monitoring tool that periodically checks a site for changes & running custom scripts. Miru takes user-contributed scripts capable of extracting uncrawlable content & executes them, recording their results in a uniform format.	Go	beta	@zsck
Miru DB	Miru's backing database.	sqLite	beta	@zsck
Recipes	A collection of strategies (recipes) for dealing with various types of uncrawlable content.	Various Languages	under construction	@jeffreyliu
ArchiveDB	Database of archived content & metadata. Schema is outlined here.	Postgres	alpha
ArchiveContent S3 Bucket	A big S3 bucket to save & read content to.	S3	alpha
Sentry	Sentry is a web that continually scans for pages that haven't been checked in a while (or ever), and generates snapshots of what it finds.	Go	planned	@b5

Site Monitoring

Services for tracking changes to websites.

Service	Description	Technologies	Status	Key Contributors
Web Monitoring DB	Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now)	Ruby	under construction	@Mr0grog
Web Monitoring Differ	Diffing service for the website monitoring project	Javascript	under construction	@WestleyArgentum
Web Monitoring Processing	Website Monitoring project: data processing, PageFreezer integration, and (eventually) diff filtering and processing	Jupyter, Python	under construction	@danielballan
Web Monitoring UI	Website Monitoring project: enable analysts to quickly assess changes to monitored government websites	Javascript	under construction	@lightandluck
PageFreezer	Page freezing / archiving service	external service	integrating	@danielballan
Versionista	Page freezing / archiving service	external service	in use	@danielballan

Distribution

Services for disseminating content & data to others.

Service	Description	Technologies	Status
API	JSON API to wrap & publish as many platform services as possible. This would include platform users, archived content, archived metadata, and web-monitoring diffs.	JSON	planned
Bag-Gen	A server to generate bags for bag-oriented data hosting services (ckan, dataverse, etc). This service is planned as a python wrapper around python bagIt lib that turns it into a server can generate bags from archived content.	Python	planned
IPFS-Node	A Bundle of existing frameworks to publish & syncronize archived content with the Inter-Planitary File System.	Go	planned
Dat-Gen	Service for generating & hosting dat-data project packages. This is a planned lightweight node.js wrapper around the dat library, capabale of translating archived content to dat projects.	Javascript, Node.js	planned

Coordination

Services for integrating with other services.

Service	Description	Status
Coordinator	Service that talks to other archiving services about content & metadata they have, using prewritten integrations for translating to each service. This service functions in a very similar fashion to the current Miru implementation. This service could be implemented either as another Miru instance, or a forked version (preference for implementing as an instance).	planned
Coordinator DB	Cache of data we've received from other services in a format that matches archiveDB.	planned
Integrations	Series of recipe-repos that map external data sources & destinations.	planned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_source.md

open_source.md

Open Source

Authentication

Guidance

Reporting

Archiving 1.0

Archiving 2.0

Site Monitoring

Distribution

Coordination

Files

open_source.md

Latest commit

History

open_source.md

File metadata and controls

Open Source

Authentication

Guidance

Reporting

Archiving 1.0

Archiving 2.0

Site Monitoring

Distribution

Coordination