Each of the following section breaks down an area in this diagram, listing the purpose of each service in that area.
Services that manage users.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Identity | A central service for creating an account, login, logout, managing user info, etc. Other services can talk to the identity service to get information about a user. It's hosted here. Normal humans can create & manage accounts using the archivers 2.0 webapp. | alpha | Go | @b5 |
IdentityDB | Database of all user identities. Only way to talk to it is through the identity service. | alpha | Postgres | @b5 |
Services that guide Data Rescue efforts.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Agency Primers | Spreadsheet of agencies & sub-agencies for archiving. | Google Sheets, Airtable | in use | @mayaad, @trinberg, Andrew Bergman |
Chrome Extension | Chrome extension to nominate government data that needs to be preserved. | Javascript, Chrome Extension | in use | @ates, @titaniumbones |
Uncrawlables Spreadsheet | The chrome extension dumps it's output to a google sheet of uncrawlable content. | Google Sheets | in use |
Services for collecting & delivering platform-wide reports.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Stats | Server that periodically collects key stats platform-wide & reports them for easy public consumption. This service will ideally just consume the JSON API's of all the other services & output a dashboard, there are lots of frameworks out in the wild that do this. We should research & pick one. | planning to use an existing solution | not yet started | |
Health | Server that periodically polls all services on the platform & outputs a page that shows when a service is down / malfunctioning. Status check frameworks for this already exist. We should research & pick one. | planning to use an existing solution | not yet started |
Current Services for downloading & storing content.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Archivers 1.0 | App for volunteers to research urls, add metadata, and upload archived .zip's. | Javascript, MeteorJS, ReactJS | in use | @kmcculloch, @danielballan, @b5 |
Archivers 1.0 DB | Archivers Backing database | MongoDB | in use | @b5 |
S3-Upload-Server | Server for making uploads to S3 buckets via browser or AWS CLI tokens | in use | Go | @b5 |
Zip-Starter | Server for generating base metadata zip archivers | in use | Go | @b5 |
New services for archiving & describing content.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Archivers 2.0 | Webapp for volunteers to add metadata to urls & content. | Javascript, ReactJS, Redux | alpha | @b5 |
Patchbay | Backing service for archivers app, it coordinates realtime communication between users of the archivers 2.0 app, and does on-the-fly archiving of undiscovered urls. | Go | alpha | @b5 |
Miru | A website monitoring tool that periodically checks a site for changes & running custom scripts. Miru takes user-contributed scripts capable of extracting uncrawlable content & executes them, recording their results in a uniform format. | Go | beta | @zsck |
Miru DB | Miru's backing database. | sqLite | beta | @zsck |
Recipes | A collection of strategies (recipes) for dealing with various types of uncrawlable content. | Various Languages | under construction | @jeffreyliu |
ArchiveDB | Database of archived content & metadata. Schema is outlined here. | Postgres | alpha | |
ArchiveContent S3 Bucket | A big S3 bucket to save & read content to. | S3 | alpha | |
Sentry | Sentry is a web that continually scans for pages that haven't been checked in a while (or ever), and generates snapshots of what it finds. | Go | planned | @b5 |
Services for tracking changes to websites.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Web Monitoring DB | Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now) | Ruby | under construction | @Mr0grog |
Web Monitoring Differ | Diffing service for the website monitoring project | Javascript | under construction | @WestleyArgentum |
Web Monitoring Processing | Website Monitoring project: data processing, PageFreezer integration, and (eventually) diff filtering and processing | Jupyter, Python | under construction | @danielballan |
Web Monitoring UI | Website Monitoring project: enable analysts to quickly assess changes to monitored government websites | Javascript | under construction | @lightandluck |
PageFreezer | Page freezing / archiving service | external service | integrating | @danielballan |
Versionista | Page freezing / archiving service | external service | in use | @danielballan |
Services for disseminating content & data to others.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
API | JSON API to wrap & publish as many platform services as possible. This would include platform users, archived content, archived metadata, and web-monitoring diffs. | JSON | planned | |
Bag-Gen | A server to generate bags for bag-oriented data hosting services (ckan, dataverse, etc). This service is planned as a python wrapper around python bagIt lib that turns it into a server can generate bags from archived content. | Python | planned | |
IPFS-Node | A Bundle of existing frameworks to publish & syncronize archived content with the Inter-Planitary File System. | Go | planned | |
Dat-Gen | Service for generating & hosting dat-data project packages. This is a planned lightweight node.js wrapper around the dat library, capabale of translating archived content to dat projects. | Javascript, Node.js | planned |
Services for integrating with other services.
Service | Description | Technologies | Status | Key Contributors |
---|---|---|---|---|
Coordinator | Service that talks to other archiving services about content & metadata they have, using prewritten integrations for translating to each service. This service functions in a very similar fashion to the current Miru implementation. This service could be implemented either as another Miru instance, or a forked version (preference for implementing as an instance). | planned | ||
Coordinator DB | Cache of data we've received from other services in a format that matches archiveDB. | planned | ||
Integrations | Series of recipe-repos that map external data sources & destinations. | planned |