Skip to content

Latest commit

 

History

History
90 lines (67 loc) · 8.53 KB

open_source.md

File metadata and controls

90 lines (67 loc) · 8.53 KB

Open Source

Each of the following section breaks down an area in this diagram, listing the purpose of each service in that area.

overview

Authentication

Services that manage users.

Service Description Technologies Status Key Contributors
Identity A central service for creating an account, login, logout, managing user info, etc. Other services can talk to the identity service to get information about a user. It's hosted here. Normal humans can create & manage accounts using the archivers 2.0 webapp. alpha Go @b5
IdentityDB Database of all user identities. Only way to talk to it is through the identity service. alpha Postgres @b5

Guidance

Services that guide Data Rescue efforts.

Service Description Technologies Status Key Contributors
Agency Primers Spreadsheet of agencies & sub-agencies for archiving. Google Sheets, Airtable in use @mayaad, @trinberg, Andrew Bergman
Chrome Extension Chrome extension to nominate government data that needs to be preserved. Javascript, Chrome Extension in use @ates, @titaniumbones
Uncrawlables Spreadsheet The chrome extension dumps it's output to a google sheet of uncrawlable content. Google Sheets in use

Reporting

Services for collecting & delivering platform-wide reports.

Service Description Technologies Status Key Contributors
Stats Server that periodically collects key stats platform-wide & reports them for easy public consumption. This service will ideally just consume the JSON API's of all the other services & output a dashboard, there are lots of frameworks out in the wild that do this. We should research & pick one. planning to use an existing solution not yet started
Health Server that periodically polls all services on the platform & outputs a page that shows when a service is down / malfunctioning. Status check frameworks for this already exist. We should research & pick one. planning to use an existing solution not yet started

Archiving 1.0

Current Services for downloading & storing content.

Service Description Technologies Status Key Contributors
Archivers 1.0 App for volunteers to research urls, add metadata, and upload archived .zip's. Javascript, MeteorJS, ReactJS in use @kmcculloch, @danielballan, @b5
Archivers 1.0 DB Archivers Backing database MongoDB in use @b5
S3-Upload-Server Server for making uploads to S3 buckets via browser or AWS CLI tokens in use Go @b5
Zip-Starter Server for generating base metadata zip archivers in use Go @b5

Archiving 2.0

New services for archiving & describing content.

Service Description Technologies Status Key Contributors
Archivers 2.0 Webapp for volunteers to add metadata to urls & content. Javascript, ReactJS, Redux alpha @b5
Patchbay Backing service for archivers app, it coordinates realtime communication between users of the archivers 2.0 app, and does on-the-fly archiving of undiscovered urls. Go alpha @b5
Miru A website monitoring tool that periodically checks a site for changes & running custom scripts. Miru takes user-contributed scripts capable of extracting uncrawlable content & executes them, recording their results in a uniform format. Go beta @zsck
Miru DB Miru's backing database. sqLite beta @zsck
Recipes A collection of strategies (recipes) for dealing with various types of uncrawlable content. Various Languages under construction @jeffreyliu
ArchiveDB Database of archived content & metadata. Schema is outlined here. Postgres alpha
ArchiveContent S3 Bucket A big S3 bucket to save & read content to. S3 alpha
Sentry Sentry is a web that continually scans for pages that haven't been checked in a while (or ever), and generates snapshots of what it finds. Go planned @b5

Site Monitoring

Services for tracking changes to websites.

Service Description Technologies Status Key Contributors
Web Monitoring DB Website Monitoring project: a more automated version of page monitoring with Versionista (proof of concept for now) Ruby under construction @Mr0grog
Web Monitoring Differ Diffing service for the website monitoring project Javascript under construction @WestleyArgentum
Web Monitoring Processing Website Monitoring project: data processing, PageFreezer integration, and (eventually) diff filtering and processing Jupyter, Python under construction @danielballan
Web Monitoring UI Website Monitoring project: enable analysts to quickly assess changes to monitored government websites Javascript under construction @lightandluck
PageFreezer Page freezing / archiving service external service integrating @danielballan
Versionista Page freezing / archiving service external service in use @danielballan

Distribution

Services for disseminating content & data to others.

Service Description Technologies Status Key Contributors
API JSON API to wrap & publish as many platform services as possible. This would include platform users, archived content, archived metadata, and web-monitoring diffs. JSON planned
Bag-Gen A server to generate bags for bag-oriented data hosting services (ckan, dataverse, etc). This service is planned as a python wrapper around python bagIt lib that turns it into a server can generate bags from archived content. Python planned
IPFS-Node A Bundle of existing frameworks to publish & syncronize archived content with the Inter-Planitary File System. Go planned
Dat-Gen Service for generating & hosting dat-data project packages. This is a planned lightweight node.js wrapper around the dat library, capabale of translating archived content to dat projects. Javascript, Node.js planned

Coordination

Services for integrating with other services.

Service Description Technologies Status Key Contributors
Coordinator Service that talks to other archiving services about content & metadata they have, using prewritten integrations for translating to each service. This service functions in a very similar fashion to the current Miru implementation. This service could be implemented either as another Miru instance, or a forked version (preference for implementing as an instance). planned
Coordinator DB Cache of data we've received from other services in a format that matches archiveDB. planned
Integrations Series of recipe-repos that map external data sources & destinations. planned