Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Add eventlog endpoint for collecting client-side events #501

Closed
wants to merge 52 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
344bb57
Add initial eventlog hook
yuvipanda Jul 7, 2019
a0f40ea
Install jupyter_telemetry from source
yuvipanda Jul 7, 2019
96bf2f0
Set up an eventlog API endpoint
yuvipanda Jul 7, 2019
06b91e0
Use different naming convention & add test for it
yuvipanda Jul 7, 2019
716ff1b
Don't use f-strings
yuvipanda Jul 7, 2019
8e122fc
Derive JSON Schema files from YAML files
yuvipanda Jul 9, 2019
f9a0dfb
Keep event schemas in YAML
yuvipanda Jul 9, 2019
c7428e8
Depend on the jupyter_telemetry package
yuvipanda Jul 9, 2019
9437e88
read schemas from new utils function
Zsailer Oct 1, 2019
6e3c80c
Add fix for tables in RTD theme sphinx docs.
Zsailer Oct 1, 2019
4035fd5
add event schema auto-documentation to jupyter notebook docs
Zsailer Oct 1, 2019
23d50a3
format paths in recorded events
Zsailer Oct 1, 2019
3c94970
add documentation for eventlog endpoint
Zsailer Oct 1, 2019
e76c91b
return exception as 400 error in eventlog endpoint
Zsailer Oct 1, 2019
2ce7c54
normalize path in emitted event
Zsailer Oct 1, 2019
5794d31
initial tests
Zsailer May 19, 2020
7c9d3d5
add initial telemetry docs
Zsailer May 19, 2020
ef8573d
fix jupyter_telemetry dependency
Zsailer May 19, 2020
ea9e352
point telemetry at correct dev branch
Zsailer May 19, 2020
b06f7d6
add tests for eventlog
kiendang Oct 20, 2020
4d7fc23
Merge branch 'master' into jupyter_telemetry
kiendang Dec 17, 2020
7302396
Use correct fixture names
kiendang Dec 17, 2020
2c79b1a
Merge branch 'master' into jupyter_telemetry
kiendang Mar 9, 2021
0ac16dc
Fix import
kiendang Mar 10, 2021
7c81b23
Remove redundant call
kiendang Mar 18, 2021
b8ca484
Update telemetry
kiendang Mar 20, 2021
ac452cd
Add note about security
kiendang Mar 22, 2021
70f9275
Register client telemetry schemas using entry_points
kiendang Mar 31, 2021
7f50c85
Add working telemetry commit for testing
kiendang Apr 6, 2021
99439b6
Use backported importlib_metadata
kiendang Apr 6, 2021
cdc92e9
Merge remote-tracking branch 'upstream/master' into jupyter_telemetry
kiendang Apr 6, 2021
8e69ab0
Ignore errors while registering client events
kiendang Apr 23, 2021
e2db0ad
Add client eventlog to list services
kiendang Apr 23, 2021
66accdc
Add tests for client telemetry events
kiendang Apr 23, 2021
4dcd258
Add client telemetry eventlog tests to CI
kiendang Apr 23, 2021
87dca57
Merge branch 'master' into jupyter_telemetry
kiendang Apr 23, 2021
9fd2a5f
Clean up
kiendang Apr 23, 2021
f692488
Fix eventlog test
kiendang Apr 23, 2021
21117a6
Use standard lib instead of backport when possible
kiendang Apr 24, 2021
1936503
Fix docs
kiendang Apr 25, 2021
e099ccc
Merge branch 'master' into jupyter_telemetry
kiendang Apr 25, 2021
bc94f13
Fix docs
kiendang Apr 25, 2021
bfbdd17
Refine example
kiendang Apr 25, 2021
0974231
Use same interface for registering file and file object
kiendang Apr 25, 2021
035eb6e
Fix client test ci
kiendang Apr 25, 2021
8179c47
Remove redundant check
kiendang Apr 26, 2021
c307a8d
Add docs on registering client events
kiendang Apr 26, 2021
e206188
Remove unrelated doc change
kiendang Apr 26, 2021
cce5c71
Add .yml extension
kiendang May 3, 2021
19214e8
No longer use pathlib .suffix to check file extension
kiendang May 3, 2021
c67401d
Doc change notebook server to jupyter server
kiendang May 3, 2021
e15dd9b
Merge remote-tracking branch 'upstream/master' into telemetry_client
kiendang May 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Derive JSON Schema files from YAML files
This lets us add detailed documentation & description
to our schemas, which is very hard to do in JSON.

We also add a lot of documentation to the one
JSON schema we have
yuvipanda authored and Zsailer committed May 19, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 8e122fcaaa4d348a4edc9c1ce780742bff788d7f
19 changes: 19 additions & 0 deletions jupyter_server/event-schemas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Event Schemas

## Generating .json files

Event Schemas are written in a human readable `.yaml` format.
This is primarily to get multi-line strings in our descriptions,
as documentation is very important.

Every time you modify a `.yaml` file, you should run the following
commands.

```bash
./generate-json.py
```

This needs the `ruamel.yaml` python package installed.

Hopefully, this is extremely temporary, and we can just use YAML
with jupyter_telemetry.
17 changes: 10 additions & 7 deletions jupyter_server/event-schemas/contentsmanager-actions.json
Original file line number Diff line number Diff line change
@@ -2,9 +2,12 @@
"$id": "eventlogging.jupyter.org/notebook/contentsmanager-actions",
"version": 1,
"title": "Contents Manager activities",
"description": "Notebook Server emits this event whenever a contentsmanager action happens",
"description": "Record actions on files via the ContentsManager REST API.\n\nThe notebook ContentsManager REST API is used by all frontends to retreive,\nsave, list, delete and perform other actions on notebooks, directories,\nand other files through the UI. This is pluggable - the default acts on\nthe file system, but can be replaced with a different ContentsManager\nimplementation - to work on S3, Postgres, other object stores, etc.\nThe events get recorded regardless of the ContentsManager implementation\nbeing used.\n\nLimitations:\n\n1. This does not record all filesystem access, just the ones that happen\n explicitly via the notebook server's REST API. Users can (and often do)\n trivially access the filesystem in many other ways (such as `open()` calls\n in their code), so this is usually never a complete record.\n2. As with all events recorded by the notebook server, users most likely\n have the ability to modify the code of the notebook server. Unless other\n security measures are in place, these events should be treated as user\n controlled and not used in high security areas.\n3. Events are only recorded when an action succeeds.\n",
"type": "object",
"required": ["action", "path"],
"required": [
"action",
"path"
],
"properties": {
"action": {
"enum": [
@@ -13,18 +16,18 @@
"save",
"upload",
"rename",
"create",
"copy"
"copy",
"delete"
],
"description": "Action performed by contents manager"
"description": "Action performed by the ContentsManager API.\n\nThis is a required field.\n\nPossible values:\n\n1. get\n Get contents of a particular file, or list contents of a directory.\n\n2. create\n Create a new directory or file at 'path'. Currently, name of the\n file or directory is auto generated by the ContentsManager implementation.\n\n3. save\n Save a file at path with contents from the client\n\n4. upload\n Upload a file at given path with contents from the client\n\n5. rename\n Rename a file or directory from value in source_path to\n value in path.\n\n5. copy\n Copy a file or directory from value in source_path to\n value in path.\n\n6. delete\n Delete a file or empty directory at given path\n"
},
"path": {
"type": "string",
"description": "Logical path the action was performed in"
"description": "Logical path on which the operation was performed.\n\nThis is a required field.\n"
},
"source_path": {
"type": "string",
"description": "If action is 'copy', this specifies the source path"
"description": "Source path of an operation when action is 'copy' or 'rename'"
}
}
}
39 changes: 39 additions & 0 deletions jupyter_server/event-schemas/generate-json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
import argparse
import json
import os
import jsonschema
from ruamel.yaml import YAML

from jupyter_telemetry.eventlog import EventLog

yaml = YAML(typ='safe')

def main():
argparser = argparse.ArgumentParser()
argparser.add_argument(
'directory',
help='Directory with Schema .yaml files'
)

args = argparser.parse_args()

el = EventLog()
for dirname, _, files in os.walk(args.directory):
for file in files:
if not file.endswith('.yaml'):
continue
yaml_path = os.path.join(dirname, file)
print('Processing', yaml_path)
with open(yaml_path) as f:
schema = yaml.load(f)

# validate schema
el.register_schema(schema)

json_path = os.path.join(dirname, os.path.splitext(file)[0] + '.json')
with open(json_path, 'w') as f:
json.dump(schema, f, indent=4)

if __name__ == '__main__':
main()
79 changes: 79 additions & 0 deletions jupyter_server/event-schemas/v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"$id": eventlogging.jupyter.org/notebook/contentsmanager-actions
version: 1
title: Contents Manager activities
description: |
Record actions on files via the ContentsManager REST API.

The notebook ContentsManager REST API is used by all frontends to retreive,
save, list, delete and perform other actions on notebooks, directories,
and other files through the UI. This is pluggable - the default acts on
the file system, but can be replaced with a different ContentsManager
implementation - to work on S3, Postgres, other object stores, etc.
The events get recorded regardless of the ContentsManager implementation
being used.

Limitations:

1. This does not record all filesystem access, just the ones that happen
explicitly via the notebook server's REST API. Users can (and often do)
trivially access the filesystem in many other ways (such as `open()` calls
in their code), so this is usually never a complete record.
2. As with all events recorded by the notebook server, users most likely
have the ability to modify the code of the notebook server. Unless other
security measures are in place, these events should be treated as user
controlled and not used in high security areas.
3. Events are only recorded when an action succeeds.
type: object
required:
- action
- path
properties:
action:
enum:
- get
- create
- save
- upload
- rename
- copy
- delete
description: |
Action performed by the ContentsManager API.

This is a required field.

Possible values:

1. get
Get contents of a particular file, or list contents of a directory.

2. create
Create a new directory or file at 'path'. Currently, name of the
file or directory is auto generated by the ContentsManager implementation.

3. save
Save a file at path with contents from the client

4. upload
Upload a file at given path with contents from the client

5. rename
Rename a file or directory from value in source_path to
value in path.

5. copy
Copy a file or directory from value in source_path to
value in path.

6. delete
Delete a file or empty directory at given path
path:
type: string
description: |
Logical path on which the operation was performed.

This is a required field.
source_path:
type: string
description: |
Source path of an operation when action is 'copy' or 'rename'