-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a VTT SMA data class for clarity across scripts #26
Comments
Since we know the format, I wonder if it would be more readable to create a dataclass such that we can do: split_paths = [SMA_Record(**p.split('/')) for p in object_paths if p.find('users.txt') == -1] This way you could do, e.g. Also, it is possible to use [p.split('/') for p in object_paths if 'users.txt' not in p] Originally posted by @jawrainey in #21 (comment) |
This looks good and would be most suitable when we process the VTT data weekly. I noted in the prior PR that having a SMA_Record or such would be helpful as it would help with accessing attributes in both At the moment, def get_list(bucket: Bucket) -> dict:
"""
GET all records (metadata) from the AWS S3 bucket
NOTE: S3 folder structure is symbolic. The 'key' (str) for each file object \
represents the path. See also `download_metadata()` in devices > vttsma.py
"""
from collections import defaultdict
results = defaultdict(set)
paths = [obj.key for obj in bucket.objects.all() if 'users.txt' not in obj]
for path in paths:
export_date, _, _hash, __ = path.split('/')
results[_hash].add(export_date)
return results or similar to above but with use of dataclass, e.g., from dataclasses import dataclass
@dataclass
class SMA_Record:
export_date: str
# raw or audio
folder_name: str
hash_id: str
# patienthash.nfo/.zip/.audio?
files: str
def get_list(bucket: Bucket) -> [SMA_Record]:
"""
-
"""
results = []
paths = [obj.key for obj in bucket.objects.all() if 'users.txt' not in obj]
for path in paths:
results.append(SMA_Record(*path.split('/')))
return results Alternatively, as we're building a list via append: def get_list(bucket: Bucket) -> [SMA_Record]:
paths = [obj.key for obj in bucket.objects.all() if 'users.txt' not in obj]
return [SMA_Record(*path.split('/')) for path in paths] |
I've messed around with this for too long - and think it overcomplicates it, especially for now. I'll add my current thoughts below and push these into a separate issue for future work.
schemas > vttsma_record.py
lib > vttsma.py > get_list()
Originally posted by @davidverweij in #21 (comment)
The text was updated successfully, but these errors were encountered: