Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pango build for Nextclade #15

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from
Draft

Pango build for Nextclade #15

wants to merge 21 commits into from

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented Nov 3, 2021

Plan:

  • Join designations onto metadata
  • Find good subsampling strategy: focus on recent diversity, big parent lineages with lots of diversity like B.1.617.2 should get more sequences than narrow ones like AY.39.1.1

Not blocking but important for good coverage with lineages from countries with whitespace in their names:

  • Join using canonicalized keys
  • For Nextclade no timetree -> how does it scale with size?

resolves #14

@corneliusroemer corneliusroemer self-assigned this Nov 3, 2021
@corneliusroemer corneliusroemer added the enhancement New feature or request label Nov 3, 2021
@corneliusroemer
Copy link
Member Author

@corneliusroemer
Copy link
Member Author

Bug in preprocess on this branch, probably due to lack of key in (default) config
image

@corneliusroemer
Copy link
Member Author

corneliusroemer commented Nov 10, 2021

Mapping designation strain names to Nextstrain strain names

  • Select only columns strain name and GISAID EPI ISL from metadata
  • Normalize meta strain name down to alphanumeric -> mapping file
  • Normalize pango strain name to alphanumeric, then use mapping to translate to EPI ISL=
  • Create new pango designations file with GISAID EPI ISL for joining later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build tree with only pango designated sequences for Nextclade pango assignments
1 participant