Grant HRCS Tagging Model

Machine learning classifier model for tagging research grants with HRCS Health Category and Research Activity Code tags based on title and grant abstract.

Developed by the Machine Learning team, within Data & Digital at the Wellcome Trust.

Data

Aknowledgement, the data used for this project was compiled as part of the UK Health Research Analysis studies: UK Health Research Analysis 2022 (UK Clinical Research Collaboration , 2023) https://hrcsonline.net/reports/analysis-reports/uk-health-research-analysis-2022/.

Set up

1. Environment set up

Start with setting up the virtual environment for this project. Make sure you have conda installed as we will use it as an environment manager. If conda is not installed, installing miniconda is a good starting point

🍏 On Mac M1 conda env create -f environment_mac.yml 🐧 On Linux conda env create -f environment.yml

The environment can be activates with conda activate hrcs_tagger

2. Downloading the dataset

To Download the UK Health Research Analysis data used for training, run:

make build_dataset

This command downloads the tagged Excel data from from https://hrcsonline.net/.
Then calls a Python script that compiles these datasets into single cleaned parquet files.
Each parquet file represents a tag type with one file for RAC division, RAC group and Health Category.
Each row represents a grant and tag combination, there can be multiple rows/tags per grant.

This make command assumes wget is installed, which on a Mac you will have to install first, brew install wget.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
config		config
data/label_names		data/label_names
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
environment_mac.yml		environment_mac.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grant HRCS Tagging Model

Data

Set up

1. Environment set up

2. Downloading the dataset

About

Contributors 2

Languages

License

wellcometrust/grant_hrcs_tagger

Folders and files

Latest commit

History

Repository files navigation

Grant HRCS Tagging Model

Data

Set up

1. Environment set up

2. Downloading the dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages