Skip to content

cpondoc/embedding-preference-training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embedding Preference Training

Can we train a model that is able to detect good or bad content quality?

Set-Up

Create a new virtual environment and install all required packages:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

We also need to install the custom version of MTEB, which is defined as a submodule:

pip install mteb/

To Run

Evaluating Embedding Models

For evaluation run the rerank.py script from the top directory:

python3 code/eval/rerank.py

Scraping Data

This code is still being generalized, but in general, scripts to extract and save data will be found in dataset/scraping/scrape_*.py:

python3 code/dataset/scraping/scrape_gb_wiki.py
python3 code/dataset/extract_warcs.py

Train Binary Classifier

To experiment with training binary classifiers, run the train_classifier.py script from the top directory:

python3 code/models/train_classifier.py

About

Co-Term Research with Douwe Kiela.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages