Skip to content

Commit

Permalink
refactor(benchmarks) Update dependencies for LLM evaluation pipelines (
Browse files Browse the repository at this point in the history
  • Loading branch information
yan-gao-GY authored Feb 5, 2025
1 parent 0538cb6 commit 7f14165
Show file tree
Hide file tree
Showing 8 changed files with 15 additions and 15 deletions.
6 changes: 3 additions & 3 deletions benchmarks/flowertune-llm/evaluation/code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Three datasets have been selected for this evaluation: [MBPP](https://huggingfac
git clone --depth=1 https://github.com/adap/flower.git && mv flower/benchmarks/flowertune-llm/evaluation/code ./flowertune-eval-code && rm -rf flower && cd flowertune-eval-code
```

Create a new Python environment (we recommend Python 3.10), activate it, then install dependencies with:
Create a new Python environment (we recommend Python 3.11), activate it, then install dependencies with:

```shell
# From a new python environment, run:
Expand Down Expand Up @@ -40,7 +40,7 @@ sudo apt-get install g++
Then, download the `main.py` script from `bigcode-evaluation-harness` repository.

```shell
git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git && cd bigcode-evaluation-harness && git checkout 0f3e95f0806e78a4f432056cdb1be93604a51d69 && mv main.py ../ && cd .. && rm -rf bigcode-evaluation-harness
git clone https://github.com/yan-gao-GY/bigcode-evaluation-harness.git && cd bigcode-evaluation-harness && mv main.py ../ && cd .. && rm -rf bigcode-evaluation-harness
```


Expand All @@ -51,7 +51,7 @@ git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git && c
```bash
python main.py \
--model=mistralai/Mistral-7B-v0.3 \
--model=your-base-model-name \ # e.g., mistralai/Mistral-7B-v0.3
--peft_model=/path/to/fine-tuned-peft-model-dir/ \ # e.g., ./peft_1
--max_length_generation=1024 \ # change to 2048 when running mbpp
--batch_size=4 \
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/flowertune-llm/evaluation/code/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
peft==0.6.2
peft==0.14.0
datasets==2.20.0
evaluate==0.3.0
sentencepiece==0.2.0
protobuf==5.27.1
bitsandbytes==0.45.0
bitsandbytes==0.45.1
hf_transfer==0.1.8
git+https://github.com/bigcode-project/bigcode-evaluation-harness.git@6116c6a9a5672c69bd624373cfbc8938b7acc249
git+https://github.com/yan-gao-GY/bigcode-evaluation-harness.git
2 changes: 1 addition & 1 deletion benchmarks/flowertune-llm/evaluation/finance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Three datasets have been selected for this evaluation: [FPB](https://huggingface
git clone --depth=1 https://github.com/adap/flower.git && mv flower/benchmarks/flowertune-llm/evaluation/finance ./flowertune-eval-finance && rm -rf flower && cd flowertune-eval-finance
```

Create a new Python environment (we recommend Python 3.10), activate it, then install dependencies with:
Create a new Python environment (we recommend Python 3.11), activate it, then install dependencies with:

```shell
# From a new python environment, run:
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/flowertune-llm/evaluation/finance/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
peft==0.6.2
peft==0.14.0
scikit-learn==1.5.0
datasets==2.20.0
sentencepiece==0.2.0
protobuf==5.27.1
bitsandbytes==0.45.0
bitsandbytes==0.45.1
hf_transfer==0.1.8
2 changes: 1 addition & 1 deletion benchmarks/flowertune-llm/evaluation/general-nlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The [MMLU](https://huggingface.co/datasets/lukaemon/mmlu) dataset is used for th
git clone --depth=1 https://github.com/adap/flower.git && mv flower/benchmarks/flowertune-llm/evaluation/general-nlp ./flowertune-eval-general-nlp && rm -rf flower && cd flowertune-eval-general-nlp
```

Create a new Python environment (we recommend Python 3.10), activate it, then install dependencies with:
Create a new Python environment (we recommend Python 3.11), activate it, then install dependencies with:

```shell
# From a new python environment, run:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
peft==0.6.2
peft==0.14.0
pandas==2.2.2
scikit-learn==1.5.0
datasets==2.20.0
sentencepiece==0.2.0
protobuf==5.27.1
bitsandbytes==0.45.0
bitsandbytes==0.45.1
hf_transfer==0.1.8
2 changes: 1 addition & 1 deletion benchmarks/flowertune-llm/evaluation/medical/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Three datasets have been selected for this evaluation: [PubMedQA](https://huggin
git clone --depth=1 https://github.com/adap/flower.git && mv flower/benchmarks/flowertune-llm/evaluation/medical ./flowertune-eval-medical && rm -rf flower && cd flowertune-eval-medical
```

Create a new Python environment (we recommend Python 3.10), activate it, then install dependencies with:
Create a new Python environment (we recommend Python 3.11), activate it, then install dependencies with:

```shell
# From a new python environment, run:
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/flowertune-llm/evaluation/medical/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
peft==0.6.2
peft==0.14.0
pandas==2.2.2
scikit-learn==1.5.0
datasets==2.20.0
sentencepiece==0.2.0
protobuf==5.27.1
bitsandbytes==0.45.0
bitsandbytes==0.45.1
hf_transfer==0.1.8

0 comments on commit 7f14165

Please sign in to comment.