You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR:sft_trainer.py:multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3558, in _map_single
batch = apply_function_on_filtered_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3427, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/data/data_handlers.py", line 96, in apply_dataset_formatting
f"{dataset_text_field}": element[f"{dataset_text_field}"] + tokenizer.eos_token
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate list (not "str") to list
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 650, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 317, in train
) = process_dataargs(
^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/data/setup_dataprocessor.py", line 348, in process_dataargs
train_dataset, eval_dataset, dataset_text_field = _process_dataconfig_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/data/setup_dataprocessor.py", line 71, in _process_dataconfig_file
train_dataset = processor.process_dataset_configs(data_config.datasets)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/data/data_processors.py", line 322, in process_dataset_configs
train_dataset = self._process_dataset_configs(dataset_configs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/data/data_processors.py", line 273, in _process_dataset_configs
raw_datasets = raw_datasets.map(handler, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/dataset_dict.py", line 869, in map
{
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
k: dataset.map(
^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3259, in map
for rank, done, content in iflatmap_unordered(
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/home/tuning/.local/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 718, in <listcomp>
[async_result.get(timeout=0.05) for async_result in async_results]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
TypeError: can only concatenate list (not "str") to list
The text was updated successfully, but these errors were encountered:
HarikrishnanBalagopal
changed the title
fails to process data when batched is set to true
bug: crash on trying to process data when batched is set to true
Jan 25, 2025
This line fails when
batched
is set to true:fms-hf-tuning/tuning/data/data_handlers.py
Line 96 in f22e243
The text was updated successfully, but these errors were encountered: