Skip to content

Commit

Permalink
Fix smart_batching_collate Inefficiency (#2556)
Browse files Browse the repository at this point in the history
* Fix smart_batching_collate Inefficiency

SentenceTransformer.py:846 throws a Inefficiency warning:

".....Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:275.) labels = torch.tensor([example.label for example in batch])"

* Update SentenceTransformer.py

* Remove some comments; add edge case (if labels is empty)

---------

Co-authored-by: Tom Aarsen <[email protected]>
  • Loading branch information
PrithivirajDamodaran and tomaarsen authored May 22, 2024
1 parent 5f75ce5 commit 684b6b5
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions sentence_transformers/SentenceTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1000,8 +1000,16 @@ def smart_batching_collate(self, batch: List["InputExample"]) -> Tuple[List[Dict
"""
texts = [example.texts for example in batch]
sentence_features = [self.tokenize(sentence) for sentence in zip(*texts)]
labels = torch.tensor([example.label for example in batch])
return sentence_features, labels
labels = [example.label for example in batch]

# Use torch.from_numpy to convert the numpy array directly to a tensor,
# which is the recommended approach for converting numpy arrays to tensors
if labels and isinstance(labels[0], np.ndarray):
labels_tensor = torch.from_numpy(np.stack(labels))
else:
labels_tensor = torch.tensor(labels)

return sentence_features, labels_tensor

def _text_length(self, text: Union[List[int], List[List[int]]]):
"""
Expand Down

0 comments on commit 684b6b5

Please sign in to comment.