Skip to content

How to include context around a span into span categorization (label). #12580

Discussion options

You must be logged in to vote

The spancat component uses the context-sensitive tensors for the first and last tokens in the span, either from the tok2vec component or from the transformer component. If tok2vec is a separate pipeline component, you can inspect this in doc.tensor and see that the tensor depends on the surrounding context:

import spacy
nlp = spacy.load("en_core_web_sm")
print(nlp("I found USD 2")[2].tensor)
print(nlp("I found a postcard worth USD 2")[5].tensor)

The amount of context is defined by window_size in the tok2vec.encode config as described here: https://spacy.io/api/architectures#HashEmbedCNN

I realize this is a toy example, but it does sound like the model will struggle to make this distinctio…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@DerDiego13
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / pipeline Feature: Processing pipeline and components feat / spancat Feature: Span Categorizer
2 participants