如何获得连词在空格中的跨度？

2022-05-15 00:00:00 python nlp spacy conjunctive-normal-form

问题描述

我使用spacy，token.conjuncts来获取每个标记的合取词。

但是，token.conjuncts的返回类型是tuple，但我想获取span类型，例如：

import spacy
nlp = spacy.load("en_core_web_lg")

sentence = "I like to eat food at the lunch time, or even at the time between a lunch and a dinner"
doc = nlp(sentence)
for token in doc:
    conj = token.conjuncts
    print(conj)

#output: <class 'tuple'>

有人知道如何将此tuple转换为span类型吗？

或者我如何才能直接获取span类型的合取？

我需要span类型的原因是，我想使用conjuncts (span)来定位这个连词的位置，例如，这个连词属于哪个名词块或拆分(无论我用什么方法来拆分它们)。

目前，我将tuple转换为str以迭代所有拆分或名词块，以搜索拆分/名词块是否包含conjunct。

但是，存在错误，例如，当conjunct出现在多个拆分/名词块中时，则定位包含该conjunct的准确拆分将会出现问题。因为我只考虑str，而不考虑index或id的conjunct。如果我可以得到这个conjunct的span，那么我就可以定位conjunct的确切位置。

请随时发表意见，提前谢谢！

解决方案

token.conjuncts返回令牌的元组。若要获取范围，请调用doc[conj.i: conj.i+1]

import spacy

nlp = spacy.load('en_core_web_sm')


sentence = "I like oranges and apples and lemons."


doc = nlp(sentence)

for token in doc:
    if token.conjuncts:
        conjuncts = token.conjuncts             # tuple of conjuncts
        print("Conjuncts for ", token.text)
        for conj in conjuncts:
            # conj is type of Token
            span = doc[conj.i: conj.i+1]        # Here's span
            print(span.text, type(span))

相关文章