Gensim列车不更新权重
问题描述
我有一个特定于领域的语料库,我正在尝试为其训练嵌入。因为我想全面掌握词汇,所以我添加了glove.6B.50d.txt
中的单词向量。从这里添加向量后,我正在使用我拥有的语料库训练模型。
我正在尝试here中的解决方案,但单词嵌入似乎没有更新。
这是我到目前为止拥有的解决方案。
#read glove embeddings
glove_wv = KeyedVectors.load_word2vec_format(GLOVE_PATH, binary=False)
#initialize w2v model
model = Word2Vec(vector_size=50, min_count=0, window=20, epochs=10, sg=1, workers=10,
hs=1, ns_exponent=0.5, seed=42, sample=10**-2, shrink_windows=True)
model.build_vocab(sentences_tokenized)
training_examples_count = model.corpus_count
# add vocab from glove
model.build_vocab([list(glove_wv.key_to_index.keys())], update=True)
model.wv.vectors_lockf = np.zeros(len(model.wv)) # ALLOW UPDATE OF WEIGHTS FROM BACK PROP; 0 WILL SUPPRESS
# add glove embeddings
model.wv.intersect_word2vec_format(GLOVE_PATH,binary=False, lockf=1.0)
下面我正在训练模型并检查训练中明确出现的特定单词的单词嵌入
# train model
model.train(sentences_tokenized,total_examples=training_examples_count, epochs=model.epochs)
#CHECK IF EMBEDDING CHANGES FOR 'oyo'
print(model.wv.get_vector('oyo'))
print(glove_wv.get_vector('oyo'))
单词oyo
的单词嵌入在训练前后是相同的。我哪里错了?
输入语料库sentences_tokenized
包含几个包含单词oyo
的句子。其中一句话--
'oyo global platform empowers entrepreneur small business hotel home providing full stack technology increase earnings eas operation bringing affordable trusted accommodation guest book instantly india largest budget hotel chain oyo room one preferred hotel booking destination vast majority student country hotel chain offer many benefit include early check in couple room id card flexibility oyo basically network budget hotel completely different famous hotel aggregator like goibibo yatra makemytrip partner zero two star hotel give makeover room bring customer hotel website mobile app'
解决方案
您在这里即兴创作了很多潜在的错误或次优化。请特别注意:
相关文章