利用深度学习突出句子中的重要单词

我试图强调IMDB数据集中的重要词汇,这最终有助于情绪分析预测。

数据集类似于:

x_train-作为字符串的评论。

Y_系列-0或1

现在,在使用手套嵌入来嵌入x列值之后,我可以将其输入神经网络。

现在我的问题是,我如何强调最重要的词概率的意义?就像Deepmoji.mit.edu?

我试过什么:

我试着把输入的句子分成两个句子,然后用1dcn来训练它。稍后,当我们想找到x_检验的重要词时,我们只需将x_检验拆分成两个大公数,并找出它们的概率。它起作用,但不准确。
我尝试使用预先构建的分级关注网络并成功了。我得到了我想要的,但我不能从代码中找出每一行和每一个概念,对我来说就像一个黑匣子。

我知道神经网络是如何工作的,我可以用numpy和手工从头开始的反向传播对其进行编码。我对LSTM的工作原理以及忘记、更新和输出门实际输出有详细的了解。但是我仍然不知道如何提取注意力权重以及如何将数据作为一个三维数组(我们的二维数据中的时间步长是什么?)

所以,欢迎任何形式的指导

这里有一个关注的版本(不是层次结构),但您应该能够了解如何使它也与层次结构一起工作-如果不是,我也可以提供帮助。诀窍是定义2个模型,并使用1作为训练(模型)和另一个模型来提取注意力值(带注意力输出的模型):

# Tensorflow 1.9; Keras 2.2.0 (latest versions)
# should be backwards compatible upto Keras 2.0.9 and tf 1.5
from keras.models import Model
from keras.layers import *
import numpy as np

dictionary_size=1000

def create_models():
  #Get a sequence of indexes of words as input:
  # Keras supports dynamic input lengths if you provide (None,) as the 
  #  input shape
  inp = Input((None,))
  #Embed words into vectors of size 10 each:
  # Output shape is (None,10)
  embs = Embedding(dictionary_size, 10)(inp)
  # Run LSTM on these vectors and return output on each timestep
  # Output shape is (None,5)
  lstm = LSTM(5, return_sequences=True)(embs)
  ##Attention Block
  #Transform each timestep into 1 value (attention_value) 
  # Output shape is (None,1)
  attention = TimeDistributed(Dense(1))(lstm)
  #By running softmax on axis 1 we force attention_values
  # to sum up to 1. We are effectively assigning a "weight" to each timestep
  # Output shape is still (None,1) but each value changes
  attention_vals = Softmax(axis=1)(attention)
  # Multiply the encoded timestep by the respective weight
  # I.e. we are scaling each timestep based on its weight
  # Output shape is (None,5): (None,5)*(None,1)=(None,5)
  scaled_vecs = Multiply()([lstm,attention_vals])
  # Sum up all scaled timesteps into 1 vector 
  # i.e. obtain a weighted sum of timesteps
  # Output shape is (5,) : Observe the time dimension got collapsed
  context_vector = Lambda(lambda x: K.sum(x,axis=1))(scaled_vecs)
  ##Attention Block over
  # Get the output out
  out = Dense(1,activation='sigmoid')(context_vector)

  model = Model(inp, out)
  model_with_attention_output = Model(inp, [out, attention_vals])
  model.compile(optimizer='adam',loss='binary_crossentropy')
  return model, model_with_attention_output

model,model_with_attention_output = create_models()


model.fit(np.array([[1,2,3]]),[1],batch_size=1)
print ('Attention Over each word: ',model_with_attention_output.predict(np.array([[1,2,3]]),batch_size=1)[1])

输出将是numpy数组,每个单词的注意力值越高,单词越重要。

编辑:您可能希望用EMB替换乘法中的lstm,以获得更好的解释,但这将导致更差的性能…