Conceptual Primitives

Re-implementation of "SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings" in tensorflow for verb substitution and clustering.

Overall Framework and Algorithm

Framework:

Algorithm for context and target word (verb) embedding generation:

Training

To train the conceptual primitives model, please run:

$ python3 main.py --gpu_idx 0 1 \  # number of GPUs used for training and their indices
    --mode train \  # training model or infer
    --resume_training false \  # if true, will resume previous trained parameters
    --neg_sample 10 \  # number of negative samples
    --word_dim 300 \  # input pre-trained / randomly initialized word embedding dimension
    --num_units 300 \  # number of units for rnn cell and hidden layer of feed-forward network
    --k 100 \  # number of units for output layer
    --use_ntn false \  # if use neural tensor network to fuse left and right contexts, otherwise just simply concatenate them
    --tune_emb false \  # whether the input word embedding are tunable while training
    --lr 0.0001 \  # learning rate
    --decay_step 10000 \  # learning rate decay step
    --decay_rate 0.9994 \  # decay rate
    --batch_size 1000 \  # batch size
    --epochs 30 \  # total training epochs
    --ckpt ckpt/ \  # checkpoint path to save model
    --max_to_keep 3 \  # maximal checkpoints can be saved
    --model_name conceptual_primitives \  # model name
    --save_step 10000 \  # save models per steps
    --print_step 1000 \  # show sample test result per steps
    --ukwac_path <raw ukwac dataset path> \  # raw ukwac dataset path
    --glove_path <pre-trained glove embedding path> \  # pre-trained glove word embedding path
    --save_path <processed data save path> \  # path for saving processed dataset
    --word_threshold 90 \  # word threshold, minimal occurrence of words to be kept
    --word_lowercase true  # whether lowercase the text

Inferring

An example for inferring, giving a sentence "When idle, Dave enjoys eating cake with his sister." and a target verb "eating", and the model will return the top N substitutes.

$ python3 main.py --gpu_idx 0 1 --mode infer --use_ntn false
restored model from conceptual_primitives-1000000, done...
Top 10 canidates:
['nibbling', 'drinking', 'munching', 'snacking', 'feeding', 'gorging', 'tasting', 'swallowing', 'chewing', 'feasting']

Reference

SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings.
The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora.
ukWaC dataset: a 2 billion word corpus constructed from the Web limiting the crawl to the .uk domain and using medium-frequency words from the BNC as seeds. The corpus was POS-tagged and lemmatized with the TreeTagger.
ukWaC TagSet: explanation of POS tag meaning used in UKWAC dataset.
jungokasai/skipgram: refer the negative sampling and loss function design.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
models		models
utils		utils
v2		v2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conceptual Primitives

Overall Framework and Algorithm

Training

Inferring

Reference

About

Releases

Packages

Languages

License

26hzhang/ConceptualPrimitives

Folders and files

Latest commit

History

Repository files navigation

Conceptual Primitives

Overall Framework and Algorithm

Training

Inferring

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages