Chinese Word Embedding

櫻仁陳
2022年4月4日
讀畢需時 1 分鐘

已更新：2022年4月20日

Use PTT (bulletin board system (BBS) in Taiwan) and Chinese Wiki corpora to build count-based and prediction-based word embeddings.

The evaluations in similarity/relatedness tasks are better than the other pre-trained word embeddings. Calculate the cosine similarity between two word vectors to get their distance in low dimensional vector space.

After that, we adopt Spearman's rank correlation coefficient as evaluation function. It calculates the rank difference of two sequences in descending order. Two sequences are highly correlated if the value is close to 1, and less correlated if it is close to -1. Then we calculate spearman scores by comparing with gold standards (human judgement).

We can see that our count-based model(the first row) outperforms other models in the relatedness tasks in the table above.

Reports:

word_embedding

word_embedding_result

More detailed information is in my Github

Ying-Ren Chen 陳櫻仁

Chinese Word Embedding

Yorumlar