top of page

Chinese Word Embedding

  • 作家相片: 櫻仁 陳
    櫻仁 陳
  • 2022年4月4日
  • 讀畢需時 1 分鐘

已更新:2022年4月20日


Use PTT (bulletin board system (BBS) in Taiwan) and Chinese Wiki corpora to build count-based and prediction-based word embeddings.


The evaluations in similarity/relatedness tasks are better than the other pre-trained word embeddings. Calculate the cosine similarity between two word vectors to get their distance in low dimensional vector space.


After that, we adopt Spearman's rank correlation coefficient as evaluation function. It calculates the rank difference of two sequences in descending order. Two sequences are highly correlated if the value is close to 1, and less correlated if it is close to -1. Then we calculate spearman scores by comparing with gold standards (human judgement).



We can see that our count-based model(the first row) outperforms other models in the relatedness tasks in the table above.

Reports:


More detailed information is in my Github




Comments


bottom of page