top of page

Chinese Word Embedding


Use PTT (bulletin board system (BBS) in Taiwan) and Chinese Wiki corpora to build count-based and prediction-based word embeddings.


The evaluations in similarity/relatedness tasks are better than the other pre-trained word embeddings. Calculate the cosine similarity between two word vectors to get their distance in low dimensional vector space.


After that, we adopt Spearman's rank correlation coefficient as evaluation function. It calculates the rank difference of two sequences in descending order. Two sequences are highly correlated if the value is close to 1, and less correlated if it is close to -1. Then we calculate spearman scores by comparing with gold standards (human judgement).



We can see that our count-based model(the first row) outperforms other models in the relatedness tasks in the table above.

 

Reports:


More detailed information is in my Github




Comentarios

No se pudieron cargar los comentarios
Parece que hubo un problema técnico. Intenta volver a conectarte o actualiza la página.
bottom of page