■■ 短単位自動解析性能評価結果 ■■ 評価には以下のスクリプトを使用 https://github.com/teru-oka-1933/unidic_ma_factory https://github.com/teru-oka-1933/unidic_ma_factory/blob/master/ph7_evaluation.py スクリプト中では形態素解析器性能評価システムMevALを使用している https://teru-oka-1933.github.io/meval/ 評価用データはtrain_list.tsv(9):test_list.tsv(1)のサイズで実施。 結果の見方は、https://teru-oka-1933.github.io/meval/#scorer を参照。 フィールド番号: 1 品詞大分類(pos1) 2 品詞中分類(pos2) 3 品詞小分類(pos3) 4 品詞細分類(pos4) 5 活用型(cType) 6 活用形(cForm) 7 語彙素読み(lForm) 8 語彙素(lemma) 10 発音形出現形(pron) 性能評価レベル LEVEL 0 : 表層形認定性能 LEVEL 1 : 品詞認定性能 LEVEL 2 : 語彙素認定性能 LEVEL 3 : 発音形認定性能 ■■ フルサイズの語彙を使った評価結果 ■■ ======================================== MEVAL SCORER ======================================== Gold: train_test_mecab/all.test.mecab Pred: eval_work/all_lex/pred.mecab ---------------------------------------- Field Num LEVEL 0 : 0 LEVEL 1 : +1+2+3+4+5+6 LEVEL 2 : +7+8 LEVEL 3 : +10 ---------------------------------------- Sentence Num: 24146 Gold Word Num (GLD): 246268 Pred Word Num (PRD): 245993 Character Num: 400759 ======================================== LEVEL 0 : 0 ======================================== Correctly Analysed Sentences: 96.67% (23342/24146 = 0.9667026) ---------------------------------------- COR : 244126 ---------------------------------------- Prec. : 99.24% (244126/245993 = 0.99241036) Rec. : 99.13% (244126/246268 = 0.99130213) F : 99.19 (0.991856) ======================================== LEVEL 1 : +1+2+3+4+5+6 ======================================== Correctly Analysed Sentences: 87.35% (21091/24146 = 0.873478) ---------------------------------------- COR : 240573 ---------------------------------------- Prec. : 97.8% (240573/245993 = 0.97796685) Rec. : 97.69% (240573/246268 = 0.97687477) F : 97.74 (0.9774205) ======================================== LEVEL 2 : +7+8 ======================================== Correctly Analysed Sentences: 85.22% (20578/24146 = 0.8522323) ---------------------------------------- COR : 239721 ---------------------------------------- Prec. : 97.45% (239721/245993 = 0.97450334) Rec. : 97.34% (239721/246268 = 0.97341514) F : 97.4 (0.97395897) ======================================== LEVEL 3 : +10 ======================================== Correctly Analysed Sentences: 82.4% (19896/24146 = 0.8239874) ---------------------------------------- COR : 238471 ---------------------------------------- Prec. : 96.94% (238471/245993 = 0.9694219) Rec. : 96.83% (238471/246268 = 0.9683394) F : 96.89 (0.96888036) ■■ 辞書未登録の表記が現れた時の性能評価 ■■ 学習用データには出現せず、評価用データにのみ出現した書字形出現形のエントリ(lex.csvファイルの行)をlex.csvファイルから除外して学習・評価した結果 ======================================== MEVAL SCORER ======================================== Gold: train_test_mecab/all.test.mecab Pred: eval_work/shallow_unk/pred.mecab ---------------------------------------- Field Num LEVEL 0 : 0 LEVEL 1 : +1+2+3+4+5+6 LEVEL 2 : +7+8 LEVEL 3 : +10 ---------------------------------------- Sentence Num: 24146 Gold Word Num (GLD): 246268 Pred Word Num (PRD): 248704 Character Num: 400759 ======================================== LEVEL 0 : 0 ======================================== Correctly Analysed Sentences: 88.71% (21420/24146 = 0.88710344) ---------------------------------------- COR : 240818 ---------------------------------------- Prec. : 96.83% (240818/248704 = 0.96829164) Rec. : 97.79% (240818/246268 = 0.97786963) F : 97.31 (0.9730571) ======================================== LEVEL 1 : +1+2+3+4+5+6 ======================================== Correctly Analysed Sentences: 79.54% (19205/24146 = 0.7953698) ---------------------------------------- COR : 236149 ---------------------------------------- Prec. : 94.95% (236149/248704 = 0.9495183) Rec. : 95.89% (236149/246268 = 0.95891064) F : 95.42 (0.9541914) ======================================== LEVEL 2 : +7+8 ======================================== Correctly Analysed Sentences: 77.67% (18753/24146 = 0.77665037) ---------------------------------------- COR : 235012 ---------------------------------------- Prec. : 94.49% (235012/248704 = 0.9449466) Rec. : 95.43% (235012/246268 = 0.95429367) F : 94.96 (0.9495971) ======================================== LEVEL 3 : +10 ======================================== Correctly Analysed Sentences: 75.46% (18220/24146 = 0.7545763) ---------------------------------------- COR : 233702 ---------------------------------------- Prec. : 93.97% (233702/248704 = 0.9396793) Rec. : 94.9% (233702/246268 = 0.9489743) F : 94.43 (0.944304) ■■ 辞書未登録の表記が現れた時の性能評価 ■■ 学習用データには出現せず、評価用データにのみ出現した語形基本形(語彙素読み(lForm)+語彙素(lemma)+語彙素類(lType)+語形基本形(formBase))に対し、階層的な見出し構造の木構造において当該語形基本形以下のすべてのエントリを除外して学習・評価 ======================================== MEVAL SCORER ======================================== Gold: train_test_mecab/all.test.mecab Pred: eval_work/middle_unk/pred.mecab ---------------------------------------- Field Num LEVEL 0 : 0 LEVEL 1 : +1+2+3+4+5+6 LEVEL 2 : +7+8 LEVEL 3 : +10 ---------------------------------------- Sentence Num: 24146 Gold Word Num (GLD): 246268 Pred Word Num (PRD): 248170 Character Num: 400759 ======================================== LEVEL 0 : 0 ======================================== Correctly Analysed Sentences: 90.24% (21789/24146 = 0.9023855) ---------------------------------------- COR : 241572 ---------------------------------------- Prec. : 97.34% (241572/248170 = 0.9734134) Rec. : 98.09% (241572/246268 = 0.98093134) F : 97.72 (0.97715795) ======================================== LEVEL 1 : +1+2+3+4+5+6 ======================================== Correctly Analysed Sentences: 81.62% (19707/24146 = 0.81616) ---------------------------------------- COR : 237401 ---------------------------------------- Prec. : 95.66% (237401/248170 = 0.9566064) Rec. : 96.4% (237401/246268 = 0.9639945) F : 96.03 (0.96028626) ======================================== LEVEL 2 : +7+8 ======================================== Correctly Analysed Sentences: 79.65% (19232/24146 = 0.79648805) ---------------------------------------- COR : 236322 ---------------------------------------- Prec. : 95.23% (236322/248170 = 0.9522585) Rec. : 95.96% (236322/246268 = 0.9596131) F : 95.59 (0.95592165) ======================================== LEVEL 3 : +10 ======================================== Correctly Analysed Sentences: 77.36% (18679/24146 = 0.7735857) ---------------------------------------- COR : 235059 ---------------------------------------- Prec. : 94.72% (235059/248170 = 0.9471693) Rec. : 95.45% (235059/246268 = 0.9544845) F : 95.08 (0.9508129) ■■ 完全な未知語(=新語)が現れた時の性能評価 ■■ 学習用データには出現せず、評価用データにのみ出現した語彙素(語彙素読み(lForm)+語彙素(lemma)+語彙素類(lType))について、当該語彙素が階層的な見出し構造においてrootとなるすべてのエントリを除外して学習・評価 ======================================== MEVAL SCORER ======================================== Gold: train_test_mecab/all.test.mecab Pred: eval_work/deep_unk/pred.mecab ---------------------------------------- Field Num LEVEL 0 : 0 LEVEL 1 : +1+2+3+4+5+6 LEVEL 2 : +7+8 LEVEL 3 : +10 ---------------------------------------- Sentence Num: 24146 Gold Word Num (GLD): 246268 Pred Word Num (PRD): 248148 Character Num: 400759 ======================================== LEVEL 0 : 0 ======================================== Correctly Analysed Sentences: 90.35% (21817/24146 = 0.9035451) ---------------------------------------- COR : 241651 ---------------------------------------- Prec. : 97.38% (241651/248148 = 0.97381806) Rec. : 98.13% (241651/246268 = 0.98125213) F : 97.75 (0.97752094) ======================================== LEVEL 1 : +1+2+3+4+5+6 ======================================== Correctly Analysed Sentences: 81.73% (19734/24146 = 0.8172782) ---------------------------------------- COR : 237482 ---------------------------------------- Prec. : 95.7% (237482/248148 = 0.9570176) Rec. : 96.43% (237482/246268 = 0.9643234) F : 96.07 (0.96065664) ======================================== LEVEL 2 : +7+8 ======================================== Correctly Analysed Sentences: 79.79% (19266/24146 = 0.79789615) ---------------------------------------- COR : 236428 ---------------------------------------- Prec. : 95.28% (236428/248148 = 0.9527701) Rec. : 96.0% (236428/246268 = 0.96004355) F : 95.64 (0.956393) ======================================== LEVEL 3 : +10 ======================================== Correctly Analysed Sentences: 77.54% (18722/24146 = 0.77536654) ---------------------------------------- COR : 235206 ---------------------------------------- Prec. : 94.78% (235206/248148 = 0.94784564) Rec. : 95.51% (235206/246268 = 0.95508146) F : 95.14 (0.95144975)