国立国語研究所
 
 

Lexical Item Charts for the "Corpus of Historical Japanese"

Word counts of individual lexemes (and word counts of lexical type and of part of speech) for the data collected in the "Corpus of Historical Japanese" have been arranged by historical period and by literary work.

They can be downloaded through the following links.

Lexical Item Charts for the "Corpus of Historical Japanese"(Version 2024.03)

Word Count Chart

The word counts for the data collected in the "Corpus of Historical Japanese" are presented in the following files. Word frewuency (both including punctuation marks and not including punctuation marks) have been arranged according to Sample ID, Core/non-core, Main text type (including quotations), and Style.

Data on the Word Count Chart for Short Unit Words can be downloaded through the following links.

Download Word Count Chart for Short Unit Words (Version 2024.03)

Download Word Count Chart for Long Unit Words (Version 2024.03)

Lexical Statistics: Version 2024.03

This page presents Word Count Charts and Lexical Item Charts for the "Corpus of Historical Japanese" Version 2024.03.

  • Word Count Charts assemble word counts (in two types: word counts including punctuation marks, and word counts not including punctuation marks) according to Sample ID, Core/non-core data, Main text type, and Style.
  • The Lexical Item Charts assemble data on word counts for lemmas according to Historical period and literary work. For each work in each historical period, there is information on how many times each lemma is attested, and there is a filter such as those in Excel and other applications, allowing data to be sorted and refined by reference to such factors as part of speech and lexical type.

In the "Corpus of Historical Japanese" Version 2024.03 there are presented 20,910,000 Short Unit Words, and 2,880,000 Long Unit Words. The following sets out the breakdown for the individual sub-corpora.

  • Word Count of each sub-corpus
PeriodSub-corpusShort Unit WordLong Unit Word
Nara periodNara Period Series I: Man'yōshū99,00094,000
Nara Period Series II: Senmyō21,00017,000
Nara Period Series III: Norito11,0009,000
Heian periodHeian Period Series I: Kana literature1,030,000912,000
Heian Period Series II: Kunten materials10,000-
Heian period / Kamakura periodWaka-shū Series269,000252,000
Kamakura periodKamakura Period Series I: Folktales and Essays844,000792,000
Kamakura Period Series II: Diaries and Travel Literature128,000118,000
Kamakura Period Series III: Military Chronicles331,000291,000
Muromachi PeriodMuromachi Period Series I: Kyōgen277,000256,000
Muromachi Period Series II: Christian Materials149,000139,000
Edo PeriodEdo Period Series I: Share-bon218,000
Edo Period Series II: Ninjo-bon406,000
Edo Period Series III: Chikamatsu-Joruri255,000
Edo Period Series IV: Essays and Travel Literature128,000
Meiji Era / Taishō Era / Shōwa EraMeiji Era / Taishō Era Series I: Magazines14,180,000
Meiji Era / Taishō Era Series II: Textbooks1,058,000
Meiji Era / Taishō Era Series III: Early Meiji Spoken Language Materials193,000
Meiji Era / Taishō Era Series IV: Modern Novels779,000
Meiji Era / Taishō Era Series V: Newspapers407,000
Meiji Era / Taishō Era Series VI: Rakugo 78 rpm Discs104,000
 
 
event
unidic_bnr

日本語をはじめとする言語を分析するための基礎資料として、書き言葉や話し言葉の資料を体系的に収集し、研究用の情報を付与したものです。