This page explains the column-separated search results in the online search engine “中納言(Chūnagon)” and in files of research results downloaded therefrom, with particular focus on items independently established for the Corpus of Historical Japanese and crucial to its use.
Separately from specifications such as “~時代編 (~jidaihen ‘~Period Series’)” that form part of the names for each of the corpora, we also provide a categorization based on historical periods and era names, presented in the table below:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|
奈良 Nara | 平安 Heian | 鎌倉 Kamakura | 室町 Muromachi | 江戸 Edo | 明治 Meiji | 大正 Taishō | 昭和 Shōwa |
For example, data for textbooks from the Meiji Era to the Shōwa Era are included in the “明治・大正編Ⅱ教科書 (‘Meiji Era / Taishō Era Series II: Textbooks’)” corpus, but specifications for sub-periods are included, such that Period I (1904) and Period II (1910) are annotated as “6明治 (‘6 Meiji’)”, Period III (1918) is annotated “7大正 (‘7 Taishō’)”, and Period IV (1933), Period V (1941), and Period VI (1947) are annotated “8昭和 (‘8 Shōwa’)”. In this way it is possible to re-order and refine data according to historical period.
Each data sample (volume, chapter, article, etc.) of a given corpus is individually annotated with a 15-place (characters and digits) ID code string specifying the historical period and the textual source of the data. The respective places in the code string express the following types of information:
places 1-2 | place 3 | places 4-5 | places 6-9 | place 10 | places 11-15 |
---|---|---|---|---|---|
Era (conforming to the “~Series” specification | Genre | ID of the textual source | Time of production | Demarcation symbol | Serial number specifying position in the text |
For example, in the ID code “60M明六1874_01001”, the first two places “60” stands for “Meiji / Taishō Period Series”, “M” stands for “Magazine”, 明六” stands for “Meiroku Zasshi”, “1874” stands for the year “1874”, underbar “_” specifies the column separator, and “_01001” indicates the first article of the first volume.
The sample ID codes for each of the sub-corpora are annotated according to the following standards:
Sub-corpus | places 1-2 | place 3 | places 4-5 | places 6-9 | place 10 | places 11-15 |
---|---|---|---|---|---|---|
Era (conforming to the “~Series” specification) | Genre | ID of the textual source | Time of production | - | Serial number specifying position in the text | |
Nara Period Series I: Man'yōshū | 10 (Nara) | - | 万葉 (Man’yō) | 0759 | - | 5-digit volume number |
Nara Period Series II: Senmyō | 10 (Nara) | - | 宣命 (Senmyō) | 0797 | - | 2-digit Shoku-Nihongi + 3-digit edict serial number |
Nara Period Series III: Norito | 10 (Nara) | - | 祝詞 (Norito) | 0927 | - | 5-digit Norito serial number |
Heian Period Series I: Kana literature | 20 (Heian) | - | 2-character source title acronym | production of each text | - | 5-digit serial number (For the Ōkagami only, 2-digit section and 3-digit file serial numbers) |
Heian Period Series II: Kunten materials | 20 (Heian) | K (Kunten materials) | 西金 (Sai-kon) | 0803 | - | 2-digit volume number + 3-digit chapter number |
Kamakura Period Series I: Folktales and Essays | 30 (Kamakura) | - | 2-character source title acronym | Year of production of each text | - | 2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number |
Kamakura Period Series II: Diaries and Travel Literature | 30 (Kamakura) | - | 2-character source title acronym | Year of production of each text | - | 2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number |
Kamakura Period Series III: Military Chronicles | 30 (Kamakura) | - | 2-character source title acronym | Year of production of each text | - | Volume 2 digits + file serial number 3 digits |
Muromachi Period Series I: Kyōgen | 40 (Muromachi) | - | 虎明 (Toraakira-bon) | 1642 | - | Volume 2 digits + file serial number 3 digits |
Muromachi Period Series II: Christian Materials (Kirishitan Shiryō) | 40 (Muromachi) | - | 2 characters from source title | Year of production of each text | - | 2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number |
Edo Period Series I: Share-bon | 52 (Edo / Latter Modern) | - | 洒落 (Share-bon) | Year of production of each text | - | 2-digit region ID + 3-digit file serial number |
Edo Period Series II: Ninjo-bon | 53 (Edo / Late Modern) | - | 人情 (Ninjō-bon) | Year of production of each text | - | 2-digit source serial number + 3-digit volume serial number |
Edo Period Series III: Chikamatsu-Joruri | 51 (Edo / Early Modern) | - | 近松 (Chikamatsu-Joruri) | Year of production of each text | - | 2-digit source serial number + 3-digit volume serial number |
Edo Period Series IV: Essays and Travel Literature | 51 (Edo / Early Modern) | - | 芭蕉 (Bashō) | Year of production of each text | - | 2-digit source serial number (in order of year of production) + 3-digit serial numbers for items produced in the same year |
Meiji Era / Taishō Era Series I: Magazines | 60 (Meiji / Taishō) | M (magazine) | Source title acronym 2 characters | Year of production of each text | - | 2-digit volume serial number + 3-digit article serial number |
Meiji Era / Taishō Era Series II: Textbooks | 60 (Meiji / Taishō) | T (textbook) | Source title acronym 2 characters | Year of production of each text | - | 1-digit period number + 1-digit grade number + 1-digit volume number + 2-digit article serial number |
Meiji Era / Taishō Era Series III: Early Meiji Spoken Language Materials | 60 (Meiji / Taishō) | C (Colloquial) | 口語 (Colloquial) | Year of production of each text | - | 2-digit material serial number + 1- digit material-internal edition number + 2-digit material- and edition-internal serial number |
Meiji Era / Taishō Era Series IV: Modern Novels | 60 (Meiji / Taishō) | N (novel) | 2-character source title acronym | Year of production of each text | - | 1-digit book number + 1-digit part number + 3-digit chapter serial number |
Meiji Era / Taishō Era Series V: Newspapers | 60 (Meiji / Taishō) | P (paper) | 読売 (Yomiuri) | Year of each volume | - | 1-digit month number (in Base32) + 1-digit day number (in Base32) + 3-digit article serial number |
Meiji Era / Taishō Era Series IV: Rakugo 78 rpm Discs | 60 (Meiji / Taishō) | R (rakugo) | 2-character abbreviation for rakugo | Year of production of each text | - | 2-digit region number + 3-digit intra-region file serial number |
Waka-shū Series | 20 (Heian) | W (waka-shū) | 2-character source title acronym | Year of production of each text | - | 2-digit volume number + 3-digit file serial number |
30 (Kamakura) |
The “開始位置 (kaishi ichi ‘starting location’)” is an ID that indicates the position of the first character of any word that satisfies the description of the “キー (kii ‘key’)”. Each character in a given text is annotated with a number, forming a “+10” arithmetic sequence. When searching by location, it is possible to uniquely identify an example by combining this number with the ID of the sample.
The “連番 (renban ‘serial number’)” is an ID that indicates the location of a word that corresponds to the “キー (kii ‘key’)”. In contrast to “開始位置 (kaishi ichi ‘starting locations’)”, serial numbers are annotated onto short unit words or long unit words to form a “+10” arithmetic sequence, without reference to the number of characters in the “key”.
The value for the “コア(koa ‘core’)” category indicates whether the sample containing a search result is either a part of the core data (the set of data for which each text has received full hand correction) or a part of the non-core data (data including parts that are either partially hand corrected or left untouched after morphological parsing). A value of “1” indicates status as core data while a value of “0” indicates non-core status.
Values in this category indicate the distinction between main text (the principal reading) and alternative text (alternative readings). Given that there are character strings having information for two or more possible readings or meanings, such as kakekotoba ‘pivot words’, records coded with multiple morphological information are annotated with numbers for their respective statuses. A value of “1” indicates status as main text, while a value of “0” indicates alternative text status.
With regard to records having multiple assignments of morphological information, the factor according to which that multiple assignment was carried out is indicated by values in the category of “多重化種別 (tajūkasyubetsu ‘types of multiple analyses’)”. Types such as “掛詞(kakekotoba ‘pivot word’)” and “振り仮名 (furigana ‘ruby text’)”, etc. are indicated.
Entries in the “語彙素 (goiso ‘lexeme’)” column express the way the dictionary heading corresponding to a word satisfying the description of the “key” is written. Because a lexeme is equivalent to the heading of a dictionary heading, unifying all the variations in form (morphological form, inflection, orthographical form, etc.) that a given word may take, common Native Japanese words (wago) and Sino-Japanese words (kango) are written in kanji / hiragana, while loan words, personal names, place names, and the like are written in katakana (for example, “国 [kuni ‘country’]”, “国家 [kokka ‘nation’]”, “カントリー [kantorii ‘country’]”, “日本 [nihon ‘Japan’]”).
An entry in the “語彙素読み (goiso yomi ‘lexeme reading’)” column indicates the reading for any word fitting the description of the “key” (see the entry for “lexeme”). It is written in katakana. (Examples, “クニ [kuni ‘country’]”, “コッカ [kokka ‘nation’]”, “カントリー [kantorii ‘country’]”, “ニホン [nihon ‘Japan’]”).
An entry in the “語形 (gokei ‘morphological form’)” column indicates the sound shape of any word fitting the description of the “キー (kii ‘key’)”. A morphological form might be equivalent to one of the variant forms unified under a given lexeme (for example, the forms yahari and yappari unified under the lexeme “矢張り [‘as expected’]”), or equivalent to any one of the various inflections that might be unified under a given lexeme (for example, “yomu [pentagrade - m consonant]”, “yomu [Classical Japanese quadrigrade - m consonant]”, “yomeru [lower monograde-m consonant; potential verb form]”, etc.), unified under the lexeme “読む (yomu ’read’)”. Morphological forms are written in katakana.
For any word fitting the description of the “キー (kii ‘key’)”, and entry in the “品詞 (hinshi ‘part of speech’)” column indicates the information about how the UniDic system analyzes the part of speech for that word. Care must be taken in that UniDic adopts its own particular part of speech system in some cases, so that, for example, a word that falls under the category of “形容動詞 (keiyōdōshi ‘adjectival verb’)” in school grammar is, in UniDic, split into a word stem and an inflecting suffix, respectively analyzed as a “形状詞 (keijōshi ‘adjectival noun’)” and a “助動詞 (jōdōshi ‘auxiliary verb’)”. Furthermore, in the case of short unit words, for example, a word such as “朝 (asa ‘morning’)” is annotated with the part of speech information “名詞-普通名詞-副詞可能 (‘noun-common noun - adverbial use possible’)”. Annotations such as these, having sub-categories under a main class (in this case the class of “nouns”), indicate possibilities under that class. Thus, for a word such as the example “朝 (asa ‘morning’)” above, both tokens used as nouns and tokens used as adverbs are given the same (generalized) part-of-speech information.
This category appears only in the case of inflecting words, and its values indicate the inflectional class of the word satisfying the description in the “キー (kii ‘key’)”. The inflectional classes of contemporary spoken Japanese are presented simply by grade and consonant, as, for example, in “五段-サ行 (‘pentagrade - s consonant’)”, while for the inflectional classes of Classical Japanese, the qualifier “文語 (‘Classical Japanese’)” is added, as in “文語四段-サ行 (‘Classical Japanese - quadrigrade - s consonant’)”.
This category appears only in the case of inflecting words, and for short unit words included in the“キー (kii ‘key’)”, the particular value displayed indicates the inflectional form of the token in question. Note that, while in school grammar, the volitional conjectural form is presented divided into “未然形 (‘irrealis’)” and auxiliary verb “-u/-yō”, UniDic unifies this into a “意志推量形(ishisuiryōkei ‘volitional conjectural inflection’)”.
Things such as mistaken characters, missing characters (e.g., haplography), superfluous characters (e.g., dittography), and missing voice diacritics have been corrected in the texts of the corpora. Furthermore, depending on the sub-corpus in question, various revisions have been made, such as the conversion of writing that mixes kanji with katakana into writing that mixes kanji with hiragana, or the spelling out of strings employing odoriji ‘repetition marks’ (such as “〳〵” and “々々”), etc. Columns headed by either “原文文字列 (genbun mojiretsu ‘original text character string’)” or “原文kwic (genbun kwic ‘original text key word in context’)” display text as it appeared prior to revision.
Furthermore, for the 室町時代編Ⅱキリシタン資料 (Muromachi jidai-hen II Kirishitan Shiryō ‘Muromachi Period Series II Christian Materials’), as an exceptional case, the original text is presented in Romanized form.
For any string corresponding to the “キー (kii ‘key’)”, under the “振り仮名 (furigana ‘ruby text’)” column is displayed the text (after correction) of any right-hand ruby text and any right-hand (or alternatively, overhead) inter-linear notes added to the main line of text. Left-hand ruby text does not appear in the search results of Chūnagon.
Values in this column indicate the features (“地の文 [ji no bun ‘narrative, expository’]”, “会話 [kaiwa ‘conversation’]”, etc.) of the section of the main text in which the search result appears.
When the cell in the “本文種別 (honbun shubetsu ‘main text type’)” column is blank, that indicates that the search result appears in a section forming 地の文 (ji no bun ‘an expository or narrative section’). When the object of search appears in a section that forms a conversation, the annotation is “会話 (kaiwa ‘conversation’)”; when the object of search appears in a section that makes up a quotation from a book or a letter, etc., the annotation is “引用 (in’yō)”. Furthermore, depending on the sub-corpus, there are cases peculiar to a genre where annotations such as “歌 (uta ‘poem’)” or “詞書 (kotobagaki ‘preface’)” are supplied (for details, refer to the documentation for individual Period Series).
In the “話者 (washa ‘speaker’)” column, are indicated the speaker in a conversation, the author of a quoted text, or the title of a quoted text. In the case of “和歌 (waka ‘Japanese poetry’)”, the name of the composer is indicated.
Restricted to the “明治・大正編 (‘Meiji / Taishō Period Series’)”, the style of the text included in the search result is indicated. Based on the uses of sentence final particles in the texts in question, either the annotation “口語 (kōgo ‘spoken language’)” or the annotation “文語 (bungo ‘written language’)” is added. Distinctions between “口語 (kōgo ‘spoken language’)” and “文語 (bungo ‘written language’)” are in principle made for the entire sample as a unit for the “地の文 (ji no bun ‘expository or narrative sections’)” type, and for other types such as “会話 (kaiwa ‘conversation’)” and “引用 (in’yō ‘quotation’)”, the distinction is made for units as defined by their respective types. In cases where “口語 (kōgo ‘spoken language’)” and “文語 (bungo ‘written language’)” appear mixed together in a single text, a judgement is made as to which style is the principal one, and that style is annotated on the text. In addition to these distinctions, there are also exceptional cases where styles peculiar to particular samples, such as “漢文 (kanbun ‘Chinese text’)”, “韻文 (inbun ‘verse’)”, and “外国語 (gaikokugo ‘foreign language’)” are applied.
Under this column is displayed the annotation for the genre subsuming each source text, such as “物語 (monogatari ‘narrative’)”,”歌集 (kashū ‘poetry collection’)”, ”日記 (nikki ‘diary’)”, “説話 (setsuwa ‘folktales’)”, “狂言 (kyōgen ‘comic drama’)”, etc. However, in one part of the “明治・大正編 (‘Meiji / Taishō Period Series’)” sub-corpus, annotations of either “文芸 (bungei ‘literary’)” or “非文芸 (hibungei ‘non-literary’’)” are assigned, depending on whether the sample belongs to a field of literature (novel, drama, poetry, etc.) or not (expository text, essay, reportage, etc.).
For the sample in which a given search result appears, an entry in this column indicates the name of the source material from which that sample has been collected.
For the sample in which a given search result appears, an entry in this column indicates the year of publication or year of creation of the source material or the series. Even when the year of production of the source is uncertain, such as is the case for the “竹取物語 (Taketori Monogatari ‘Tale of the Bamboo Cutter’)”, an annotation of the Western calendar year based on an approximate estimate is supplied.
For any sample in which a given search result appears, an entry in this column indicated the series title, volume title, or sample title for the material containing that sample.
Entries under this column give auctorial information (the name of the author of a work, of the composer of a poem, etc.) for the sample containing the search result in question. The ascertainment of an author’s name is based on the colophon of the original source text, but there are cases where the name has been changed to an appellation widely recognized in the present age.
For a subset of the authors, links to the website of the “国立国会図書館典拠データ検索・提供サービス (Kokuritsu kokkai toshokan tenkyo deeta kensaku / teikyō sābisu ‘Web NDLAuthorities’)” have been provided for referring to the auctorial information provided there.
An entry in this column gives the year of birth of the author of the sample containing the search result in question.
An entry in this column gives the gender of the author of the sample containing the search result in question.
An entry in this column indicates the original source text (source material) which is the basis for the corpus text in which the search result appears.
An entry in this column indicates the number of the page on which the search result appears within the original source text.
An entry in this column indicates the publisher of the original source text containing the search result in question.
As external links, the “日本語歴史コーパス (Nihongo rekishi kōpasu ‘Corpus of Historical Japanese’)” provides “底本リンク (Teihon rinku ‘original source text links’)” and “参考リンク(Sankō rinku ‘reference links’)”.
The “底本リンク (teihon rinku ‘original source text links’)” allow the user to refer to photographic reproductions of the source texts on which the main texts of the corpora are based, or to refer to the corresponding location in the “新編 日本古典文学全集 (Shinpen Nihonkoten bungaku zenshū)” published by Shōgakkan.
The “参考リンク(Sankō rinku ‘reference links’)” lead to locations the user can refer to in cases where the source texts on which the main texts of the corpora are based cannot be made public for reasons of copyright, etc. They lead to locations such as collections containing translations into Present-day Japanese, or to photographic reproductions from editions different from the original source texts.