Chūnagon Search interface

This page explains the column-separated search results in the online search engine “中納言(Chūnagon)” and in files of research results downloaded therefrom, with particular focus on items independently established for the Corpus of Historical Japanese and crucial to its use.

・コーパス情報‘corpus information’
・形態論情報‘morphological information’
・本文情報‘main text information’
・作品情報‘source information’
・作者情報‘auctorial information’
・底本情報‘original source text information’
・外部リンク‘external links’

コーパス情報 (kōpasu jōhō ‘corpus information’)

時代名 (jidaimei ‘names of eras’)

Separately from specifications such as “～時代編 (~jidaihen ‘~Period Series’)” that form part of the names for each of the corpora, we also provide a categorization based on historical periods and era names, presented in the table below:

1	2	3	4	5	6	7	8
奈良 Nara	平安 Heian	鎌倉 Kamakura	室町 Muromachi	江戸 Edo	明治 Meiji	大正 Taishō	昭和 Shōwa

For example, data for textbooks from the Meiji Era to the Shōwa Era are included in the “明治・大正編Ⅱ教科書 (‘Meiji Era / Taishō Era Series II: Textbooks’)” corpus, but specifications for sub-periods are included, such that Period I (1904) and Period II (1910) are annotated as “6明治 (‘6 Meiji’)”, Period III (1918) is annotated “7大正 (‘7 Taishō’)”, and Period IV (1933), Period V (1941), and Period VI (1947) are annotated “8昭和 (‘8 Shōwa’)”. In this way it is possible to re-order and refine data according to historical period.

サンプルID Sample ID

Each data sample (volume, chapter, article, etc.) of a given corpus is individually annotated with a 15-place (characters and digits) ID code string specifying the historical period and the textual source of the data. The respective places in the code string express the following types of information:

places 1-2	place 3	places 4-5	places 6-9	place 10	places 11-15
Era (conforming to the “~Series” specification)	Genre	ID of the textual source	Time of production	Demarcation symbol	Serial number specifying position in the text

For example, in the ID code “60M明六1874_01001”, the first two places “60” stands for “Meiji / Taishō Period Series”, “M” stands for “Magazine”, 明六” stands for “Meiroku Zasshi”, “1874” stands for the year “1874”, underbar “_” specifies the column separator, and “_01001” indicates the first article of the first volume.

The sample ID codes for each of the sub-corpora are annotated according to the following standards:

Sub-corpus	places 1-2	place 3	places 4-5	places 6-9	place 10	places 11-15
Sub-corpus	Era (conforming to the “~Series” specification)	Genre	ID of the textual source	Time of production	-	Serial number specifying position in the text
Nara Period Series I: Man'yōshū	10 (Nara)	-	万葉 (Man’yō)	0759	-	5-digit volume number
Nara Period Series II: Senmyō	10 (Nara)	-	宣命 (Senmyō)	0797	-	2-digit Shoku-Nihongi + 3-digit edict serial number
Nara Period Series III: Norito	10 (Nara)	-	祝詞 (Norito)	0927	-	5-digit Norito serial number
Heian Period Series I: Kana literature	20 (Heian)	-	2-character source title acronym	Year of production of each text	-	5-digit serial number (For the Ōkagami only, 2-digit section and 3-digit file serial numbers)
Heian Period Series II: Kunten materials	20 (Heian)	K (Kunten materials)	西金 (Sai-kon)	0803	-	2-digit volume number + 3-digit chapter number
Heian Period Series III: Kanbun-based materials	20 (Heian)	L (Kanbun-based materials)	法華 (Hokke) / 尾張 (Owari) / 高山 (Kōzan)	Year of production of each text	-	Hokke Hyakuza Kikigaki-shō and Kōzanji-bon Ko-ōrai: 5-digit file serial number, Owari-no-kuni Gebumi: 3-digit serial number for paratext + 2-digit article serial number
Kamakura Period Series I: Folktales and Essays	30 (Kamakura)	-	2-character source title acronym	Year of production of each text	-	2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number
Kamakura Period Series II: Diaries and Travel Literature	30 (Kamakura)	-	2-character source title acronym	Year of production of each text	-	2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number
Kamakura Period Series III: Military Chronicles	30 (Kamakura)	-	2-character source title acronym	Year of production of each text	-	Volume 2 digits + file serial number 3 digits
Muromachi Period Series I: Kyōgen	40 (Muromachi)	-	虎明 (Toraakira-bon)	1642	-	Volume 2 digits + file serial number 3 digits
Muromachi Period Series II: Christian Materials (Kirishitan Shiryō)	40 (Muromachi)	-	2 characters from source title	Year of production of each text	-	2-digit volume number + 3-digit file serial number; alternatively, 5-digit file serial number
Muromachi Period Series III: Shōmono	40 (Muromachi)	-	抄物 (Shōmono)	Year of production of each text	-	Volume 2 digits + file serial number 3 digits
Edo Period Series I: Share-bon	52 (Edo / Latter Modern)	-	洒落 (Share-bon)	Year of production of each text	-	2-digit region ID + 3-digit file serial number
Edo Period Series II: Ninjo-bon	53 (Edo / Late Modern)	-	人情 (Ninjō-bon)	Year of production of each text	-	2-digit source serial number + 3-digit volume serial number
Edo Period Series III: Chikamatsu-Joruri	51 (Edo / Early Modern)	-	近松 (Chikamatsu-Joruri)	Year of production of each text	-	2-digit source serial number + 3-digit volume serial number
Edo Period Series IV: Essays and Travel Literature	51 (Edo / Early Modern)	-	芭蕉 (Bashō)	Year of production of each text	-	2-digit source serial number (in order of year of production) + 3-digit serial numbers for items produced in the same year
Edo Period Series V: Commentaries	52 (Edo / Late Modern)	-	遠鏡 (Tōkagami)	1793	-	3-digit serial number for paratext + 2-digit article serial number
Edo Period Series VI: Kamigata Eiri Kyogen-bon	51 (Edo / Early Modern)	-	2-character source title acronym	Year of production of each text	-	3-digit serial number for paratext + 1-digit part serial number + 1-digit article serial number
Meiji Era / Taishō Era Series I: Magazines	60 (Meiji / Taishō)	M (magazine)	Source title acronym 2 characters	Year of production of each text	-	2-digit volume serial number + 3-digit article serial number
Meiji Era / Taishō Era Series II: Textbooks	60 (Meiji / Taishō)	T (textbook)	Source title acronym 2 characters	Year of production of each text	-	1-digit period number + 1-digit grade number + 1-digit volume number + 2-digit article serial number
Meiji Era / Taishō Era Series III: Early Meiji Spoken Language Materials	60 (Meiji / Taishō)	C (Colloquial)	口語 (Colloquial)	Year of production of each text	-	2-digit material serial number + 1- digit material-internal edition number + 2-digit material- and edition-internal serial number
Meiji Era / Taishō Era Series IV: Modern Novels	60 (Meiji / Taishō)	N (novel)	2-character source title acronym	Year of production of each text	-	1-digit book number + 1-digit part number + 3-digit chapter serial number
Meiji Era / Taishō Era Series V: Newspapers	60 (Meiji / Taishō)	P (paper)	読売 (Yomiuri)	Year of each volume	-	1-digit month number (in Base32) + 1-digit day number (in Base32) + 3-digit article serial number
Meiji Era / Taishō Era Series IV: Rakugo 78 rpm Discs	60 (Meiji / Taishō)	R (rakugo)	2-character abbreviation for rakugo	Year of production of each text	-	2-digit region number + 3-digit intra-region file serial number
Waka-shū Series	20 (Heian)	W (waka-shū)	2-character source title acronym	Year of production of each text	-	2-digit volume number + 3-digit file serial number
Waka-shū Series	30 (Kamakura)	W (waka-shū)	2-character source title acronym	Year of production of each text	-	2-digit volume number + 3-digit file serial number

開始位置 (kaishi ichi ‘starting location’)

The “開始位置 (kaishi ichi ‘starting location’)” is an ID that indicates the position of the first character of any word that satisfies the description of the “キー (kii ‘key’)”. Each character in a given text is annotated with a number, forming a “+10” arithmetic sequence. When searching by location, it is possible to uniquely identify an example by combining this number with the ID of the sample.

“連番 (renban ‘serial number’)”

The “連番 (renban ‘serial number’)” is an ID that indicates the location of a word that corresponds to the “キー (kii ‘key’)”. In contrast to “開始位置 (kaishi ichi ‘starting locations’)”, serial numbers are annotated onto short unit words or long unit words to form a “+10” arithmetic sequence, without reference to the number of characters in the “key”.

コア(koa ‘core’)

The value for the “コア(koa ‘core’)” category indicates whether the sample containing a search result is either a part of the core data (the set of data for which each text has received full hand correction) or a part of the non-core data (data including parts that are either partially hand corrected or left untouched after morphological parsing). A value of “1” indicates status as core data while a value of “0” indicates non-core status.

主本文 (syuhonbun ‘main text’)

Values in this category indicate the distinction between main text (the principal reading) and alternative text (alternative readings). Given that there are character strings having information for two or more possible readings or meanings, such as kakekotoba ‘pivot words’, records coded with multiple morphological information are annotated with numbers for their respective statuses. A value of “1” indicates status as main text, while a value of “0” indicates alternative text status.

多重化種別 (tajūkasyubetsu ‘types of multiple analyses’)

With regard to records having multiple assignments of morphological information, the factor according to which that multiple assignment was carried out is indicated by values in the category of “多重化種別 (tajūkasyubetsu ‘types of multiple analyses’)”. Types such as “掛詞(kakekotoba ‘pivot word’)” and “振り仮名 (furigana ‘ruby text’)”, etc. are indicated.

形態論情報 (keitairon jōhō ‘morphological information’)

語彙素 (goiso ‘lexeme’)

Entries in the “語彙素 (goiso ‘lexeme’)” column express the way the dictionary heading corresponding to a word satisfying the description of the “key” is written. Because a lexeme is equivalent to the heading of a dictionary heading, unifying all the variations in form (morphological form, inflection, orthographical form, etc.) that a given word may take, common Native Japanese words (wago) and Sino-Japanese words (kango) are written in kanji / hiragana, while loan words, personal names, place names, and the like are written in katakana (for example, “国 [kuni ‘country’]”, “国家 [kokka ‘nation’]”, “カントリー [kantorii ‘country’]”, “日本 [nihon ‘Japan’]”).

語彙素読み (goiso yomi ‘lexeme reading’)

An entry in the “語彙素読み (goiso yomi ‘lexeme reading’)” column indicates the reading for any word fitting the description of the “key” (see the entry for “lexeme”). It is written in katakana. (Examples, “クニ [kuni ‘country’]”, “コッカ [kokka ‘nation’]”, “カントリー [kantorii ‘country’]”, “ニホン [nihon ‘Japan’]”).

語形 (gokei ‘morphological form’)

An entry in the “語形 (gokei ‘morphological form’)” column indicates the sound shape of any word fitting the description of the “キー (kii ‘key’)”. A morphological form might be equivalent to one of the variant forms unified under a given lexeme (for example, the forms yahari and yappari unified under the lexeme “矢張り [‘as expected’]”), or equivalent to any one of the various inflections that might be unified under a given lexeme (for example, “yomu [pentagrade - m consonant]”, “yomu [Classical Japanese quadrigrade - m consonant]”, “yomeru [lower monograde-m consonant; potential verb form]”, etc.), unified under the lexeme “読む (yomu ’read’)”. Morphological forms are written in katakana.

品詞 (hinshi ‘part of speech’)

For any word fitting the description of the “キー (kii ‘key’)”, and entry in the “品詞 (hinshi ‘part of speech’)” column indicates the information about how the UniDic system analyzes the part of speech for that word. Care must be taken in that UniDic adopts its own particular part of speech system in some cases, so that, for example, a word that falls under the category of “形容動詞 (keiyōdōshi ‘adjectival verb’)” in school grammar is, in UniDic, split into a word stem and an inflecting suffix, respectively analyzed as a “形状詞 (keijōshi ‘adjectival noun’)” and a “助動詞 (jōdōshi ‘auxiliary verb’)”. Furthermore, in the case of short unit words, for example, a word such as “朝 (asa ‘morning’)” is annotated with the part of speech information “名詞-普通名詞-副詞可能 (‘noun-common noun - adverbial use possible’)”. Annotations such as these, having sub-categories under a main class (in this case the class of “nouns”), indicate possibilities under that class. Thus, for a word such as the example “朝 (asa ‘morning’)” above, both tokens used as nouns and tokens used as adverbs are given the same (generalized) part-of-speech information.

活用型 (katsuyōgata ‘inflectional class’)

This category appears only in the case of inflecting words, and its values indicate the inflectional class of the word satisfying the description in the “キー (kii ‘key’)”. The inflectional classes of contemporary spoken Japanese are presented simply by grade and consonant, as, for example, in “五段-サ行 (‘pentagrade - s consonant’)”, while for the inflectional classes of Classical Japanese, the qualifier “文語 (‘Classical Japanese’)” is added, as in “文語四段-サ行 (‘Classical Japanese - quadrigrade - s consonant’)”.

活用形 (katuyōkei ‘inflectional form’)

This category appears only in the case of inflecting words, and for short unit words included in the“キー (kii ‘key’)”, the particular value displayed indicates the inflectional form of the token in question. Note that, while in school grammar, the volitional conjectural form is presented divided into “未然形 (‘irrealis’)” and auxiliary verb “-u/-yō”, UniDic unifies this into a “意志推量形(ishisuiryōkei ‘volitional conjectural inflection’)”.

原文文字列・原文kwic (‘original text character string / original text KWIC’)

Things such as mistaken characters, missing characters (e.g., haplography), superfluous characters (e.g., dittography), and missing voice diacritics have been corrected in the texts of the corpora. Furthermore, depending on the sub-corpus in question, various revisions have been made, such as the conversion of writing that mixes kanji with katakana into writing that mixes kanji with hiragana, or the spelling out of strings employing odoriji ‘repetition marks’ (such as “〳〵” and “々々”), etc. Columns headed by either “原文文字列 (genbun mojiretsu ‘original text character string’)” or “原文kwic (genbun kwic ‘original text key word in context’)” display text as it appeared prior to revision.

Furthermore, for the 室町時代編Ⅱキリシタン資料 (Muromachi jidai-hen II Kirishitan Shiryō ‘Muromachi Period Series II Christian Materials’), as an exceptional case, the original text is presented in Romanized form.

振り仮名 (furigana ‘ruby text’)

For any string corresponding to the “キー (kii ‘key’)”, under the “振り仮名 (furigana ‘ruby text’)” column is displayed the text (after correction) of any right-hand ruby text and any right-hand (or alternatively, overhead) inter-linear notes added to the main line of text. Left-hand ruby text does not appear in the search results of Chūnagon.

本文情報 (honbun jōhō ‘main text information’)

本文種別 (honbun shubetsu ‘main text type’)

Values in this column indicate the features (“地の文 [ji no bun ‘narrative, expository’]”, “会話 [kaiwa ‘conversation’]”, etc.) of the section of the main text in which the search result appears.

When the cell in the “本文種別 (honbun shubetsu ‘main text type’)” column is blank, that indicates that the search result appears in a section forming 地の文 (ji no bun ‘an expository or narrative section’). When the object of search appears in a section that forms a conversation, the annotation is “会話 (kaiwa ‘conversation’)”; when the object of search appears in a section that makes up a quotation from a book or a letter, etc., the annotation is “引用 (in’yō)”. Furthermore, depending on the sub-corpus, there are cases peculiar to a genre where annotations such as “歌 (uta ‘poem’)” or “詞書 (kotobagaki ‘preface’)” are supplied (for details, refer to the documentation for individual Period Series).

話者 (washa ‘speaker’)

In the “話者 (washa ‘speaker’)” column, are indicated the speaker in a conversation, the author of a quoted text, or the title of a quoted text. In the case of “和歌 (waka ‘Japanese poetry’)”, the name of the composer is indicated.

文体 (buntai ‘style’)

Restricted to the “明治・大正編 (‘Meiji / Taishō Period Series’)”, the style of the text included in the search result is indicated. Based on the uses of sentence final particles in the texts in question, either the annotation “口語 (kōgo ‘spoken language’)” or the annotation “文語 (bungo ‘written language’)” is added. Distinctions between “口語 (kōgo ‘spoken language’)” and “文語 (bungo ‘written language’)” are in principle made for the entire sample as a unit for the “地の文 (ji no bun ‘expository or narrative sections’)” type, and for other types such as “会話 (kaiwa ‘conversation’)” and “引用 (in’yō ‘quotation’)”, the distinction is made for units as defined by their respective types. In cases where “口語 (kōgo ‘spoken language’)” and “文語 (bungo ‘written language’)” appear mixed together in a single text, a judgement is made as to which style is the principal one, and that style is annotated on the text. In addition to these distinctions, there are also exceptional cases where styles peculiar to particular samples, such as “漢文 (kanbun ‘Chinese text’)”, “韻文 (inbun ‘verse’)”, and “外国語 (gaikokugo ‘foreign language’)” are applied.

作品情報 (sakuhin jōhō ‘source information’)

“ジャンル(janru ‘genre’)”

Under this column is displayed the annotation for the genre subsuming each source text, such as “物語 (monogatari ‘narrative’)”,”歌集 (kashū ‘poetry collection’)”, ”日記 (nikki ‘diary’)”, “説話 (setsuwa ‘folktales’)”, “狂言 (kyōgen ‘comic drama’)”, etc. However, in one part of the “明治・大正編 (‘Meiji / Taishō Period Series’)” sub-corpus, annotations of either “文芸 (bungei ‘literary’)” or “非文芸 (hibungei ‘non-literary’’)” are assigned, depending on whether the sample belongs to a field of literature (novel, drama, poetry, etc.) or not (expository text, essay, reportage, etc.).

作品名 (sakuhinmei ‘title’)

For the sample in which a given search result appears, an entry in this column indicates the name of the source material from which that sample has been collected.

成立年 (seiritsunen ‘date of production’)

For the sample in which a given search result appears, an entry in this column indicates the year of publication or year of creation of the source material or the series. Even when the year of production of the source is uncertain, such as is the case for the “竹取物語 (Taketori Monogatari ‘Tale of the Bamboo Cutter’)”, an annotation of the Western calendar year based on an approximate estimate is supplied.

巻名等 (kanmei tō ‘title of volume, etc.’)

For any sample in which a given search result appears, an entry in this column indicated the series title, volume title, or sample title for the material containing that sample.

作者情報 (sakusha jōhō ‘auctorial information’)

作者 (sakusha ‘author’)

Entries under this column give auctorial information (the name of the author of a work, of the composer of a poem, etc.) for the sample containing the search result in question. The ascertainment of an author’s name is based on the colophon of the original source text, but there are cases where the name has been changed to an appellation widely recognized in the present age.

For a subset of the authors, links to the website of the “国立国会図書館典拠データ検索・提供サービス (Kokuritsu kokkai toshokan tenkyo deeta kensaku / teikyō sābisu ‘Web NDLAuthorities’)” have been provided for referring to the auctorial information provided there.

生年 (seinen ‘year of birth’)

An entry in this column gives the year of birth of the author of the sample containing the search result in question.

性別 (seibetsu ‘gender’)

An entry in this column gives the gender of the author of the sample containing the search result in question.

底本情報 (teihon jōhō ‘original source text information’)

底本 (teihon ‘original source text’)

An entry in this column indicates the original source text (source material) which is the basis for the corpus text in which the search result appears.

ページ番号 (peeji bangō ‘page number’)

An entry in this column indicates the number of the page on which the search result appears within the original source text.

出版社 (shuppansha ‘publishing company’)

An entry in this column indicates the publisher of the original source text containing the search result in question.

外部リンク (gaibu rinku ‘external links’)

As external links, the “日本語歴史コーパス (Nihongo rekishi kōpasu ‘Corpus of Historical Japanese’)” provides “底本リンク (Teihon rinku ‘original source text links’)” and “参考リンク(Sankō rinku ‘reference links’)”.

The “底本リンク (teihon rinku ‘original source text links’)” allow the user to refer to photographic reproductions of the source texts on which the main texts of the corpora are based, or to refer to the corresponding location in the “新編　日本古典文学全集 (Shinpen Nihonkoten bungaku zenshū)” published by Shōgakkan.

The “参考リンク(Sankō rinku ‘reference links’)” lead to locations the user can refer to in cases where the source texts on which the main texts of the corpora are based cannot be made public for reasons of copyright, etc. They lead to locations such as collections containing translations into Present-day Japanese, or to photographic reproductions from editions different from the original source texts.

日本語をはじめとする言語を分析するための基礎資料として、書き言葉や話し言葉の資料を体系的に収集し、研究用の情報を付与したものです。

詳しい解説