Here we publically release under the title "Corpus of Historical Japanese Meiji-Taishō Period Series I, Magazines," data for magazines from the Meji and Taishō periods, under the title "Corpus of Historical Japanese Meiji-Taishō Period Series II, Textbooks," data for government-designated national language textbooks used in elementary and higher elementary schools, and under the title "Corpus of Historical Japanese Meiji-Taishō Period Series III, Early Meiji Spoken Language Materials," the principle data for spoken language published in the early Meiji period, and data for 21 novels (by 21 authors) from the Mid-Meiji period through the Taishō period under the title of “Corpus of Historical Japanese Meiji Era / Taishō Era Series IV Modern Novels”.
※ Long Unit Word information has not been annotated for this data.
The "Corpus of Historical Japanese Meiji-Taishō Period Series I, Magazines" covers the published material for each year in a determined series of years for magazines representative of the Meiji and Taishō periods, respectively. Taking as its material magazines with a great variety of article genres and a great range of authors, the corpus is designed to enable a broad survey of aspects of written language, such as the shift from the classical style that was mainstream in the early Meiji period to the establishment of Contemporary Japanese, and to capture change in modern language diachronically.
The titles, publication dates, (and volume numbers) of the magazines included in the Meiji Era / Taishō Era Series I: Magazines corpus are as follows:
In principle, for each magazine the entirety of the text is taken as the object for annotation, but the following textual elements are included in the coverage.
Furthermore, the following items are excluded from the range of textual elements designated for coverage in the corpus.
In addition to the Short Unit Word morphological information rendered searchable in the text of this corpus, information on the genre, title of the magazine, article, and author has been annotated and can be accessed in the search results of the corpus search application Chūnagon.
Please read the following abstract for this corpus before use:
Please see the following pages for details on the specifications of the four sub-corpora that comprise this corpus:
The Short Unit Word morphological information for this corpus has been annotated according to the distinction between Literary and Colloquial Japanese, set out in the Annotation Guidelines listed below:
Please consult the Annotation Guidelines before use.
Images of the original texts of the magazines can be accessed through the search results from Chūnagon, so that the text of the corpus can be compared to the original documents during use. Please note that images of the original texts for Jogaku Zasshi, Jogaku Sekai, and Fujin Kurabu are not available.
Presentations of research results using this corpus must include a citation taking the general form of the example below (with appropriate modifications depending on the version and the date of access):
* As long as one of either the version or the date of access is clearly cited, the other can be omitted, as below:
Users will need to access the Corpus of Historical Japanese through the online search engine Chūnagon. Completion of a Users Licensing Agreement is required.
Please refer to the following: The Corpus of Historical Japanese: How to apply
* Titles are current with the time of development.
The corpus compilation was supported by the "Design for a Diachronic Corpus" (2009-2016) project and the "Construction of Diachronic Corpora and New Developments in Research on the History of Japanese" (2016-) project, and JSPS KAKENHI Grant Number JP15H01883 (2015-2019).
In 1903, in accordance with amendments to the Elementary School Order, a national textbook system was established that limited the use of textbooks in elementary schools to those for which the Ministry of Education owned the copyright, and from 1904 national textbooks were used in Japanese language classes. The national textbooks for Japanese language were compiled with the aim of accomplishing, through national language education, the completion of a style unifying writing and speech as a written language form, and the establishment of a standard spoken language. The materials were major contributors to the establishment and spread of the standard language used in modern Japan. In the "Meiji and Taisho Edition II Textbooks" corpus are recorded the nationally prescribed Japanese language textbook used at elementary schools (Period 1 to Period 6) and the nationally-prescribed Japanese language textbook used at higher elementary schools (Period 1). The period (the first year of use) and the name of each textbook recorded are as follows:
In principle, for each magazine the entirety of the text is taken as the object for annotation, but the following textual elements are included in the coverage.
Furthermore, the following items are excluded from the range of textual elements designated for coverage in the corpus.
The data for the elementary school textbooks in this corpus are based on the text data of the body of text used in creating the "Kokutei-Yōgo-Sōran CD-ROM Edition" (National Institute for Japanese Language and Linguistics, 1997). In addition, the data for higher elementary school textbooks is based on the data from the separately created Morphologically Annotated Corpus of "Koutou-Shōgaku Tokuhon" (higher elementary school readers) (Asuko Kondo, Toshinobu Ogiso, Fumiko Kato, (2010), 'The Morphologically Annotated Corpus of "Koutou-Shōgaku Tokuhon"' ('The Collected Papers from the "Information Processing Society of Japan Symposium (Jinmonkon 2010 Collected Papers)', 2010:15, pp.189-194). The present corpus unifies these two sets of data and reconstructs the information in line with the design of the "Corpus of Historical Japanese".
In addition to the Short Unit Word morphological information rendered searchable in the text of this corpus, information on the period and grade has been annotated and can be accessed in the search results of the corpus search application Chūnagon. Please read the following abstract for this corpus before use:
The Short Unit Word morphological information for this corpus has been annotated according to the distinction between Literary and Colloquial Japanese, set out in the Annotation Guidelines listed below:
Please consult the Annotation Guidelines before use.
Images of the original texts of the national textbooks can be accessed through the search results from Chūnagon, so that the text of the corpus can be compared to the original documents during use.
Presentations of research results using this corpus must include a citation taking the general form of the example below (with appropriate modifications depending on the version and the date of access):
* As long as one of either the version or the date of access is clearly cited, the other can be omitted, as below:
Users will need to access the Corpus of Historical Japanese through the online search engine Chūnagon. Completion of a Users Licensing Agreement is required.
Please refer to the following: The Corpus of Historical Japanese: How to apply
* Titles are current with the time of development.
The corpus compilation was supported by the "Construction of Diachronic Corpora and New Developments in Research on the History of Japanese" (2016-) project.
The "Corpus of Historical Japanese Meiji-Taishō Period Series III, Early Meiji Spoken Language Materials" is a corpus collecting the principle materials for spoken language published in the early Meiji period. The materials collected are considered to be crucial for the understanding of the spoken language of the time, and of the colloquial writing style (genbun'icchi-tai --the writing style unifying Classical and Spoken Japanese) which spread and became established in the Meiji and Taishō Periods.
The titles, publication dates, (and volume numbers) of the magazines included in the Meiji Era / Taishō Era Series I: Magazines corpus are as follows:
The entirety of the text for each work has been taken as the object for annotation, but the following textual elements are included in the coverage for the corpus.
Furthermore, the following items are excluded from the range of textual elements designated for coverage in the corpus.
Please read the following the overview of this corpus before use:
The Short Unit Word morphological information for this corpus has been annotated according to the distinction between Literary and Colloquial Japanese, set out in the Annotation Guidelines listed below:
Please consult the Annotation Guidelines before use.
Presentations of research results using this corpus must include a citation taking the general form of the example below (with appropriate modifications depending on the version and the date of access):
* As long as one of either the version or the date of access is clearly cited, the other can be omitted, as below:
Users will need to access the Corpus of Historical Japanese through the online search engine Chūnagon. Completion of a Users Licensing Agreement is required.
Please refer to the following: The Corpus of Historical Japanese: How to apply
* Titles are current with the time of development.
The corpus compilation was supported by the "Design for a Diachronic Corpus" (2009-2016) project and the "Construction of Diachronic Corpora and New Developments in Research on the History of Japanese" (2016-) project, JSPS KAKENHI Grant Number JP15H01883 (2015-2019), and JSPS KAKENHI Grant Number JP15H01883 (2015-2019), and JSPS KAKENHI Grant Number JP17K02786 (2017-2020).
Modern novels are literary works written in a new style of language, born out of the influence of “civilization and enlightenment” which incorporated the culture and thought of the Western world. For research into the unification of the written and spoken language, a language change that is one the symbols of Japanese modernity, these are indispensable resources. Furthermore, they are crucial resources that contain history and achievements that have been the object of research on modern language in every field, beginning with those of vocabulary, style, and orthography.
The following 21 works are the novels that comprise this corpus:
In principle the entirety of the text for each volume of a work is taken as the object of annotation, but the following textual elements are excluded the coverage of texts.
Furthermore, the following items are excluded from the range of textual elements designated for coverage in the corpus.
Concerning for details on the coverage of the texts, please refer to the following link: Overview; Meiji Era / Taishō Era Series IV: Modern Novels from the Corpus of Historical Japanese (CHJ)
Please read the following the overview of this corpus before use:
The Short Unit Word morphological information for this corpus has been annotated according to the distinction between Literary and Colloquial Japanese, set out in the Annotation Guidelines listed below:
Please consult the Annotation Guidelines before use.
With regard to the 10 works for which images of the original texts have been made public, it is possible to refer to images of the original texts from the National Diet Library Digital Collections while using the corpus by accessing them through search results generated in Chūnagon. (Please note that, out of the 21 works comprising this corpus, the 11 works for which images of the original texts have not been made public are not available for reference. Furthermore, please be aware in advance that with regard to the images that have been made open to the public, in addition to incomplete sections, there are also parts containing marginalia, tearing, and other damage that make viewing difficult.)
Presentations of research results using this corpus must include a citation taking the general form of the example below (with appropriate modifications depending on the version and the date of access):
* As long as one of either the version or the date of access is clearly cited, the other can be omitted, as below:
Users will need to access the Corpus of Historical Japanese through the online search engine Chūnagon. Completion of a Users Licensing Agreement is required.
Please refer to the following: The Corpus of Historical Japanese: How to apply
* Titles are current with the time of development.
The corpus compilation was supported by the "Construction of Diachronic Corpora and New Developments in Research on the History of Japanese" (2016-) project.