How to Apply
This section gives information about the usage and subscription process for each corpus. Please see each individual page for more detailed instructions.
Corpus Reference Application 'Chunagon' (BCCWJ, CSJ, CHJ)
The 'Chunagon' are online reference tools for the Balanced Corpus of Contemporary Written Japanese (BCCWJ), the Corpus of Spontaneous Japanese (CSJ) and the Corpus of Historical Japanese (CHJ).
Balanced Corpus of Contemporary Written Japanese (BCCWJ)
The BCCWJ is the most uniquely balanced corpus currently available for the Japanese language. In total it contains roughly 100 million randomly sampled words.
Corpus of Spontaneous Japanese (CSJ)
The CSJ is a world-class spoken language corpus, containing samples of roughly 7.5 million words of spontaneous Japanese speech. It contains additional data, such as information on the texts that prompted the speech data, as well as intonational information.
Corpus of Historical Japanese (CHJ)
Made available on Chuunagon, the CHJ contains morphological analyses of 10 literary works from the Heian period.
Corpus of Modern Japanese (CMJ) *Japanese text only*
A corpus focusing on magazined from the Meiji to the Showa periods. The following four corpora have been publically released: The "Taiyou Corpus", the "Modern Womens' Language Corpus", the "Meiroku Magazine Corpus", and the "Kokumin no Tomo Corpus".