Documents
This section will allow you to access reports concerning each corpus.
Balanced Corpus of Comtemporary Written Japanese(BCCWJ)
"The Balanced Corpus of Contemporary Written Japanese" (BCCWJ) is a corpus created for the purpose of attempting to grasp the breadth of contemporary written Japanese, containing extensive samples of modern Japanese texts in order to create as uniquely balanced a corpus as possible. The data is comprised of 104.3 million words, covering genres such as general books and magazines, newspapers, business reports, blogs, internet forums, textbooks, and legal documents among others. Random samples of each genre were taken.
- BCCWJ Usage Guide v1.1
- NINJAL Internal Reports
- International Symposium Preliminary Draft, and Reports
- Collections of Results of Public Workshops
- Collections of Results from Satellite Sessions
Corpus of Spontaneous Japanese (CSJ)
The CSJ is a world-class spoken language corpus, containing samples of roughly 7.5 million words of spontaneous Japanese speech. It contains additional data, such as information on the texts that prompted the speech data, as well as intonational information.