This section will allow you to access reports concerning each corpus.

Balanced Corpus of Comtemporary Written Japanese(BCCWJ)

"The Balanced Corpus of Contemporary Written Japanese" (BCCWJ) is a corpus created for the purpose of attempting to grasp the breadth of contemporary written Japanese, containing extensive samples of modern Japanese texts in order to create as uniquely balanced a corpus as possible. The data is comprised of 104.3 million words, covering genres such as general books and magazines, newspapers, business reports, blogs, internet forums, textbooks, and legal documents among others. Random samples of each genre were taken.

Corpus of Spontaneous Japanese (CSJ)

The CSJ is a world-class spoken language corpus, containing samples of roughly 7.5 million words of spontaneous Japanese speech. It contains additional data, such as information on the texts that prompted the speech data, as well as intonational information.


リンク Links