It is possible to look up a large amount of data regarding the texts collected in the BCCWJ, as illustrated below.
The above information is compiled using XML markup. For example, take the following text:
The above text will then be digitized as in the following example. The sections enclosed in < > are called tags. What type of tags are available and how they are organized has a large influence on how information can be accessed in the corpus.
This section will explain the tags used in the BCCWJ.
|Samples||sample||The scope of a single sample.|
|sampling||Information related to the sampling points.|
|article||Identifies author and theme.|
|title||Gives a title descriptive of the contents, such as a chapter or article title.|
|cluster||Marks the whole text of a title tag.|
|list||Itemized lists or lists of noun phrases. For list elements.|
|paragraph||Marks the boundaries of paragraphs.|
|sentence||Marks sentence boundaries.|
|figure||Figures・Charts・Photos・Pictures, and others|
|caption||Titles and explanations of figures.|
|citation||Any citations of other documents.|
|speech||Speech and internal monologues, opening sentences.|
|noteBody||Footnotes, endnotes, etc. An element describing or annotating the original text.|
|abstract||Outlines that do not fit under the article or cluster tags.|
|verse||Verses of poems、waka、haiku、songs, etc.|
|Characters and Transcription||ruby||Readings of Kanji characters|
|correction||Corrections of mistakes in the original text.|
|missingCharacter||Characters not included in the standard encoding (non-JIS).|
Digitized texts are encoded using the JISX0213:2004 standard (aka JIS 4th edition), a type of Unicode encoding.