M-XML is an implementation based off the C-XML format which gives information on the linguistic structure of both Fixed and Variable Length Samples - it allows for the embedding of morphological and syntactic information concerning both Long and Short Unit Words, and allows easy access to such linguistic data. M-XML contains the following layers of information concerning lexical items:
The primary differences between C-XML and M-XML are outlined in the following sections.
M-XML retains only sentence structure information for Variable Length Samples, and fixed length portions selected from Variable Length Samples are surrounded by simple containers. Only inline elements are retained.
Due to some variable sentence definitions used in C-XML for samples of sources such as online knowledge bases, blogs, textbooks, and poetry being merged with sentence definitions in M-XML, certain tags particular to different sub-corpora were changed.
C-XML allows for the recursive nesting of sentence tags, but this was revised in M-XML. In M-XML, a top-level sentence tag changes to the SuperSentence tag, while any a subordinate sentence remains the same. Text following a superSentence tag will be marked with the new notation of sentence type="Fragment".
In cases where the ruby or quote tages conflict with or obscure tags giving morphological information, they will be amended.
In portions where calculations would be transformed into representative numbers, the NumTrans and fraction tags have been appended.
Below is an example of M-XML markup on a single extracted sample of a sentence element. For ease of viewing some properties have been omitted.