Grammar documentation/Examples
Contents
English plurals
this counts as two grammar points: one main point, and a number of nuances that together count again
In English, plural morphology on nouns denote that there is more than one of something.
There are only two numbers in English, singular and plural. The tag for a noun should be <n>, and singular and plural should be <sg> and <pl>, respectively.
Regular plurals
Regular plurals are formed with the addition of «s» or «es».
- Nouns ending in «s», «ch», «x», or «z» take the suffix «es»:
- box<n><pl> ↔ boxes
- match<n><pl> ↔ matches
- kiss<n><pl> ↔ kisses
- Otherwise the suffix is «s»:
- snake<n><pl> ↔ snakes
- window<n><pl> ↔ windows
- boy<n><pl> ↔ boys
- Nouns that end in «o» are unpredictable: some take «s» and some take «es»:
- potato<n><pl> ↔ potatoes
- piano<n><pl> ↔ pianos
Regular suffix with stem alternations
- If a noun ends in a «y» that's preceded by a consonant, the «y» turns to «i» and the suffix is «es»:
- baby<n><pl> ↔ babies
- In some (but not all) nouns that end in «f» or «fe», the «f»/«fe» is replaced by «ve» before the suffix «s» is added:
- leaf<n><pl> ↔ leaves
- life<n><pl> ↔ lives
- Some stems that end in «ex» or «ix» change the «x» to «c» before adding «es»:
- matrix<n><pl> ↔ matrices
No suffix
- Some plural forms are denoted only by a stem alternation; these are not predictable:
- tooth<n><pl> ↔ teeth
- mouse<n><pl> ↔ mice
- man<n><pl> ↔ men
- crisis<n><pl> ↔ crises
- diagnosis<n><pl> ↔ diagnoses
- Some nouns have identical singular and plural forms:
- deer<n><pl> ↔ deer
- fish<n><pl> ↔ fish
- moose<n><pl> ↔ moose
Other irregular plurals
- There are a number of other unpredictable patterns:
- addendum<n><pl> ↔ addenda
- corpus<n><pl> ↔ corpora
- alumnus<n><pl> ↔ alumni
- child<n><pl> ↔ children
Spanish present tense
this only covers the regular forms, so counts as one grammar point; if it covered a bunch of irregular verbs, or a different tense for the regular verbs, a second grammar point would be present
Verbs in Spanish should be tagged <v>. They can be subcategorised into transitive and intransitive verbs with additional tags <tv> and <iv>, respectively (this is important for morphology to some extent, but especially for translation). One of the tenses is the present tense: <pres>. The person and number tags needed will be <p1>, <p2>, <p3> and <sg> and <pl>.
The regular present tense in Spanish is formed by adding a set of endings to the verb stem. The set of endings used depends on the "theme vowel" of the verb: either «a», «e», or «i».
«hablar» (speak) is an «a»-vowel verb, with the stem «habl»-:
- hablar<v><tv><pres><p1><sg> ↔ hablo
- hablar<v><tv><pres><p2><sg> ↔ hablas
- hablar<v><tv><pres><p3><sg> ↔ habla
- hablar<v><tv><pres><p1><pl> ↔ hablamos
- hablar<v><tv><pres><p2><pl> ↔ habláis
- hablar<v><tv><pres><p3><pl> ↔ hablan
«comer» (eat) is an «e»-vowel verb, with the stem «com»-:
- comer<v><tv><pres><p1><sg> ↔ como
- comer<v><tv><pres><p2><sg> ↔ comes
- comer<v><tv><pres><p3><sg> ↔ come
- comer<v><tv><pres><p1><pl> ↔ comemos
- comer<v><tv><pres><p2><pl> ↔ coméis
- comer<v><tv><pres><p3><pl> ↔ comen
«escribir» (write) is an «i»-vowel verb, with the stem «escrib»-:
- escribir<v><tv><pres><p1><sg> ↔ escribo
- escribir<v><tv><pres><p2><sg> ↔ escribes
- escribir<v><tv><pres><p3><sg> ↔ escribe
- escribir<v><tv><pres><p1><pl> ↔ escribimos
- escribir<v><tv><pres><p2><pl> ↔ escribís
- escribir<v><tv><pres><p3><pl> ↔ escriben
Kyrgyz locative case
Kyrgyz nouns (<n>) can be followed by a locative case suffix (<loc>). Locative roughly expresses the same ideas as English "in/at/on".
Plural morphology and possessive morphology may intervene between the verb stem and the locative case suffix. The suffix has eight forms; which one is used is entirely predictable based on the last consonant (if present) and vowel of the material before it, regardless of whether it's part of the noun stem or other morphology.
The first letter of the suffix is «д» after any voiced sound (vowels, sonorants, and «з») and «т» after anything a voiceless sound (the remaining consonants). The second letter of the suffix is a vowel that is either «а», «е», «о», or «ө». «a» occurs if the previous vowel is also «а», or is «ы», «я», «у», or «ю». «е» occurs if the previous vowel is «е», «э», or «и». «о» occurs if the previous vowel is «о» or «ё». «ө» occurs if the previous vowel is «ө» or «ү».
Here is an example of a noun that takes each form of the locative suffix:
- алма<n><loc> ↔ алмада
- кол<n><loc> ↔ колдо
- көз<n><loc> ↔ көздө
- бел<n><loc> ↔ белде
- баш<n><loc> ↔ башта
- чок<n><loc> ↔ чокто
- күч<n><loc> ↔ күчтө
- иш<n><loc> ↔ иште
Kyrgyz case suffixes
In Kyrgyz, case suffixes can follow a noun stem (<n>) directly, or number and possession morphology may intervene. This section focuses on the system of case suffixes, not on the conditioning environments for the forms they take.
The main case suffxes used in Kyrgyz include the following:
case name | ~meaning | tag | possible forms | алма "apple" | гүл "flower" |
---|---|---|---|---|---|
nominative | subject | <nom> | — | алма<n><nom> ↔ алма | гүл<n><nom> ↔ гүл |
accusative | definite direct object | <acc> | ны, ни, ну, нү, ды, ди, ду, дү, ты, ти, ту, тү | алма<n><acc> ↔ алманы | гүл<n><acc> ↔ гүлдү |
genitive | possessor | <gen> | нын, нин, нун, нүн, дын, дин, дун, дүн, тын, тин, тун, түн | алма<n><gen> ↔ алманын | гүл<n><gen> ↔ гүлдүн |
dative | "to" | <dat> | га, го, ге, гө, ка, ко, ке, кө | алма<n><dat> ↔ алмага | гүл<n><dat> ↔ гүлгө |
locative | "at, in on" | <loc> | да, до, де, дө, та, то, те, тө | алма<n><loc> ↔ алмада | гүл<n><loc> ↔ гүлдө |
ablative | "from" | <abl> | дан, дон, ден, дөн, тан, тон, тен, төн | алма<n><abl> ↔ алмадан | гүл<n><abl> ↔ гүлдөн |
Malay adjective reduplication
In Malay, reduplication of an adjective (<adj>) can express either adverbialisation (<advl>) or plurality of a corresponding noun (<pl>).
- keras<adj><advl> ↔ keras-keras ("loud" → "loudly")
- besar<adj><pl> ↔ besar-besar ("big", referring to a plural noun)
for this grammar point, I'd want to see at least a couple more example examples (e.g., can "keras-keras" mean "loud (plural)"?), and potentially a sentence or two demonstrating the use
Mandarin personal pronouns
In Mandarin, personal pronouns are distinguished by person (1st, 2nd, 3rd), number (singular, plural), gender in third person (masculine, feminine, neuter), formality in second person (informal, formal), and inclusivity/exclusivity in first person plural (inclusive, exclusive).
Person | Singular | Plural | ||
---|---|---|---|---|
1st | 我<prn><pers><p1><sg> ↔ 我 | exclusive | inclusive | |
我们<prn><pers><p1><pl><excl> ↔ 我们 我们<prn><pers><p1><pl><excl> ← 我們 |
咱们<prn><pers><p1><pl><incl> ↔ 咱们 咱们<prn><pers><p1><pl><incl> ← 咱們 | |||
2nd | informal | 你<prn><pers><p2><sg> ↔ 你 | 你们<prn><pers><p2><pl> ↔ 你们 你们<prn><pers><p2><pl> ← 你們 | |
formal | 您<prn><pers><p2><sp><frm> ↔ 您 | |||
3rd | masculine | 他<prn><pers><p3><sg><m> ↔ 他 | 他们<prn><pers><p3><pl><m> ↔ 他们 他们<prn><pers><p3><pl><m> ← 他們 | |
feminine | 她<prn><pers><p3><sg><f> ↔ 她 | 她们<prn><pers><p3><pl><f> ↔ 她们 她们<prn><pers><p3><pl><f> ← 她們 | ||
neuter | 它<prn><pers><p3><sg><nt> ↔ 它 | 它们<prn><pers><p3><pl><nt> ↔ 它们 它们<prn><pers><p3><pl><nt> ← 它們 |
- Note: while 3rd person pronouns are orthographically distinguished by gender, the pronunciation of the pronouns is the same regardless of gender.
This is one approach to tagging these pronouns, where everything is a category tag. Another approach would treat the plural suffix (consistent throughout) as a grammatical tag, which would mean the lemma for the plural pronouns would change to just the singular element (first character of each). Similarly, gender could be treated as a grammatical tag, and a single lemma could be chosen for the third person pronouns. Taking this to the extreme, person could be a grammatical tag, and then all pronouns would have the same lemma—although deciding what that is might be tricky. In the Apertium English transducer, prpers is used in this way.
English demonstratives
English has four demonstratives, differentiated by proximity (proximal, distal) and number (singular, plural). They may function as determiners and as pronouns.
- this<det><dem><sg> ↔ this, this<prn><dem><sg> ↔ this
- that<det><dem><sg> ↔ that, that<prn><dem><sg> ↔ that
- this<det><dem><pl> ↔ these, this<prn><dem><pl> ↔ these
- that<det><dem><pl> ↔ those, that<prn><dem><pl> ↔ those
Note that proximity isn't indicated in the tags—that's because it can be distinguished based on the lemma. However, it would be perfectly fine to have tags for that, just like pronouns have tags for person despite also being able to be distinguished by the lemma. At some level it comes down to what's useful at the level of the syntax—in many languages person is useful to know because it needs to agree (e.g., with finite verb forms or possessed noun forms), but proximity often isn't helpful in this way.
Russian spellrelax
In Russian-language pædagogical materials, accent marks are sometimes written to mark stress; they may also be used to differentiate words which are otherwise identical in spelling. Stress marks should be ignored for the purposes of analysis. Some examples include:
- молоко<n><nt><nom> ↔ молоко́
- автобус<n><mi><nom> ↔ авто́бус
- замок<n><mi><nom> ↔ за́мок
Furthermore, the letter «ё» is almost always spelled simply «е», even in normative formal texts, despite the fact that the distinction is part of correct orthographic Russian:
- шофёр<n><ma><nom> ↔ шофер
- жёлтый<adj><m><nom> ↔ желтый
- счёт<n><mi><nom> ↔ счет