Compiled by: Advaith Siddharthan
Note: this section needs work!
Each node in the dependency tree can be thought of as an attribute-value matrix, i.e., a bundle of features with values. All values must be set for each node in the tree. This will require checking each node before finishing the analysis. Here is a list of features:
Position (wpos). The linear position of the word in the sentence. This should not be modified or annotated, except for new empty nodes created by the annotator, which should always be given a wpos feature which inserts the new node in teh place where it would belong if it were not empty.
Word (lex). This is the inflected word form associated with the node. It is almost always correctly displayed already. Example: went.
Part-of-Speech (POS). This is the lexical class, taken from a short list. Example: verb. Specific options:
o V -- verbs, but not auxiliary verbs (=Aux)
o N -- common nouns
o PN -- proper nouns
o Adj -- adjectives
o Adv -- adverbs
o P -- prepositions and subordinating conjunctions
o Conj -- coordinating conjunctions, but not subordinating conjunctions; also includes the comma used in enumerations instead of repeated and
o Det -- determiners
o Aux -- auxiliary verbs
o Pun -- punctuation marks, but not the comma used in conjunctions
o Sym -- various symbols (dollar signs and the like)
o Uh -- speech-specifc sounds, even if meaningful (such as /UH HUH/)
o Misc -- everything else, including greetings (Hi, Hello) and interjections (Okay)
Supertag (Stag). IGNORE. This is the supertag. This should not be annotated, as it will be filled in automatically.
Base form (Root). This is the base form (lexeme) of the inflected form. A first "guess" will be included, which needs to be checked and corrected. Example: go.
Morphological Features (Morph). A complete specification of the morphological features needed to derive the inflected word form from the base form. The options are gruoped by part-of-speech; in the menu of the GRAPH tool, all options are displayed at once and the GRAPH tool does not enforce a proper choice of features given the part-of-speech. Possibilities are:
o NOUNS (including proper nouns and determiners)
sg -- singular
pl -- plural
o VERBS (including auxiliaries)
3sg_PRES -- 3rd person singular present sings
PRES -- present tense, but NOT 3rd person singular sing; also use for all subjunctives (lest he sing)
PAST -- past tense sang
PAPRT -- past participle sung
PRESPRT -- present participle singing
INF -- base form when used in inifinitive sing
o ADJECTIVES and ADVERBS
COMP -- morphological comparative longer
SUPER -- morphological superlative longest
o In comparatives and superlatives formed with more and most, label the adjective or adverb as None. MISC
None -- use this when the word is not inflected, other than infinitive verbs (e.g. adjectives in base form)
--- -- use this when the word is not inflected, other than infinitive verbs (e.g. adjectives in base form), or when the word does not ever inflect (prepositions, particles, etc).
Note: in fact "---" subsumes "None" and can be used whenever "None" can be used.
Functional Role Reassignment (FRR). This feature is only used on verbs, nouns, adjectives, or prepositions,and reflects ways in which the usual distribution of roles has been changed:
As Hindi is a (relatively) free word order language, there is no notion of passive voice as such. Subjects and Objects to the verb are identified by case markings, not by position. Similarly there isn't any construct corresponding to “there is a tree”. This would be written in hindi as ek ped hai (one/a tree is).
Pred -- predicative (only nouns, adjectives, prepositions). Use this to indicate that the noun, adjective, or preposition is used as head of a predicative construction with a dependent form of be which is analyzed as an auxiliary. See discussion of copula constructions.
None -- none of the above three cases apply
Surface role (SRole). This is the role of the node with respect to its mother, as the node appears in the surface string. All nodes have a surface role.
Subj -- surface subject. In the event that the verb has more than one argument, the Subj is usually marked by the case marking ne (did). There are cases where the subject is not marked by case. In these cases, the object is always case marked with either ko (to) or se (from). The disambiguation between an un-case-marked Subj and Obj2 should be straightforward. other surface subjects are the empty nodes of non-finite verbs. Surface subjects can also be verbs (sentential subjects, as in gaadi bechna mushkil hai (car's selling difficult is) “Selling the car is difficult”), or adjectives (pehla behtar hai (first better is) “the first is better”) where there is some ellipsis involving the noun, or prepositions (savere main acchha rahega (morning in good stay) “in the morning will be good).
A few tricky cases:
John Mary ko pasand karta hai (John Mary to like does be) means the same as John ko Mary pasand hai (John to Mary like be). In both cases, the meaning is “John likes Mary” and in both cases John is the Subj and Mary is the Obj. When there is no light verb present, the subject markings can vary. Also see the other cases below.
John se kaam nahin hoga (John from work not happen) “John is incapable of doing this work”. In this case, no action takes place (the work doesn't happen) and again the case markings on the subject can vary.
John se ek kaam kiya gaya (John from one task done was) “John performed one task”. This resembles the passive voice in English “one task was performed by John”. Again, John should be marked as subject.
John ka ek dost hai (John belonging-to a friend is) “John has a friend”. The verb in this example conveys possession. This is reflected in the case marking of possession on the Subj John.
Obj -- surface object. This is always case marked with either ko (to) or se (from) when the object in question is animate (apart from the special cases under Subj above). When the object is inanimate, the ko can be omited (the omission usually signals a more general or less specific reading). Obj is also the surface role of a complement of a preposition. This is also the role of sentential complements (the verb which heads the complement has the role Obj). This also covers reflexives: in Apne ko dekho (yourself to look) “look at yourself”, The surface role is Obj, as marked by “ko”.
Obj2 -- surface indirect object. This is usually not case-marked, but should be obvious from having identified Subj and Obj as above.
Mod -- all adjuncts, including modifiers, appositions, and the like. Also, all function words (determiners, auxiliaries, subordinating conjunctions) will be labeled "Adj".
Deep role (DRole). Due to detailed case marking in Hindi, The surface and deep roles are one and the same. The deep role can be omited, and it is assumed that this is the same as surface role.
Done. This feature is only a check to make sure that the default values have been checked. Set it to "Y" when you are done with the features for one node.
Verbs are heads of sentences and clauses.
Back to verbs
The head of any complete clausal utterance is the main verb. Incomplete utterances (NPs, PPs, Greetings) should have as their head the usual head for that type of phrase.
Auxiliary verbs (do, have, had, auxiliary-be) are deleted. Their meaning is represented as features on the main verb (for example, tense:fut). For example, jaa raha hai (is going), jaayega (will go) and jata hai (goes) should all be represented by the verb jaa (go) with tense: present-continuous, future or present. Modals (sakta (can)) are syntactically very much like auxiliaries, but they are included in IL0 for semantic reasons as dependents of the main verb. In all cases, when the main verb is missing, as in VP ellipsis, an empty verb node should be created and used as the head of the entire clause.
When a form of the copula is present in a sentence, the head of the clause will vary depending on the type of copular sentence. Predicative copular constructions will have the predicate as their head. Equative copular constructions will have the copula as their head.
Back to verbs
Arguments vs. Adjuncts
In distinguishing between arguments and adjuncts, consistency is the most important thing. This distinction will matter most for annotating empty categories. In addition, each argument will be annotated with a feature encoding its grammatical role. All non-arguments will be annotated as adjuncts, including function words.
The only NPs that will be considered arguments for annotation purposes are
NPs that never appear with a preposition;
NPs that are obligatory (e.g. Y in X ko Y per rakh (put X on Y).
A list of argument patterns of common verbs can be consulted for questionable cases.
Back to verbs
The role of each argument (subject, object, indirect object) must be annotated as a feature of its node. See the features page for a more detailed description.
Both deep and surface grammatical relations should be annotated. For Hindi, these are one an the same.
Back to verbs
See the general discussion here.
Back to verbs
There is a small class of Hindi verbs that function as light verbs in verb compounds. The main light verbs are ja/gaya (go/went), le (take), de (give), daal (put). These verbs are semantically void and should be deleted (their function is similar to auxillary verbs in that light verbs decide agreement features of arguments of the verb compound; however the arguments are determined by the main verb solely).
Examples:
Hum santre kha gaye (We oranges eat went) “We ate the oranges”
Maine santra kha liya (I-did orange eat take) “I ate the orange”
There are some tricky cases where what appears to be a light verb is actually not semantically void. In these cases, they should not be removed.
Examples:
Ram kha-kar jaayega (Ram eat and-then will-go) “Ram will eat and leave”
Ram kha-ye jayega (Ram eating will-go) “Ram will go on eating”
In the the above examples, ja (go) is not functioning as a light verb. In the first instance, the kar clitic indicates sequencing; hence this is an example of a missing subject for ja (see also Empty Nodes). In the second example, ja contributes meaning to the sentence and should be preserved as a node. In such cases, the supposed light verb should be treated as the head of a verb group, and the other verb (in this case, kha (eat)) should be a dependent of it.
Back to verbs
I cannot find an example of raising verbs in Hindi. This doesn't necessarily mean that it doesn't happen, but i can't find documentation that explicitly says it is impossible.
Back to verbs
Control structures should have an empty node included as the subject of their lower verb.
Some common subject control verbs/adjectives are koshish(try), aasha (hope), chaahna or aakaansha (want, desire), utsuk (keen), nirnay lena or nishchay karna (take a decision, decide), murkh hona (be silly), bhagyavaan honaa (be lucky).
Object control verbs include: tell, tempt, force, persuade, appeal to. As with subject control verbs, object control constructions cannot be used with expletives or non-thematic subjects of sentential idioms. Here too an empty node must be included as the dependent of the lower verb. Just like subject control verbs can be confused with raising, object control verbs can be confused with ECM verbs. Using an expletive object is generally a good test to distinguish between the two, as shown here with the control verb decide and the ECM verb believe.
? I decided there to be a problem.
? I decided the shoe to be on the other foot.
I believed there to be a problem.
I believed the shoe to be on the other foot.
Note that although want is a subject control verb, when it appears with a second NP, it is an ECM verb. In addition, it can appear with a infinitival for-complement. An empty node should only be included in its subject control version. The case with for should be analyzed as an ECM construction, differing only in the fact that for appears as a complementizer dependent of the embedded verb.
I want to leave.
* There wants to be a solution.
I want him to leave.
I want there to be a solution.
I want for him to win the race.
Here are some more examples to motivate the different treatment of the two constructions.
That seems to be my husband.
?? That tried to be my husband. (sounds like an insult to whoever the deictic pronoun refers to)
I believe that to be my husband.
?? I persuaded that to be my husband (sounds like an insult to whoever the deictic pronoun refers to)
In English, we cannot use that as a deictic pronoun to refer to people without a derogatory effect (since the designated person becomes an object, that being used only for objects): ??that (= George) likes apples or ??I work with that (= Hardy). The pronoun that can, however, be used to refer to something deictically in order to predicate of it that it is a (particular) person: that is my husband or that is my co-worker. Here, that does not refer to a person, but to unformed sense data, which is then identified as being a person. The data above shows exactly the same pattern: that can be used felicitously (without derogatory effect) as a subject of a predication (1, 3), even if that subject has raised to surface subject (1) or surface object (3) position of another verb. This is because in raising (1) and ECM verbs (3), the argument is not an argument of the higher verb. However, that cannot be used to refer to a person (without derogatory effect) in any other argument position -- in (2), that is not only subject of the lower predication, but also of the higher verb (subject control verb), and in (4), it is not only subject of the lower predication, but also object of the higher verb (object control verb). Thus the odd effect comes from the use of control verbs and, as a consequence, the that participating in the higher verb's argument structure.
Back to verbs
In an exceptional case marking (ECM, also known as AcI "Akkusativ cum Infinitiv" or "raising-to-object") construction, the NP that morphologically appears to be a direct object is really the subject of the lower verb. That is, it will have as its head not the ECM verb, but the lower verb.
Common ECM verbs include expect, assume, believe, forbid, know, let, need.
As with raising verbs, the best tests are to use expletive there and non-thematic subject idioms.
I believe there to be a problem.
I believe the shoe to be on the other foot.
I need there to be a solution.
I need the cat to be out of the bag.
He let there be light.
ECM constructions may be confused with object control. See Control for a discussion of this matter.
Exceptional case marking constructions with for as in (1-2) below should be analyzed as a subordinate clause with for as a complementizer dependent on the subordinate clause's main verb:
For me to eat Crispy Critters would be unprecedented.
I want for you to eat only Crispy Critters.
Some ECM verbs (need) subcategorize for either an NP and an infinitive or an NP and a past participle. In the case of the latter, the analysis will be the same as that of the small clause complement analysis. The past participle will be tagged as an adjective.
John needs me to solve the problem.
John needs the problem solved.
Back to verbs
When non-finite verb phrases appear without subjects, an empty noun node should be included as a dependent of the verb. If a subject noun phrase is present and part of the VP, as in (1) above, an empty node should not be included. Instead, that head noun (and its dependents if any) should be a dependent of the non-finite verb.
Norma ki sab pe shikaayat karna mujhe hamesha sataata hai (Norma 's everyone on complaint doing to-me always annoys is) “Norma's complaining about everyone always annoys me”.
Sab pe shikaayat karna hamesha auron ko sataata hai (Everyone on complaint doing always others to annoy is) “Complaining about everyone always annoys others”.
Abhi jaane se sab bhang ho jaayega (now leaving from everything disrupt will go) “Leaving now would disrupt everything”.
Parinaam se dukhi hokar Uli ne mehnat karna chod diya results from sad happened-then Uli did effort doing cease gave) “ Depressed by the results, Uli ceased to make an effort”.
Jaane se pehele Max ne Mike ko bulaya (Leaving from before Max did Mike to call) “Before leaving, Max called Mike”.
In general, non-finite clauses will be dependents of main verbs. Exceptions are reduced relative clauses, if they modify nouns. In cases that are not clear, the default choice of a head should be the verb.
Small clause complements will be analyzed with the predication as the head of the small clause and dependent on the head verb. The predication may be nominal, prepositional, or adjectival. In the following, the small clauses are bracketed:
Prabhandkarta [Ernie ko sangat ke liye mehetvapoorna] samajhte hain (Manager Ernir to company for importance considers is) “The manager considers [Ernie an asset to the company]”.
Adhikari [us mamale ko hamare charche ke chaukhat ke bahar] samajhte hain (officer that issue to our discussion 's scope 's outside considers is) “The agent considers [that issue outside the scope of our discussion]”.
Hum [is samasya ko mushkil] samajhte hain (We this problem to difficult consider is) “We consider [the problem difficult]”.
The analysis of small clauses is identical to predicative copular constructions, since the overt copula is omitted anyway at IL0.
In the case of a past participle-headed predication,like the following, the participle should be tagged as an verb as well. The missing arguments (the deep subject) needs to be added.
Hum [is samasya ko samadhan huve] samajhte hai (We this problem to solution happened consider is) “We consider [the problem solved]”.
Hum [is gaadi ko marammat kiye huve] chahte hain (We this car repair done to need is) “We need [the car repaired]”.
Back to verbs
Not quite “wh” in Hindi, but this section deals with questions containing kya (what), kaun (who), kaunsa (which), kab (when), kaise (how) etc. As with other full clauses, the head of a wh-question will be its main/lexical verb. The wh-word will be a dependent of the main verb like any other argument.
When the wh-word is part of a long-distance dependency, it will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which the wh-word originated. The linear order will allow a reconstruction of the wh-word's surface position. In cases of long-distance dependencies, there may be "crossing arcs". This is ok.
If an overt subject is not present, as in (1), include an empty noun; otherwise an imperative will have the same analysis as a declarative sentence.
Mujhe akele chod! (Leave me alone!)
Tu mujhe akele chod! (You leave me alone!)
Back to verbs
A relative clause will be the dependent of whatever it modifies, in most cases a noun. The arc is labeled MOD. As with other clauses, its main verb will be its own head. The relativizer will be a dependent of the main verb like any other argument (or adjunct, in cases such as woh jagah jahaan usne machli dekha (that place where he-did fish see) “the place where he saw the fish”).
In long-distance dependencies, the relativizer will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which it originated. The linear order will allow a reconstruction of its surface position.
Reduced relative clauses (aapse chuna huva udaan (you-from choose has-been flight) “the flight chosen by you”) are analyzed like regular relative clauses without overt relative pronoun. They have only an empty subject inserted, but not an empty complementizer, nor an empty auxiliary.
Reduced relative clauses appear similar to non-finite past or present participial clauses and may be difficult to distinguish from these. However, they will always depend on a nominal rather than a verbal head. Although most reduced relative clauses are postnominal, it seems that they can be preposed as in (1) below. When sentence initial, it may be difficult to decide what they depend on. If it is clear that they modify a noun phrase (as in (1) below), choose the noun; otherwise choose the verb as their default head, have in (2) and (3), sang in (4). Note that world knowledge needs to be used when making these decisions.
[Staying at the Palace Hotel], you can use the gym.
[Returning on the eleventh], I have a couple flights, the first one departing Baltimore at twelve forty p.m.
The lowest rate I have for a car [using your discount number] is going to be Avis.
[Playing in the yard], the boy sang happily.
Two tests to use to decide whether the clause is modifying the verb or a noun:
Can you insert while or being at the beginning of it without changing the meaning? If yes, it should modify a VP; otherwise, it's a dependent of the NP.
Can you insert which/that is/are without changing the meaning? If yes, it depends on the NP; if not, it modifies the verb.
Back to verbs
VP-ellipsis should be annotated with an empty verbal head as the root node. Any auxiliaries and the subject will be dependents of this node. No missing arguments should be added. Also see section on empty nodes.
The head of a noun phrase is the head noun. Any determiner is a dependent. Adjectives are separate dependents from determiners. If there are multiple adjectives, the default structure will simply have each adjective as a direct dependent of the noun. This is the case for multiple determiners also.
Adverbial noun modifiers can be dependents of the determiner or the noun in the phrase they modify. For example, lagbagh (approximately), n bartaav se (practically), jyada-se-jyada (at most), only can depend on cardinals or some quantifiers; kum-se-kum (at least), sirf(only), bus(just), even can depend on nouns (i.e. modify entire noun phrases). These classes have some overlap; the default head choice in cases of ambiguity should be the noun.
Compound noun phrases, when clear, can have multiple noun phrases as dependents. For example, chaubis ghanta samachar seva (twentyfour hour news service) will have seva (service) as the head and samachar (news) and ghanta (hour) as its direct dependents. Chaubis (twentyfour) will be a dependent of ghanta (hour). A good test for this is to remove each noun in turn, to see if the phrase still retains part of its original sense. Because a 24 hour news service is a news service and a 24-hour service, this analysis is the one we want.
In cases where it's not clear whether or which nouns modify each other, the default compound structure will have all modifying nouns as direct dependents on the rightmost noun.
http://www.cis.upenn.edu/~creswell/dependency/compound.gif
Proper nouns should have the value PN for feature POS. They are treated largely like nouns, except that compound proper nouns are not analyzed syntactically as if they were common nouns, but rather given right-branching structures. (The intuition is that they are really fixed phrases.) So in British Airways, British is the head, has POS PN, and carries the other features of this proper noun (in American English, singular number). Airways is a dependent on British (with SRole Adj), and also has POS PN. In Hindi, proper nouns cannot be identified by capitalization (which doesn't exist). Hence all nouns or compound nouns that are the names of companies, organizations, locations, people or animals etc should be marked as proper nouns. Note how this differs from the English Manual, where Heathrow Airport would be marked as PN PN, while Heathrow airport would be marked PN NN. In Hindi, heathrow hawaii adda (Heathrow air terminal) should be marked PN PN PN for standarization.
Quantifier headed NPs
In a noun phrase consisting of only a quantifier, the quantifier should be the head of the NP. Any modifying phrases are directly dependent on it.
Sab aa gaye (All come went) “All came”
Adjectives and adverbs will be coded in much the same way that nouns and verbs are coded. The same procedure is followed.
Adverbs and adjectives point to modifying concepts -- adjectives for nouns, adverbs for verbs. For example, in the phrase, neela kitab (blue book) the adjective neela (blue) modifies kitab(book) by identifying the color of the book. In woh sunder nachchti hai (she gracefully dances is) “she danced gracefully" the adverb sundar (gracefully) modifies the verb by specifying the manner in which the action was performed.
The degree of the modification can be specified by other modifiers, such as bahut (very) or halka (light). These degree modifiers are also adverbs.
In addition, there are two kinds of degree specification that you probably know them as the comparative and superlative forms. In Hindi, the comparative is achieved by using jyada (more) and the superlative by using sabse jyada (of all more) “most”.
In order to simplify the lookup procedure in Omega, and to allow for a common interlingual representation of degree, adjectives and adverbs will be shown in their base form (called their "positive degree"). If they are in the text as comparatives or superlatives, that will be indicated as a feature of their node.
Quite often participial forms of a verb will show up in syntactic positions also occupied by adjectives. Some adjectives also have the form of participles. The present participle of a verb ends in "-ing," e.g., eating, buying; the past-participle ends in "-ed," e.g., loved, believed.
These participles and participial adjectives can show up
(a) in pre-nominal position:
bhaga huva kaid (ran happenned prisinor) “escaped prisoner”
(b) post-nominal position:
suraj se murjhe huve parde (sun from fade happened curtains) "the curtains faded by the sun”
(c) copulative position:
parinam achanak they (results unexpected were) “The results were unexpected”.
Parde murjhe huve hain (curtains fade happened are) “The curtains are faded”.
The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun.
It is not always easy to tell the difference. Here are some clues / tests to tell the difference:
(1) If there is no corresponding verb, it must be an adjective. E.g., unexpected, talented, down-hearted, diseased.
(2) If you can add the adverb "very" in front of the participial form, then it is probably an adjective. For this test to work, however, the adjective must be scalar or gradable. For example, the adjective blue is scalar and thus intensifiers like very can be added easily, and comparative and superlative forms exist: very blue, bluer, bluest. The adjective, triangular, however, is not gradable. The intensified and comparative forms sound funny: very triangular, the most triangular, etc.
The very smiling man. (bad, and thus a verb) The very frost-bitten man. (good, and thus an adjective) The very heart-breaking results. (good, and thus an adjective) The very quail-hunting vice-president. (bad, but maybe because hunting is not gradable?)
(3) If there are dependents on the participial form (a direct object, or an agent), then it is likely that it is a verb. Thus most postnominal modifiers will be verbs, since their position almost guarantees the presence of additional dependents.
(4) If the word is not listed in an on-line dictionary like Merriam Webster as an adjective, it is likely to be a verb.
(5) When in doubt, make your best guess and discuss the issue with Owen. Participles, which are sometimes coded as adjectives, are generally coded here as verbs. Participles are the -ing and -ed form of the verb, and are not main verbs. For example, in the sentence "The man eating the eggplant is old." the word "eating" is a present participle and modifies or specifies the "man". Similarly, in the sentence, "The man killed yesterday by police was buried today." "killed" is a past participle and again modifies or specifies "man". Since these are coded as verbs, they will also assign semantic roles.
See the manual section on copular constructions for how to handle such sentences as The book is blue.
IN PROGRESS.....
Note that Penn TreeBank did something arbitrary, but consistent, across verbs. What we have decided to do for this project is mainly for the sake of consistency, not out of any strong theoretical bias.
For now, there will be separate nodes for V and Prep. Annotators
will annotate each with the correct concept, and if that concept
conflates meaning of the preposition in the verb, then mark the
preposition as "EMPTY".
See the manual section on copular constructions for how to handle such sentences as The book is in the tub.
To verbs
Sentences whose main verb is a form of to be fall into several types, mainly existential, equative, and predicative. Existential use of hai asserts existence (chand par aadmi hai (moon on man is) “there is a man on the moon”), equative use of hai equates two entities (John woh doshi hai (John the culprit is) “John is the culprit”), while the predicative use asserts that the post-verbal predicate holds of the deep subject (John doshi hai (John culprit is) (John is guilty/John is a culprit). These three constructions are treated in two different ways: existential in one way, and equative and predicative in another way. In general, the use of a definite determiner on the second argument suggests an equative use, while the lack of a definite article suggests a predicative use (Hindi does not have a non-definite determiner).
In the case of existential hai (be), the head of the sentence is the verb hai, with any prepositional phrase as an adjunct. The meaning of the existential construction is that the existence of the subject is asserted. Any PP is modifying the existence assertion.
In the case of equative and predicative hai, the predicate (Obj) of the verb hai as the sentence head with one deep syntactic Subj. The verb hai is treated as an auxiliary, and thus deleted. See verbs and auxiliaries: choosing a head. This analyses makes predicative and equative copula constructions look just like small clauses. Note that the grammatical role of the predicate reflects the role of the predicative construction in the sentence. In Mary laal hai (Mary red is) “Mary is all red”, laal (red) is the root of the sentence, while in Vaidya hote huve John aftar khoon dekhta tha (Doctar being is John often blood see was) “Being a doctor, John often saw blood”, vaidya (doctor )depends on dekhna (see) and is a MOD.
The meaning of a predicative construction is the assertion that the predicate holds of the subject. The meaning of the existential construction is that the identity of the two arguments is asserted.
If you are having trouble determining whether the use of to be is existential or predicative, use the following rules (also see Expletive subjects and there-insertion):
If the object of hai (be) is an adjective phrase, it must be a predicative use:
Gulab bahut laal hai (Rose very red is) “This rose is very red” (predicative)
If the object of hai (be) is a noun phrase, it cannot be an existential use:
Yeh phool gulab hai (This flower rose is) “This flower is a rose” (predicative)
Yeh phool woh phool hai jo main utha raha tha jab maine Pat ko pehle bar dekha (This flower that flower is that I pick-ing was when I-did Pat to first time see) “This flower is the flower that I was picking when I first saw Pat”. (equative)
If the object of hai (be) is a prepositional phrase, it cannot be an equative use (but note that you needn't distinguish equative from predicative uses).
Phool guldaan main tha (Flower vase in was) “The flower was in the vase”. (predicative)
If there is only one argument of hai (be), it is an existential construction.
Sarvaagat chetna hai (Universal consciousness is) “There is universal consciousness”. (existential)
Conjunction has its own part-of-speech (Conj). The conjunction (aur (and), ya (or), lekin (but), etc) is placed as a dependent of the first conjunct with role Mod, and the second conjunct is a dependent of the conjunction with role Obj.
If a comma acts as a conjunction, it is treated as such (given part-of-speech Conj and analyzed as in the above paragraph). However, note that in "chicken, ducks, and geese", the second (last) comma does not serve as a conjunction (since there is an explicit "and"), and it is removed at IL0. The first comma does serve as a conjunction.
To verbs
This section discusses cases in which the annotator must add an empty node to the tree. An empty node is a node which does no correspond to a word (or other graphical manifestation such as a punctuation mark) in the input string.
New empty nodes are created using the "new" option under "Node" in the TrEd tree editing tool. The new node should have feature POS to N (most cases) or V (if VP ellipsis). Give the new node a wpos feature so that it ends up in a position that roughly corresponds to its grammatical function (i.e., if it is a subject, to the left of its governing verb, and so on). When the fs files come out of the parser, the nodes have wpos features in increments of 10, so there are enough unused positions to place new nodes where they belong. Never reuse an already used position.
For the lex feature, first identify which node this node is coreferential with. This is usually straightforward. Then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value, for example "<Dominic>". Alternatively, use lex feature "<pro>" if it is hard to identify the correct coreferential node.
Sometimes, it is not clear what the reference of the empty node is ("arbitrary pro"). Arbitrary empty subjects can usually be found in adjunct clauses. For example, in Jicama khaana bachchaon ke mashtisk vikas ke liye achcha hai (Jicama eating children 's brain growth for good is) ”Eating jicama is good for children's brain growth”, the subject of "khaana (eating)" is not specified. In these cases, we label both the lexeme and the word feature of the new node "<pro>". In case of doubt ("<pro>" or "<child>"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in the lexeme. In the example sentence Eating jicama is good for children's brain growth, the syntax does not determine the identity of the missing subject (compare Buying Microsoft products is good for Bill Gate's wealth growth).
We now list cases of missing nodes.
Missing arguments of a predicate should appear as empty nodes. The basic idea is this: replace all missing deep-syntactic arguments of each predicate (verb or predicative noun, adjective, or preposition) with empty nodes.
Missing adjuncts will not be replaced by empty nodes. In some cases, deciding whether the missing constituent is just an adjunct or a seemingly optional argument is quite difficult. Consistency is the important thing in such a situation. See the discussion of arguments and adjuncts.
Do not replace missing semantic arguments which are not syntactic arguments. For example, a non-agentive (ergative) construction (khidkee khulee (window opened) “the window opened”) and other verbs with missing participants should not have their missing semantic arguments included as an empty NP. Only missing syntactic arguments should be added as empty nodes.
Also, in cases of verbs that can take an optional direct object, do not fill in missing direct objects. For example, in John ne kha liya (John did eat take) “John ate”, do not add an empty node as object just because John ate an apple is also possible. We consider the intransitive use a syntactic option, not an elliptical construction (after all, eat does not mean the same as eat something).
There are several standard cases of missing arguments, we list them here but the list is not necessarily exhaustive.
Note that in all of these cases, in addition to labeling the surface grammatical role, you must also label the deep grammatical role, which as usual may or may not be the same as the surface role. The principal reason this may not be the case would be passive voice in the lower verb.
Gerundive, Infinitive and Participial VPs (Bauchhaara mein gaate huvey Sadao phisala (Shower in singing happenning Sadao slipped) “While siging in the shower, Sadao slipped”): put in an empty surface subject NP if no subject is present. See non-finite clauses for details.
Subject and object control verbs (Gustav jaanaa chaahtaa thaa (Gustav to-leave wanted did) “Gustav wanted to leave”, John ne Ahmed ko jaane ko kahaa (John did Ahmed to go to said) “John told Ahmed to leave”: These constructions will require a missing category to be included as a dependent of the embedded verb, which is the surface subject of the lower verbal head. See Control for more details.
Imperatives (Leave!): add an empty surface subject NP if one is not present. For details, see Imperatives.
Relative clauses: when no relative pronoun is overt, include one. If the clause is a reduced relative (topeka mein dekha gaya aadmi (Topeka in seen went man) “the individual seen in Topeka”), you need to add an empty node on the main verb of the relative clause for the relativized noun, and one for the missing deep subject.
Conjunction reduction. When a conjunction results in the disappearance of an argument (John ko seb achha lagta hai aur santra burra (John to apple good feel is and orange bad) “John likes apples and hates oranges”, santra Elissa ko achchi lagti hai aur Margaret ko burra “orange Elissa to good feel is and Margaret to bad” (Elissa likes and Margaret hates oranges), the missing argument must be added as an empty node.
VP ellipsis is the term for cases in which the main verb is replaced by an auxiliary:
John likes beans, and so does Mary
Henry thought he could jump over that wall, but Jules knew he couldn't
VP ellipsis requires an empty verbal head; the auxiliary is deletd in the usual manner and replaced as needed by features. In addition, add all missing arguments (but not any adjuncts!), as described above. The lexeme and word of the empty head should be filled in from the antecedent between brackets, e.g. "<play>" forMary plays with cats and so does Tony.
Gapping. In gapping, a verb is deleted in a conjunction Francis ne ek shaphataalu khaya aur Elise ne ek khubaan (Francis did one peach eat and Elise did one apricot) “Francis ate a peach, and Elise, an apricot). In the second conjunct, the verb must be restored as an empt ynode.
Quantified noun phrases without an overt nominal head include cases such as:
John ne do kutte khareede aur Mary ne teen (John did two dogs buy and Mary did three) “John bought two dogs, and Mary bought three”
Here, put in an empty noun head (in this case, for kutta (dog).
Conjunctions: In lists of conjoined phrases where there is only one conjunction but more than two conjuncts (e.g. Tom, Dick, and Harry), a comma separating two conjuncts in lieu of a conjunction can be analyzed as the missing conjunction.
Remove all punctuation, except meaningful punctuation. Examples:
Quotes -- leave them (open and closed) attached to the constituent which is quoted. If the quoted passage is not a constituent, quote each piece separately.
Commas that act as conjuncts (see Conjunction).
Do remove:
All non-conjunction commas.
All sentence-final punctuation.
All dashes and so on.