Exploring Chinese Type Coercion

Viewer
Transcript

Exploring Chinese Type Coercion: A Web-as-Corpus Study

Shu-Yen Lin National Taiwan Normal University [email protected]

Shu-Kai Hsieh National Taiwan Normal University [email protected]

Yann-Jong Huang National Taiwan Normal University [email protected]

Abstract

This study aims to explore Chinese type coercion, a phenomenon which has been refuted by some linguists. Their discussion has been based on the translation of the English sentences discussed in the coercion literature. We point out the inappropriateness of this approach by showing that it does not take account of the lexical semantics of the target language and the real langue use. To show that type coercion is pervasive in Chinese, we adopt the corpus-based approach (web as corpus) and focus on one of the generative mechanisms proposed in Pustjovsky (1995), namely, true complement coercion. A hand-crafted lexico-syntactic template is used to extract coercion data along with their non-coercive counterparts from the Web. Our preliminary results support the hypothesis that true complement coercion is a universal linguistic mechanism. Our data-extraction algorithm is also likely to be useful in automatically discovering the telic and agentive roles of nouns as well as in establishing a gold standard corpus for contrasting the coercive vs. non-coercive contexts of these nouns. In exploring Chinese type coercion, we also find instances that may be unique to Chinese, hence providing language-specific support for the generative mechanism of language.

1

Introduction

Natural language use often proceeds in an underspecified fashion, leaving many meaning facets unexpressed in the surface form. The inferential process of understanding an utterance like John began the book requires that the language user interprets the book as part of an event in which John was involved. This utterance is commonly interpreted as John began to read/reading the

book or John began to write/writing the book. The reasoning process underlying the resolution of linguistic underspecification or semantic incongruity, according to the theory of generative lexicon (Pustjoysky, 1995), involves many different generative factors that account for the way language users create word senses in novel contexts. One of the most important generative mechanisms is type coercion. In this example, the verb coerces its argument to assume an event type (i.e. read/write a book) from an entity type (i.e. a book). If lexical semantic knowledge is represented in the same fashion across languages, we would expect the generative mechanisms to apply universally to natural languages. In this paper, we set out to explore this hypothesis in the case of Mandarin Chinese. To our knowledge, so far no systematic study of Chinese coercion based on extensive empirical data has been done in accordance with the framework proposed by Pustjovsky (1995). The only observation of Chinese type coercion we know is in Huang and Ahrens (2003), which suggests that certain Chinese classifiers can coerce event reading from nouns that denote individuals. Although the generative mechanisms discussed in the theory of Generative Lexicon (GL) have received full attention in Liu (2003), Lin and Liu (2005), and Huang (2009)i, their thesis is that being an analytical language, Chinese lexicon does ‘not share the same degree of richness’, as in a language like English, ‘in sub-lexical event information’ (Lin and Liu, 2005:9). In this study we aim to show, contrary to the abovequoted argument, how coercion is accomplished in Chinese. Our claim is based on empirical data extracted from the Web. In section 2, we show why the approach adopted by the linguists who argue against the

universality of coercive mechanism is inadequate, with the emphasis on the coercion of causative subject. In section 3, we provide data extracted from the Web to support our claim that Chinese, like English, also relies on the generative mechanisms, in particular type coercion, to resolve semantic underspecification. In order to collect coercion data in a systematic manner, we focus on a specific type of coercion generated by specific selectors – true complement coercion generated by control verbs. First, we hand-crafted a lexico-syntactic template with which we googled the Web. This was followed by manual filtering to pick out the coercion instances. It is noteworthy that our algorithm of extracting the data is likely to be useful in automatically discovering the telic and agentive roles of nouns as well as in establishing a gold standard corpus for contrasting the coercive vs. non-coercive contexts of the control verbs. In section 4, we bring up factors which are likely to involve language-specific generative mechanisms. These include, in particular, the classifiers in Chinese.

2

Is there no coercion in Chinese?

It is well known that Chinese has a markedly different morpho-syntactic marking system than most European languages, but the question remains whether Chinese also behaves distinctively in terms of lexical semantics. According to Liu (2003), Lin and Liu (2005), and Huang (2009), the answer is yes. It is argued by Lin and Liu (2005:9) that ‘in English the primitives that carry event information are extensively incorporated into individual lexical forms, but in Mandarin Chinese they are not; instead, they are sent directly to syntactic computation.’ ii This claim is based on what appears to them to be the failure of coercion in Chinese, which is represented by the Chinese translation of the English coercion sentences in the literature. For example, it is unacceptable to them to say Zhangsan kaishi yi-ben shu (the word-for-word translation of John began a book). Instead, one must say Zhangsan kaishi du yi-ben shu ‘John began to read a book’. Interestingly, of the many English sentences in the literature exemplifying true complement coercion, this is the only sentence translated into Chinese in Lin and Liu (2005). We will show in more details in the next section that true complement coercion is commonly generated in Chinese. In this section, we focus on the possible underlying reasons why Lin and Liu

(2005) argues that the coercion of causative subject by and large doesn’t work in Chinese. 2.1

Subject coercion in Chinese: A case study

To refute the coercion of causative subject in Chinese, it is argued that the verb sha, the literal translation of kill, does not allow for a broad range of possible subject types except for the agent of the action. For example, it is argued that Zhe-ba qiang sha-le Lisi ‘This-CLASSIFIER gun kill-ed Bill’ is ill-formed (Liu, 2003; Lin and Liu, 2005). At least two questions can be raised regarding the approach described above. To begin with, it is doubtful whether the discussion of lexical semantics can be founded on the acceptability of sentences translated from another language. Such an approach presumes that the semantic representation of kill is the same as that of sha. It is highly doubtful that this can be true. There is at least one obvious difference between kill and sha. The verb sha implies, but does not necessarily entail, that the victim is dead. This can be illustrated by a fixed Chinese expression equivalent to attempted murder: sha ren wei sui try to kill person not succeed To make it clear that the victim died, one says, for example, sha-si, with the second morpheme denoting ‘die’. In order to discover the differences between the lexical semantic representation of sha and kill, we surveyed the subject types of sha-le ‘killed’ vs. its synonym duo-zou someone’s shengming ‘take away by force someone’s life’ as well as killed vs. took the life of. We googled the Web with the queries “sha-le” and “duo-zuo * shengming” and manually tagged the subjects in the first 100 snippets. 18 search results of the query ‘sha-la’ are either not possible to tag or irrelevant to our survey. The majority of the subjects of sha-le (77 of the 82, i.e. 94%) are the animate agents of the action. There are 5 subjects (6%) which are not animate agents. The distribution of the subject types of “duo-zuo * sheng-ming” is in inverse proportion. The majority of its subjects (74 of the 83 tagged ones, i.e. 89%) are inanimate. These include words that denote disease, accident, catastrophe, etc.. There are only 9 agentive subjects (11%). We also tried to discover the distribution of the subject types of killed and took the life of. In

contrast to “duo-zuo * sheng-ming”, the preference of inanimate subjects is not so strong with ‘took the life of’. Only 67 of the 93 tagged subjects (72%) are inanimate. There are 26 animate agentive subjects (28%). The tendency of taking agentive subject by killed is also less strong than that by sha-le. Of the 59 tagged subjects of killed, there are 16 instances (27%) that take inanimate subjects. (Many of the first 100 search results of the query “killed” are in participle form, mostly in passive constructions without the by-adjunct.) These tendencies of taking more animate subjects than inanimate ones or vice versa are likely to underlie the intuitive judgment of the semantic well-formedness of a sentence, especially when it is presented as an isolated sentence. Though the Chinese verb sha does take inanimate subjects, its frequency relative to animate subjects is pretty low. The overwhelming impression that sha only takes animate subjects is reinforced by the competing synonym duo-zou someone’s sheng-ming ‘take away by force someone’s life’, which has a strong tendency to collocate with inanimate subjects. The relation between killed and took the life of is different. First of all, killed has a much weaker tendency to take animate subjects than sha-le. The higher frequency of inanimate subjects of kill and the weaker competition from took the life of to take an inanimate subject makes the semanticality judgment of such a sentence as The gun killed Mary not so dubious as that of its Chinese equivalent. A translation-based approach presumes that the paradigmatic relations of the equivalent verbs (e.g. sha and kill) to other lexical items in their own languages are the same, but we have shown that this presumption is a risky one. Let’s get back now to the seemingly unacceptable Zha-ba qiang sha-le Lisi ‘This-CLASSIFIER gun kill-ed Bill’. Notice that sha is the root morpheme of verbs like qiang-sha ‘gun-kill’, shesha ‘shoot-kill’, mou-sha ‘conspiray-kill’ (‘murder’), ci-sha ‘stab-kill’ (‘assassinate’), pu-sha ‘assault-kill’ (‘wipe out’), zi-sha ‘self-kill’ (‘suicide’), lei-sha ‘strangle-kill’, tu-sha ‘slaughterkill’ (‘massacre’), etc.. The senses of these words, both Chinese and English, all incorporate the sense of sha or kill. Our speculation is that when a concept has been lexicalized, decomposing the lexical item and placing the component parts in separate syntactic chunks renders the sentence odd unless it occurs in particular contexts with special discursive purposes. This pre-

dicts low frequency of ‘to kill someone with a/the gun’ and ‘a/the gun kill someone’ relative to ‘to shot someone’ and ‘someone being shot’. Query “qiang-sha le” ‘gun-kill-ed’, i.e. ‘shot’ “ba qiang sha-le” ‘CLASSIFIER gun kill-ed’ “was shot” “were shot” “a gun killed” “the gun killed” “killed * with a gun” “killed * with the gun”

Frequency 167,000 498 24,100,000 2,440,000 895 374 457,000 897

Table 1. Frequency differences between lexicalized words and syntactic chunks The web search results presented in Table 1 confirm our prediction. The sharp contrast between the frequencies of the lexicalized units and the syntactic chunks supports our view. Though the search results of “ba qiang sha-le” is part of “qiang-sha le”, its frequency is so low that this can be ignored. We manually examined the first 100 snippets of “ba qiang-sha le” and found 75 instances in which the gun is the object of the verbs ‘take’, ‘use’, ‘buy’, etc. in phrases like na ba qiang sha-le ta ‘take a gun and kill him’. There are 19 instances where qiang ‘gun’ is the subject of sha. The web-as-corpus approach shows, on the one hand, that sha indeed can take qiang ‘gun’ as its subject, and, on the other hand, why such causative subject coercion seems illformed to some native speakers. We have attempted to account for the factors underlying the seemingly unacceptable sentence Zha-ba qiang sha-le Lisi ‘This-CLASSIFIER gun kill-ed Bill’. In the next section, we show in more details that type coercion is pervasive in Chinese. Specifically, we focus on true complement coercion of control verbs for the purpose of discovering empirical evidence in a systematic fashion.

3

True complement coercion in Chinese

To explain the systematic variation of meaning behind examples like John wants another cigarette, Jane wants a beer, and Lisa wants a job, GL has suggested a theory of composition that takes these shifts in meaning to be a matter

of shifts in semantic type, called type coercion (Pustjovsky, 1995; Asher and Pustejovsky, 2000). The direct object of want undergoes a type shifting operation by virtual of lexical governance from the verb and assumes an event reading, resulting in the interpretation of ‘to smoke a cigarette’, ‘to drink a beer’, and ‘to have a job’. Hence, GL is able to account for the creative use of words like want in different contexts and conflate its different word senses into a single meta-entry, greatly reducing the size of the lexicon. This section describes how we study Chinese type coercion focusing on instances like the English examples cited above. Syntactically, such verbs are called control verbs, or subject control verbs to be more specific, where the subject of the main clause is the underlying subject in the embedded non-finite clause. 3.1

Using lexico-syntactic template to discover coercion data

The seminal work of Hearst (1992) has inspired a considerable body of research of matching specific lexico-syntactic patterns in corpora to identify instances of a relation of interest (Charniak and Berland, 1999; Iwanska et al., 2000; Widdows and Dorow, 2002; Snow et, al. 2006; Cimiano and Wenderoth, 2005, 2007; Yamada, et al., 2007). Recently, there is an increasing interest in using the Web as a big corpus to identify the searched patterns (Markert et al., 2003; Etzioni et al., 2005; Cimiano and Wenderoth, 2007). Our work is inspired by both approaches but differs in the design and use of the lexico-syntactic template. The work by Yamada et al. (2007) and Cimiano and Wenderoth (2007) are most relevant to our study. Both aim to automatically acquire the qualia roles of nouns from corpus data. Cimiano and Wenderoth (2007) uses patterns like “purpose of (a|an) x is (to)”, “(a|an) x is used to”, and “to * a|an new|complete x” to match candidates for the telic and agentive roles. The templates used by Yamada et al. (2007) can be exemplified by a book worth/deserving reading, a well-read book, (I enjoy) reading books, a book to read, and the book was written. These templates are designed brilliantly and yield good experimental results, but we aim to go a step further to extract the coercion instances along with their non-coercive counterparts while discovering the qualia roles at the same time. Our algorithm employs the notion of “lexical conceptual paradigm” (LCP) proposed by Pustejovsky et al. (1993) that relates a set of syntactic

behaviors to the lexical semantic structures of the participating items. For example, the container LCP relates the set of generalized syntactic patterns (p. 342): ViNj {to, from, on} Nk (e.g. read information from tape) ViNj (e.g. read information) ViNk (e.g. read tape) This LCP includes a nominal alternation between the container (i.e. tape) and the containee (i.e. information) and the verb that predicates the telic role (i.e. read) of the noun. In this study, we utilize the control verb LCP. With this LCP, we can search for the coercion instances in addition to the qualia roles. Vi to Vj Nk (e.g. want to drink a beer) Vi Nk (e.g. want a beer) With regard to the English language, this LCP only pertains to subject control verbs, but it also pertains to object control verbs in Chinese since the object in the main clause can be elided for some object control verbs. For example, we found in our data the alternation. jinzhi tonggong forbid child worker jinzhi guyong tonggong forbid employ child worker Omitting the object in the main clause (i.e. the employer) is grammatical as long as the object is inferable from the context. Also, notice that Chinese control verbs can be followed by a verb directly without any other units in between such as the infinitival to in English. 3.2

Discovering Chinese true complement coercion on the Web

Another feature of our template is that there are no specific lexical items in the template. Our first step was hence to collect control verbs in Chinese. Then we googled the Web with these verbs and randomly examined their objects. If an object appears to be an instance of type coercion, the verb-object pair is put into the template “V * O”. The * enables the search engine to find anything (or sometimes nothing) between the verb and the noun. The intuition is that when a verb

occurs between the control verb and the noun, it is a good candidate for the telic or agentive role of the noun. The web search results of employing such templates were manually checked. With the verb zhizai ‘aim to’, for example, we found at least two VO pairs that are instances of true complement coercion – zhizai daxui ‘aim (at) college’ (816 results retrieved) and zhizai guanjun ‘aim (at) championship’ (613 results retrieved). We then googled the Web with the template “zhizai * daxui” and “zhizai * guanjun” and found the verb shang ‘enter’ and weimian ‘win’ between the selector and the nouns respectively. There were 3 occurrences of zhizai shang daxui ‘aim to enter college’ and 42 occurrences of zhizai weimian guanjun ‘aim to win the championship’. The control verb LCP can be employed to discover not only the coerceion data, e.g. zhizai daxui ‘aim (at) college’, but also the telic or agentive role of the noun involved, e.g. shang ‘enter’ for daxui ‘college’. Since the background of our study is the strong argument of no type coercion in Chinese, the manual checking and an algorithm with high precision but low recall fit our research purpose well. Appendix A presents a fragment of our findings. Clearly, these frequencies do not represent the real frequencies of the searched patterns because some of the collocations may result from other patterns. But at least one sentence of each pattern is attested by two native speakers and then saved in our database. An example is: 高三這年是志在的一年

大學

的高中生最需把握

aim (at) college ‘ The third year in senior high school is the most crucial year for the students who aim at college to seize.’

志在上大學 aim (to) enter college ‘It turned out that he aimed to enter college.’ 原來他

We have thus built a small database which can serve as a gold standard corpus for the study of Chinese true complement coercion. Most of the control verbs in our database, e.g. zhizai ‘aim to’, xihuan ‘like to’, changshi ‘try to’, jixiu ‘continue to’, etc. have underspecified meanings, a typical property of coercive selectors. It is interesting how these high frequency verbs elude the attention of the linguists who refute the pervasiveness of type coercion in Chi-

nese. Let’s use an example to illustrate our point empirically. In Lin and Liu (2005:17), it is argued that nominals in Chinese that are inherently event-denoting can compose with aspectual verbs like kaishi ‘begin’, and because nominals like shu ‘book’ are not inherently event-denoting, they fail to be coerced by kaishi. Nevertheless, the authors also note that the verbs enjoy and want are also aspectual verbs. What they do not notice is that expressions like xiangshou kafei ‘enjoy coffee’ and xiangshou yangguang ‘enjoy sunlight’ (53,400 and 635,000 results retrieved from Google) are fully acceptable although kafei ‘coffee’ and yangguang ‘sunlight’ are not event-denoting expressions. Nonetheless, all the native speakers we have consulted with do find the sentence Zhangsan kaishi yi-ben shu ‘John began a book’ unacceptable. The web search results also indicate its semantic illformedness. This specific example, however, is not sufficient to rule out type coercion in Chinese. What it indicates, we suggest, is probably the unique lexicalization of the noun shu or the verb kaishi, which requires more extensive research to understand. Our searching results show that type coercion should be studied in real language use, i.e. in the contexts of their occurrences. The following example further explains our point. The phrase changshi xiju ‘try comedy’ is a frequently occurring expression (27,300 times) on the Web. Without context, however, it is unlikely to know the intended action. As described above, the query “changshi * xiju” was then sent to the web search engine. The search results show that one may try ‘to act (in)’, ‘to film’, ‘to write’, or ‘to watch’ a comedy, and one can certainly think of other verbs that fit into the slot. Clearly, there is a contextual variability at play. When an actress says she wants to changshi xiju ‘try comedy’, the most likely meaning is to act in a comedy. If it is uttered by a director, the intended action is probably to film a comedy. Research in this direction will most likely rely on a contrastive database like ours in which the coercive and the non-coercive counterparts are presented in parallel. A lexico-syntactic template that only aims to acquire the qualia roles will miss such contexts and not be capable of providing clues as to how the various qualia roles are selected by which contextual factors. Our algorithm is supposed to be superior to simply ranking the various roles of the quales (Cimiano and Wenderoth, 2007; Yamada et al., 2007), which only relies on statistical correlation and is not sensitive to the context.

3.3

Automatically discovering coercion data from tagged corpora or the Web

The algorithm we employed can easily be applied to the automatic discovery of coercion data and qualia roles from tagged (and parsed) corpora or the Web. This section presents its likely procedure. The first step is also to collect control verbs. The second step is to discover all the objects of the verbs. This is an easy step with a parsed corpus. With the Web, one can use Google API to download a certain number of snippets of the web search results using the verb as the query. Then the snippets are POS tagged (and parsed if the accuracy is high). With a tagged but not parsed corpus, the first nominal element (excluding determiner, proper names, and pronouns) preceded by the verb is extracted. The verb-noun pair is put into the template “V * N”, which then is sent to the web search engine or the text corpora and the abstracts of the first n documents matching this query are downloaded or extracted. Note that the matched sequence must not span across phrasal boundaries. If there occurs a verb and nothing else between the control verb and the noun, it is a candidate for the telic or agentive role of the noun and the verbnoun pair in the search template is likely to be an instance of coercion, while the verb-verb-noun sequence may be its non-coercive counterpart. Human annotation may follow to filter the data, or machine learning techniques may be employed to boost the speed. Which approach yields better results requires further evaluation.

4

Language-specific evidence for GL

In this section we show that Chinese provides language-specific evidence for type coercion. The first evidence is associated with its nominal classifiers, which, according to the traditional view, only classify individuals. Huang and Ahrens (2003) demonstrates that certain classifiers can coerce event readings from nouns that are typically interpreted as individuals, and that the event classifiers can be classified into eventtype classifiers and event-token classifiers. For example, the event-type classifier chu selects the nouns containing the stem xi ‘play’ or ju ‘drama’ as in (1). In contrast, the event-token classifier chang selects a scheduled event as in (2). (1) Bailaohui jin nian zhi yanle yi chu gewuju. Broadway this year only played a CL musical ‘ Only one musical (e.g. Cats) was performed on Broadway this year.’

(2) Bailaohui jin nian yanle yibai chang gewuju. Broadway this year played 100 CL musical Broadway had one hundred performances of musicals this year.’ (1) clearly shows that chu individuates eventtypes such that no inference can be made regarding how many times the musical was performed, while it is manifested by (2) that when an eventtoken classifier is used, the same noun will refer to the occurrences of the event and no information is provided by this sentence about whether one or more musicals was shown. In studying the type coercion generated by the control verbs, we found the interesting contrast. (3) yingle na chang qiou won that CL ball ‘won the ballgame.’ (4) jixiu zhe chang qiou continue this CL ball ‘ continue the ballgame’ (i.e. ‘continue playing the ballgame’) (5) * jixiu zhe ge qiou continue this CL ball (This phrase does not make sense.) In (3)-(5), the noun qiou is the Chinese word for ‘ball’, a typical physical entity. In (3) and (4), it is modified by the event-token classifier chang, and the NP zhe chang qiou is interpreted as ‘this ballgame’. The classifier ge in (5) is a typical classifier for physical individuals, and zhe ge qiou can only be interpreted as ‘this ball’. On the one hand, the event token reading of zhe chang qiou ‘this ballgame’ provides support for Huang and Ahrens (2003); on the other hand, the contrast between the acceptability of (4) and the unacceptability of (5) indicates that classifiers have the ability to block type coercion. It seems that at least in these specific examples, it is the classifier that determines the semantic type of the noun qiou ‘ball’, and once this is done, it cannot be coerced further. In other words, with regard to type coercion, the classifier system takes precedence over the lexical verbs such as control verbs. Our data suggest that type coercion may involve competition from different selectors and that one type of selector always wins the other.

5

Conclusion

This paper describes the initial studies on type coercion in Chinese. Some linguists have argued that type coercion by and large does not work in Chinese. The empirical evidence of these arguments is derived from the translation of the English sentences in type coercion literature. Our approach relies on the web search results which are more likely to provide a better reflection of real language use. The algorithm we employ to systematically extract type coercion data can kill two birds with one stone. Both the coercive sentences and the non-coercive counterparts as well as the qualia roles involved are retrieved at the same time. It is suggested that the algorithm can easily be applied to the automatic extraction. Finally, Chinese may serve as the gateway to a better understanding of type coercion both in manifesting new types of coercion selectors and in how the various types of selectors interact with each other.

References Asher N. and J. Pustejovsky. 2000. The metaphysics of words in context. (submitted to) Journal of Logic, Language, and Information. Bouillon P., V. Claveau, C. Fabre, and P Sébillot. 2002. Acquisition of qualia elements from corpora – Evidence of a symbolic learning method. In Proc. LREC (3rd). 208-215. Las Palmas,, Spain. Charniak E. and M. Berland. 1999. Finding parts in very large corpora. In Proc. ACL (37th), 57-64. Cimiano P. and J. Wenderoth. 2005. Automatically learning qualia structures from the web. In Pro-

ceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition. 28-37. Cimiano P. and J. Wenderoth. 2007. Automatic acquisition of ranked qualia structures from the web. In Proc. ACL, 888-895. Etzioni O., M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91-134. Hearst M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proc. COLING (14th), 539-545, Nantes, France. Huang C.-R. and K. Ahrens. 2003. Individuals, kinds and events: classifier coercion of nouns. Language Sciences 25:353-373.

Huang C.-T. James. 2009. Lectures on parametric syntax. Lecture notes at National Taiwan Normal University, Taipei, Taiwan. Iwanska L. M., N. Mata, and K. Kruger. 2000. Fully automatic acquisition of taxonomic knowledge from large corpora of texts. In L. M. Iwanksa and S. C. Shapiro (eds.) Natural Language Processing and Knowledge Processing, 335-345. Lin T.-H. and C.-Y. Liu. 2005. Coercion, event structure, and syntax. Nanzan Linguistics, 2:9-31. Liu C.-Y. 2003. Dynamic Generative Lexicon. MA Thesis, National Tsing Hua University, Taiwan. Markert E., n. Modjeska, and M. Nissim. 2003. Using the web for nominal anaphora resolution. In Pro-

ceedings of the EACL Workshop on the Computational Treatment of Anaphora. Pustejovsky J., P. Anick, and S. Bergler. 1993. Lexical semantic techniques for corpus analysis. Computational Linguistics, 19(2):331-358. Pustejovsky J. 1995. The Generative Lexicon. The MIT Press. Snow R., D. Jurafsky, and A.Y. Ng. 2006. Semantic taxonomy induction from heterogeneous evidence. In Proc. COLING/ACL, 801-808. Widdows D. and B. Dorow. 2002. A graph model for unsupervised lexical acquisition. In Proc. COLING (19th), 1093-1099, Taipei, Taiwan. Yamada I., T. Baldwin, H. Sumiyoshi, and M. Shibata. 2007. Automatic acquisition of qualia structure from corpus data. IEICE Transactions on Information and Systems, E90-D(10):1534-1541.

Appendix A. Some true complement coercion instances in Chinese V1N

V1 816 zhi4-zai4 'aim to' 613 　 6840 kang4ju4 'oppose Ving' 31 　 12

24900 　 3410 2380 35200 140

V2 N (shang4 'enter') da4-xui2 'college' (wei4-mian3 'win') (chi1 'eat')

(shi3-yong4 'use') zhi4-li4-yu2 (fa1-zhan3 'de'pledge to' velop') 　 (tui1-dong4 'drive') 　 (cu4-jing4 'boost') tao3-yan4 'hate (he1 'drink') to' 　 (chi1 'eat') ting2-zhi3 'stop (shi3-yong4 Ving' 'use') 　 (cai3-qu3 'take')

72000 xi3-huan1 'like to' 　　 4660000 　

(da3 'play')

V1V2N 3

guan4-jun1 'championship' mei3-shi2 'delicious food' lei4-gu4-chun2 'steroid'

42

lyu4-se4-jing1-ji4 'green economy' min2-zhu3 'democracy'

34 522

　

474

ka1-fei1 'coffee'

950

shu1-cai4 'vegetable' bao4-li4 bi4-yun4-cuo4-shi1 'contraceptive measure' bang4-qou2 'baseball'

(kan4 'watch') 　 (ting1 'listen to') yin1-yue4 'music'

5 10

10800 7540 37 8820 11900 247000

　　 4570 3000 569000 3180 6370 27300 　　　 984000 3500000 　　　 1260 679

(chuang4-zuo4 'compose') kai1-shi3 'begin (shuo1 'say') to' 　 (jian4-li4 'build') shi4 'try Ving' (chuan1 'put on') 　 (kai1 'drive') chang2-shi4 'try (yong4 'use') to' 　 (yan3 'act') 　 (pai1 'make') 　 (xie3 'write') 　 (kan4 'watch') xue2 'learn' (tan2 'play') 　 (shuo 'speak') 　 (xie 'write') 　 (ting 'listen') 　 (du 'read') ji4-xu4 'contin- (can1-jia1 'parue to' ticipate in') 　 (yan3 'act')

37500 bi4-mian3 'avoid Ving' 　　　　 6120 　

(chan3-sheng1 'give rise to') (zhi4-zao4) (ting1-dao4 'hear') (chan3-sheng1 'give rise to') 335 fan3-dui4 'op- (shi3-yong4 pose Ving' 'use') 2950 　 (xing1-jian4 'construct') 7620 zan4-cheng2 (shi1-xing2 'im'agree to' plement') 779　　 (xing1-jian4 'construct') 2020 zhi1-chi2 'advo- (shi2-shi1 'carry cate Ving' out') 7520 　 (tong1-guo4 'pass') 10 bu2-xie4 'dis(cai3-qu3 dain to' 'adopt') 6　 (cai3-na4 'accept') 40600 jiao1 'teach' (tan2 'play') 125000 　 (shuo1 'speak') 　　 (du2 'read') 　　 (xie3 'write') 1800 fu4-ze2 'manage (zhi2-xing2 'imto' plement') 7080 　 (diao4-cha2 'investigate') 　　 (ban4 'handle') 1240 ju4-jue2 'refuse (chi1 'eat') to' 155000 　 (xi1-shi2 'take') 119 cuo4-guo4 'miss (da1 'take') Ving' 944 　 (kan4 'watch') 137 jui2-ding4 'de- (xuan2 'choose') cide to' 20 　 (mai3 'buy') 9370 xiu1-yao4 'need (jiao1 'pay') to' 153000 　 (fu4-chu1 'give') 　 (de2-dao4 'get') 37400 yao1-chou2 'ask (zuo4-dao4 'live someone to' up to') 26600 　 (de2-dao4 'get') 　　 (gei3-yu3 'give') 34 jin4-zhi3 'forbid (bo1-fang4 someone to' 'play') 4470 　 (gu4-yong4 'employ') 106000 tui1-jian4 'rec- (kan4 'watch') ommend Ving' 1080 　 (yong 'use') 19400 jie1-shou4 'accept to'

(zuo4 'do')

　 yi2-ju4-hua4 'a sentence' yi2-duan4-guan2-xi1 'a relationship' yi1-fu2 'clothes' xin1-che1 'new car' zhe4-ge0-fang1-fa3 'method' xi3-ju4 'comedy' 　　　 ji2-ta1 'guitar' ying1-wen2 'English' 　　　 zhe4-chang3-bi3-sai4 'this competition' zhe4-chu1-xi4 'this play' zao4-yin1 'noise' 　　 hou4-yi2-zheng4 'sequela' han4-yu3-pin1-yin1 'Hanyu Pinyin' he2-si4 'nuclear power station 4' si3-xing2 'death penalty' he2-si4 'nuclear power station 4' xin1-zheng4-ce4 'new policy' jui2-yi4 'resolution' zhe4-zhong3-fang1-fa3 'this kind of method' bie2-ren1-de0-yi4-jian4 'others' opinions' gang1-qin2 'piano' ing1-wen2 'English' 　　 zhe4-ge0-ren4-wu4 'this assignment' zhe4-ge0-an4-zi0 'this case' 　 le4-se4-shi2-wu4 'junk food' du2-pin3 'drugs' lan3-le1 'gondola'

1350 64 4 76200 28 1830

15500 shi4-shi4-kan4 'try Ving' 619 　 23500 79400 3350 14000

447 2920 21 6 41900 977 6210 194 29 10

(jin4-xing2 'proceed') (yong4 'use')

(shang 'go onto') gan3 'hurry to' (da1 'take') 　 (jiao 'hand in') yun3-xu3 'allow (you3 'have') someone to' 　 (zuo4 'do')

534000 wan2-cheng2 'succeed in Ving' 136000 　

(shi2-xian4 'make come true') (shi2-xian5 'make come true') 2460 mian3-qiang3 (fu4-chu1 'force someone 'give') to' 1620 　 (guo4 'live')

20 2980 700 2 276 36 534 8 199 2 24 5 3　 1260 47 47 31 41 990 26 14 593 4

qiu2-sai4 'ballgame' na3-ge0-xiue2-xiao4 'which school' na3-tai2-che1 'which car' xiue2-fei4 'tuition'

10400

ai4-xin1 'benevolence'

1310

　 cheng2-shi2 'honesty'

40 152

fu2-li4 'welfare' 　 zhe4-ge0-guang3-gao4 'this advertisement' tong2-gong1 'child labor' zhe4-bu4-dian4-ying3 'this movie' zhe4-ge0-ruan3-ti3 'this software' zhe4-ge0-gong1-zuo4 'this job'

699000 　

8 14 16

14 13 4 1810 267 12 3

3010 tong2-yi2 'agree (qian1-shu3 to' 'sign') 2100 　 (cai3-chu3 'adopt') 45 qi3-qiou2 'beg (shi1-she3 'dole someone to' out') 8990 　 (gei3-yu3 'give') 30900 qi2-qiou2 'hope (de2-dao4 'get') to' 122000 　 (de2-dao5 'get') 1110 ke3-qiou2 'long (de2-dao4 'get') to' 14000 　 (de2-dao5 'get') 69400 ke3-wang4 'de- (de2-dao4 'get') sire to' 18400 　 (de2-dao5 'get')

shou3-shu4 'surgery'

10

zhe4-ge0-fan1-fa3 'this method' zhe4-ge0-wang3-zhan4 'this website' gong1-che1 'bus' zuo4-ye4 'homework' yan2-lun4-zi4-you2 'freedom of speech' zhe4-zhong3-shi4 'this kind of thing' xin1-yuan4 'wish'

18

484 433 30

meng4-xiang3 'dream'

12

gan3-qing2 'affection'

3

xing4-sheng1-huo2 'sexual life' he2-yue1 'contract'

7

19 4

8540 26

zhe4-zhong3-zuo4-fa4 'this kind of move' shan4-yi4 'friendliness'

11

bang1-zhu4 'help' xing4-fu4 'bliss'

10 29

ping2-an 'security' da2-an4 'answer'

15 2310

ai4-qing2 'love' he2-ping2 'peace'

29 192

qin1-qing2 'parents' love'

362

1

i

Lin and Liu (2005) elaborates the argument in Liu (2003). Huang (2009) advocates for Lin and Liu (2005), and attributes the no-coercion-in-Chinese to the analyticity of this language in contrast to the synthesis of languages like English. ii Subtype coercion is the only generative mechanism that these linguists argue to be pervasive in Chinese.

Actuality Entailments and Aspectual Coercion