= “X tried to V”. The result is that X did V. X forgot to V.
= “X ought to have Ved, or intended to V”. The result is that X did not V. X happened to V.
= “X didn't plan or intend to V”. The result is that X did V. X avoided V-ing.
= “X was expected to, or usually did, or ought to V”. The result is that X did not V. Possibility for transformations: as we can see from the given schema, the main point to care about during the transformation is to preserve the “result” introduced by the implicative verb, and the implicative verb itself can be removed: Original sentence: Somehow I managed to wrench myself out of the dream, but not into a state of waking; it was like the screen went blank. // Neil Gaiman and Dave Mckean interview, Los Angeles, December 1994, originally broadcast on KUCI, 88.9FM.
= “I tried to wrench myself out of the dream” triggered by the implicative verb manage. Transformed sentence: Somehow I wrenched myself out of the dream, but not into a state of waking; it was like the screen went blank. Thus it is possible to transform a text by means of removal of presuppositions and converting sentences (or their corresponding fragments) into non-presupposed forms. However, the inverse process can hold – using the
same techniques, but applying them the other way round, we can introduce presuppositions into (parts of) sentences where there were originally none. 2.2. Semantic Representations and Presupposition Resolution An efficient and elegant way of representing the behavior of presuppositions in a discourse is to treat them as anaphoric expressions [5]. Presupposition triggers can be processed in the same way as anaphoric expressions. The detailed algorithm of presuppositional analysis is described in [8]. In formal description of the processes of presupposition interpretation, discourse representation structures (DRSs) are used. DRS is a concept within the frameworks of Discourse Representation Theory (partly described in [3]), being one of the most influential and interesting current approaches to the semantics of natural language. DRSs model semantic relationships within the discourse in general and mechanisms of presupposition interpretation in particular. DRSs are made up of two types of objects – discourse referents representing objects introduced in the discourse and conditions holding descriptive information about these referents. To put it formally: DRS R is a pair where: - U(R) is a set of reference markers - Con(R) is a set of DRS conditions If R and R‘ are DRSs, then ¬R, R∨R‘, R⇒R‘ are (complex) DRS-conditions. Let’s consider the following example: If Mike buys a new Hummer, he will like his car. The obtained DRS for this sentence is the following (for illustration purposes we describe only processes related to presupposition interpretation and generalize other details): [[x, y: Mike (x), Hummer (y), buy (x,y)] ⇒ [like (x,z) [α: car(z)]]] where α represents the presupposed information to be resolved. Presuppositions have a rich semantic content, capable of introducing new DRSs within bigger ones. We incorporate presuppositional information into a larger DRS and locate it in the web of discourse referents regulated by accessibility constraints. The resolution is performed by binding the presupposition “Mike has a car” to the antecedent (“Hummer”) found in the first part of the sentence: [[x, y: Mike (x), Hummer (y), buy (x,y)] ⇒ [like (x,y)]] If we build a DRS for the same sentence not containing the presupposition, but conveying the same meaning, we will have: If Mike buys a Hummer, he will have a car and he will like it. [[x, y: Mike (x), Hummer (y), buy (x,y)] ⇒ [Mike (x), car (z), like (x,z)]]
In the result of anaphoric pronoun resolution and bridging the “car” to “Hummer” (using created ontology) we will obtain the same representation as for the sentence containing presupposed information: [[x, y: Mike (x), Hummer (y), buy (x,y)] ⇒ [like (x,y)]] So, as we can see, the final semantic representations of sentences with and without presupposition are the same, since all the discourse referents and relations between them are preserved after transformation. Thus, the marked sentences will be those containing presupposition triggers captured by parsing (i.e. those for which DRSs representing their semantic structure will contain (an) embedded DRS(s) with presupposed information to be resolved). During the text analysis, logically transparent semantic representations (DRSs) of single sentences are created. In case a sentence is a part of a more extended discourse the named operations integrate the DRS into the whole previous discourse interpretation. This is an accumulative process, where new discourse referents and conditions holding for them, are incorporated into the main DRS representing the whole discourse contents. For syntactic and semantic analysis of the text including presupposition triggers detection and presupposed information processing we use the CCG (Combinatory Categorial Grammar) parser [2] that has wide-coverage and hence provides robust parsing. The CCG parser has a Boxer [1] as the add-on software package to generate semantic representations and to convert the CCG derivations into DRSs. 3. ALGORITHM OF AN EXTENDED DISCOURSE WATERMARKING 3.1. Algorithm By means of remarkable properties of presuppositions and constructions triggering them, we can easily transform any sentence in a text. We define three principal possibilities for sentence transformation: - make presupposed information explicit, i.e. get rid of presupposition triggers; - replace a presupposition trigger with another one; - introduce presupposed information where there was no. Having a file with a text to be watermarked, our task is to modify the text preserving its semantic content and to create a key allowing to perform inverse transformation and thus to confirm originality of the text (and to prove its ownership) or, otherwise, to prove that the text has been subjected to unauthorized modifications. Let’s illustrate our algorithmic procedure on a following real-life example. A: What kinds of things can people do to try to expand and reclaim democracy and the public space from corporations? B: Well, the first thing they have to do is find out what's happening to them. So if you have none of that information, you can't do much. For example, it's impossible to oppose, say, the Multilateral Agreement on
Investment, if you don't know that it exists. That's the point of the secrecy. You'd have to not only read the headlines which say market economy's triumphed, but you also have to read Alan Greenspan, the head of the Federal Reserve, when he's talking internally; when he says, look, the health of the economy depends on a wonderful achievement that we've brought about, namely "worker insecurity." That's his term. Worker insecurity is a great boon for the health of the economy because it keeps wages down. It's great: it keeps profits up and wages down. // A corporate watch interview with Noam Chomsky / By Anna Couey and Joshua Karliner, June 1998, http://zena.secureforum.com/Znet/zmag/articles/chomsky june98.htm We enumerate the sentences from 1 to N, it is 9 in our example case. For each sentence we form an array of 3 values: Sentence Number - Sentence Text - Triggers Code. The triggers code length is equal to the number of defined presupposition triggers that can be used for text transformations = 100 positions. Each position is associated with the particular trigger from our list of 100 and denotes how many times this trigger occurs in the given sentence. After that we make a point-to-point summation of triggers position numbers throughout all the sentences and obtain CHECK SUM 1 (SUM1), for our example text it will look like: 11010010000010000000000000000000000000000000 0020000000000000000010000000000000000000000000 00100000000 (In general, a text can be of arbitrary length, and hence it can contain any number of triggers. We set the maximum possible number in each code position 65535.) After that we make transformations in the text (we do not make more than one transformation per 3 sentences in order to keep fluency of the text and avoid the sentences to sound too heavy, artificial and hence obviously modified) and get the following resulting modified text: A: What kinds of things can people do to try to expand and reclaim democracy and the public space from corporations? B: Well, the first thing they have to do is see [the factive find out is replaced with a synonymic factive verb] what's happening to them. So if you have none of that information, you can't do much. For example, it's impossible to oppose, say, the Multilateral Agreement on Investment, if you don't know that it exists. That's the point of secrecy [definite article the is removed]. You'd have to not only read the headlines which say market economy's triumphed, but you also have to read Alan Greenspan, the head of the Federal Reserve, when he's talking internally; when he says, look, the health of the economy depends on a wonderful achievement that we've brought about, namely "worker insecurity." That's his term. Worker insecurity is a great boon for the economy health [definite article the is removed and the sentence` fragment is slightly paraphrased] because it keeps wages down. It's great: it keeps profits up and wages down. After the transformations we form the array of modifications comprising the following pairs:
Numbers of the sentences bearing transformations ::: Position number of the transformed trigger And then we recalculate the SUM1 and obtain SUM2: 90100100000000010000000000000000000000000000 0200000000000000000100000000000000000000000000 0100000000 Having obtained SUM1 and SUM2, we generate the key in the form of a hash table with the following fields: SUM1 - SUM2 - Number of transformations in the text - Numbers of the sentences bearing transformations Position number of the transformed trigger. Thus using the key we can make inverse (back) transformation of the text and make sure that the obtained code coincides with the original SUM1 (thus proving that the text is original, or otherwise, proving that the text has been subjected to unauthorized modifications). 3.2. Applications The watermark can convey a proof of information ownership, an integrity mark or a fingerprint containing the end-user id. Applications are possible in many domains, where pretty long texts need to be watermarked, including press agencies, authenticated text and e-books protection. Our method can be also used for very compact storage of text for retrieval. The method is domain-independent, i.e. we can process texts of any genre and any field because presuppositions are always present in any discourse regardless of its subject. Another advantage of the method is that it is potentially resilient to such an attack as translation into another language. The fact is that presuppositions, being a semantic phenomenon, are present in any human language, only the triggers generating them (particular words and syntactic constructions) will vary from language to language. It can be a subject of future research to define the lists of triggers for other languages and then the method itself will work anyway. 4. DISCUSSION AND FUTURE WORK We have described a method of text watermarking using presuppositions. We have explained the way how the semantic representations of sentences resist against surface transformations based on presupposition triggers. For the purposes of text watermarking we have studied so far the behavior of eight types of presupposition triggers in discourse. Some of them are well formalized and can be easily used by automatic algorithm. However a few triggers are hard to formalize and at present we make transformations based on them manually (while the watermarking algorithm itself is certainly automatic in all cases). Thus one of the directions of future research on this topic will be further work on formalization of the possibilities of transformations for several presupposition triggers. We study other methods of text watermarking with the help of presuppositions, since this phenomenon provides several possibilities for that. Even single sentences if
taken isolated can carry the proposed watermarks. In those cases presupposed information will be treated like new information, it will not be bound to information in other sentences, but still all the mentioned transformations can hold. However, the longer is the text to be watermarked, the more efficient and resilient the watermark can be made. If we create the web of bound information (resolved presupposed information bound to its antecedents in the previous text), it will hold the integrity of the text, introducing secret ordering into the text structure, in order to make it resilient to “data loss” attacks and “data altering” attacks - changing the order of sentences, removing sentences from the text or modifying them. Our own software for automatic text watermarking with the help of our method is under development. 5. REFERENCES [1] J. Bos, “Towards Wide-Coverage Semantic Interpretation”, Proc. of Sixth International Workshop on Computational Semantics IWCS-6, 2005. [2] S. Clark, J.R. Curran, “Parsing the WSJ using CCG and Log-Linear Models”, Proc. of the 42nd Ann. Meeting of the Assoc. for Computational Linguistics, Barcelona, Spain, 2004. [3] H. Kamp, “The Importance of Presupposition”, The Computation of Presuppositions and their Justification in Discourse, – ESSLLI’01, 2001. [4] M. Topkara, G. Riccardi, D. Hakkani-Tur, M. J. Atallah, “Natural Language Watermarking: Challenges in Building a Practical System”, Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, January 15 - 19, 2006, San Jose, CA. [5] R. Van der Sandt, “Presupposition projection as anaphora resolution”, Journal of Semantics, 9, 1992. [6] O. Vybornova, B. Macq “A method of text watermarking using presuppositions. Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, (SPIE’07), San Jose, CA, January 29 – February 1, 2007. [7] O. Vybornova, B. Macq: Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis // In: Proceedings of the 2007 International Conference on Information Reuse and Integration (IEEE IRI-07), August 1315, Las Vegas, USA. [8] O. Vybornova “Presuppositional Component of Communication and Its Applied Modeling” (Original title: Пресуппозиционный компонент общения и его прикладное моделирование), PhD dissertation, Moscow State Linguistic Univ., 2002.