Position Paper on The Meaning and The Mining of Legal Texts Mireille Hildebrandt

∗∗

Introduction Positive law, inscribed in legal texts, entails an authority not inherent in literary texts, generating legal consequences that can have real effects on a person’s life and liberty. The interpretation of legal texts, necessarily a normative undertaking, resists the mechanical application of rules, though still requiring a measure of predictability, coherence with other relevant legal norms and compliance with constitutional safeguards. The present proliferation of legal texts on the internet (codes, statutes, judgments, treaties, doctrinal treatises) renders the selection of relevant texts and cases next to impossible. We may expect that systems to mine these texts to find arguments that support one’s case, as well as expert systems that support the decision-making process of courts, will end up doing much of the work. This raises the question of the difference between human interpretation and computational pattern-recognition and the issue of whether this difference makes a difference for the meaning of law. Possibly, data mining will produce patterns that disclose habits of the minds of judges and legislators that would have otherwise gone unnoticed (reinforcing the argument of the ‘legal realists’ at the beginning of the 20th century). Also, after the data analysis it will still be up to the judge to decide how to interpret the results or up to the prosecution which patterns to engage in the construction of evidence (requiring a hermeneutics of computational patterns instead of texts). My focus in this paper regards the fact that the mining process necessarily disambiguates the legal texts in order to transform them into a machine-readable data set, while the algorithms used for the analysis embody a strategy that will co-determine the outcome of the patterns. There seems a major due process concern here to the extent that these patterns are invisible for the naked human eye and will not be contestable in a court of law, due to their hidden complexity and computational nature. This position paper aims to explain what is at stake in the computational turn with regard to legal texts. This prepares for the question I want to put forward to those involved in distant reading and not-reading of texts: could a visualization of computational patterns constitute a new way of un-hiding the complexity involved, opening the results of computational ‘knowledge’ to citizens’ scrutiny? The Meaning of Law It is interesting to note that legal scholars often think of law as part of the humanities, whereas others – who prefer to call themselves legal scientists – lay claim to law as part the social sciences. In fact, the attribution to either domain can have serious consequences, for instance where in the Netherlands legal research grant applications must be submitted to the social science department of the national agency for the funding of scientific research (NWO). The decision on funding legal research is thus made by a jury consisting mostly of psychologists and other empirical scientists, many of whom believe in the priority of quantitative methods

Mireille Hildebrandt is Associate Professor of Jurisprudence at the Erasmus School of Law, Rotterdam and Senior Researcher at Law Science Technology and Society studies (LSTS) at Vrije Universiteit Brussel. Her research focuses on the nexus of legal philosophy and philosophy of technology with special attention to the implications of smart infrastructures for democracy and the rule of law. She was coordinator of ‘Profiling technologies’ in the EU funded NoU the Future of Identity in the Information Society (FIDIS). With Serge Gutwirth she edited Profiling the European Citizen. Cross-Disciplinary Perspectives (2008). She is associate editor of Criminal Law and Philosophy and of Identity in the Information Society and publishes widely on both criminal law and the legal implications of smart technologies, see http://works.bepress.com/mireille_hildebrandt/. ∗

and all of whom seems engrossed in a methodological granularity that is atypical for legal research. The practice of law, however, seems to build on a hermeneutical approach to legal texts; confronted at the same time with the ambiguity that is inherent in natural language and with the need for legal certainty. According to a hermeneutical legal approach, interpretation and context determine the meaning of the text with regard to the case at hand, while the meaning of the case at hand is determined by the relevant legal text. This circle is not necessarily vicious, nor can it be avoided by stipulating that interpretation is forbidden (something Justinian attempted when he ‘enacted’ what has later been called the Corpus Iuris of Roman law). Nevertheless, the circularity does highlight the tension between norm and decision, raising the question to what extent a norm is reinterpreted and changed due to a decision on its application and to what extent a decision is constraint by the norm that is expected to regulate it. Legal theory has some awareness that decisions often posit the norm that rules them. We could start quoting (Wittgenstein, Anscombe et al. 2009) and (Taylor 1995) here, on what it means to follow a rule, or immerse ourselves in the debate between (Schmitt 2005), (Kelsen 2005) and (Radbruch 1950), on the priority of normativity or decisionis. In times of emergency these debates can be enlightening, for instance when privacy and security are opposed as incompatible social goods, demanding unrestricted discretionary powers to monitor citizens and to intervene in their lives unhindered by constitutional legal norms (Hildebrandt (forthcoming 2010)). However interesting these debates may be, the computational turn in the humanities – as elsewhere – launches an entirely new field of questions and issues, as the process of interpretation is confronted with and may even be replaced by computational processes of pattern-recognition. Before embarking on the novel issues this generates, we must pay attention to an important legal notion, being that of ‘the sources of the law’. Generally speaking, in a modern legal system, the sources of the law can be summed up as codes and statutes, court judgments, legally binding international treaties, doctrinal treatises and legal principles. The concept of ‘the sources of the law’ relates to the need for a final establishment of legal consequences. The legal adage of litis finiri oportet states that the legal struggle should come an end, implicating that at some point legal certainty is a good in itself, despite the fact that one can disagree about the justice of the outcome. To achieve closure in a legal setting, the arguments that can be used are limited to those grounded in the authoritative legal sources. These sources are assumed to contain sets of norms that must be interpreted in a way that guarantees the overall coherence of the legal system as well as their compliance with constitutional safeguards. Evidently the assumption of coherence and constitutional compliance cannot be understood as something to be taken for granted. Rather, this assumption is a productive one, that requires lawyers ever and again to taste and turn different interpretations of the facts of the case and the relevant legal norm, until they have a ‘solution’ that fits and adjusts the system in a manner consistent with what Montesquieu called ‘the spirit of the laws’. Legal texts, entailing an authority not inherent in literary texts, thus generate legal consequences that can have major effects for a person’s life and liberty. Their interpretation, necessarily a normative undertaking, defies a mere mechanical application of legal rules. This raises the question what data mining operations on legal texts could contribute to our understanding of the law. Will it provide the final proof of what legal realists and critical legal studies have claimed all along: that the interpretation of legal texts is far more subjective and arbitrary than legal scholars like to assume? Will it uncover correlations between whatever a judge had for breakfast and the outcome of his decision-making process or will it uncover correlations between the vested interests of the ruling class and the content of the law? Or will it correlate attributes of particular cases with their outcome, thus providing a new type of legal certainty, based on a statistical prognosis of what courts will decide? Oliver Wendell Holmes (renowned legal realist and US Supreme Court judge) contended that ‘the prophecies of what the courts will do in fact, and nothing more pretentious, are what I mean by the law’. Will the anticipation he alludes to be brought to perfection by the data mining of

court verdicts? Or might it be that Holmes’ pragmatism was closer to a hermeneutic understanding of law than some would have it? Measurabilities How does the law cope with measurability? Can we measure the lawfulness or even the justice of a verdict? Is causality a matter of either/or, or must we accept gradations of some action being the cause of an event. Could data mining reveal a granularity of wrongfulness, instead of merely judging an action to be wrongful under the circumstances of the case? Is culpability a matter of yes/no, or must we develop machine-readable criteria to calculate the precise extent of a person’s guilt? Can punishment – or treatment in the case of the absence of mens rea – be calculated in precise measure to the causality, wrongfulness and culpability of an action? Or should we rather calculate the type and measure of punishment in accordance with the effectiveness, as inferred from data mining operations on the fusion of data bases containing verdicts with data bases containing the post-conviction behaviours of offenders (recidivism)? Actuarial justice practices have sprung up during the past decades, pretending that such calculations provide reliable knowledge. Interesting debates have erupted about the attribution of tort liability for toxic waste or occupational diseases, where only epidemiological inferences are available, providing evidence in probabilistic terms that cannot be pinned down to individual cases. Similarly, criticism has been formulated of actuarial knowledge claims used in the fields of policing (determining who should be monitored) and sentencing (determining who is calculated to be unresponsive to any form of treatment). One may expect, however, that at some point cognitive science – also in the thrift of a computational turn – will produce knowledge claims as to the measure of freedom a person has to comply with the law, taking into account her calculated brain behaviours and/or morphology. Debates on the issue of determinism (as claimed by some cognitive scientists) and voluntarism (the assumed foundation of criminal liability) proliferate, with compatibilists – who consider the whole debate the result of a category mistake – and incompatibilists – who often reject scientistic determinism on the mere ground that is would be undesirable – claiming final solutions to such existential problems as to whether responsibility presumes freedom or whether, on the contrary, a measure of freedom is born from being forced to give an account of oneself (Butler). What strikes me as an important concern here, is the question of what it means to translate the flux of real life events into discrete silicon inscriptions and the question of what actually happens when such discrete inscriptions are manipulated to generate patterns in data bases. Could the visualization – another translation – of data mining processes (eg dynamically, also showing the development and transformation of the patterns mined) help us to come to terms with the meaning and significance or these patterns? Maybe such visualization requires a novel hermeneutics, eg building on a semiotics that does not equate a sign with a letter, acknowledging different ways of ‘reading’ texts, images, movements and life events.

From Hermeneutics to Pattern-Recognition Let us now return to the issue of proliferating legal texts and the need to involve machine learning technologies to select relevant texts or to predict the outcome of a case. Data mining of legal texts in order to classify them for instant retrieval, analogous reasoning or prediction seems to require distant reading (Moretti) in the sense of not-reading (Mueller) of legal texts (Clement, Steger et al. 2008). Methods such as unsupervised neural networks for clustering of similar texts within an archive of legal texts, supposedly provide self-organizing maps capable of classifying legal documents, based on automatic analysis of text corpora and semiautomatic generation of document descriptions (Merkl and Schweighofer 1997). NLP techniques to perform linguistic annotation using XML-based tools and a combination of

rule-based and statistical methods are used to correlate between linguistic features of sentences in legal texts and the argumentative roles they play. This provides for automatic summarization of legal texts (Grover, Hachey et al. 2003). Case-Based Reasoning architectures are used to enhance the relevance of jurisprudence research in data bases of legal decisions (Grover, Hachey et al. 2003). Combinations of case-based reasoning and extracting information from legal texts provide somewhat successful predictions of the outcome of cases based on similar facts. This allows computer systems to ‘support reasoning by analogy, presenting alternative reasonable arguments, and generating testable legal predictions’ (Ashley and Brüninghaus 2009):126. This paper does not focus on mere information retrieval, based on simple indexing on the basis of word-frequency, which has been around for many decades. What interests me here are software machines capable of detecting patterns in legal texts that should help practicing lawyers to argue their case or to help legal scholars to develop, attune or reject doctrinal theories about specific legal issues. Instead of painstakingly reading one’s way into a mass of individual cases to see how codes and statutes are interpreted, helped by legal treatises on the relevant legal doctrine, the idea is that such inference-machines will at some point provide a more immediate oversight regarding whatever may be the factual and legal issues at stake. My question is how such a computational overview differs from the hands-on close reading of legal texts that has informed the legal profession ever since the script and especially the printing press took over from orality (Ong 1982; Collins and Skover 1992; Glenn 2004 (second edition); Eisenstein 2005 (second edition); Hildebrandt 2008). What difference does not-reading make to the subsequent reading of texts deemed relevant by the KDD expert system? With the upsurge of printed material the legal profession has developed a monopoly on the interpretation of legal texts. This monopoly in fact contributed to the relative autonomy of the law in relation to both morality and politics (Berman 1983; Koschaker 1997/1947). In the context of modern legal systems the practice of law involves the elaboration and maintenance of a productive normative bias, negotiating the often competing demands of legal certainty, justice and the purposiveness of the law (Radbruch 1950; Leawoods 2000). Lawyers are used to this negotiation, aimed at creating and sustaining coherence and integrity of the legal system and this aim stands for a productive normative bias on which modern law depends. How does pattern recognition relate to this normative bias? To what extent will KDD expose gross inconsistencies, hidden and unjustified discriminations and previously invisible ineffectiveness, whereas traditional legal practice obfuscates inconsistencies, injustice and counterproductive patterns by ‘reading’ patterns into the chain of decisions that realign them in the direction of legal certainty, equality and purposiveness? Does machine learning imply that the necessary normative bias is integrated in the process of knowledge discovery, must we instead believe that is can be neutral (and what could this mean) or will there be invisible biases, based on the assumptions that are inevitably embodied in the algorithms that generate the patterns. Most importantly, how do we (lawyers, legislators and citizens) get our finger behind those assumptions if they are eg protected as trade secrets or by intellectual property rights? … and back? In a salient article on ‘Meaning and Mining: the Impact of Implicit Assumptions in Data Mining for the Humanities - that inspired the title of this paper – (Sculley and Pasanek 2008) discuss some of the assumptions that inform data mining schemes. Notably they point out that (1) machine learning assumes that the distribution of the probabilistic behaviour of a data set does not change over time, whereas much of the work done in the humanities is based on small samples that do not pretend such fixed distribution (focusing on change and ambiguity rather than invariance over the course of time), (2) machine learning assumes a well defined hypothesis space because otherwise generalization to novel data would not work, (3) for machine learning to come up with valid predictions or discoveries the data that are being

mined must be well represented, avoiding inadequate simplifications, distortions or procedural artifacts, (4) machine learning may assume that there is one best algorithm to achieve the one best interpretation of the data, but this is never the case in practice, as demonstrated by the No Free Lunch Theorem (which says there is no data mining method without an experimenter bias). To illustrate their point they develop a series of data mining strategies to test Lakoff’s claim of a correlation between the use of metaphor and political affiliation, both via various types of hypothesis testing (supervised learning methods) and via various types of clustering (unsupervised learning methods). They conclude that: Where we had hoped to explain or understand those larger structures within which an individual text has meaning in the first place, we find ourselves acting once again as interpreters. The confusion matrix, authored in part by the classifier, is a new text, albeit a strange sort of text, one that sends us back to those text it purports to be about (idem at 12).

In fact they continue: Machine learning delivers new texts – trees, graphs, and scatter-grams – that are not any easier to make sense of than the original texts used to make them. The critic who is not concerned to establish the deep structure of a genre or validate a correlation between metaphor and ideology, will delight in the proliferation of unstable, ambiguous texts. The referral of meaning from one computer-generated instance to the next is fully Derridean (idem at 17).

This seems a very apt pointer for the meaning and mining of legal text. Instead of taking the ‘trees, graphs, and scatter-grams’ of KDD in legal texts at face value or discarding them as irrelevant, the results of automated analysis could invite legal scholars to reconsider their established opinion, to uproot their preconceptions and to engage with novel ambiguities. To achieve this Sculley and Pasanek provide five recommendations for Best Practice, which could serve as an appetizer to develop similar constraints for the data mining of legal texts. First, the collaborative effort of computer engineers and lawyers who sit down together to mine legal data bases should make the effort of making their assumptions about the scope, function and meaning of the relevant legal texts explicit. Second, they should use multiple representations and methodologies, thus providing for a plurality of mining strategies that will most probably result in destabilizing any monopoly on the interpretation of the legal domain. Third, all trials should be reported, instead of ‘cherry-picking’ those results that confirm the experimenters’ bias. As they suggest, at some point failed experiments can reveal more than supposedly successful ones. This is particularly important in a legal setting, because the legal framework should protect individual outliers and minority positions from being overruled by dominant frames of interpretation. Fourth, in a legal setting the public interest requires transparency about the data and the methods used, to make the data mining operations verifiable by other joint ventures of lawyers and software engineers. This connects to their fifth recommendation, regarding the peer review of the methodologies used. The historical artifact of a legal system that is relatively autonomous in regard to both morality and politics, safeguarded by an independent judiciary, nourishes on an active class of legal scholars and practitioners, willing to test and contest the claimed truth of mainstream legal doctrine. Only a similarly detailed and agonistic scrutiny of the results of data mining operations can sustain the fragile negotiations of the rule of law. Visualisation Scrutinizing the mining strategies of legal researchers can be seen as an important contribution to due process. Curiosity for and a serious interest in the ‘intestines’ of the data mining process should become part of a lawyer’s training if KDD technologies are used to determine the content of positive law. There is, however, another due process concern, that regards a citizen’s understanding of how her actions will be interpreted in a court of law and whether the determination of guilt or liability are the result of a fair trial. Though our legal systems – especially in a continental setting where jury trials are not the standard way to hold

court – depend heavily on the expertise of legal counsel, I believe that the legitimacy or the legal process also depends on a party’s capacity to understand how she is being judged. To grasp the standards against which her actions are measured, a citizen too, must be empowered to ‘read’ and scrutinize the data mining techniques that co-determined the outcome of her case. Even if expert systems based on clustering of relevant legal elements in authoritative legal texts are not presently used to argue a case, we may expect something like that to happen in the not too distant future. My question is whether visualization techniques that engage with a visual and dynamic presentation of jurisprudential developments could invite citizens to participate in the scrutiny of knowledge construction technologies. In other words, I hope that some collaboration at this point will develop.

References Ashley, K. and S. Brüninghaus (2009). "Automatically classifying case texts and predicting outcomes." Artificial Intelligence and Law 17(2): 125-165. Berman, H., J. (1983). Law and Revolution. The Formation of the Western Legal Tradition. Cambridge Massachusetts and London, England, Harvard University Press. Clement, T., S. Steger, et al. (2008). "How Not to Read a Million Books." Harvard University, Cambridge, MA. from http://www3.isrl.illinois.edu/~unsworth/hownot2read.html. Collins, R. and D. Skover (1992). "Paratexts." Stanford Law Review 44: 509-552. Eisenstein, E. (2005 (second edition)). The Printing Revolution in Early Modern Europe. Cambridge New York, Cambridge University Press. Glenn, H. P. (2004 (second edition)). Legal Traditions of the World. Oxford, Oxford University Press. Grover, C., B. Hachey, et al. (2003). Summarising legal texts: sentential tense and argumentative roles. Proceedings of the HLT-NAACL 03 on Text summarization workshop Volume 5, Association for Computational Linguistics. Hildebrandt, M. (2008). A Vision of Ambient Law. Regulating Technologies. R. Brownsword and K. Yeung. Oxford, Hart. Hildebrandt, M. ((forthcoming 2010)). "The Indeterminacy of an Emergency: Challenges to Criminal Jurisdiction in Constitutional Democracy." Criminal Law and Philosphy. Kelsen, H. (2005). Pure theory of law. Clark, N.J., Lawbook Exchange. Koschaker, P. (1997/1947). Europa en het Romeinse recht. Deventer. Leawoods, H. (2000). "Gustav Radbruch: An Extraordinary Legal Philosopher." Journal of Law and Policy 2: 489-516. Merkl, D. and E. Schweighofer (1997). En Route to Data Mining in Legal Text Corpora: Clustering, Neural Computation, and International Treaties. Proceedings of the Eighth International Workshop on Database and Expert Systems Applications. IEEE: 465-470.

Ong, W. (1982). Orality and Literacy: The Technologizing of the Word. London/New York, Methuen. Radbruch, G. (1950). Rechtsphilosophie. Herausgegeven von Erik Wolf. Stuttgart. Schmitt, C. (2005). Political theology : four chapters on the concept of sovereignty. Chicago, University of Chicago Press. Sculley, D. and B. M. Pasanek (2008). "Meaning and Mining: The Impact of Implicit Assumptions in Data Mining for the Humanities." Literary and Linguistic Computing 23(4): 409-424. Taylor, C. (1995). To Follow a Rule. Philosophical Arguments. C. Taylor. Cambridge, Mass., Harvard University Press: 165-181. Wittgenstein, L., G. E. M. Anscombe, et al. (2009). Philosophical investigations. Malden, MA, Wiley-Blackwell.

Associate Professor of Jurisprudence at the Erasmus School of Law, Rotterdam and Senior Researcher at the Centre for Law Science Technology and Society studies (LSTS) in Brussels. Coordinator of Profiling within the EU funded project on the Future of Identity in the Information Society (FIDIS). Associate Editor of Criminal Law and Philosophy and of Identity in the Information Society (IDIS). Publishes on the nexus of philosophy of law and of technology, seeking to trace the implications of knowledge discovery in data bases (KDD) for democracy and the rule of law. Eg ‘Profiling the European Citizen’ (ed. with Serge Gutwirth Springer 2008), see http://works.bepress.com/mireille_hildebrandt/. ∗

Position Paper on The Meaning and The Mining of ...

Possibly, data mining will produce patterns that disclose habits of the minds ... data set, while the algorithms used for the analysis embody a strategy that will co- ...

136KB Sizes 1 Downloads 151 Views

Recommend Documents

Position Paper: Measuring the Impact of Alphabet and ... - Usenix
A key open question that we wish to shine some light on is: Does the cultural background .... fying android patterns using persuasive security framework. In The.

Position Paper: Measuring the Impact of Alphabet and ... - Usenix
Impact of Alphabet and Culture on Graphical Passwords. Adam J. Aviv. United States Naval Academy [email protected]. Markus Dürmuth. Ruhr-University Bochum [email protected]. Payas Gupta. NYU, Abu Dhabi [email protected]. 1. OUR POSITION. Android's

Position Paper on Contractualization.pdf
Thanks to the sunshine industry of BPO and the work. opportunities that abound for Filipinos abroad, the ill-effects of an untenable unemployment rate is ...

Position Paper
ics, and form of government will shape any solution for the United. States. This caution ..... (such as Veterans Health Administration, Department of. Defense ...

Divergent position on a CVMP opinion on the establishment of ...
Reading summary reports and the assessment report that was prepared for the current procedure, the following can be summarised: •. We do not have accessible any relevant scientific information concerning the behaviour and fate of fluazuron in milk,

Divergent position on a CVMP opinion on the establishment of ...
The CVMP has estimated the safe level based on a mathematic calculation only (even if the conclusion is that exposure does not represent a sizable part of the ...

Position paper on BSV distribution strategies_FINAL.pdf
Page 1 of 16. 1. Position paper on a strategy to distribute banana (Musa). germplasm with endogenous Banana streak virus genomes. Prepared by a task force assembled through the MusaNet Conservation Thematic Group, after a. workshop held in Montpellie

The Metaphysics of Meaning: Hopkins on Wittgenstein
Jan 28, 2015 - Downloaded by [Johns Hopkins University] at 12:11 17 September 2015 ... The first sentence states a necessary condition, the second a sufficient ..... can, even at our best, always come to know the facts about the former on ..... cal i

EPSA Position Paper on Soft Skills.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. EPSA Position ...

Krell, On the Manifold Meaning of 'Aletheia', Brentano ...
Krell, On the Manifold Meaning of 'Aletheia', Brentano, Aristotle, Heidegger.pdf. Krell, On the Manifold Meaning of 'Aletheia', Brentano, Aristotle, Heidegger.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Krell, On the Manifold Meanin

The Effects of Age on Using Prosody to Convey Meaning and on ...
We tested the effects of aging on the use of prosody to convey meaning and the ability to ... meaning of structurally ambiguous sentences: Keysar and Henly.

Krell, On the Manifold Meaning of 'Aletheia', Brentano, Aristotle ...
Krell, On the Manifold Meaning of 'Aletheia', Brentano, Aristotle, Heidegger.pdf. Krell, On the Manifold Meaning of 'Aletheia', Brentano, Aristotle, Heidegger.pdf.

On the Meaning of Screens: Towards a ... - Springer Link
(E-mail: [email protected]). Abstract. This paper presents a Heideggerian phenomenological analysis of screens. In a world and an epoch where screens ...

Approved FINAL GABMLS GABMLS POSITION PAPER ON Practice ...
Approved FINAL GABMLS GABMLS POSITION PAPER ON ... ice of LPs and Other Scientists _FINAL REP.pdf. Approved FINAL GABMLS GABMLS POSITION ...

position paper cc2007
tools, interface and software design as well as the social environment, working processes and .... Digital ink- jet printing in both two and three dimensions enables products to be prototyped with ease in multiple locations. Changes in printing subst