Provenance of Exposure: Identifying Sources of Leaked Documents Christian Collberg∗1 , Aaron Gibson∗2 , Sam Martin∗4 , Nitin Shinde∗5 ∗Department of Computer Science The University of Arizona

Amir Herzberg§3 and Haya Shulman§†6 §Department of Computer Science Bar Ilan University †Fachbereich Informatik Technische Universität Darmstadt/EC-SPRIDE Abstract—We design a provenance system for documents on clouds. The system allows writing documents by several collaborating individuals. Provenance allows recovery of information about the sequence of significant events relevant to the documents. Existing provenance systems focus on editing events, such as creation or removal of document parts. In this work, we introduce provenance of exposure events, allowing identification of one, or more, individuals which are possible sources of the exposure to external source of a particular version of documents. Our design provides a practical solution for provenance of documents via not-fully-trusted cloud systems, with support for provenance of both exposure and editing events.

I. I NTRODUCTION Provenance is critical to building trust in data as it provides evidence allowing to determine how data was derived, in order to establish its validity and reliability. The digital provenance of a digital object gives a history of its creation, update and access. There are a multitude of situations where one would like to know the history of a digital object: who created it, who modified it, when and where it was modified, in whose custody it has been since its inception, and so on, e.g., see [1]. For instance, consider a scientist who, upon reading a scientific paper, questions its conclusions; without a complete history of how the data was collected and the exact sequence of transformations it has gone through, it may not be possible to verify the results. Or consider an accountant performing a financial audit of a major corporation. Without being able to verify when and by whom book entries were modified, he will not be able to trace any irregularities in the accounts. Indeed the importance of allowing to assert validity of the data in scientific work has been highlighted in prior art, e.g., [2]– [4]. The scenarios above, as well as prior art, focus on the importance of allowing to determine the trustworthiness of the data, i.e., what modifications were applied to it and by whom. Often, it is critical not only to ensure the validity of the data but also to allow detect who performed an illicit exposure of the data, e.g., to detect exposure of trade secrets or say leakage of

sensitive medical information. For instance, consider a mobile phone manufacturer who finds the secret blueprints of their yet-to-be-released model published in a trade journal. Without being able to trace the schematics back to the insider (the ‘traitor’) who divulged the trade secrets, they cannot stop further leaks. In this work, in addition to addressing the provenance of editing events, which attests to the reliability and trustworthiness of the data, we also introduce the provenance of exposure events, which allows to identify the traitor that exposed a confidential document. We suggest a design which captures those properties, and we show how to extend the OpenOffice suite of open source documents editing tools to provide for such a functionality. Our design and implementation1 ensures the following properties: (1) detection of misbehaviour, e.g., leakage of a document or its associated provenance, and (2) identification of a traitor (a misbehaving party) or a set of suspects. The side effect of these properties, is that benign users have evidence which they can use to attest to their honest behaviour. II. D IGITAL P ROVENANCE M ODEL AND S ETTING Conceptually, the principals of a provenance system are users who create and modify digital objects, auditors who query the provenance of an object, and validators who check the validity of an object’s provenance; see Figure 1. In practice, the same person can serve in different roles at different times, and auditors and validators can be automatic services rather than individuals. In our model, we also identify a traitor, a malicious user who leaks an object outside the system, without the appropriate provenance being collected. Users employ Provenance Enabled Tools, to create and modify documents and auditors use Provenance Query Tools to access provenance information in order to learn about a document’s history. A provenance-enabled system contains functions for collecting, storing, validating, and querying provenance. To

1 [email protected], 2 [email protected] 4 [email protected], 5 [email protected] 3 [email protected], 6 [email protected]

1 Prototype of our system, called Haathi, can be downloaded from http://haathi.cs.arizona.edu.

be practical, these functions must be secure, reliable, efficient, and usable. For example, if it were possible to tamper with (or inadvertently corrupt) the provenance, it can cause users to draw incorrect inferences about the authenticity or reliability of the underlying data, potentially with significant real-world consequences.

print or copy documents without exporting them first. We use software protection mechanisms, e.g., [5], [6], to obfuscate the OpenOffice suite in order to ensure security against malicious users that may attempt to circumvent the export operation by reverse engineering the binary of the provenance system. We use watermarking to correlate between leaked copies of the documents and clients that obtained access to them. We summarise the attacker capabilities versus security guarantees in Table I. Attacker capabilities Breaks Watermarking Secure Watermarking

Fig. 1. Digital provenance model and involved parties.

III. S ECURE P ROVENANCE S YSTEM OVERVIEW Our system, illustrated in Figure 2, is designed to work on office documents, such as text documents in word-processors, spreadsheets, drawings, and presentations; however, the ideas and designs carry over to other kinds of digital artifacts as well. In our implementation, we extend the OpenOffice suite of open source document editing tools with a Secure Provenance Library, SPL, to collect digital provenance information in a manner that guarantees the authenticity and integrity of this information. The documents, along with the provenance data, are stored on a cloud, and displayed to users in OpenOffice word. The users access the documents and edit them. In order to print, email or copy (to external memory device) the document, the users have to export the document. For

Breaks software Secure software protection protection detection of access detection of export (traditional goal) who exported and which version detection detection of traitor of version by version TABLE I ATTACKER CAPABILITIES VS SECURITY GUARANTEES .

IV. S ECURITY G UARANTEES The security guarantees that our design ensures are summarised in Table I. In case of exposure event, our system allows to identify a set of suspects that obtained access to a leaked document. Given a leaked document (or a fragment thereof), our system enables identification of a traitor, that leaked the document. Each access (i.e., download) to documents is recorded, the provenance is added to provenance records of the document and is stored on a cloud. If a confidential information from some document is leaked, the set of suspects that accessed the document can be obtained from the provenance records stored on the cloud; this holds also in case that the attacker breaks the software protection. The software protection of the provenance system allows to ensure that each export of the document is registered. Thus if a leaked document is found, the set of suspects can further be narrowed to those that exported the document. To enable detection of a specific corrupt user, or of a set of suspects, we use watermarking of exported documents; given a leaked document copy, the watermark allows to detect the corrupt user that leaked that document. Downloaded documents are watermarked, such that each user obtains a different watermark. A leaked document allows identification of a specific user that leaked the document. R EFERENCES

Fig. 2. Document export procedure.

any export event the provenance store is extended with the relevant provenance record, ensuring provenance of exposure, i.e., that the source of every piece of data is accurately recorded. Our design also allows data to be imported into the system, even from unverified sources. For such operations, the provenance indicates that data was exported/imported, but that the target/source was unknown. Provenance enabled system must allow data to be exported out of the system, while ensuring provenance of exposure, i.e., all export events are registered, and the users cannot email,

[1] S. Ram and J. Liu, “A new perspective on semantics of data provenance,” in Proceedings of the First International Workshop on the role of Semantic Web in Provenance Management (SWPM 2009), 2009. [2] S. Rajbhandari, I. Wootten, A. S. Ali, and O. F. Rana, “Evaluating provenance-based trust for scientific workflows.” in CCGRID. IEEE Computer Society, pp. 365–372. [3] S. B. Davidson, S. C. Boulakia, A. Eyal, B. Ludascher, T. M. McPhillips, S. Bowers, M. K. Anand, and J. Freire, “Provenance in scientific workflow systems.” IEEE Data Eng. Bull., no. 4, pp. 44–50, 2007. [4] S. B. Davidson and J. Freire, “Provenance and scientific workflows: challenges and opportunities.” in SIGMOD Conference, J. T.-L. Wang, Ed. ACM, 2008, pp. 1345–1350. [5] C. Collberg, G. Myles, and A. Huntwork, “Sandmark–a tool for software protection research,” IEEE Security and Privacy, vol. 1, no. 4, pp. 40–49, 2003. [6] A. Herzberg, H. Shulman, A. Saxena, and B. Crispo, “Towards a Theory of White-Box Security,” in SEC-2009 International Information Security Conference, 2009, http://www.sec2009.org/.

Provenance of Exposure: Identifying Sources of Leaked ...

documents via not-fully-trusted cloud systems, with support for .... Computer Society, pp. 365–372. [3] S. B. Davidson, S. C. Boulakia, A. Eyal, B. Ludascher, T. M. ...

230KB Sizes 0 Downloads 176 Views

Recommend Documents

A Framework for Identifying the Sources of Local ...
with an Empirical Application* .... Such factors may include the small costs of re-pricing (the so- .... wholesale level, accounting for 10.5% on average for the incomplete pass-through. .... School of Business and include aggregate retail volume mar

Leaked photos of pinay
minute heisat death's door,and the next heis sitting in therecreation roomhaving ... to exemplify theevilwhich lies within by showingmany characters which have ... PDF File: Whisky In Your Pocket: A New Edition Of Wallace Milroy's The Origin 2.

Food Sources of Fibre - Dietitians of Canada
Papaya. ½ fruit. 2.6. Apple, with skin. 1 medium. 3.5. Star fruit. 1 medium. 2.5. Raisins. 60 mL (1/4 cup). 2.5. Nectarine. 1 medium. 2.3. Grapefruit (pink, red, white).

Leaked Kate upton
Agarbage pail kid.Nothing's gonnachange my ... Thesecret ofthe unicorn pdf.Leaked Kate upton. ... Walkthrough:Cisco ICND1 CCENT 100-101.Le pere goriot.

A graph model of data and workflow provenance - Usenix
currency, procedures, service calls, and queries to exter- nal databases. ... in a uniform way. ... tion 3 we describe the structure of provenance graphs and.

BM4F_District of Columbia Childhood Lead Exposure Prevention ...
43551 -- 48926 -- -- -- -- -- -- -- 40895 -- 29657 -- 40875 54352 -- -- -- -- --. Whoops! There was a problem loading this page. Retrying... BM4F_District of Columbia Childhood Lead Exposure Prevention Amendment Act of 2017.pdf. BM4F_District of Colu

Radiation Exposure of the Anesthesiologist.pdf
Boca Raton, Florida; and the American Society of Anesthesiologists. Annual Meeting, October 18, 2009, New Orleans, Louisiana. Figure. 3 in this article was ...

Identifying the Determinants of Intergenerational ...
with parental income; and 3) equalising, to the mean, for just one generation, those environmental .... Relatedness, r ∈ {mz, dz}, denotes whether the twins are.

A graph model of data and workflow provenance
Umut Acar. Max-Planck Institute for Software Systems ... complex object data, by propagating fine-grained an- notations or ... are defined in a common language and data model [8, 6]. ...... storing provenance graphs over nested collections [1].

Secure Perturbations of Data Provenance and ...
Fig.2.System Architecture. The data receiver utilizing an optimal doorstep-based mechanism, which minimizes the probability of provenance decoding errors,.

Animal models of long-term consequences of early exposure to ...
issues related to animal studies of the long-term consequences of early exposure to repetitive pain so that the studies are comprehensible and can be critiqued for relevance to clinical situations. It is well documented that neonates and infants in n

Animal models of long-term consequences of early exposure to ...
neonatal intensive care units (NICUs) experience repeated painful events ... There are also emerging data that indicate there may be long-term consequences.

Appendix - Possible Sources of Revenue.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Appendix ...

the sources of economic growth
THE LEAST FREE-MARKET ECONOMY IN AMERICA. While most ... 3 Also online at http://www.freetheworld.com. ... According to a study published by the Federal Reserve Bank of Dallas, the citizens of ... may say 'Open for Business,' but our policies don't.

Identifying the Extent of Completeness of Query ... - Simon Razniewski
to the data warehouse having only partially complete information. Permission to .... software. D network. *. *. Table 1: Database Dmaint annotated with completeness information. A sample database using this schema is depicted in Table 1. Each ......

handbook of ion sources pdf
Page 1 of 1. File: Handbook of ion sources pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. handbook of ion sources pdf.

Sources of Tissue Factor
Sep 14, 1982 - reported in a monocytic cell line,57 few data supported the idea of ...... Chou J, Mackman N, Merrill-Skoloff G, Pedersen B, Furie. BC, Furie B.

types of energy sources pdf
Loading… Page 1. Whoops! There was a problem loading more pages. types of energy sources pdf. types of energy sources pdf. Open. Extract. Open with.

The Sources of Capital Misallocation - NYU Stern
Oct 8, 2017 - (9) where a∗ is the level of TFP in the absence of all frictions (i.e., where static marginal products are equalized) and σ2 mrpk is the cross-sectional dispersion in (the log of) the marginal product of capital (mrpkit = pityit −k

sources of folic acid pdf
Page 1 of 1. sources of folic acid pdf. If you were not redirected automatically, click here. Page 1 of 1. sources of folic acid pdf. sources of folic acid pdf. Open. Extract. Open with. Sign In. Main menu. Displaying sources of folic acid pdf. Page