Tiancheng Li's Research Projects

Viewer
Transcript

Tiancheng Li's Research Projects

1 of 4

Last updated on November 16, 2009.

Table of Contents Privacy Preserving Data Publishing A Taxonomy of Generalization Schemes for Data Anonymization t-Closeness: A New Privacy Measure for Data Publishing Privacy in Publishing Dynamic Data Background Knowledge in Data Publishing Privacy-Utility Tradeoff in Data Publishing Slicing: Anonymizing Sparse High-Dimensional Transaction Database Role Mining and Role Engineering Role Mining with Semantics Evaluating Role Mining Algorithms Other Projects Trust Preservation and Verification for Regulatory Compliance Security and Practicality of Outsourcing Association Rule Mining Data Deduplication for Bandwidth Savings

Privacy Preserving Data Publishing In this information age, data are increasingly being collected by various organizations and government agencies for the purpose of data analysis. To facilitate data analysis, it is often necessary to publish the data which, however, poses privacy risks to the individuals. A typical solution is to anonymize the data and release an anonymized version of the data. The goal of data anonymization is to provide privacy protection for the individuals while allowing adhoc queries and analysis on the anonymized data. A Taxonomy of Generalization Schemes for Data Anonymization One notion of privacy in data publishing is k-Anonymity, which requires each record to be indistinguishable from at least k-1 other records with respect to certain "identifying" attributes. An approach to achieve k-anonymity is generalization, which replaces a value with a "less specific but semantically consistent" value. A major thread of research on k-anonymity has focused on developing more flexible generalization schemes that produce higher-quality data. In this work, we propose three new generalization schemes and present enumeration algorithms and pruning techniques for finding optimal solutions in the new schemes. We then develop a taxonomy of generalization schemes that allow tradeoffs between efficiency and data quality. Optimal k-Anonymity with Flexible Generalization Schemes through Bottom-Up Searching. Tiancheng Li and Ninghui Li. In IEEE International Workshop on Privacy Aspect of Data Mining (PADM), in conjunction with ICDM, pp. 518-523, 2006. Towards Optimal k-Anonymization. Tiancheng Li and Ninghui Li. In Data & Knowledge Engineering Journal (DKE), 65:(1) 22-39, 2008. t-Closeness: A New Privacy Measure for Data Publishing It was shown that k-anonymity cannot prevent attribute disclosure. The notion of l-diversity has been proposed which requires that each equivalence class has at least l "well-represented" sensitive values. In this work, we show that l-diversity has a number of limitations. In particular, it is neither

Tiancheng Li's Research Projects

2 of 4

necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall data. We choose to use Earth Mover's Distance (EMD) that measures semantic distance. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. In IEEE International Conference on Data Engineering (ICDE), pp. 106-115, 2007. Closeness: A New Privacy Measure for Data Publishing. Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. To appear in IEEE Transaction on Knowledge and Data Engineering (TKDE), 2009. Privacy in Publishing Dynamic Data Existing solutions in data publishing are limited to static data release. That is, it is assumed that a complete dataset is available at the time of data release. This assumption implies a significant shortcoming, as in many applications data collection is rather a continual process. In this work, we consider incremental data dissemination, where a dataset is continuously incremented with new data. Static anonymization (i.e., anonymization which does not consider previously released data) may enable various types of inference. We identify such inference issues and discuss some prevention methods. Privacy Preserving Incremental Data Dissemination. Ji-Won Byun, Tiancheng Li, Elisa Bertino, Ninghui Li, and Yonglak Sohn. In Journal of Computer Security (JCS), 17:(1) 43-68, 2009. Background Knowledge in Data Publishing Recent work has shown the importance of considering the adversary's background knowledge when reasoning about privacy in data publishing. However, it is very difficult for the data publisher to know exactly the adversary's background knowledge. In this work, we propose a novel approach (called Injector) to model and integrate background knowledge. Injector mines knowledge (such as negative association rules and probability distributions) from the original data and then uses the mining results as the background knowledge when anonymizing the data. Injector: Mining Background Knowledge for Data Anonymization. Tiancheng Li and Ninghui Li. In IEEE International Conference on Data Engineering (ICDE), pp. 446-455, 2008. Modeling and Integrating Background Knowledge in Data Anonymization. Tiancheng Li, Ninghui Li, and Jian Zhang. In IEEE International Conference on Data Engineering (ICDE), pp. 6-17, 2009. Privacy-Utility Tradeoff in Data Publishing In data publishing, anonymization techniques have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. There have been some widely-held misconceptions about privacy and utility, e.g., in KDD 2008, Brickell and Shmatikov directly compared privacy gain with utility gain and concluded that "even modest privacy gains require almost complete destruction of the data-mining utility". In this work, we analyze the fundamental characteristics of privacy and utility, and show that it is inappropriate to directly compare privacy with utility. We then propose an integrated framework for evaluating privacy-utility tradeoff, borrowing concepts from the Modern Portfolio Theory for financial investment.

Tiancheng Li's Research Projects

3 of 4

On the Tradeoff Between Privacy and Utility in Data Publishing. Tiancheng Li and Ninghui Li. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 517-526, 2009. Slicing: Anonymizing Sparse High-Dimensional Transaction Database Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data while bucketization does not prevent membership disclosure. In this work, we present a novel technique called slicing, which partitions the data both horizontally and vertically. Slicing is a promising technique for anonymizing sparse high-dimensional transaction data. Slicing: A New Approach to Privacy Preserving Data Publishing. Tiancheng Li, Ninghui Li, Jian Zhang, and Ian Molloy. Unpublished manuscript.

Role Mining and Role Engineering With the growing adoption of role-based access control (RBAC) in commercial security and identity management products, how to facilitate the process of migrating a non-RBAC system to an RBAC system has become a problem with significant business impact. According to a study by NIST, building an RBAC system is the costliest part of migrating to an RBAC implementation. Any improvement on methodology that can reduce the cost of RBAC system creation will further improve the ROI of RBAC and will accelerate RBAC's adoption in practice. Mining Roles with Semantics Researchers have proposed to use data mining techniques for role engineering but the proposed algorithms are adhoc. In this work, we propose the theory of formal concept lattice, which provides a solid theoretic foundation for role mining. Another key problem that has not been adequately addressed by existing approaches is how to discover roles with semantic meanings. We propose to create roles that can be explained by expressions of user-attributes when user-attribute information is also available. Since an expression of attributes describes a real-world concept, the corresponding role represents a real-world concept as well. Mining Roles with Semantic Meanings. Ian Molloy, Hong Chen, Tiancheng Li, Qihua Wang, Ninghui Li, Elisa Bertino, Seraphin Carlo, and Jorge Lobo. In ACM Symposium on Access Control Models and Technologies (SACMAT), pp. 21-30, 2008. Evaluating Role Mining Algorithms While many role mining algorithms have been proposed in recent years, there lacks a comprehensive study to compare these algorithms. These role mining algorithms have been evaluated when they were proposed, but the evaluations were using different datasets and evaluation criteria. In this work, we introduce a comprehensive framework for evaluating role mining algorithms. We also introduce a new role mining algorithm and two new ways for algorithmically generating datasets for evaluation. Using synthetic as well as real datasets, we compared 9 role mining algorithms. Evaluating Role Mining Algorithms. Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang, and Jorge Lobo. In ACM Symposium on Access Control Models and Technologies (SACMAT), pp. 95-104, 2009.

Other Projects Trust Preservation and Verification for Regulatory Compliance

Tiancheng Li's Research Projects

4 of 4

As the number and scope of government regulations and rules mandating trustworthy retention of data keep growing, businesses today are facing a higher degree of regulation and accountability than ever. Existing compliance storage solutions focus on providing WORM (Write-Once Read-Many) support and rely on software enforcement of the WORM property, due to performance and cost reasons. Such an approach, however, offers limited protection in the regulatory compliance setting where the threat of insider attacks is high and the data is indexed and dynamically updated (e.g., append-only access logs indexed by the creator). In this work, we propose a solution that can greatly improve the trustworthiness of a compliance storage system, by reducing the scope of trust in the system to a tamper-resistant Trusted Computing Base (TCB). We show how trustworthy retention and verification of append-only data can be achieved through the TCB. Due to the resource constraints on the TCB, we develop a novel authentication data structure that we call Homomorphic Hash Tree (HHT). HHT drastically reduces the TCB workload. WORM-SEAL: Trustworthy Data Retention and Verification for Regulatory Compliance Tiancheng Li, Xiaonan Ma, and Ninghui Li. In European Symposium on Research in Computer Security (ESORICS), 2009. Security and Practicality of Outsourcing Association Rule Mining The recent interest in outsourcing IT services onto the cloud raises two main concerns: security and cost. One task that could be outsourced is data mining. In VLDB 2007, Wong et al. propose an approach for outsourcing association rule mining. Their approach maps a set of real items into a set of pseudo items, then maps each transaction non-deterministically. In this work, we analyze both the security and costs associated with outsourcing association rule mining. We show how to break the encoding scheme by Wong et al. without using context specific information and reduce the security to a one-to-one mapping. We present a stricter notion of security, and then consider the practicality of outsourcing association rule mining. Our results indicate that outsourcing association rule mining may not be practical, if the data owner is concerned with data confidentiality. On the (In)Security and (Im)Practicality of Outsourcing Precise Association Rule Mining. Ian Molloy, Ninghui Li, and Tiancheng Li. To appear in IEEE International Conference on Data Mining (ICDM), 2009. Data Deduplication for Bandwidth Savings Data deduplication is a popular dictionary based compression method in storage archival and backup. The deduplication efficiency improves for smaller chunk sizes, however the files become highly fragmented requiring many disk accesses during reconstruction or chattiness in a clientserver architecture. Within the sequence of chunks that an object (file) is decomposed into, sub-sequences of adjacent chunks tend to repeat. We exploit this insight to optimize the chunk sizes by joining repeated sub-sequences of small chunks into new super chunks with the constraint to achieve practically the same matching performance. We employ suffix arrays to find these repeating sub-sequences and to determine a new encoding that covers the original sequence. With super chunks, we significantly reduce fragmentation, improving reconstruction time and the overall deduplication ratio by lowering the amount of metadata. Block Size Optimization in Deduplication Systems. Cornel Constantinescu, Jan Pieper, and Tiancheng Li. In Data Compression Conference (DCC), pp. 442, 2009.

LIS -Abb.pdf

The Intelligence Advanced Research Projects Activ - GovTribe

Funded Research Projects in Data Science - GitHub

The Intelligence Advanced Research Projects Activ - GovTribe

Lis Haley - Cautiva y seducida.pdf

Why critical librarianship is important for LIS

MATLAB Projects - IEEE Projects

LIS 01-Information Sources.pdf

Lis Haley - Cautiva y seducida.pdf

Lis-moi (Plugins Akallan).pdf

LIS 770 Implementing Video Games.pdf

LiS/D lVLS/D iVZS/D

Lis-moi (Plugins Akallan).pdf

Owen Brown** Defense Advanced Research Projects ...

Small or medium-scale focused research projects ... -

Design Precepts for Social Justice HCI Projects - Research at Google

RESEARCH PROJECTS IC and System Design and Test

Agenda - FP7 Small-population research methods projects and...

Agenda - FP7 Small-population research methods projects and ...

LIS weblogs in Argentina - A rbeif review