Bioinformatics

Vol. 00 no. 00 2013 Pages 1–2

BIOINFORMATICS WebGLORE: A Webservice for Grid LOgistic REgression Wenchao Jiang 1,2 , Pinghao Li 1,2 ∗, Shuang Wang 1 , Yuan Wu 1 Meng Xue 2 , Lucila Ohno-Machado 1 , Xiaoqian Jiang 1 1 2

Division for Biomedical Informatics, University of California, San Diego Shanghai Jiao Tong University, Shanghai

Received on XXXXX; revised on XXXXX; accepted on XXXXX

Associate Editor: XXXXXXX

1 INTRODUCTION

c Oxford University Press 2013. ⃝

2

METHOD

The authors have previously studied distributed logistic regression models theoretically but there was no practically useful collaborative software framework developed for biomedical data analysts to deploy in real healthcare environment. To close the gap, we developed a software as a service (SAAS) that allows data analysts to easily construct models across different sites. The foundation of this work is based on our previous research: Grid LOgistic REgression (GLORE) (building shared models without sharing data) (Wu et al. [2012]) and Expectation propagation logistic regression (explorer): distributed privacy-preserving online model learning (Wang et al. [2013]).

iew

equally to the first authorship

through SMC. We focus on horizontally partitioned data, i.e., using information from locally hosted databases containing different observations that share the same attributes (i.e., horizontal partitions of stackable sets of patient records), which is most common for cross-institutional studies.

ev

In the biomedical context, a major challenge to ensure sufficient power of a predictive model is to obtain enough samples. This is especially important for scenarios like rare disease prognosis, where individual institutions do not have enough observations. With the massive adaptation of electronic health records, it becomes possible to combine information across institutions to build powerful global models. There are, however, reality challenges in combining raw data due to the privacy concern. To address privacy challenges, researchers proposed many privacy-preserving models to facilitate data sharing, including models using generalization, noisy perturbation, and a hybrid of both techniques. Please refer to Fung’s review Fung et al. [2010]. These models try to enforce confidentiality through introducing “indistinguishability”, however, may alter values in the original data. Results based on such perturbed data cannot be reliably trusted, for example, in applications like medical surveillance and healthcare decision support. An alternative solution is, instead of releasing data, to share model through secure multiparty computing (SMC), which approaches leverage security enhanced protocols (e.g., transmitting aggregated statistics) to offer a practical solution and shed lights on building accurate predictive models without disclosing sensitive raw data. In this manuscript, we target at constructing a collaborative framework for logistic regression ∗ contributed

Fig. 1. A screenshot of the WebGLORE system.

rR

ee

rP

Fo

ABSTRACT WebGLORE is a free webservice that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure (HTTPS) to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use webservice for biomedical researchers to break down policy barriers during information exchange. Availability: http://dbmi-engine.ucsd.edu/webglore3/ WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation. Contact: [email protected]

3 FRAMEWORK To avoid sensitive data transferring, we developed a heavy-client light-server framework, in which raw data related computation tasks are conducted locally. Our system is platform independent and the client-side functionality can be interoperated in a wide variety of environments to cope with arbitrary WebGLORE participants. To ensure wide accessibility of our framework, we developed signed communications between Servlets and Applets using JAVA.

1

Page 1 of 2

Jiang et al

Participants only need a web browser to join the collaborative global model construction through Applets. These Applets are embedded in the webpages to handle local computation so that local data never leave their host institutions. Because only signed Applets can execute and communicate with Servlets, we can easily check the validity of inputs from participants on the server side. WebGLORE utilizes a three-layer structure. The front-end consists of AJAX-supported webpages, which dynamically reflect the user status (in preparation, prepared, offline) and task status (created, finished). After passing the validity check (i.e., task name, expiration date, model parameters), an initiator can invite participants, who have similar type of data and want to build GLORE together. Once a task is created, invited participants will receive emails from the server; each one is provided with a unique link (e.g., hashed by task name and participant’s email). Our system also memorizes existing collaborators and stores their emails in a list to be quickly retrieved in the invitation panel. After some participants confirm attendance, the task creator (initiator) can trigger the computation at any time so that the server starts to interact with all participants. When the computation begins, WebGLORE works in an asynchronized manner. In each iteration (Newton-Raphson step), the server combines intermediary results collected from participants. The server renews and distributes the value of globally estimated parameters of the current iteration. The optimization terminates when it meets the accuracy requirement or the iteration number exceeds the maximum iteration limit. Each participant can retrieve reports for the global model or the local model to compare with. These reports contain important statistic graphs and variables, sensitivity, specificity, Area under the ROC curve, and HosmerLemeshow tests. We also plot the ROC curve and the reliability diagram for the models. Finally, WebGLORE provides each participant with an interface to test additional local observations using the global model. Because WebGLORE does not access individual records, users have to handle missing data, check attribute consistency, and ensure data quality. We picked the myocardial infarction dataset (1,253 records), which was collected in Edinburgh UK (Kennedy et al. [1996]), to demonstrate our system. These records were partitioned (314, 313, 313 and 313) in 4 sites. We selected nine non-redundant features in this data set including: pain in left arm, pain in right arm, nausea, hypoperfusion, ST elevation, new Q waves, ST depression, T wave inversion, and sweating. The binary response indicates the presence of disease. We listed two tables from the WebGLORE report. Table 1 summarizes the model performance in terms of discrimination and calibration. Various metrics like AUC, F-score, sensitivity, specificity, and type-I error were listed for discrimination. Regarding calibration, the web service calculates the calibration error, aka, Brier score (i.e., mean squared error from predictions to observations) and conducts two types of HosmerLemeshow tests (H-test and C-test) to evaluate the goodness of fit. Table 2 shows information regarding estimated parameters. A small p-value (≤ 0.05) indicates the corresponding parameter plays a significant role in prediction.

Table 1. Summary of the global model performance. Discrimination Calibration AUC = 0.699 F-Score = 0.451 Calibration Error = 0.16 S.D. = 0.019 Sensitivity = 0.715 HL C-Test HL H-Test C.I. = (0.662, 0.737) Specificity = 0.592 0.01 0.51 Type-I error = 0.285 Table 2. Statistics on the globally estimated parameters. Predictor Beta SE Z- stat. df Intercept -1.0158 0.1940 -5.2370 1 Pain in left arm 0.0799 0.1498 0.5331 1 Pain in right arm 0.2939 0.1639 1.7933 1 Nausea -0.8456 0.2010 -4.2074 1 Hypoperfusion -0.4011 0.2840 -1.4126 1 ST elevation 0.2864 0.2292 1.2494 1 New Q waves -1.2899 0.2344 -5.5040 1 ST depression 0.1638 0.1621 1.0105 1 T wave inversion -0.4165 0.1485 -2.8055 1 Sweating 0.2450 0.1559 1.5721 1

p-value 0.0000 0.5940 0.0729 0.0000 0.1578 0.2115 0.0000 0.3123 0.0050 0.1159

OR N/A 1.0831 1.3417 0.4293 0.6696 1.3316 0.2753 1.1779 0.6593 1.2777

Results in tables above match exactly with those of a logistic regression model that was trained on the combined data from all four individual datasets, which validates our system. In addition, WebGLORE generates visual measurement like the ROC curve and the reliability diagram to illustrate discrimination and calibration, as shown in Figures 2 and 3, respectively.

Fig. 2. The ROC plot.

Fig. 3. The reliability diagram.

ACKNOWLEDGEMENT

iew

2

ev

rR

ee

rP

Fo

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioinformatics

The authors were funded in part by EDM Forum grant U13 HS19564 and NIH grants 4R00LM011392 and U54HL108460.

REFERENCES

Fung,B.C.M., Wang,K., Chen,R. and Yu,P.S. (2010) PrivacyPreserving Data Publishing: A survey of recent developments. ACM Computing Surveys, 42 (4), 1–53. Kennedy,R.L., Burton,a.M., Fraser,H.S., McStay,L.N. and Harrison,R.F. (1996) Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. European Heart Journal, 17 (8), 1181–1191. Wang,S., Jiang,X., Wu,Y., Cui,L., Cheng,S. and Ohno-Machado,L. (2013) Expectation propagation logistic regression (explorer): distributed privacy-preserving online model learning. Journal of biomedical informatics, 46 (2), 480–96. Wu,Y., Jiang,X., Kim,J. and Ohno-Machado,L. (2012) Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. Journal of the American Medical Informatics Association, 19 (5), 758–64.

For Peer Review

during information exchange. Availability: http://dbmi-engine.ucsd.edu/webglore3/ ... to combine information across institutions to build powerful global models. There are ... (SMC), which approaches leverage security enhanced protocols.

454KB Sizes 1 Downloads 441 Views

Recommend Documents

For Peer Review
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. Page 2. For Peer Re

For Peer Review
Now YOU whisper to him'. In this example, adults tended to interpret 'him' as Grover, whereas children interpreted. 'him' as Goofy. Another piece of research demonstrated that English children as old as five years did not make use of prosodic informa

For Peer Review -
2009). de Haan and colleagues employed a computational model to test the activity ..... functional images to 3 mm isotropic voxels that are the minimum spatial ...... Song XW, Dong ZY, Long XY, Li SF, Zuo XN, Zhu CZ, He Y, Yan CG, Zang YF.

For Peer Review - Dan Halgin
social capital across individuals, and how these differences relate to differences in outcomes (cf. Lin, Cook .... a dynamic property of individuals that can change as a result of life events (Gist & Mitchell,. 1992), as ..... business development, c

For Peer Review
black carbon, loss on ignition, urban soil, Glasgow, Coventry,. Stoke-on- .... policies for their management, policymakers need accurate measurements of SOC.

For Peer Review Only
bEntomology Division, ICRISAT, Patancheru, 502 324, AP, India, and. cForschungsanstalt ...... Micron 37: 624 − 632. Inglis GD, Goettel MS, Johnson DL. 1995.

Peer Review Sample.pdf
Page 1 of 1. Peer Review Sample.pdf. Peer Review Sample.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Peer Review Sample.pdf. Page 1 of 1.

Peer Review Sample.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Peer Review ...

Peer Review Form.pdf
Page 3 of 7. Peer Review Form.pdf. Peer Review Form.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Peer Review Form.pdf.

Peer to Peer Network: A Review
peer can initiate requests to other peers, and at the same time respond to ... operators even obstruct P2P traffic in their network in order to prevent ... File Sharing: technologies for sharing data between equal peers in large .... an API. Thus, JX

Online Course Peer-review
Course teacher's e-mail: [email protected]. Short Report. •. The course book written according to the University template? Yes.

Is Peer Review in Decline?
member at Princeton will count as one-third of a paper by Princeton. ..... has seen a large share of its top new Ph.D's take jobs in business schools over the last.

Digital Research Project Peer Review
At least two sources are database articles from one of the following databases: >Gale Global Issues in Context. >Gale Opposing Viewpoints. >Gale Virtual Reference Library. >EBSCOhost Academic Search. Complete n/a. X. Group Discussions (*please note s

Vaccine Peer Review 1000.pdf
VACCINE. Peer Review. The History Of The Global Vaccination Program. In 1000 Peer Reviewed Reports And Studies. 1915-2015. A Jeff Prager Publication.

Peer review: journal articles versus research proposals
to make up my mind. ... cannot apply for support of my own research. ... research proposals, both of which can be intellectually rewarding. .... and site visits.

Is Peer Review in Decline?
Study, and the Toulouse Network for Information Technology for their support. ..... helps both with disseminating the particular paper and for career-concerns ...

Understanding the Peer Review Process
Robert J.S. Thomas. Peter MacCallum Cancer Centre, Melbourne, Australia. Peer review of scientific literature is a time-honoured process with a long history. To paraphrase Relman,. ''It is hard to imagine how we, (sic. .... with no intervention, a gr

Method and apparatus for facilitating peer-to-peer application ...
Dec 9, 2005 - microprocessor and memory for storing the code that deter mines what services and ..... identi?er such as an email address. The person making the ..... responsive to the service request from the ?rst application received by the ...

Viability of Microsoft Peer-to-Peer Framework for ...
One example of this is Windows Mobile Smartphone devices support an email channel to allow them to communicate using the simple data services provided ...

Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems
A core problem in peer to peer systems is the distribu- tion of items to be stored or computations to be car- ried out to the nodes that make up the system. A par-.

Method and apparatus for facilitating peer-to-peer application ...
Dec 9, 2005 - view.html on Nov. 23, 2005. ...... to add additional or more complex translation rules to those used in the ..... identi?er such as an email address.

Query Protocols for Highly Resilient Peer-to-Peer ... - Semantic Scholar
is closest in spirit to the virtual content addressable network described by Fiat .... measures the cost of resolving a query in terms of the number hops taken by a ...

Query Protocols for Highly Resilient Peer-to-Peer ... - Semantic Scholar
Internet itself, can be large, require distributed control and configuration, and ..... We call a vertex in h to be occupied if there is a peer or node in the network (i.e., ...