@ITIM - 9° Congresso Nazionale di Telemedicina ed Informatica Medica -Trieste, 14-16 Dicembre 2008

Statistical web service in medicine Michaela Gündel1, Alessandro Donzelli1, Francesco Sicurello1 1

Italian Association of Tele Medicine and Medical Informatics (@ITIM) / International Institute of TeleMedicine (IITIM)

Via Lampugnani, 66, 20033 Desio (Milan), Italy Tel. +39 0362 627190 - Fax +39 0362 337840 [email protected], [email protected], [email protected]

Abstract In order to provide medical and clinical experts with a tool to do a set of basic statistical analyses on their data, even if they do not have experience in dealing with statistical software, we developed the data analysis web service presented here. The data on which the service can be run contains two different patient groups, such as a group of patients with diabetes and a complementary group of patients without diabetes. The data analysis web service returns a contingency table, the results of a statistical hypothesis test, and basic statistical values such as minimum, maximum, mode median and mean values of the variables, variance, standard deviation, kurtosis, skewness and correlation coefficients. In future versions, depending on user needs, further statistical analyses might be included.

1.

INTRODUCTION

Medical and clinical experts working with electronic patient data which they have available, e.g., in clinical data bases, often do not have the opportunity to automatically run a simple analysis on this data. It is often cumbersome for them to deal with data analysis software to do statistical tests to be able to evaluate their data and make hypothesis testing. For this reason, the web service presented in this paper was developed. Its aim is to provide easy access to basic statistical analyses of patient data to the average medical expert. Thus far, no graphical visualization of results is provided, but statistical indicators are provided to the user that can help to draw conclusions from the data at hand, such as testing for a null hypothesis of independence between data stemming from two groups of patients.

2.

METHODS

The statistical web service was developed in Java 1.51, R 2.7.22 and the Java-R Interface (JRI)3. The R statistical language [1] was chosen to do the statistical analyses as R provides a large set of already implemented statistical methods, such as statistical hypothesis tests, and is freely available under the terms of the Free Software Foundation's GNU General Public License4. The JRI nicely provides an interface between Java and R and allows to access all R functionalities. As a web container for the web service, Apache Tomcat 65 was chosen and Apache Axis26 as web service engine, using the Web Service Description Language (WSDL)7 and the SOAP8 protocol. 1

http://java.sun.com/ http://www.r-project.org/ 3 http://www.rforge.net/JRI/ 4 http://www.fsf.org/licensing/licenses/gpl.html 5 http://tomcat.apache.org/ 6 http://ws.apache.org/axis2/ 7 http://www.w3.org/TR/wsdl 8 http://www.w3.org/TR/soap/ 2

1

@ITIM - 9° Congresso Nazionale di Telemedicina ed Informatica Medica -Trieste, 14-16 Dicembre 2008

For the data analysis carried out in the R language, beside the base package [1], the R packages stats [1] (for statistical testing and values such as median, standard deviation and variance) and timeDate [2] (for the calculation of skewness and kurtosis) were used. The web service loads a file containing various R functions during runtime that were programmed specifically for the web service. These functions are then called in the Java code through the Java-R Interface.

3.

PROGRAM DESCRIPTION

The data analysis web service developed uses a set of R functions accessed via Java code to return a set of statistical data. The analysis is done on two groups of patients, such as patients with and without diabetes. The statistical data that is returned is calculated on each of the two patient groups on the one hand (such as minimum or maximum values for each group) and between the two groups on the other hand (such as a p-value returned by statistical testing). The logical schema of the analysis system (represented in Fig. 1) is divided into three sections: data input, analysis (calculation of basic statistical indicators and statistical testing) and output.

CLINICAL QUERY

XML to TABULAR

DATA PRESENTATION INTERFACE

TABULAR to XML

CORE ANALYSIS

Fig. 1: Logical schema of the analysis system

1

Data input

Data retrieved from a clinical query stemming from one or more clinical databases are transmitted as a SOAP message structured in an XML file that contains a set of variables from a selected subpopulation and values for the same variables from the complementary sub-population group. Data are reorganized in a tabular schema and classified according to their format (qualitative or quantitative) (see yellow boxes in Fig. 1). As an example of data transmission we can consider a query for the selection of “stenosis presence” (as Boolean value where 1 means present and 0 non-present) for female patients with diabetes and age more than 40 years. The SOAP message that will be sent to the statistical tool will contain the identification ID of the databases, the kind and the value of selected fields for the patient groups with the indicated criteria for both the sub-population of interest and the complementary patient group; in our example all female patients without diabetes, those with diabetes but with an age equal and less than 40 years and all male patients.

2

Calculation of basic statistical indicators

A statistical overview is created via an automated process, depending on the format of each variable. In particular, there are different procedures for qualitative and quantitative data. For qualitative data, a contingency table is generated for each of the different variables. These tables can be used to record and analyse the relationships between variables. They contain the summed-up occurrences of the values and the percentages for the sub-groups of patients against the possible values of the variable. For quantitative data, in each subgroup of patients, minimum, maximum, mode median and mean values of the variables are returned, as well as the variance and the standard deviation (measures of dispersion of the data). Kurtosis (measure of the "peakedness" of the probability distribution) and 2

@ITIM - 9° Congresso Nazionale di Telemedicina ed Informatica Medica -Trieste, 14-16 Dicembre 2008

skewness (measure of the asymmetry of the probability distribution) are calculated, too. Furthermore, for each variable inside the groups of patients, the correlation coefficient is calculated as a measure of linear relationship between the values.

3

Statistical testing

For qualitative data, a statistical chi square test [3, 4], used for testing a null hypothesis of independence of rows or columns, is carried out on the patient groups to evaluate whether the data between the sub-groups show a statistically significant difference of association. This is measured by the p-value, i.e., the probability of obtaining a result at least as extreme as the one that was actually observed, given that the null hypothesis is true. For the comparison of quantitative variables between patient groups, a two-sided Student's t-test is done. This kind of statistical indicator can be used to determine whether there exists or not a statistically significant difference in the mean values of underlying distributions which are assumed to be normal. These tests result in a value for the t-statistic, the degrees of freedom, the p value and the confidence interval as an indicator of the reliability of the estimate.

4

Data output

Independently of the type of the variables, the web service returns the resulting statistical indicators in a uniform way; i.e., the format of the returned result is the same for qualitative and for quantitative data. Whenever a value could not be calculated (for example, a minimum or maximum value in the case of qualitative data), “NA” will be returned instead. Thus, the result contains the contingency table, the p-value resulting from the statistical test, the value of the test statistic, the confidence intervals and the degrees of freedom of the statistical test. Furthermore, minimum, maximum, mode median and mean values of the variables are returned, together with the variance and the standard deviation of the distributions and the calculated kurtosis and skewness values. Furthermore, for every variable inside the patient groups, the correlation coefficient is returned. After data analysis of all variables, the results of the elaboration are formatted in an XML schema using the same procedure that reorganizes data from XML to a tabular schema. Results are finally transmitted as a SOAP message to the calling interface where the data can be represented and visualized.

4.

FUTURE DEVELOPMENTS

In general, anything that can be done using R and its packages is thinkable for future improvements which will be made when there is a need for further or more sophisticated analyses, depending on the needs of the users. Examples of possible future developments include the integration of various clustering algorithms to cluster patient data into groups showing similar phenotypes. For the “average” user, also a Graphical User Interface (GUI) might be of interest that accepts data input from the user and acts as a mediator between the user and the web service. This would provide an opportunity for the user to have an easy access to the tool and to have the data visualized in tables or rendered graphically.

5.

CONCLUSIONS

The Data Analysis web service presented it this paper was primarily designed for the analysis of data stemming from clinical databases. However, it is not necessarily restricted to clinical data but can also be used for any other kind of data that is either qualitative or quantitative and contains two groups for which statistical hypothesis testing should be done. Fields where the web service might also be of interest include other statistical analyses in the life science domain in general, such as for microarray data, or even on data stemming from economics. The tool is usable also by non-statisticians such as medical doctors who have at hand a set of clinical data and would like to run a basic statistical analysis on them. This enables the analysis to 3

@ITIM - 9° Congresso Nazionale di Telemedicina ed Informatica Medica -Trieste, 14-16 Dicembre 2008

be run on any kind of data that contain two groups of either qualitative or quantitative data. The drawback resulting from this simplicity and generality of usage in the current version of the web service is the limited kinds of analyses that are done by the tool. However, in future versions more different kinds of analyses might be included, depending on the needs of the users.

6.

BIBLIOGRAPHY

[1] R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.Rproject.org. [2] Diethelm Wuertz and Yohan Chalabi (2008). timeDate: Rmetrics - Chronological and Calendarical Objects. R package version 280.79. http://www.rmetrics.org [3] Hope, A. C. A. (1968) A simplified Monte Carlo significance test procedure. J. Roy, Statist. Soc. B 30, 582–598. [4] Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics 30, 91–97.

4

convegno nazionale

Statistical web service in medicine ... +39 0362 627190 - Fax +39 0362 337840 ... The data analysis web service returns a contingency table, the results.

153KB Sizes 0 Downloads 126 Views

Recommend Documents

V Convegno Nazionale SIAA english def -
The Fifth National Conference of the Italian Society for Applied Anthropology (SIAA) will be an occasion for discussion around the forms of collaboration and mutualism that arise in response to crises, generating transformative practices for society.

Convegno PAB.pdf
Tridente e di recente pubblicazione dopo ben otto anni di ricerche e cinque di scrittura della biografia di Enzo Ferrari “un. grande italiano del Novecento” (Ferrari Rex. Edizioni Giunti). È stato Direttore della Comunicazione Maserati dal 2005

LEGA NAZIONALE PROFESSIONISTI SERIE A COMUNICATO ...
Jan 21, 2016 - premesso che in occasione delle gare disputate nel corso dei quarti di finale sostenitori delle. Società Napoli e Spezia hanno, in violazione ...

Locandina convegno 25 marzo_OK.pdf
Pier Marco Passani, ASL 5 Responsabile S.S.D. Assistenza. Penitenziaria C.C. SP. La valenza terapeutica e riabilitativa dell'istruzione in carcere. Dott.ssa ...

Agenzia Nazionale per i Giovani - Intercultural Peace Calendar
Validated as Non profit organization and accredited as Voluntary service ... of seminars in secondary and high schools on European programmes for youth; ... the online guide “Right to information”, available on www.right-to-information-eu;.

the case study of tuta absoluta - Accademia Nazionale Italiana di ...
and resilience in the face of social, cultural, eco- .... model the effect of temperature on adult longevi- ty. ..... an international network of co-developers for.

Atto-Costitutivo-dellUnione-Nazionale-degli-Avvocati ...
Page 3 of 13. Page 3 of 13. Atto-Costitutivo-dellUnione-Nazionale-degli-Avvocati-Amministrativisti-stato-documentazione-.pdf.

Convegno SIDI 2017 - Call for Papers_IT.pdf
gli obblighi degli Stati di origine verso gli Stati di accoglienza (es., compensa- zioni di tipo economico);. il principio di solidarietà dell'Unione europea e il ...

delibera n.31 - CA 13-05-2014 - Convegno ISIA.pdf
Andrea Spatari Giuseppe Furlanis. Page 1 of 1. delibera n.31 - CA 13-05-2014 - Convegno ISIA.pdf. delibera n.31 - CA 13-05-2014 - Convegno ISIA.pdf. Open.