implementation of dynamic taxonomies for clinical ...

Viewer
Transcript

IMPLEMENTATION OF DYNAMIC TAXONOMIES FOR CLINICAL GUIDELINE RETRIEVAL Dennis Wollersheim, PhD Student Dept. of Computer Science and Computer Engineering, La Trobe University Abstract Therapeutic Guideline Limited (TGL) is an Australian medical guideline publishing house. This paper discusses appropriate ways of using TGL content in a computer implemented guideline (CIG). Existing CIG representations are reviewed in light of TGL features such as the need for incremental development, strong existing relationship with guideline consumers, broad shallow guideline content, need for opportunistic integration into medical records systems, and desire to increase guideline usage. We suggest that guideline usage will be increased by facilitating information retrieval. Dynamic Taxonomy is proposed as an appropriate vehicle for this task. Dynamic Taxonomy is an indexing/browsing scheme that uses existing medical taxonomies to expand the number of paths into the guidelines. This is then browsed via a dynamically generated index tree. The complexity is kept manageable through a conceptual zoom, which restricts the tree to a related subset of the guidelines and their corresponding index terms.

INTRODUCTION This work is part of a project aimed at making medical guidelines more effective by putting them closer to the point of clinical decision making. This will be done by using computer implemented guidelines (CIGs). There exists a large stockpile of medical guidelines1; they are mainly in narrative text format. In this paper, we will consider the example of one specific set of guidelines, those produced by Therapeutic Guideline Limited (TGL), a respected Australian guideline publisher. TGL guidelines are widely recognized in Australia, and are gaining popularity overseas, with translations into Chinese and Japanese. They consist of authoritative recommendations distilled from available evidence through expert consensus decision. The guidelines contain concise information covering a broad range of therapeutic decisions. The paper version of the books is in a handbook format, meant to be used as a quick reference, often within the clinical consultation. A recent advance in guideline production has been a conversion to an electronic book format, taking the form of hypertext HTML based CIG. TGL’s position in the Australian market means that it has particular requirements. Their books are well respected among clinicians, selling 50,000 copies of the most popular book (TG: Antibiotic [1]). This good relationship with their audience has been fostered over many years, and is a dual edged sword. They have the good will and trust of their readers, which 1

For example, see http://www.guideline.gov.

means there is leeway to try new things. But with this relationship comes high expectations; any new development must be at least as functional as the current product. Another factor is that TGL is not a research entity. They are in the business of producing guideline products, and this process must continue during any changeover to CIG. This, and the fact that there is limited resources for revolutionary change, auger the need for a product that can be implemented incrementally. Thirdly, a TGL CIG is unlikely to have maximum value as a standalone application. It’s full power will come when integrated into computerised medical records packages. There are many such packages, and even within a single package, capabilities are constantly changing. This gives rise to conflicting expectations. On one hand, the CIG must be able to integrate into a wide variety of packages; on the other, it must provide standalone utility so as to provide value even when the integration is poor. This, combined with the well know poor data quality within the electronic patient record, argues for an opportunistic CIG, which can take advantage of patient data when available, and still provide adequate functionality when the data is lacking. TGL is not merely motivated by the selling of books. It started as a government body concerned with inappropriate antibiotic usage, and even today, the writers are unpaid. Often, the writer’s primary motivation is the quality use of medicines. As such, TGL is not merely concerned with selling books; it wants the guidelines to change clinician’s behaviour. For behaviour change to happen, the books must be read, the CIG consulted. Quality information, or a efficient CIG, is not effective if it is not used. Many medical informatic systems are unusable as delivered [2]. A TGL CIG must be usable in such a way that it builds on the already established relationship with the clinician. Such a system must have greater usability than the current narrative text format. It cannot be difficult. To use an ancient behaviour change metaphor, it must be more ‘carrot’, and less ‘stick’.

A REVIEW OF COMPUTER IMPLEMENTED GUIDELINE We categorise CIG is by level of organisation [3] . One end of the continuum is very similar to paper based books, with the text merely transposed to a computer screen. This is the present state of TGL CIG. Improvements on the book include a hyperlinked table of contents, index, and a full text search. Figure 1 shows an example of the TGL product.

Descriptive CIG The next level of CIG organisation is ‘described’ text. Examples of this can be found in the guideline mark-up languages HGML[4] and GEM[5]. Structure is added via mark-up tags, and the original text is left intact. Arbitrary mark-up tags mean that these languages can represent a wide variety of information, ranging from micro guideline detail to broader summary data. Unfortunately, there are few applications for this type of data. A key contribution of descriptive CIG is an enumeration of the types of information that comprise a guideline.

Algorithmic CIG Algorithmic CIG is a more structured CIG. It is a set of instructions that allow machine execution of a guideline. Current generation of algorithmic CIG usage follows strict execution paths, similar to low level computer programs.

Figure 1 Current generation TGL CIG. Hypertext version of text guidelines.

Most of the energy in the CIG community has been focused on the algorithmic realm. Examples include Proforma[6], GLIF[7], Prodigy[8], and the Arden Syntax[9]. This work has solved some important problems, such as: • • • •

the usefulness of separating the guideline knowledge base from the execution engine, the use of ontologies to represent medical knowledge, integration with patient data, and some strategies to grapple with human frailties such as incomplete patient data.

A key feature of algorithmic CIG is that it is executable. This is defined as taking a set of inputs (including patient data and clinician choices) and returning some output (such as a

printed prescription, or other recommendation). This procedure can be arbitrarily complex, and so, it is especially useful in places where the human mind gets confused, for example: • • •

multifactor decisions, decisions that have duration over time, or multi-part regimens.

Algorithmic CIG is very different from narrative text. In the former, the clinician is forced along a rigid path through the guideline, while the latter allow free entry and traversal, and the ability to arbitrarily zoom from detail to overview [3]. The strictness of algorithmic CIG would not be conducive to the existing good TGL-clinician relationship. The additional structure that would come from translating from narrative text into a more structured format would expose inconsistencies in text guideline construction (eg “dosage for a child”, where child is not defined), but because of the TGL requirements, this additional structure had be best added incrementally. Algorithmic CIG, because of the necessity to radically restructure, is not conducive to incremental implementation. TGL books are designed to be read selectively. They are references, composed of chunks of information. Because of this, TGL has lavished much work on retrieval strategies. These take the form of book features that tell the user where they are, what to look at on a page, where to go for certain information. In books, retrieval features include the index, table of contents, page and paragraph headers, and font type, size, and colour. These conventions have been made powerful through book’s long relationship with humanity. Algorithms do not have that relationship. Even if being browsable was their aim, the foreignness of algorithmic CIG means that it cannot provide the same markers. This leaves algorithmic CIG with few retrieval strategies. Algorithmic CIG implementations work best on single complex areas of medicine (eg asthma, and diabetes). TGL guidelines focus on providing concise simple recommendations, and cover broad areas of medicine. Because TGL guidelines are so simple, the strength of algorithmic CIG can be implemented as mere text filtering routines. A TGL CIG is better aimed at improving retrieval.

IMPROVING RETRIEVAL As a result of the above thinking, our current goal is to construct CIGs which build on the rich TGL narrative resource, and preserve the flexibility of narrative access. We aim to take the strengths of the current narrative text form, and make it more usable. That means keeping the guidelines browsable, keeping the ability to zoom in and out, to go from micro to macro detail. In the medical domain, one of the major problems is that there is a large and growing amount of information to deal with. Because of this, usability is directly related to retrievability. Retrievability is measured in the information retrieval (IR) community by the terms of precision and recall. Recall is the ability of a system to get all the data that are relevant to a query, and precision is the accuracy, the percentage of things that are correctly retrieved.

In traditional IR, precision and recall are used to describe a computer system in isolation. Here, we will use them to discuss the combination of computer system and human user. In this context, recall is defined as the breadth of the access route, or the number of ways to get at a certain bit of information. Precision is the ability of giving the user exactly the bits of information that they are looking for, eliminating noise or extraneous bits of information.

DYNAMIC TAXONOMIES Using these definitions, we propose to improve IR through an indexing system based on the notion of a Dynamic Taxonomy (DT). DT was developed as a way of sifting through large amounts of data. At its base it uses a domain specific taxonomic hierarchy consisting of concepts connected by is-a relationships. Examples from the medical domain include UMLS2 and SNOMED3. Concepts from the hierarchy are used to classify chunks of guideline text. The hierarchy is then used as an augmented index for guideline chunk retrieval. Navigation is done via the operations of browsing and zooming. Figure 2 shows an example of this process from simplified medical guideline taxonomy. There is a set of guideline atoms (rectangles). These are categorised (dotted lines) by concepts (circles) in a semantic hierarchy (solid lines).

Pathogens Medical concept taxonomy

Streptococcus pneumoniae

Drugs

procaine penicillin

Conditions

vancomycin amoxycillin Meningitis

Enterococcus faecalis

Guideline atoms

Acute cholecystitis

Conjunctivitis

Hospital acquired meningitis

Gonococcal conjunctivitis

Meningitis : Haemophilus influenzae type b

Figure 2. Medical guideline dynamic taxonomy example

2

See http://www.nlm.nig.gov/research/umls.html or 10. Campbell, K.E., et al., Representing thoughts, words, and things in the UMLS. of the American Medical Informatics Association, 1998. 5(5): p. 421-31.

3

See http://www.snomed.org

Browsing is done by traversing the concept hierarchy tree. Guideline chunks exist as leaves of the tree. Zooms are more complex. To zoom, the user chooses a concept from the hierarchy, and asks to zoom. The system then presents a revised hierarchy, eliminating unrelated concepts and chunks. Figure 3 shows an example of a zoom on the concept amoxicillin. The atoms categorised by amoxicillin, and the other concepts that categorise these atoms, are kept. The other unrelated concepts and atoms are discarded, thinning the tree. Further zooms act on the reduced tree[11].

Pathogens Medical concept taxonomy

Drugs

procaine penicillin

Conditions

amoxycillin

Enterococcus faecalis

Guideline atoms

Acute cholecystitis

Conjunctivitis

Gonococcal conjunctivitis

Figure 3. The reduced taxonomic tree and the infobase after a zoom on concept amoxycillin

In more formal terms, the operation of zoom selects the chunks classified by the zoom concept (or classified by descendents of the zoom concept) . The taxonomy is then thinned, eliminating all concepts that do not describe the remaining chunks, facilitating further browsing. To create the DT, guideline chunks are classified under multiple concepts from the taxonomy. This is done at maximum granularity, aiming for chunks that are of a minimum length which still preserve comprehendability. This size minimisation increases precision. For example, a Figure 4 shows a chunk of guideline text describing therapy for Trachoma. It would be classified under concepts and . As noted, these concepts reside in a the taxonomic tree; for example, Chlamydia trachomatis is-a , and is-a . DT can be used by TGL in an incremental way. Matching the current index and table of contents terms with terms from a medical taxonomy would provide an initial structure. As

we add further classification, recall increases. On the other hand, precision increases as we make the chunks smaller.

Chlamydia trachomatis conjunctivitis or Trachoma Acute infection often resembles acute bacterial conjunctivitis. Chronic or recurrent infection with certain strains may produce the syndrome of trachoma. Conjunctival swabs for antigen detection, PCR/LCR techniques or culture should ideally confirm the diagnosis, as successful treatment requires systemic therapy. There is no evidence to suggest that additional topical therapy provides any benefit. Figure 4 Text categorised by DT tags

DT preserves the clarity of narrative text, and improves recall through addition of multiple access paths. It additionally preserves the index, and table of contents, which are key narrative retrieval strategies. Because a DT acts on any input tree, filtering based on existing patient data can be used to reduce the initial tree. This would have the effect of increasing precision. For example, if the presenting problem of the patient is of respiratory nature, the initial DT will contain only respiratory concepts. This integration with patient data would also be advantageous to the system because it would maximise the value of the electronic patient record, and therefore encourage good data entry practices.

Classification hierarchies The DT approach provides a way to extract value from the relationship information specified in existing hierarchies. The medical field is rich in such hierarchies. This means there is a wide range of possibilities. Examples of possible classifications include: • existing guideline structure (eg. table of contents, index, text styles), • semantic elements, (eg. problem, condition, reaction, warning, reaction description, reaction mechanism, drug, disease, micro-organism), • elements from existing algorithmic and descriptive guideline formats (eg. GLIF, ARDEN, PROFORMA, GEM), • existing taxonomies (eg. SNOMED, UMLS), or • combinations of the above; eg WORDNET and existing index. The choice of hierarchy needs to be further explored. Firstly, the hierarchy used will have to be in a form that is useful to clinicians, that complements the way clinicians seek guideline information. Prototyping, and consultation with clinicians, will inform this decision. A second factor in hierarchy choice is the ease of construction. Existing hierarchies such as UMLS were developed as human classification tools, and their suitability as a basis for DT is unknown. It is possible that the existing hierarchies are too inconsistent for use in this

manner. On the other hand, the inconsistency may be inconsequential, merely making the index less complete. Lastly, the assignment of categories to guideline chunks has potential for problems. In some cases, such as the existing table of contents and index, the assignment has already been done; one needs to merely marry those concepts into an existing hierarchy. Another way of categorisation is automatic indexing through noun phrase extraction and matching to existing hierarchies. This has been reported to have up to 80% success[12]. Finally, manual indexing is a possibility. It would be labour intensive, but is feasible given the relatively static nature of the guidelines.

CONCLUSION A system that implements algorithmic CIG for TGL is not impossible. The CIG rules that would be needed are straightforward and relatively simple. The difficulty with such a system, and many other medical informatics projects, is in the implementation. We foresee that it would be difficult to make a system that will be as well used as the current TGL guidelines using this approach. In addition, it is not feasible for TGL to carry out such a revolutionary project. Because of this, we have chosen a different approach. We have designed an implementation that delivers a flexible medical decision support tool, maximising clinical utility, without imposing a rigid structure or workflow. It is a framework that uses a powerful index browsing tool to combine the structure of medical taxonomy with an existing medical reference.

ACKNOWLEDGEMENTS This paper describes research being carried out within projects funded by the Australian Research Council, National Prescribing Service, and Therapeutic Guidelines Limited. The current project would not have been possible without the support and contributions of Dr. Wenny Rayahu, Dr. Bryn Lewis, Dr. Ken Harvey, Dr. Jonathan Dartnell, Assoc. Prof. Teng Liaw, and Ms. Elizabeth Deveny.

References 1. Antibiotic Writing Group for Therapeutic Guidelines, Therapeutic Guidelines : Antibiotic. 11 ed. 2000, Melbourne: Therapeutic Guidelines Ltd. 2. Berg, M., Patient Care Information Systems and Healthcare Work: A Sociotechnical Approach. International Journal of Medical Informatics, 1999(55): p. 87-101. 3. Wollersheim, D. A Review of Decision Support Formats with Respect to Therapeutic Guidelines Limited Requirements. in Ninth National Health Informatics Conference. 2001. Canberra, ACT, Australia: Health Informatics Society of Australia. 4. Hagerty, C.G., et al. HGML: A Hypertext Guideline Markup Language. in Proc. Annual Meeting of the . Annual Meeting of the American Medical Informatics Association (AMIA). 2000. Los Angeles.

5. Shiffman, R., et al., GEM: A proposal for a more comprehensive guideline document model using XML. JAMIA, 2000. 7(5): p. 488-498. 6. Fox, J., N. Johns, and A. Rahmanzadeh, Disseminating medical knowledge: the PROforma approach. Artif Intell Med, 1998. 14(1-2): p. 157-81. 7. Peleg, M., et al., GLIF3: the evolution of a guideline representation format. Proc AMIA Symp, 2000: p. 645-9. 8. Sugden, B., et al., The PRODIGY Knowledge Architecture Requirements for Chronic Disease Management in Primary Care. 1999, Stanford Medical Informatics. 9.

Jenders, R.A., The Arden Syntax for Medical Logic Systems. 2000.

10. Campbell, K.E., et al., Representing thoughts, words, and things in the UMLS. of the American Medical Informatics Association, 1998. 5(5): p. 421-31. 11. Sacco, G., Dynamic taxonomies: a model for large information bases. IEEE Transactions on Knowledge & Data Engineering, 2000. 12(3): p. 468-79. 12. Nadkarni, P., R. Chen, and C. Brandt, UMLS Concept Indexing for Production Databases: A Feasability Study. Journal of American Medical Informatics Association, 2001. 8: p. 80-91.

implementation of dynamic taxonomies for clinical ...

systems, and desire to increase guideline usage. We suggest that guideline usage will be increased by facilitating information retrieval. Dynamic ... In this paper, we will consider the example of one ..... Disease Management in Primary Care.

Download PDF

319KB Sizes 1 Downloads 221 Views

Report

implementation of dynamic taxonomies for clinical ...

Recommend Documents