Query-driven Ontology for BigData.pdf

Viewer
Transcript

Title: Query-driven Ontology for BigData Authors: Cuadra, D., Castro E., Morato, J., Albacete, E., Calle, J. Emails: {dcuadra,ecastro,jmorato,ealbacet,fcalle}@inf.uc3m.es Affiliation (all): Carlos III University of Madrid Abstract: The proposal presented in this paper is focused on improving the semantics of queries through a linguistic ontology. The data interpretation with their linguistic relationships will make easy the query formulation and also could help to carry out powerful analyses over the BigData. We propose a way to improve queries by natural language tools. Our approach is twofold: On one hand, Ontology is used to include the context of the user. On the other hand, reliability predictors are included in the analysis in order to answer with relevant data. As a result, the proposal brings together linguistic and retrieval technologies to provide an innovative approach to a common concern. Type of contribution: regular paper Keywords: BigData, Ontology, knowledge model, knowledge base population , trustworthiness

Query-driven Ontology for BigData Abstract. The proposal presented in this paper is focused on improving the semantics of queries through a linguistic ontology. The data interpretation with their linguistic relationships will make easy the query formulation and also could help to carry out powerful analyses over the BigData. We propose a way to improve queries by natural language tools. Our approach is twofold: On one hand, Ontology is used to include the context of the user. On the other hand, reliability predictors are included in the analysis in order to answer with relevant data. As a result, the proposal brings together linguistic and retrieval technologies to provide an innovative approach to a common concern.

1

Introduction

Big data is a paradigm with high scalability which was created as a result of needing of companies such as Yahoo and Google for growing the competitiveness up by improving search engines and to enable fast and smart data analysis through parallel and distributed computing. It is able to handle so vast and complex datasets computing which it becomes difficult to process using traditional databases. Despite the short history, these platforms have deployed a lot in data processing of modern information systems, in general and in particular to revolutionize and to enrich the geography world and spatial databases. Such that many experts claim that big data is here to stay, and it will grow more in the future [25][11]. This paradigm is focused on using the data to make powerful analytics process taking account different data structure, heterogeneity, dynamicity, multimodality and scalability. BigData is based on three main foundations: volume, variety and velocity. In 2012, twenty five experts on database technologies discussed the opportunities and challenges posed by Big Data for the Semantic Web, Semantic Technologies, and DB communities [7]. As a result, the researchers expressed their concerns about the fact that most of these data are often not made meaningful: the challenge is not only in the architecture behind big data, but also in the semantics involved and the access to meaningful data. Specifically, they addressed points related with answer queries, entity resolution, data quality, controlled vocabularies, mappings and rankings. Big data systems need ways to access and interact with massive volume of data. Dealing with large amounts of data involves new access strategies. Natural language interaction provides a powerful mechanism for querying even more than one big data systems at a time. Therefore, the main objective of this paper lies in increasing the quality and semantic value of the queries to perform consults to massive unstructured datasets. This semantic value will improve the performance of systems in terms of accuracy of analytical results, facilitating their use and management in the future. The approach proposes to design and develop mechanisms to discover structures and semantic relationships underlying these queries and datasets. These relationships are common in these resources, but often are hidden, especially when the data come

from heterogeneous sources, such as public web pages, social networks, or is acquired by crowdsourcing. This goal must be achieved without compromising critical requirements, as the system’s efficiency processing queries. Such mechanisms include automatic labeling, query analysis, structural analysis of warehouses, semantic association among nodes, detection of information spurious, and verification and characterization of sources. These processes will require the support of a multidimensional linguistic ontology that allows discovering semantic and relationships by means of defining dimensions in semantic labels and applying semantic similarity between concepts. Finally, information will be ranked under different criteria to facilitate their use, not only by semantic navigation but also by ranking the trustworthiness of these datasets. In this paper, multidimensional linguistic ontology will be presented. This is the main support to guarantee the implementation of mechanism previously presented and therefore, the first step to achieve more quality and semantic value of the information contained in a Big Data. The paper is structured as follows: firstly we present a state of the art about research work related to information extraction, modeling linguistic ontology and mechanisms to populate them automatically, big data querying and linguistic clues to assess reliability. The next section shows the proposal both the ontology modeling and the used mechanisms to discover the semantic relationships among the concepts loaded to populate the ontology by different sources. Finally, we conclude with the ongoing work and expected results.

2

State of the art

Question answering techniques imply identify the keywords of the query and the relationships between these elements. This is carried out by means of information extraction algorithms to extract relevant semantic relationships and the context where they are expressed. Next we show a summary of the most relevant techniques that have been presented in the literature to address the task learning of semantic relations among concepts. In most cases, these techniques are divided in two categories, in those that learn taxonomic relations and in those that learn non-taxonomic relations between concepts. On one hand, many approaches about the learning of taxonomic relations have been developed in order to organize domain concepts into taxonomies as detailed in [9]. On the other hand and despite they have received less attention, some studies focus their efforts on identifying non-taxonomic relations [31], [27]. Techniques for finding taxonomic (or hierarchical) relations are generally classified into three different groups of approaches: pattern-based, clustering-based and combinations of both. The pattern-based techniques are a set of lexico-syntactic patterns defined by the user and then applied to the texts in order to obtain instances of taxonomic relations. An example of this technique is presented in [17]. Linguistic patterns have been extensively used to develop non-supervised information extraction systems and knowledge acquisition. These approaches use regular expressions that indicate a relation of interest within the text. General lexical-syntactic patterns can be

designed by hand or can be learned by using a set of pre-related concepts and domain texts. One of the most important successes resulting from the application of patterns is the discovery of taxonomic relationships. Hearst [14] studied and defined a set of domain-independent patterns for hyponymy discovery which have provided the basis for further refinements and learning approaches. From the work of Hearst [14] and with the development of information extraction techniques the number of indicators to build more complex linguistic patterns has increased. These indicators add to the grammatical and syntactic categories of Hearst elements like lists of trigger words, generic and synonyms words and lists of terms associated with a named entity. From this data pattern generation can be improved by manual generation and descriptive statistics. Manual generation involves much processing time, domain experts and inflexible patterns, with little recall and low portability to other domains. Supervised learning with automatic patterns are generated from observations manually classified in a part of the corpus. Typically these algorithms are HMM, decision trees, maximum entropy, Naive Bayes, SVM and CRF. On the other hand, clustering-based techniques are used for finding taxonomic relations between concepts [24, 26]. These techniques combined approaches used firstly lexico-syntactic patterns in the text and, secondly clustering techniques to filter the extracted taxonomic relations. Although these approaches are more flexible than the manual and identifies easily difficult patterns, there are some problems due to overfitting, selection of attributes to analyze, incompressible models generated, cost to annotate the training corpus and so on. Another technique is bootstrapping. It is a semi-supervised learning from examples (patterns or entities) that will automatically generate patterns from known entities. The combination with linguistic information and thesauruses give good results. Although these approach, the question that arises, is whether all statements that form the corpus are equally reliable. In the most related works, the relationships between concepts detected are specifically “is-a” and “part-of” relationships. However, the proposal [2] defines more relationships between concepts and proves better results in the calculation of the semantic similarity taking account more linguistic relationships. Some researches are been addressing to provide more semantics in the Big Data. One remarkable work is the Optique Project [3] is focused on using an ontology to capture user conceptualizations, and declarative mappings to transform user queries into complete, correct and highly optimized queries over the data sources. This project is built over open standards and protocols such as RDF, OWL, SPARQL the OWL API, and the openRDF project. The SMART project was a pioneer work on QA, conducted for 25 years in the MIT. The goal was to implement a query-answer system, mapping interrogative sentences to data in a DB. Recently, the project has been focused on big data [15]. The proposal presented here is supported by a more complete linguistic ontology based on the human communication which improving both user-queries and the quality of stored data. The data will be cleaned, filtered and verified to increase the trustworthiness into the analytics process results. Web credibility studies have a long history [30]. The [10] work focus on studying medical resources, but only based on elements that users identify as credibility markers among users. Since 2003 this field has been extended to the Semantic Web [13], [12], trying to analyze the layer of confidence. This layer is reflected in Big Data in

the so-called veracity aspect proposed by IBM in 2011 [16]. Although the implementation of these layers has been very limited and tools to develop are scarce [23]. In 2012, Assaf and Senart [4] create a list of factors that may influence in the reliability of the data. Currently, Lukoianova and Rubin [19] have defined a veracity index to assess trust and credibility. Reliability of resources is a need frequently identified as critical in datasets [18, 21]. While there are techniques that assess the quality of these resources through linguistic markers [23], unstructured resources, and measure the appropriateness of the collection for analysis [29], these technologies have to be validated for their inclusion in Big Data. These algorithms and heuristics resources may be used to rank the output, in the same way that web search engines used to do [7, 22]. Therefore, the proposal presented is addressed to give support linguistic to query into a big data with a multidimensional linguistic ontology that allows discovering semantic and relationships by means of defining dimensions in semantic labels and applying semantic similarity between concepts. In addition, the information will be ranked under different criteria to facilitate their use, not only by semantic navigation but also by ranking the trustworthiness of these datasets extending the works [11, 12] to the big data features.

3

Proposal

The core concern of this work is to develop a strong Ontology in order to support diverse processes related to BigData systems: enhance queries across non-structured storages by means of similarities between terms and between concepts in a domain; facilitate proper interpretation of queries expressed in natural language; and assist the data refinement and accurate storage through the acquisition processes. To attain such goals, the need of knowledge is beyond common ontology models. The proposal is presented in three steps: ontological approach, knowledge acquisition, and trustworthiness of the knowledge sources. 3.1

Multi-dimensional Ontology focused on Calculation of Similarities

The here applied ontological design is based on several knowledge dimensions described in [8]: semiotic, sort (is-a), compositional (part-of), essential, restrictive, descriptive, and comparative. First six ones enclose specific knowledge, while the last one is derived from the other and stands for main strength of the approach, the calculation of similarities between terms and concepts in the human interaction domain. The approach was evaluated in [2], proving its advantages in similarities calculation. Therefore, it will be taken as a basic supporting tool in the proposal, and will be next briefly described. The knowledge base gathers seven dimensions, including some well-known and widely applied, as the sort dimension (concept is-a concept) or the compositional dimension (concept is part-of concept). Some others are less frequent, but not uniquely applied. The semiotic dimension observes knowledge regarding the relation be-

tween concepts and terms, but also the joint of set of terms in languages, and the relationships between languages. Concepts in this dimension are identified with WordNet synsets, so the ontology can be easily applied in conjunction with many tools. The fourth one, the essential dimension, represents the general taxonomy of concepts. This dimension requires a design suited to the purpose of the ontology. For example, the WordNet’s essential dimension classifies the concepts into four main linguistic categories (verb, noun, adjective, and adverb). When evaluating this model, the authors designed a taxonomy suited to human-like interaction, which improved the results in the experiments carried out in similarity calculation. That design organized 25 concepts (regarding classes) in a hierarchy with 16 leaf nodes. The singular dimensions in the approach are called restrictive and descriptive. The restrictive dimension describes the compatibility between concepts in a single statement. For example, the action ‘to swim’ is directly related to the concepts ‘sea’, ‘lake’, ‘river’, … but restricted to the rest. This dimension helps to understand and apply context, as well to process and interpret some tropes. Finally, the descriptive dimension observes the relationship among three kinds of concepts: generic concepts, attributes (likely to characterize first concept), and domains (of values) on which attributes are defined. It also observes the relationship between domains for the same attribute (enabling translation of terms from those domains, by means of fuzzy labels and subject to a certainty value). The calculation of similarities is performed separately for each dimension, and further combined. The aggregation observes all dimensions with different weights. Those weights are not constant but initially trained (for each concept, for each user) and later on evolved through the interaction with the system. This is supported by the fact that some facets are more relevant for certain concepts. For example, for some concepts the descriptive dimension should be more weighted than the sort. Additionally, some type of users would be paying more attention to certain facets of the concepts. The authors proposed and evaluated three sort of trainings: concept-centered, user-centered and hybrid. For all three, the results were improved with the use (the more training and experience, the more accurate results the system produced). 3.2

Focused Population Techniques

Main drawback of that ontological approach is acquiring the knowledge, especially for the rarest dimensions. Knowledge bases should not be directly populated by experts, because of the vast amount of knowledge required would entail unaffordable cost (nevertheless, the authors of that proposal provide an implementation of the knowledge bases and a knowledge editor, distributed with a GNU license). Instead, it will be proposed to acquire all that knowledge from free linguistic resources and, above all, from diverse natural language sources. Regarding the latter, several crawlers will be set. On one hand, crawlers parsing the structural definition of diverse storages (relational scripts, json, etc.) and queries are focused. Those definitions are usually well organized and the labels meaningful, but provide little knowledge. On the other hand, crawlers parsing text and searching patterns (in natural language) used for describing concepts (related terms, and common relationships with

other concepts, etc.). These crawlers are set to parse and navigate through any docudoc mental base (sets of documents, hypertexts, etc.). And finally, a human-like human interaction sub-system system is set for acquiring this knowledge with collaborating users. While interacting, this sub-system system seeks lacks of knowledge across oss all dimensions related to the concepts occurring in the dialogue with the user. When found, the system will query the user about that knowledge. The style of query depends on whether the aimed knowledge is a completely new fact (eg, “I don’t know what you mean by ‘phone’.” ‘ or just “What is a phone?”); a disambigdisambi uation tion in a context (“With ‘frame’ you mean the ‘goalmouth’, right?”); right?”) or a reinforcement of a concept with weak reliability (“By ‘phone’ you mean a device with headset and keyboard, don’t you?”). you?”) This interactive sub-system system is currently implantimplan ed in a game, and is planned to be integrated in a console for querying a BigData implantation through Natural Language. Given the nature of the diverse sources, acquired knowledge would be of different relevancy levancy and, even worse, reliability. Because of this, it is compulsory to add proper mechanisms to supervise acquired knowledge, knowledge which will be the role of experts in this proposal as shown in fig. 1.

Fig. 1. BigSemData proposal

However, the amount of knowledge to be supervised is still too high, boosting the need for experts (supervisors) to a, again, unaffordable cost. Thus, it is proposed to supervise only most obscure cases (chosen by the system, through a set of rules). rule Therefore, facts coinciding on the same matter will be contrasted automatically, takta ing into account that reliability is not the same for every fact. When acquiring a canca didate fact, its initial reliability value will be calculated upon the confidence in the source,, also known as trustworthiness. When analyzing candidate facts, the reliability value will be reinforced (if facts agree) or weakened (if disagree) to some extent. Focusing the confidence in a source, the initial trustworthiness value is set depending on the nature of the source. But, as candidate facts are acquired from that specific source, the trustworthiness value evolves upon the success of those facts when being incorporated to the knowledge bases. Therefore, mechanisms for supporting that th evolution through reputation techniques are required.

Finally, as fore mentioned, supervisors have the last word in validating new knowledge or reviewing the already stored. In order to reduce costs, this task will be restricted to obscure cases of candidate facts, and the already stored facts causing some problem through use. There is other sort of stored facts requiring supervision (low-reliability facts requiring reinforcement), but this reinforcement will be left for high-reputation users interacting with the system. 3.3

BigSemData Crawler: loading of candidate facts

This crawler relies on a set of patterns able of extracting ontological knowledge for a given dimension from sentences in Natural Language. Using the proposal presented by [14] several patterns to detect hyponymy relationships between concepts were identified and implemented. As an example, Table 1 shows some lexico-syntactic patterns that can be used to detect the taxonomical relation. The first column indicates the pattern (NP stands for noun phrase), the second a sentence exemplifying the pattern and third the concepts hypernym/hyponym conforming the taxonomical relation. Pattern

Example

Such NP as {NP,}* {and|or} NP

Hyponym(NP1, NP2)

Such city as Madrid

Hyponym{city, Madrid}

NP {,} such as {NP,}* {and|or} NP

Countries as Spain

Hyponym{Spain, Country}

NP {,} including {NP,}* {and|or} NP

Pets including turtles

Hyponym{turtle, pet}

NP {,} specially {NP,}* {and|or} NP

Plants specially tulips

Hyponym{tulip, plant}

NP {,} {and|or} other NP

Jeans and other clothes

Hyponym{jeans, clothe}

Table 1. - Hearst Patterns with examples

One of the most relevant non-taxonomic relations between concepts is the part-of (or part-whole) one. Linguistic patterns have been extensively employed to express part-of relationships. As the same way as in [14], other patterns have been defined in order to discover part-of relationships between concepts. Some of the studies aimed to define meronymy relations within text are [6]. Table 2 presents some of the English patterns that represent a part-of relationship between two concepts. Pattern

Example

Meronym(NP1, NP2)

NP’s NP

Car’s engine

Meronym(engine, car)

NP of {the|a|an} NP

Screen of the computer

Meronym(screen, computer)

NP in {the|a|an} NP

Radio in a car

Meronym(radio, car)

NP of NPs

Speed of processors

Meronym(speed, processor)

NP in NPs

Cache in processors

Meronym(cache, processor)

NP have|has|had NP

plant has leaves

Meronym(leaf, plant)

NP come|comes|came with NP

Camera comes with lens cap

Meronym(lens cap, camera)

NP feature|features|featured NP

Camera features zoom

Meronym(zoom, camera)

Table 2. - Meronym Patterns

In the first column the pattern is defined, the second one contains co an example and the third the two concepts that conform the part-of part of relation (meronym and holonym). All the patterns described have been manually constructed from observations found in natural language texts. They represent domain-independent domain regular expressions sions which can potentially be used in any domain of knowledge. Analogously, through the analysis of specific corpus, there have been defined sets of patterns to extend the learning abilities to the rest of the dimensionss of the ontology. As an example, Table Ta 3 presents some patterns defined to populate the restrictive restrictiv and descriptive dimensions, dimensions respectively.

Table 3. Patterns including Constraints and Descriptions

3.4

Trustworthiness of Candidate Facts

In the state of the art has been discussed the identification and application of linguistic li patterns. Prior to apply these patterns, the trustworthiness of these sentences should be assessed. A criterion based only on the authorship or the organization responsible sponsible of the resource is not enough, due to the fact that not all the data have the same degree of reliability and because many of these organizations are unknown. In [23]] linguistic and semantic markers are used to assess the degree of confidence to get data from some natural language sentences. Some of these markers are: percentage of negations, percentage of quantifiers, percentage of verbs in conditional tenses, legibillity index, and use of specialized domain terms, which are often associated with specialized sp knowledge. A pattern to detect the co-occurrence co occurrence of bibliographic references with topics that overlook these facts can give a misleading result (eg, "Although "Although the work of Smith (1997) did not indicate that there should be almost no relationship between the entities name and Zipf's law, all subsequent works refute this assumption assum tion", if we identify (Smith, 1997) and Zipf's law as related our results will be wrong). These markers will be stored joined with the candidate facts to evaluate through the supervisor, what of them could belong to multidimensional ontology.

4

Conclusion and ongoing work

It has been presented a proposal of tools supporting several processes related to BigData system’s use and maintenance. Such tools are aimed to enrich the semantic scope of queries, endowing them with references to concepts instead of simple terms, which should lead to more complete and accurate results. The approach includes an ontological model, mechanisms for populate its knowledge bases, and metrics for fixing the reliability of the facts and trustworthiness of the sources. The complexity of the aimed system makes it hard to evaluate its contribution to the success and performance of the global system. Because of this, the proposal has been fragmented into independent parts, which performance can be compared to other similar components in the state of the art, and also increasing reusability and reducing development cost. The ongoing work is addressing the integration of the tools into some BigData sample systems. Such work will enable the evaluation of the whole approach (not only evaluating the tools, but the enhancement they provide in the targeted problem). As future lines, the interactive learning subsystem (acquiring facts through dialogue with users) has to be integrated into more systems, specifically, a bots able of querying a BigData base. However, when doing this, it will be intended to incorporate mechanisms enabling balance between the need for knowledge and the need of avoid annoying the user with too many questions. Preventing tiresome behavior should provide the chance of acquiring more knowledge at medium term, while abusing the user could lead to withdrawal.

5

References

1. Agichtein E., Gravano L. (2000). Snowball: Extracting Relations from Large Plain-Text Collections. ICDL. 2. Albacete E., Calle-Gómez J., Castro E., Cuadra D. (2012) Semantic Similarity Measures Applied to an Ontology for Human-Like Interaction. J. Artif. Intell. Res., 44: 397-421. 3. Antonioli N., Castanò F., Civili C., Coletta S., Grossi S., Lembo D., Lenzerini M., Poggi A., Savo D. F., Virardi E. (2013). Ontology-Based Data Access: The Experience at the Italian Department of Treasury. In CAiSE Industrial Track, pp. 9-16. 4. Assaf, A, Senart, A(2012) Data Quality Principles in the Semantic Web. IEEE Sixth International Conference on Semantic Computing, 226-229 5. Banko M., Cararella M., Soderland S., Broadhead M., Etzioni O. (2007). Open information extraction from the web. IJCAI. 6. Berland, M. C. (1999). Finding parts in very large corpora. 37th annual meeting of the Association for Comp. Linguistics on Computational Linguistics, pp. 57-64. Maryland. 7. Bizer, C, Boncz, P, Brodie, M, Erling, O (2012) The Meaningful Use of Big Data: Four Perspectives – Four Challenges. SIGMOD Record, V. 40 (4), 56-60. 8. Calle J., Castro E., Cuadra D., (2008). Ontological dimensions applied to Natural Interaction, ONTORACT: First International Workshop on Ontologies in Interactive Systems, Liverpool, UK, September, 2008, Human Computer Interaction Conference. 9. Cimiano, P. (2006). Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. New York: Springer-Verlag New York, Inc.

10. Eysenbach, G., Köhler, C. (2002) How do consumers search for and appraise health information on the world wide web? BMJ 324: 573-576. 11. Floridi, L. (2012). The search for small patterns in big data. The Philosophers' Magazine 59: 17-18. 12. Gil , Y., Artz, D. (2007) Towards content trust of web resources. Web Semantics: Science, Services and agents on the World Wide Web 5, 227–239. 13. Golbeck, J., Parsia, B., & Hendler, J. (2003). Trust networks on the semantic web. En: Cooperative Information Agents VII LNCS Volume 2782, 238-249. 14. Hearst, M. A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. 14th Conference on Computational Linguistics - Volume 2 (pp. 539-545). Nantes, France: Association for Computational Linguistics. 15. http://bigdata.csail.mit.edu/node/55. (Last accessed on 24-07-2014) 16. IBM (2012) Analytics Study: The real-world use of big data. IBM Corp. 17. Kilgariff, A. (2007). Googleology is bad science. Comp. Linguistics 3 (1), 147–151. 18. Labrinidis, A, Jagasish, H.V. (2012). Challenges and Opportunities with Big Data. Proceedings of the VLDB Endowment, v. 5, p. 2032-2033. 19. Lukoianova, T., Rubin, V.L.(2014) Veracity roadmap: is big data objective, truthful and credible? Advances in Classification Research Online, 24(1), 4-15. 20. Mintz, Bills, Snow, Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL09. 21. Morato, J., Fraga, A., Andreadakis, Y. , Sanchez-Cuadrado, S. (2012). Eight steps towards the socialisation of the Semantic Web, Int. J. Social and Humanistic Comp, 1,4, 347-362. 22. Morato, J., Sanchez-Cuadrado, S., Dimou, C., Yadav, D., Palacios, V. (2013). Evaluation of Semantic Retrieval Systems on the Semantic Web. Library Hi-Tech, 31 (4). 23. Morato, J.; Llorens, J., Genova, G., Moreiro, J. A. (2003). Experiments in discourse analysis impact on information classification and retrieval algorithms. Information Processing & Management, vol 39 nº 6 November 2003: 825- 851. 24. Moreno, A., Riaño, D., Isern, D., Bocio, J., Sánchez, D., & Jiménez, L. (2004). Knowledge exploitation from the web. Fifth International Conference on Practical Aspects of Knowledge Management, (pp. 175–185). Vienna, Austria. 25. Özdemir, V., et al. (2013). Crowd-Funded Micro-Grants for Genomics and “Big Data”: An Actionable Idea Connecting Small (Artisan) Science, Infrastructure Science, and Citizen Philanthropy. Omics: a journal of integrative biology 17.4: 161-172. 26. Pivk, A. C. (2007). Transforming arbitrary tables into logical from with TARTAR. Data and Knowledge Engineering 60 (3), 567–595. 27. Sánchez, D. M. (2008). Pattern-based automatic taxonomy learning from the Web. AI Community 21, 27-48. 28. Snow, Jurafsky, Ng. (2005). Learning syntactic patterns for automatic hypernym discovery. NIPS 17. 29. Urbano, J., Martín, D., Marrero, M. and Morato, J. (2011) Audio Music Similarity and Retrieval: Evaluation Power and Stability, 12th International Society for Music Information Retrieval Conference (ISMIR), 597-602. 30. Wathen, C. N., & Burkell, J. (2002). Believe it or not: Factors influencing credibility on the Web. Journal of the American Soc. for Inf. Science and Technology, 53(2), 134-144. 31. Weichselbraun, A. W. (2009). Discovery and evaluation of non-taxonomic relations in domain ontologies. Int. Journal of Metadata, Semantics and Ontologies 4 (3), 212–222. 32. Wu F., Weld D.S. (2007). Autonomously Semantifying Wikipeida. CIKM 2007