IADIS International Conference WWW/Internet 2012

LEARNING SYNONYM RELATIONS FROM FOLKSONOMIES Alex Rêgo12, Leandro Marinho2 and Carlos Eduardo Pires2 1

Instituto Federal de Educação, Ciência e Tecnologia da Paraíba – Campus Campina Grande - Av. Tranquilino C Lemos 671, Campina Grande, PB - Brasil 2 Departamento de Sistemas da Computação - Universidade Federal de Campina Grande - Av. Aprígio Veloso s/n, Campina Grande, PB – Brasil

ABSTRACT

Folksonomies such as Delicious, Flickr, and BibSonomy are now widespread with thousands of users using them daily to upload, tag, and retrieve on-line resources (e.g., web pages, photos, and videos). Tags are keywords freely chosen by users that reflect the vocabulary employed by these users to annotate resources. The automatic detection of semantic relations (e.g. synonym, hypernym, and hyponym) between tags may improve the ability of users in finding relevant resources, e.g., queries may be expanded with semantically related tags. Several works in the literature propose approaches to automatically detect semantic relations between tags in folksonomies. In general, these works adopt similarity/distance measures (e.g. edit distance, tag-tag co-occurrence, and cosine) as heuristics for semantic similarity detection. In this paper, we present a more principled approach that learns a pairwise decision model for predicting synonym relations between pairs of tags directly from folksonomy data. The features are extracted from pairs of tags through the application of similarity/distance measures that capture the different aspects of a synonym relation. We conduct a comprehensive set of experiments on a recent snapshot of BibSonomy, a real-world social tagging system, and show that our approach is up to 8.1% superior than the best performing baseline. KEYWORDS

Folksonomy, Machine Learning, Synonym Detection, Tag, Semantics.

1. INTRODUCTION Describing web resources (e.g., URL, video, and photo) through tags has become a popular practice among users of Folksonomies. Although the primary goal of tags is to help individual users to organize and retrieve their own content, the democracy promoted by folksonomies can cause undesired side effects which are usually related to the following issues: (i) semantics: there is little or no synonymy/polysemy control in social tagging systems; (ii) linguistics: existence of orthographic errors, lack of term conventions (e.g. proper nouns capitalized), multiple idioms, and tags comprising compound words; and (iii) cognitive: heterogeneity in users’ tagging vocabulary. In this work, we are concerned with the problem of detecting synonym relations between tags, which in folksonomies, besides semantics, may also appear as a linguistic or cognitive issue. The problem is motivated by the fact that disregarding semantic relations may decrease the chances of users in finding the resources of interest. For instance, if a given user is looking for a specific book using the tag Information Retrieval but the book is annotated with the tag IR, the user will not find it, although IR is a typical acronym for Information Retrieval and thus both tags should be regarded as the same. Although linguistic problems have been largely studied by the IR community, most solutions need to be adapted when applied to a folksonomy. Natural Language Processing (NLP) techniques are not suitable for detecting synonyms in folksonomies since tags are, in most of cases, isolated words and resources can assume various formats such as images (Flickr1), audio (Last.fm2), and videos (YouTube3). 1 2 3

http://www.flickr.com http://www.last.fm http://www.youtube.com

273

ISBN: 978-989-8533-09-8 © 2012 IADIS

Several works have proposed heuristics to automatically identify semantic relations between tags with the aim of improving the user ability in retrieving resources of interest in folksonomies. In Clements et al. (2008), the authors postulate that tags usually applied to the same resource tend to be synonymous, even if the tags are provided by different users. This hypothesis is exemplified using the pair of tags color (American) and colour (British). The similarity between two tags is measured by computing Pearson correlation on distribution profile vectors of user-tag (most prominent tag users) and item-tag (most relevant items to a tag). Since the paper was published as a short paper, no more details about the experiments and the evaluation method are available. A holistic approach for discovering synonym, homonym and hierarchical relations in folksonomies is introduced in Dattolo et al. (2011), in which different heuristics are applied on a Delicious dataset. For synonym detection, the authors used stemming tools, normalized edit distance and synonym searches in WordNet, a well-known lexical database for the English language. Experiments have shown that the cosine measure is able to identify synonymy relations. In another work, also using a Delicious dataset, Cattuto et al. (2008) analyzed several similarity measures and demonstrated that the cosine measure based on tag-tag and tag-resource co-occurrence counts can be used to identify synonymy relations. Solskinnsbakk and Gulla (2011) combined edit distance, to identify syntactic similarity (morphological variations), with the cosine measure to recognize synonyms or near-synonyms in folksonomies. Results showed that the cosine similarity is able to recognize more complex types of semantic relations (e.g. associations and abstractions, that are generalization and specialization of tags), but not necessarily synonyms. In the TagPlus system (Lee and Yong, 2007), the search for images on Flickr is extended using synonyms of a input tag, retrieved on WordNet. However, unwanted images may be retrieved due to lack of homonym control. For instance, at image entry time in Flickr, users do not distinguish the semantics of tags. Thus, it is not simple to determine which domain the tag belongs to. Our approach is similar to the one presented by Spiliopoulos et al. (2010) for ontology alignment, but the purpose is focused on folksonomies. Actually, we borrow ideas from the field of object identification (or record linkage), where the aim is to learn equivalence classes of objects that are the identical but may have different descriptions (Rendle and Schmidt-Thieme, 2006; Kopcke and Rahm, 2010). Although the works mentioned above are relevant to identify semantic similarity between tags in folksonomies, the similarity measures alone do not provide empirical or theoretical guarantee that they are indeed suitable for correctly identifying synonymy. We, on the other hand, take a more principled approach where the synonym relation between tags is learned directly from the folksonomy data. The features are extracted from pairs of tags using similarity/distance measures that reflect different aspects of synonym. We conduct a comprehensive set of experiments on a recent snapshot of the real-world social tagging system BibSonomy and show that our approach is superior to the purely heuristic-based approaches available in the literature. Our contributions are summarized as follows: 1) We investigate synonym detection in folksonomies under a perspective of machine learning and formulate the problem as a pairwise decision model; 2) We propose a generic pairwise classification model based on feature extraction from pairs of tags using similarity/distance measures applied on tag profile vectors; and 3) We evaluate the proposed model over a large-scale snapshot of BibSonomy. Experimental results show that our approach outperforms the best performing baseline by 8.1% in some of the evaluated sample datasets.

2. PROBLEM SETTING Folksonomies are the underlying structures of social tagging systems and consist basically of a set of users U, tags T, resources R, and tag assignments Y, i.e., a set of user/tag/resource triples. They result from the practice of collaboratively creating and managing tags to annotate and categorize content. For more details, a formal definition of folksonomies can be found in Hotho et al. (2006). We cast the problem of detecting synonym relations in folksonomies as a classification problem. The idea is to learn a pairwise classification function of the form

y : T 2 o {, }

that classifies any given pair of tags as positive, if the tags are synonyms; or negative, otherwise.

274

IADIS International Conference WWW/Internet 2012

2.1 Classification for Synonym Detection in Folksonomies In machine learning, classification is defined as follows. Let X be any set called feature space. Given a set of training examples of the form Dtrain Ž X x Y, where Y is a discrete and finite set called target space, it is defined a prediction function yˆ : X o Y that minimizes the error on the test set Dtest Ž X x Y, not available during training), i.e., 1 err( yˆ ; D test ) : ¦ ( y, yˆ ( x)) | D test | ( x , y )D test

test , the misfit : Y u Y o ฀ is a loss function measuring, for any test instance(x,y) D ˆ( y x ) . Figure 1 summarizes our approach. In order to apply between the true y and the predicted value

Note that

classification for detecting synonym tags in folksonomies, we first need to extract feature vectors from pairs of tags, i.e., f : T 2 o ฀ n . To this end, the tags’ features are compared (cf. Section 2.2) using several similarity/distance measures. Note that the training data is now composed by a set of feature vectors X , where each feature vector is labeled as positive or negative, i.e., Y := { + , - }, indicating whether or not two tags are synonyms, respectively (step 1, Fig. 1). The classification task is defined as usual with the difference that now, for any pair of distinct tags (t, t’)  T2, we want to learn a pairwise classification function (step 2) of the form yˆ : f (t , t c) o Y that predicts new instances on test set (step 3) as positive or negative (step 4).

Figure 1. Classification for detecting synonym tags in folksonomies.

2.2 Feature Extraction We propose to extract features from pairs of tags available in a dataset through the application of similarity/distance measures. Our choices for these measures, presented in the following, are supported by related work where those measures were also used to capture semantic relations between tags, in particular synonym relations.

2.2.1 Edit Distance (ED) The Edit Distance (also known as Levenshtein Distance) is a well-known string metric in which the distance between two strings is computed in terms of insertions, deletions and substitutions of characters needed to transform a string into the other (Damerau, 1964). The ED is particularly suitable for detecting synonymy cases that occurs due to small variations of the same word (e.g. work and works) and addition of special symbols (e.g. case_study and casestudy). In order to suppress the eventual differences in length between the compared strings, we use a normalized version here denoted as EDnorm. In EDnorm, the distance between (t, t)  T is divided by the maximum length of the tags t and t’.

275

ISBN: 978-989-8533-09-8 © 2012 IADIS

2.2.2 Tag-tag co-occurrence The count of co-occurrence between tags has been used by various authors for measuring the semantic similarity between tags (Dattolo et al., 2011; Cattuto et al., 2008; Clements et al., 2008; Gemmel et al., 2009). The underlying idea is that tags that have often been associated with the same resources are likely to be related. In this paper, we start by adopting the same principle introduced by Cattuto et al. (2008) where the count of co-occurrences between two distinct tags t and t’ is defined as the number of posts (a post corresponds to the set of tag assignments of a user for a given resource) in which they co-occur, i.e., tag-tag post (t , t c) : |{(u , r )  U u R | t , t c  Tur }|(1)

tag-tag res (t , t c) : |{r  R | t , t c  Tr } | (2)

Alternatively, we can count the tag co-occurrences by resource as follows:

tag-tag post (t , t c) (3) max ( t ,t cT :t zt c) tag-tag post (t , t c)

To keep all the feature values in the range [0; 1], we use a normalized version of these measures:

tag-tag norm (t , t c)

The normalized version for Eq. 2 is defined analogously, replacing the function tag-tagpost by tag-tagres.

2.2.3 Cosine Similarity Cosine is a mathematical measure widely used in Information Retrieval for quantifying the similarity between document vectors. We can employ the same principle for computing similarities between pairs of tags. To compute the cosine similarity between two distinct tags t and t’ we apply the following equation:

& & cos(t , t c)

&& t ·t c & & (4) 䇸t 䇸䇸 · t c䇸

where t and t c are profile vectors for tags t and t’, respectively, . denotes the dot product, and the vector norm. Note that cosine values range from 0 (maximum dissimilarity) to 1 (maximum similarity). We have adopted two different kinds of vector components: (i) the count of tag-tag co-occurrences by post and (ii) the count of tag-resource co-occurrences. In Cattuto et al. (2008) the authors showed that high values of cosine between tag profile vectors whose components are (i) or (ii) tend to indicate synonymy. (i) Tag-tag Co-occurrence by Post. Each vector component corresponds to the tag-tag co-occurrence count according to Eq. 1. Thus, for a given tag t  T , its profile vector is composed by & t : (tag-tag post (t , t ), tag-tag post (t , t c),}, tag-tag post (t , t|T | )) (5) &

&

t  T is used to annotate a certain resource r  R, i.e., tag-res(t , r ) : |{u  U |(u , t , r )  Y } | . Therefore, the profile vector of a given tag t  T is composed by & t : (tag-res(t , r1 ), tag-res(t , r2 ), }, tag-res(t , r|R| )) (6)

(ii) Tag-Resource Co-occurrence. Each vector component corresponds to the counting of how often a tag

3. EVALUATION METHODOLOGY For evaluation of our approach, we have used a snapshot of the social bookmark and publication management system BibSonomy, from July 1st, 2011. BibSonomy is the only folksonomy that provides its data publicly on the web for research purposes, thus other researches can use the same data to reproduce the experiments. We considered only the 10,000 most popular tags and the resources/users that have been associated with at least one of those tags. Special symbols such as commas, semicolons, and brackets were removed. All tags were normalized to lowercase. The tag imported was removed since it is automatically assigned by BibSonomy when importing resources from other systems, e.g., from the DBLP Computer Science Bibliography. Finally, we considered only the tags that could be found in WordNet since this dictionary was used to label the training instances. The characteristics of the cleaned dataset consist of: #users=4,800; #resources=338,492; #tags=3,177 tags, and #assignments=722,999.

276

IADIS International Conference WWW/Internet 2012

3.1 Instance Labelling After the features are extracted, instances are labeled as positive or negative by checking whether or not the tags representing each instance are defined as synonyms in WordNet. This task is performed as follows: given two tags t, t’  T with t  t, we submit a query to WordNet using t. As a result, WordNet returns all the synsets (sets of synonyms that represent the concept, which in this case is represented by tag) associated with t. The pair (t, t’) is labeled as positive if t’ is contained in any of the retrieved synsets; or negative, otherwise.

3.2 Class Imbalance Most learning-based methods work under the assumption that a classifier will operate in the same distribution as training. However, when we have a severe class imbalance in the training dataset, as we do in the dataset used in this paper, the classifier is biased to predict the majority class (Drumond and Holte, 2005). To exemplify, after the data preparation phase, we ended up with 2,210 positive and 859,854 negative instances, respectively. This is known as the class imbalance problem and fortunately there are several approaches in the literature for addressing this issue (Tomek, 1976; Kubat and Matwin, 1997; Chawla et al., 2002). There are essentially two basic approaches: oversampling, which balances the training set through the random replication of minority class instances, and undersampling, which randomly eliminates instances of the majority class. For performance reasons, we have adopted undersampling considering different proportions of positive/negative instances, i.e., the number of positive instances were kept fixed while the number of negative instances removed varied according to the proportions (in percentages): 10+/90-, 20+/80-, 30+/70-, 40+/60- and 50+/50-. The basic idea is to evaluate the compared methods under different levels of class imbalance, including the scenario where the training dataset is perfectly balanced.

3.3 Baselines and Evaluation Metrics As baselines we considered the normalized edit distance (EDnorm), normalized tag-tag co-occurrence by post (tag-tag), cosine based on vectors of tag-tag co-occurrence counts (costag-tag ), and cosine based on vectors of tag-resource co-occurrence counts (costag-res) (cf. Section 2.2). Although most of the related work adopt some of these measures for detecting semantic relations between tags in folksonomies, to the best of our knowledge this is the first work providing a quantitative evaluation of these measures for the synonymy detection problem between tags. 2 For a given pair of tags t,t’ T and a given similarity measure s : T o ฀ , we assign a positive class to (t,t’) iff s(t, t’) • thr, where thr is a pre-specified threshold for each baseline; and negative, otherwise. Note that if we have a distance function d instead, then positive is assigned to (t, t’) iff s(t, t’) ” thr; and negative, otherwise. Now, Recall, Precision, and F-measure can be computed as follows: Precision·Recall TP TP Precision (7); Recall (8); F-measure 2· (9) TP  FP TP  FN Precision  Recall where TP , FP and FN denote True Positives, False Positives and False Negatives, respectively. Note that in the case of a similarity measure, a true positive between two tags t, t’ T will occur when s(t, t’) • thr and t,t’ are in the same synset, or s(t, t’) < thr and t,t’ are in distinct synsets according to WordNet. This works the other way around for distance measures. Finally, we used 2/3 of each sample for training and the remaining for test. The best threshold for each similarity/distance measure was estimated through a greedy search over a validation set containing 1/3 of the training instances for each sample dataset.

4. EXPERIMENTAL RESULT AND DISCUSSION Initially, we performed a qualitative analysis in some tags randomly selected from the test dataset and their top-5 most related tags with respect to each baseline. As expected, the edit distance is able to capture many of

277

ISBN: 978-989-8533-09-8 © 2012 IADIS

the synonym cases related to small lexical variations, such as study and studies. For the other measures, even though they are able to capture some different cases of synonymy, as already pointed out by Cattuto et al. (2008) and Clements et al. (2008), in most of the cases they capture other semantic relations but not necessarily synonym relations (e.g. web and semantic or web and ajax). In order to identify the best features to be used by the classifier, we evaluated all the possible subsets of features reported on Section 2.2 (each subset containing at least two features) in the validation set and selected the ones that provide the best results in terms of F-measure for each partition. As classifier, we have used the well-known C4.5 algorithm (Quinlan, 1993), a decision tree-based classifier, whose implementation is freely available in the Weka data mining tool. In principle, we could have used any classifier, but the claim we want to prove is not which classifier performs best among the ones available in the literature, but that a supervised learning approach pays off. The edit distance is the only feature present in all the best subset of features, which is in line with the observation that this is the best performing individual feature, and thereby highly relevant for the classification task. Table 1 shows the F-measure values over all the sample datasets for all the compared methods according to their estimated thresholds. The best heuristic is ED, followed by costag-tag, costag-res, and tag-tag. As observed, our approach outperforms the best heuristic in all partitions. Particularly, in the samples (10-90) and (30-70) in which the highest improvements were 7.3% and 8.1%, respectively. Notice that the F-measure of all methods increases monotonically toward more balanced datasets, achieving their best result when the data is perfectly balanced (50-50). Table 1. F-measure value over all samples. The last column shows the rate of improvement in comparison to the best performing baseline Sample 10-90 20-80 30-70 40-80 50-50

ED 0.564 0.585 0.640 0.666 0.711

costag-tag 0.261 0.375 0.449 0.515 0.566

costag-res 0.300 0.385 0.423 0.444 0.458

tag-tag 0.268 0.340 0.386 0.401 0.412

C4.5 0.605 0.627 0.692 0.697 0.735

% improvement 7.3% 7.2% 8.1% 4.6% 3.4%

In general, the experiments suggest that the feature with the highest contribution is the edit distance, being responsible, alone, for the great majority of correctly classified instances. Most synonymy cases found in the training dataset occur due to small lexical variations between tags that are easily captured by ED, which explains the high F-measure value achieved by ED in the experiments. It is worth mentioning that this might not be the case for other domains. The other features seem to have a lower impact in the results, although they are still important for capturing the less trivial cases of synonymy. For example, Table 2 shows the list of synonym tags correctly predicted by our approach. Table 2. Supervised learning – predicted synonyms. Tag study color video web hack build image TV

Synonyms learn, reading, studies, survey colour, colors television, videos, TV www, net hacks, hacker, hacking make, building icon, icons, imaging television, video

As we can see in Table 2, our approach is able to capture the following typical cases of synonymy in folksonomies: 1) Grammatical number: tags can be found in a singular or plural form, such as hack and hacks, for example; 2) Noun/Verb inflections: the variations due to nouns and verb inflections produce tags such as build and building, that are similar terms; 3) User vocabulary: tag assignments sometimes depend on the vocabulary preferences of the user. For example, some users may tag a document about a web tutorial as web while others may prefer to use the tag www; 4) Multilingual tags: some words may be spelled differently according to the idiom used. Sometimes variations occur within the same language, e.g., color

278

IADIS International Conference WWW/Internet 2012

(American English) and colour (British English); 5) Non trivial cases of synonymy: distinct words may have the same meaning, as the tags study and survey, or image and icons, for example, and 6) Acronyms: the existence of acronyms is normally seen as an ambiguity problem in folksonomies, but also brings evidences of synonymy. For instance, both the tags TV and television usually refer to the same thing. Table 3 reports the averaged F-measure of C4.5 for 10 samples of each proportion of positive/negative examples. In this case, we repeated the undersampling process 10 times for each proportion of negative/positive instances. The small standard deviation observed indicates that the method is robust against different samples of negative instances. Table 3. Averaged F-measure of C4.5 per sample and standard deviation. Sample 10-90 20-80 30-70 40-80 50-50

F-measure 0.605 0.627 0.692 0.697 0.735

Standard Deviation 0.015772 0.016342 0.016057 0.014401 0.021141

5. CONCLUSION In this paper, we have proposed a supervised learning approach to address synonym detection between tags in folksonomies. The main idea is to learn a pairwise classification function that classifies any given pair of tags as positive, if the tags are synonyms; or negative, otherwise. The main advantage of our approach is that since the model is generated, it can be reused and applied for other datasets. Besides, the proposed approach is more principled because the synonym relation between tags is learned directly from folksonomy data, where features are extracted from pairs of tags using similarity/distance measures commonly used by related work, which reflect different aspects of synonymy. We evaluate the proposed model over a large-scale snapshot of BibSonomy and show that our approach is superior to the purely heuristic-based approaches currently used in the literature. We also observed a severe class imbalance between positive and negative instances. This was somewhat expected since, intuitively, the probability of two tags not being synonyms is much higher than the other way around. For addressing this issue, we performed undersampling on the majority class instances for different proportions of positive/negative instances, including high levels of imbalance, such as 90% of negative instances, and no imbalance at all, such as when the number of positive instances equals the number of negative ones. For all these levels, the classifier performed better than the baselines. In our experiments, the best individual performing feature was edit distance, whose results were largely superior to all the other features. This indicates that most of the synonym cases present in the test set were due to small lexical variations. The other features did not seem to have a great impact on the results, which encourages us to investigate more informative features in the future in combination with edit distance. It is worth noticing that assembling the training data can be costly, depending on the complexity of the similarity/distance functions employed for the feature extraction. Once the training data is given, the complexity will depend solely on the classifier chosen. In future work, we plan to address scalability issues and investigate the performance of more complex classifiers, with the aim of improving the results achieved in the present paper. We are aware that a thesaurus like WordNet is limited because it is language-dependent and its vocabulary does not evolve as fast as a multilingual community-specific terms in a folksonomy. However, as commonly explored in the literature (Laniado et al., 2007; Benz et al., 2009; Andrews and Zaihrayeu, 2011; Cattuto et al., 2008; Lee and Yong, 2007), it is necessary to provide a semantic grounding on the mapped tags. Therefore, we intend to extend the ground truth in several ways. First, we plan to use a multilingual thesaurus, since in folksonomies users can provide tags written in several idioms. The problem is that by using WordNet alone as we do right now, (i) pairs of multilingual synonym tags are labeled as negative in the training data, and (ii) pairs of multilingual tags that are correctly identified as synonyms can be miscounted as false positives. We believe that this modification will increase the number of positive cases on the training data since instances with synonym tags such as gratis (Portuguese) and free (English) will be labeled with (+). Second, we plan to augment the number of positive training instances through the use of linguistic tools.

279

ISBN: 978-989-8533-09-8 © 2012 IADIS

This will probably increase the number of tags successfully matched against terms of these thesaurus (e.g., stemming-based matching). We also aim to perform more experiments using different and larger datasets in order to make possible settings and obtain the best model generalization. Finally, we plan to use the same general approach to identify other, and more complex, types of semantic relations, particularly the hierarchical ones.

REFERENCES Andrews, P. and Zaihrayeu, I., 2011. Sense induction in folksonomies. Workshop on Discovering Meaning On the Go in Large Heterogeneous Data, Barcelona, Spain. Benz, D. et al., 2009. Characterizing semantic relatedness of search query terms. Proceedings of the 1st Workshop on Explorative Analytics of Information Networks, Bled, Slovenia. Cattuto, C. et al, 2008. Semantic grounding of tag relatedness in social bookmarking systems. Proceedings of the 7th International Conference on The Semantic Web. Berlin, German, pp. 615–631. Chawla, N. et al., 2002. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, Vol. 16, pp. 321–357. Clements, M. et al, 2008. Detecting synonyms in social tagging systems to improve content retrieval. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, USA, pp. 739–740 Damerau, F., 1964. A technique for computer detection and correction of spelling errors. Communications ACM, Vol. 7, pp. 171–176. Drummond, C. and Holte, R., 2005. Severe class imbalance: Why better algorithms aren’t the answer”. Proceedings of the 16th European Conference of Machine Learning. Porto, Portugal, pp. 539–546 Dattolo, A. et al, 2011. An integrated approach to discover tag semantics. Proceedings of the 2011 ACM Symposium on Applied Computing. New York, USA, pp. 814–820. Gemmell, J. et al, 2009. The impact of ambiguity and redundancy on tag recommendation in folksonomies. Proceedings of the third ACM conference on Recommender Systems. New York, USA, pp. 45–52. Hotho et al., 2006. Information retrieval in folksonomies: Search and ranking. The Semantic Web: Research and Applications, Vol. 4011, pp. 411–426. Kopcke, H. and Rahm, E., 2010. Evaluation of entity resolution approaches on real-world match problems. VLDB Endowment, Vol. 4, No. 1-2, pp. 484–493. Kubat, M. and Matwin, S., 1997. Addressing the curse of imbalanced training sets: One-sided selection. Proceedings of the International Conference on Machine Learning, Oregon, USA, pp. 179–186. Laniado, D. et al., 2007. Using wordnet to turn a folksonomy into a hierarchy of concepts. Semantic Web Application and Perspectives - Fourth Italian Semantic Web Workshop, Bari, Italy, pp. 192–201. Available: Lee, S. and Yong H. S., 2007. Tagplus: A retrieval system using synonym tag in folksonomy. Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering. Washington, USA, pp. 294–298. Quinlan, J., 1993. C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco, USA. Rendle, S. and Schmidt-Thieme, L., 2006. Object identification with constraints. Proceedings of the Sixth International Conference on Data Mining. Washington,USA, pp. 1026–1031. Solskinnsbakk, G. and Gulla, J. A., 2011. Mining tag similarity in folksonomies. Proceedings of the 3rd international workshop on Search and mining user-generated contents. New York, USA, pp. 53–60. Spiliopoulos, V. et al., 2010. On the discovery of subsumption relations for the alignment of ontologies. Elsevier Science Publishers, Vol. 8, pp. 69-88. Tomek, I., 1976. Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 7, No 2, pp. 679–772.

280

learning synonym relations from folksonomies

Detecting synonyms in social tagging systems to improve content retrieval. Proceedings of the. 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York,. USA, pp. 739œ740. Damerau, F., 1964. A technique for computer detection and correction of spelling errors.

341KB Sizes 0 Downloads 173 Views

Recommend Documents

Synonym set extraction from the biomedical literature by lexical pattern ...
Mar 24, 2008 - Address: National Institute of Informatics, Hitotsubashi 2-1-2, Chiyoda-ku, Tokyo, 101-8430, Japan ... SWISS-PROT [4]. General thesauri, such as WordNet, give relatively poor coverage of specialised domains and the- sauri often do not

Synonym Match Up.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Synonym Match ...

Synonym & Antonym Game.pdf
Page 1 of 10. Page 1 of 10. Page 2 of 10. Page 2 of 10. Page 3 of 10. Page 3 of 10. Page 4 of 10. Page 4 of 10. Synonym & Antonym Game.pdf. Synonym ...

Spring Synonym Sort Freebie.pdf
resources at my TPT store: 2nd Grade Snickerdoodles. You can follow my blog at. http://2ndgradesnickerdoodles.blogspot.com/. Please email me if you have ...

man-14\synonym-for-spec.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. man-14\synonym-for-spec.pdf. man-14\synonym-for-spec.pdf. Open. Extract.

Redescription disembeds relations: Evidence from ...
a passage of text that describes the spatial layout of a scene results in a mental representation of that scene ..... the participants saw a small map on the right-hand side of the screen that showed the positions of their train and the ..... similar

Learning from Perfection
The ideological objection against adding human defined base features often leads to machine .... Van Belle attempted to apply genetic algorithms to checkers endgame databases, which proved to be unsuccessful. Utgoff developed the. ELF learning algori

Learning from Streams
team member gets as input a stream whose range is a subset of the set to be ... members see the same data, while in learning from streams, each team member.

Unsupervised Learning of Semantic Relations for ...
including cell signaling reactions, proteins, DNA, and RNA. Much work ... Yet another problem which deals with semantic relations is that addressed by Craven.

Context Sensitive Synonym Discovery for Web Search ...
Nov 6, 2009 - Playstation 3" and ps3" were not syn- onyms twenty years ago; snp newspaper" and snp online" carry the same query intent only after snpon- line.com was published. Thus a static synonym list is less desirable. In summary, synonym discove

Facilitation of learning spatial relations among locations ...
movement of the mouse changed the view in a 360º sphere within the virtual environment), (3) auditory feedback indicated movement within the environment ...

Facilitation of learning spatial relations among ... - Springer Link
Sep 24, 2009 - Received: 20 July 2009 / Revised: 10 September 2009 / Accepted: 14 September 2009 / Published online: ... ronment or interactive 3-D computer-generated virtual ... learning spatial relations among locations by visual cues.

Unsupervised Learning of Semantic Relations between ...
pervised system that combines an array of off-the-shelf NLP techniques ..... on Intelligent Systems for Molecular Biology (ISMB 1999), 1999. [Dunning, 1993] T.

synonym and antonym word list pdf
pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. synonym and antonym word list pdf. synonym and antonym word list pdf.

Synonym-based Query Expansion and Boosting-based ...
large reference database and then used a conventional Information Retrieval (IR) toolkit, the Lemur toolkit (Lemur, 2005), to build an IR system. In the post-.

Towards deriving conclusions from cause-effect relations
Department of Computer Science ... a central aim of the special sciences. ..... aand P5) we may understand causal literals in the top part of the program as a ...

Tensor relations from bivector field equation.
This contains a somewhat unstructured collection of notes translating between .... This is metric independent with this bivector based definition of Fµν, and. Fµν.

pdf-0720\cross-language-relations-in-composition-from-southern ...
Try one of the apps below to open or edit this item. pdf-0720\cross-language-relations-in-composition-from-southern-illinois-university-press.pdf.

pdf-0979\american-intergovernmental-relations-fourth-edition-from ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. pdf-0979\american-intergovernmental-relations-fourth-edition-from-brand-cq-press.pdf. pdf-0979\american-inte

CorrActive Learning: Learning from Noisy Data through ...
we present the past work done on modeling noisy data and also work done in the related .... suggest the utility of the corrActive learner system. 4.3 Synthetic experiments ... placing the human with an autmoatic oracle that has the true labels.