Improving  Wikipedia  with   DBpedia   Diego  Torres,  Pascal  Molli,  Hala  Skaf-­‐Molli  and  Alicia  Diaz   University  of  La  Plata,  ArgenEna   University  of  Nantes,  France   SemanEc  Web  CollaboraEve  Spaces  –SWCS2012     April,  17th  2012  -­‐  Lyon  

1  

Context   •  SemanEc  Web  is  growing  fast.  It  is  mainly   build  extracEng  informaEon  from  Social  Web.   –  Dbpedia  extracted  from  Wikipedia  infoboxes.   –  It  is  possible  to  make  semanEc  queries  and   deduce  new  data  

•  How  DBPedia  can  help  improve  the   Wikipedia  ?     •  More  generally,  How  semanEc  web  help  to   improve  social  web  ?   2  

Dbpedia  Query  Support   Retrieve  the  list  of  people  with  city  birth  place.   SELECT  ?city,  ?person     WHERE{   ?person  a  Person.   ?city  a  City  .   ?person  birthplace  ?city  }  

•  (Paris,  Pierre_Curie)   •  (Rosario,  Lionel_Messi)   •  (Boston,  Robin_Moore)   ...   •  119097  results  !  

3  

Checking  the  same  results  in  Wikipedia   NavigaEon  from  city  to  person  

•  (Paris,  Pierre_Curie)   OK   •  (Rosario,  Lionel_Messi)  OK   •  (Boston,  Robin_Moore)   X   ...  

Not  possible  to  navigate  from  Boston  to  Robin  Moore   But  Robin  Moore  is  born  in  Boston  !!  

In  49899  of  119097  is  NOT  possible  the  navigaEon  from  city  to   person.     Only,  65200  of  119097  cases  is  possible  navigate  from  the  city  to  the  person.   4  

General  Issues   •  There  is  an  informaEon  gap  between  DBpedia   and  Wikipedia.   •  If  I  want  to  repair,  What  are  the  Wikipedia   convenEons  I  have  to  follow?   –  If  I  am  good  Wikipedian  and  read  carefully     Wikipedia  convenEons1,  I  have  to  follow  this:   –  Boston/Category:Boston/ Category:People_from_Boston/Robin_Moore  

•  It  is  possible  to  discover  this  automaEcally?   1h`p://en.wikipedia.org/wiki/Wikipedia:CategorizaEon_of_people  

5  

Approach   •  Learn  from  the  good  cases  in  Wikipedia.   SELECT  ?city,  ?person     WHERE{   ?person  a  Person.   ?city  a  City  .   ?person  birthplace  ?city  }  

(Paris,  Piere_Curie)   }  (Rosario,  Lionel_Messi)   }  (Boston,  Robin_Moore)   ...   } 

OK   OK   X  

•  For  example,  for  people  and  birthplace  we  can  learn   from  the  65200  of  119097  cases.    

6  

Approach   •  Learn?   –  If  Wikipedia  a  DB,  what  is  the  query  in  Wikipedia   that  best  approximates  the  results  obtained  by  a   DBpedia  query  ?  

•  But  Wikipedia  is  not  a  Database  -­‐>  the  idea:   –  Index  the  concerned  fragment  of  Wikipedia  as  a   Path  Data  Base.   –  Next,  find  the  path  query  that  best  approximate   the  DBpedia  Query.   7  

Wikipedia  as  a  graph  DB  with  Path   Queries   •  Considering  Wikipedia  as  a  Graph-­‐DB,  I  want  to   ask  Path  Queries  [Abiteboul97]:   –  Retrieve  all  people  p  from  a  given  city  c.   –  PQ1(c,p)=c/Category:c/Category:  People_from_c/p  

•  We  make  the  hypothesis  that  the  shortest  path   query  that  maximally  contains  the  Dbpedia   results  is  the  best  expression  of  this  semanEc   relaEon  in  Wikipedia.   Path  Queries  Abiteboul  &  Vianu,  SIGMOD  97  

8  

Path  Indexing   •  Index  a  sub  graph  of  Wikipedia  given  the  nodes   resulEng  from  DBpedia  Query.   •  Collect  all  the  paths  that  links  source  and  target   with  DFS  algorithm.   •  Reduce  alphabet  by  “wildcarding”  properEes   linked  to  source  and  target.   •  Example:     –  Boston  /Category:Boston/Category:   People_from_Boston/Robin_Moore   –  #from/Category:#from/Category:   People_from_#from/#to  

9  

RSq(d,r)  

Path  

Path  Query  

(Paris,   Paris/  Category:  Paris  /   Pierre_Curie)   Category:People_from_Paris/ Pierre_Curie  

#from  /  Category:#from  /   Category:People_from_#from /  #to  

(Rosario,   Lionel_Messi )  

Rosario  /  Category:  Rosario  /   Category:People_from_Rosario/  Lionel_Messi  

#from/  Category:#from  / Category:People_from_#from/  #to  

Rosario  /  Lionel_Messi  

#from/  #to  

10  

Path  indexing  results   65200  

11  

EvaluaEon   •  Run  6  queries  in  DBpedia  and  calculate  the  path   query  index  running  PIA  (maxL=5).   –  Compute  Found,  Not  Found  and  Errors.     –  Compute  the  proporEon  of  generated  path  with  DFS   up  to  maxL  with  number  of  Path  Queries.  

•  Run  the  SCMPQ  on  Wikipedia   –  Analyze  the  path  query  in  funcEon  of  the  returned   values.   –  Compute  Precision  and  Recall.  

•  Community  ValidaEon   12  

EvaluaEon  –  Queries  in  DBpedia   #EQ1:  Cities  and  people  born  there.   SELECT  ?city,  ?person  WHERE{   ?person  a  Person.   ?city  a  City.   ?person  birthplace  ?city}     #EQ2:  Cities  and  philosophers  born   there.   SELECT  ?city,  ?philosopher  WHERE{   ?philosopher  a  Philosopher.   ?city  a  City.   ?philosopher  birthplace  ?city}     #EQ3:  Philosophers  born  in  France   SELECT  France,  ?philosopher  WHERE{   ?philosopher  a  Philosopher.   ?philosopher  birthplace  France}  

#EQ4:  Books  and  its  authors.   SELECT  ?book,  ?author  WHERE{   ?book  a  Book.     ?book  author  ?author  }     #EQ5:  Works  and  its  music   composer.   SELECT  ?musician,  ?work  WHERE{   ?work  a  Work.     ?work  musicBy  ?musician  }     #EQ6:  Cities  and  its   universities.   SELECT  ?city,  ?university  WHERE{   ?university  a  University.   ?city  a  City.   ?university  city  ?city}  

13  

Query  Results   Query  

Domain  

Range  

Number  of   pairs  in   DBpedia  

Pairs   where   exist  a  WP   path  

Paris   Synchroniz where  NOT   aEon   exist  a  WP   errors   path  

EQ1  

City  

Person  

119097  

65200  

49899  

3998  

EQ2  

City  

Philosopher   171  

103  

61  

7  

EQ3  

France  

Philosopher   21  

21  

0  

0  

EQ4  

Book  

Author  

24185  

20328  

3689  

168  

EQ5  

Work  

Musician  

1204  

836  

367  

1  

EQ6  

City  

University  

14094  

9497  

4404  

193  

14  

Query  

Path  

#  

Q1:CiEes-­‐ people  

#from/  Cat:from/Cat:People  from  #from/  #to  

34008  

#from/#to  

3188  

Q2:CiEes-­‐ philo  

#from  /Cat:from/Cat:People  from  #from/  #to  

60  

#from  /Cat:Capitals  in  Europe/Cat:#from/People  from  #from/#to  

15  

Q3:philo-­‐ France  

#from/  Cat:#from/Cat:French  people/Cat:French  people  by   occupaEon/French  philosophers/  #to  

21  

#from/  Cat:#from/  Cat:French  people/Cat:French  people  by   occupaEon/French  sociologists/  #to  

3  

Q4:Book-­‐ authors  

#from/#to  

19863  

#from/  Cat:#to/  #to  

119  

Q5:musician-­‐ work  

#from/  #to  

811  

#from/  Cat:Tony  Award  winners/   Cat:Tony  Award  winning  musicals  /  #to  

26  

Q6:ciEes-­‐ universiEes  

#from/#to  

6031  

#from/Cat:#from/  #to  

15  

1314  

Query  

Prop  

Precisi Recal Cont on   l   rib  

Rej ecte d  

Q1:CiEes-­‐ people  

#from/  Cat:from/Cat:People  from  #from/  #to  

0.415  

12  

Q2:ciEes-­‐ philo  

#from  /Cat:from/Cat:People   from/  #to  -­‐>    Sportspeople   0.003   0.58   •  People  from  E#dinburgh   from  Edinburgh   •  Dayton-­‐Kentucky  is  a  small  community"  and    “this  category   contain  opne   arEcle  and  have  li`le   possibility   #from/  Cat:#from/Cat:French   eople/   0.099   1   for  growth"   Cat:French  people    b   y  occupaEon/  

Q3:Philo-­‐ france  

0.52  

78  

French  philosophers/  #to   Q4:Book-­‐ authors  

#from/#to  

0.22  

0.97  

Q5:musici an-­‐work  

#from/  #to  

0.027  

0.97  

36  

0  

Q6:ciEes-­‐ #from/#to   universiEe s  

0.014  

0.63  

17  

1   16  

Conclusions  and  Further  Work   •  What  is  the  query  in  Wikipedia  that  best   approximates  the  results  obtained  by  a  DBpedia   query?   –  shortest  path  query  that  maximally  contains  the   semanEc  relaEon  expressed  by  the  semanEc  query  ?  

•  Preliminary  evaluaEons  with  real  data  are   encouraging   –  Precision  perEnent  meaningful  ?   –  Containment  is  perEnent  but  there  are  other  factors…   17  

Future  Work   –  ConEnue  social  validaEon  and  learn  from  social   feedback   –  Reduce  the  alphabet  with  be`er  use  of   properEes,  will  reduce  index  size   –  Use  overlapping  and  containsment  between  query   results   –  Improve  computaEons  Eme   –  Extends  to  all  relaEons  in  DBPedia  

 

18  

Obtained  Path  Index   #EQ1:  Cities  and  people  born   there.   SELECT  ?city,  ?person  WHERE{   ?person  a  Person.   ?city  a  City.   ?person  birthplace  ?city}  

207165   paths   8118  path  queries  

19  

Obtained  Path  Index   #EQ2:  Cities  and  philosophers  born   there.   SELECT  ?city,  ?philosopher  WHERE{   ?philosopher  a  Philosopher.   ?city  a  City.   ?philosopher  birthplace  ?city}  

267  paths   200  path  queries  

20  

Obtained  Path  Index   #EQ3:  Philosophers  born  in  France   SELECT  France,  ?philosopher  WHERE{   ?philosopher  a  Philosopher.   ?philosopher  birthplace  France}  

391  paths   191  path  queries  

21  

Obtained  Path  Index   #EQ4:  Books  and  its  authors.   SELECT  ?book,  ?author  WHERE{   ?book  a  Book.     ?book  author  ?author  }    

5634  paths   1801  path  queries  

22  

Obtained  Path  Index   #EQ5:  Works  and  its  music   composer.   SELECT  ?musician,  ?work  WHERE{   ?work  a  Work.     ?work  musicBy  ?musician  }    

183  paths   41  path  queries  

23  

Obtained  Path  Index   #EQ6:  Cities  and  its  universities.   SELECT  ?city,  ?university  WHERE{   ?university  a  University.   ?city  a  City.   ?university  city  ?city}    

26175  paths   6701  path  queries  

24  

Improving Wikipedia with DBpedia

build extracbng informabon from Social Web. – Dbpedia extracted from Wikipedia infoboxes. – It is possible to make semanbc queries and deduce new data.

742KB Sizes 1 Downloads 208 Views

Recommend Documents

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
applications, from manual indexing and browsing to automated natural language ..... Proc. of the International Conference on Web Intelligence. (IEEE/WIC/ACM ...

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
Department of Computer Science, University of Waikato. Private Bag 3105 ..... and 11,000 non-descriptors—and high degree of specificity. The version of ...

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
vs. colour), plurals (baby vs. babies), abbreviations (N.Y.C. vs. New York City), equivalent syntactic constructions (predatory birds vs. ..... Knowledge and Data.

Kwanzaa Wikipedia Booklet.pdf
Page 1 of 5. Kwanzaa 1. Kwanzaa. Kwanzaa. 2003 Kwanzaa celebration with its founder, Maulana Karenga, and others. Observed by African Americans, parts ...

Detecting Wikipedia Vandalism using WikiTrust
Abstract WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust ... or USB keys, the only way to remedy the vandalism is to publish new compilations — incurring both ..... call agaist precision. The models with β .... In: SI

Halloween Wikipedia Booklet.pdf
Halloween Wikipedia Booklet.pdf. Halloween Wikipedia Booklet.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Halloween Wikipedia Booklet.pdf.

Improving your ROI with Google Enterprise Search
high-quality enterprise search at 53 .4%; however, even a 10% reduction in .... on-premise and include hardware, software, product updates, support, and product ... Companies that have deployed the Google Search Appliance often report a ...

Improving Availability with Recursive Microreboots - Dependable ...
Conceding that failure-free software will continue eluding us for years to come, ...... can be achieved using virtual machines such as VMware, isolation kernels ...

Improving Compiler Heuristics with Machine Learning
uses machine-learning techniques to automatically search the space ..... We employ depth-fair crossover, which equally weighs each level of the tree [12]. ...... Code to Irregular DSPs with the Retargetable,. Optimizing Compiler COGEN(T). In Internat

Improving your ROI with Google Enterprise Search
scientists, or any employee who works with business information, finding, understanding ..... Unified results delivers results from a wide variety of sources – websites, news, video, and more .... network processes, we would have saved more ...

Improving Host Security with System Call Policies
Center for Information Technology Integration ..... where op is either equality or inequality and data a user ... code and that it operates only with benign data. Oth-.

Improving Natural Language Specifications with ...
We demonstrate by several examples from real requirements how an ... software development process saves time and improves the ... the application of formal specifications. ..... [28] T. Gelhausen, B. Derre, and R. Geiss, “Customizing GrGen.

Improving your ROI with Google Enterprise Search
equivalent to a stack of DVDs reaching halfway to Mars ... solutions . In this example, 50% of workers in a company are knowledge workers, each paid $150,000/year (fully loaded) . The chart shows savings on the total time spent on ... searching for i

Improving Sales Account Coverage with Artificial Intelligence
intelligence (AI) platform, Sales AI, that enables Intel to significantly scale its ..... we used a third-party crowd management platform that automated the process ...

Wikipedia: Friend, Not Foe
Tutorials instruct prospective edi- tors on structure, format, and style; discussion fo- rums for each article encourage debate about editorial choices; a complete ...

Chinese New Year Wikipedia Booklet.pdf
[4] Philippines,. [5][6] Vietnam, and also in. Chinatowns elsewhere. Chinese New Year is considered a major holiday for the Chinese and has had influence on.

Wikipedia - re-examining credibility
Mar 25, 2008 - contribute you must have a working internet connection and a web browser. It is helpful to ... year I forced myself on a one week wikipedia fast.

Entity-Relationship Queries over Wikipedia
locations, events, etc. For discovering and .... Some systems [25, 17, 14, 6] explicitly encode entities and their relations ..... 〈Andy Bechtolsheim, Cisco Systems〉.

Wikipedia: Friend, Not Foe - National Writing Project
s online research has become an in- creasingly standard activity for mid- dle school and high school students,. Wikipedia (http://www.wikipedia .org) has simultaneously emerged as the bane of many teachers who include research-focused assign- ments i

Languages of South Africa, Wikipedia, 2013
heterogeneous province, with roughly equal numbers of Nguni, Sotho and Indo-. European language speakers. This has resulted in the spread of an urban.

Detecting Wikipedia Vandalism using WikiTrust - CiteSeerX
Automated tools help reduce the impact of vandalism on the Wikipedia by identi- ... system for Wikipedia authors and content, based on the algorithmic analysis ...

New Year's Day Wikipedia Booklet.pdf
"(Do not) make vetulas, [little figures of the Old Woman], little deer or iotticos or set tables [for the house-elf,. compare Puck] at night or exchange New Year gifts ...

Improving Health Care Quality with the RAREEVENTS ... - SAS Support
The influential management consultant W. Edwards Deming advocated an ..... Of the 37 data points in the chart, 3 points exceed the UPL, accounting for just over ... The RAREEVENTS procedure in SAS/QC software creates rare events charts ...

Improving news quality and editing efficiency with big data
leader in cloud computing and big data solutions, Sugon helps the industry keep pace with new media developments through its XData* big data solution, ...