Prediction of Thematic Rank for Structured Semantic Role Labeling Weiwei Sun/孙薇, Zhifang Sui and Meng Wang † Institute ‡ Key

of Computational Linguistics, EECS, Peking University Laboratory of Computational Linguistics, Ministry of Education, China

Overview I

Thematic hierarchy theory argues that there exists a language independent rank of possible semantic roles.

I

Thematic hierarchy establishes priority among arguments with respect to their syntactic realization.

I

Thematic rank between arguments can be accurately identified by using syntax clues.

I

Strong dependencies among arguments → globally assign semantic roles

I

To import structural information, re-ranking technique is emplied to incorporate thematic rank information into local classification results.

I

Experimental results show that prediction of thematic rank can help semantic role classification.

Linguistic Basis

Subject selection rule of Fillmore’s Case Grammer If there is an A [=Agent], it becomes the subject; otherwise, if there is an I [=Instrument], it becomes the subject; otherwise, the subject is the O [=Object, i.e., Patient/Theme]. I

John broke the window with a hammer.

I

A hammer broke the window.

I

The window broke.

Linguistic Basis Subject selection rule of Fillmore’s Case Grammer If there is an A [=Agent], it becomes the subject; otherwise, if there is an I [=Instrument], it becomes the subject; otherwise, the subject is the O [=Object, i.e., Patient/Theme]. I

A thematic hierarchy is a language independent rank of possible semantic roles, which establishes prominence relations among arguments.

I

The thematic hierarchy theory argues that thematic ranks of semantic roles affect their syntactic realization.

I

Thematic hierarchies can help to construct mapping from semantics to syntax.

Problems in Modeling Thematic Hierarchy of PropBank Roles 1. Predicates of PropBank do not share the same list of semantic roles. I

I

There are six semantic role types in the label set, which are tagged as Arg0-5. There is no consistent meaning of the six labels. Arg3 for rise.01 is Location, whereas Arg3 for order.02 is Source.

2. Although there is general agreement that the Agent should be the highest-ranking role in a hierarchy, there is no consensus over hierarchies of the remaining roles in the therotical discussion. I

For example, the Patient occupies the second highest hierarchy in some linguistic theories but the lowest in some other theories.

Ranking Arguments in PropBank I

I

We take into account the proto-role theory to rank PropBank roles. There are three key points in our solution: 1. The rank of Arg0 is the highest: Arg0Argi(i > 0). 2. The rank of Arg1 is the second highest or the lowest: Arg1Argi(i > 1) Vs ArgiArg1. 3. We do not rank other arguments. That means we take equivalence relation among other roles.

I

Two sets of roles closely correspond to numbered arguments: 1. referenced arguments (R-A*) 2. continuation arguments (C-A*)

I

To adapt the relation to help these two kinds of arguments, the equivalence relation is divided into six sub-categories.

Ranking Arguments in PropBank (cont’d)

I

Arg0 is generally the argument exhibiting features of a prototypical Agent;

I

Arg1 is a prototypical Patient.

I

The Agent is almost without exception the highest role in proposed hierarchies. As being the proto-Agent, the rank of Arg0 is higher than other numbered arguments

I

Both rank of Arg1 are tested on PropBank data.

I

A majority of thematic hierarchies take an equivalence relation among Source, Goal, Locative, and etc. This kind of roles are usually labeled as from Arg2 to Arg5.

Ranking Arguments in PropBank (cont’d)

Figure: The Hasse diagrams of hierarchies.

Thematic Rank Prediction I

Assigning different labels to different rank relations, we formulate the prediction of thematic rank between two arguments as a multi-class classification task.

I

Let A denote the set of arguments, R denote rank relation set. Given a score function STH : A × A × R 7→ R, the relation r is recognized in argmax flavor: ˆr = r ∗ (ai , aj ) = arg max STH (ai , aj , r ) r ∈R

Score function STH (ai , aj , r ) = P

exp{ψ(ai , aj , r ) · w} r ∈R exp{ψ(ai , aj , r ) · w}

where ψ is the feature map and w is the parameter vector to learn.

Label List 1. : first argument is higher than the second argument. 2. ≺: first argument is lower than the second argument. 3. AR: the second argument is the referenced argument of the first argument. 4. RA: the first argument is the referenced argument of the second. 5. AC: the second argument is the continuation argument of the first argument. 6. CA: the first argument is the continuation argument of the second. 7. =: two arguments are labeled as the same role label. 8. ∼: two arguments are equivalent, but not in the same type.

SRL Re-ranking with Thematic Rank Information I

Structured Semantic Role Labeling I

I

I

Arguments in one predicate-argument structure are highly correlated. Toutanova et al. (2005) empirically show that global information is important for SRL and that structured solutions outperform local semantic role classifiers.

The local semantic classifier can produce a list of labeling results: Y S(a, s) = Sl (ai , si ) i

I

Our re-ranking step picks one from this list according to the predicted rank relations. Two re-ranking polices: 1. hard constraint re-ranking 2. soft constraint re-ranking

Hard Constraint Re-ranking I

Being strictly in accordance with the rank relations.

I

If the thematic rank prediction result shows the rank of argument ai is higher than aj , then the role assignment [ai =Patient and aj =Agent] will be eliminated.

Hard constraint re-ranking S(a, s) =

Y

Sl (ai , si )

i

Y

I(r ∗ (ai , aj ), r (si , sj ))

i,j,i
I

r ∗ : A × A 7→ R: predict rank of two arguments;

I

r : S × S 7→ R: predict the thematic rank of two semantic roles, e.g. r (Agent, Patient) = ”  ”.

I

I : R × R 7→ {0, 1}.

Soft Constraint Re-ranking I

The predicted confidence score of relations is added as factor items to the score function of the semantic role assignment.

Soft constraint re-ranking S(a, s) =

Y i

I

Sl (ai , si )

Y

STH (ai , aj , r (si , sj ))

i,j,i
As there are not too many arguments (at most six on the test corpus) in an argument structure, the output space is not very large and just an exhaustive search can compute efficiently.

How to Rank Arguments?

I

Table below is the performance of thematic rank prediction and structured semantic role classification on different thematic hierarchies. Baseline A A & P↑ A & P↓

Rank Prediction – 94.65% 95.62% 94.09%

SRL (S) 94.77% 95.44% 95.07% 95.13%

SRL (G) – 96.89% 96.39% 97.22%

Table: SRC performance based on different thematic hierarchy definitions.

Performance of Semantic Role Classification

Baseline Hard Soft Gold

Gold 95.14% 95.71% 96.07% 97.63%

Charniak 94.12% 94.74% 95.44% 97.32%

Table: Overall semantic role classification accuracy.

Performance of SRC (cont’d) Arg0 Arg1 Arg2 Arg3 Arg4 Arg5 R-Arg0 R-Arg1 R-Arg2 C-Arg0 C-Arg1 C-Arg2

Local 96.10 95.26 90.09 83.8 90.20 80.00 93.19 83.39 56.00 12.50 80.59 N/A

Hard 96.47 96.09 90.63 83.03 87.62 72.73 95.56 89.38 62.07 56.00 81.12 18.18

Soft 97.07 96.58 91.56 84.43 87.20 83.33 96.89 90.63 66.67 66.67 84.85 18.18

Table: F-measures of SRC based on Charniak parsing.

Analysis I

Modification for local classification results with structural information.

I

Using individual features only, local classifier may falsely label roles in a one-by-one style. Structural information can correct some of this kind of mistakes.

An Example I

Some ”circuit breakers” installed after the October 1987 crash failed their first test. Assignment Arg0+Arg1 Arg1+Arg2

Score(Local) 78.97% × 82.30% 14.25% × 11.93%

Score(Rank) :0.02% ∼:99.98%

Analysis

An Example I

Some ”circuit breakers” installed after the October 1987 crash failed their first test. Assignment Arg0+Arg1 Arg1+Arg2

Score(Local) 78.97% × 82.30% 14.25% × 11.93%

Score(Rank) :0.02% ∼:99.98%

The baseline system falsely assigns roles as Arg0+Arg1. Taking into account thematic rank prediction result that relation ”∼” gets a extremely high probability, our system returns Arg1+Arg2 as SRL result.

Conclusion & Future Work

I

Borrow the thematic hierarchy idea from linguists.

I

Use relation information to represent structural constraints.

I

Open question: how to define reasonable hierarchy of a given semantic role list?

Game Over

Prediction of Thematic Rank for Structured Semantic ...

To import structural information, re-ranking technique is emplied to incorporate ... the subject is the O [=Object, i.e., Patient/Theme]. ▻ John broke the window ...

558KB Sizes 0 Downloads 515 Views

Recommend Documents

Prediction of Thematic Hierarchy for Structured ...
archy that is a rank relation restricting syntactic realization of arguments. De- tection of thematic hierarchy is formu- lated as a classification problem through.

Structured Prediction
Sep 16, 2014 - Testing - 3D Point Cloud Classification. • Five labels. • Building, ground, poles/tree trunks, vegetation, wires. • Creating graphical model.

What is structured prediction? - PDFKUL.COM
VW learning to search. 11. Hal Daumé III ([email protected]). Python interface to VW. Library interface to VW (not a command line wrapper). It is actually documented!!! Allows you to write code like: import pyvw vw = pyvw.vw(“--quiet”) ex1 = vw.examp

What is structured prediction? - GitHub
9. Hal Daumé III ([email protected]). State of the art accuracy in.... ➢ Part of speech tagging (1 million words). ➢ wc: ... iPython Notebook for Learning to Search.

Structured Sparse Low-Rank Regression Model for ... - Springer Link
3. Computer Science and Engineering,. University of Texas at Arlington, Arlington, USA. Abstract. With the advances of neuroimaging techniques and genome.

structured language modeling for speech ... - Semantic Scholar
20Mwds (a subset of the training data used for the baseline 3-gram model), ... it assigns probability to word sequences in the CSR tokenization and thus the ...

Reverse Split Rank - Semantic Scholar
there is no bound for the split rank of all rational polytopes in R3. Furthermore, ... We say that K is (relatively) lattice-free if there are no integer ..... Given a rational polyhedron P ⊆ Rn, we call relaxation of P a rational polyhe- dron Q âŠ

Multihypothesis Prediction for Compressed ... - Semantic Scholar
May 11, 2012 - regularization to an ill-posed least-squares optimization is proposed. .... 2.1 (a) Generation of multiple hypotheses for a subblock in a search ...... For CPPCA, we use the implementation available from the CPPCA website.3.

Prediction Services for Distributed Computing - Semantic Scholar
In recent years, large distributed systems have been de- ployed to support ..... in the same domain will have similar network performance to a remote system.

Multihypothesis Prediction for Compressed ... - Semantic Scholar
May 11, 2012 - Name: Chen Chen. Date of Degree: May ... ual in the domain of the compressed-sensing random projections. This residual ...... availability. 26 ...

Ensemble Methods for Structured Prediction - NYU Computer Science
and possibly most relevant studies for sequence data is that of Nguyen & Guo ... for any input x ∈ X, he can use the prediction of the p experts h1(x),... ...... 336, 1999. Schapire, R. and Singer, Y. Boostexter: A boosting-based system for text ..

Ensemble Methods for Structured Prediction - NYU Computer Science
http://ai.stanford.edu/˜btaskar/ocr/. It con- tains 6,877 word instances with a total of .... 2008. URL http://www.cs.cornell.edu/people/ · tj/svm_light/svm_struct.html.

Sum-Product Networks for Structured Prediction - Signal Processing ...
tional undirected graphical model and as b) sum-product network. Dashed edges ..... For every seg- ment of the recording, we computed 13 Mel frequency cep-.

Ensemble Methods for Structured Prediction - NYU Computer Science
may been used for training the algorithms that generated h1(x),...,hp(x). ...... Finally, the auto-context algorithm of Tu & Bai (2010) is based on experts that are ...

Sum-Product Networks for Structured Prediction - Signal Processing ...
c ,h. (l) i,c)l. Model Optimization. The model weights w = (wk) are optimized to maximize the logarithm of the conditional likelihood over the training set, i.e.. F(w, P) = N. ∑ n=1 log p(yn|xn), where P = 1(y1, x1),..., (yN , xN )l is a given labe

A Structured Prediction Approach for Statistical ...
For sequence labeling problem, the standard loss function is. Hamming distance, which measures the difference between the true output and the predicting one:.

A Structured Prediction Approach for Statistical ... - ACL Anthology
Abstract. We propose a new formally syntax-based method for statistical machine translation. Transductions between parsing trees are transformed into a problem of sequence tagging, which is then tackled by a search- based structured prediction method

Prediction of Channel State for Cognitive Radio ... - Semantic Scholar
Department of Electrical and Computer Engineering ... in [10]. HMM has been used to predict the usage behavior of a frequency band based on channel usage patterns in [11] for ..... range of 800MHz to 2500MHz is placed near the laptop and.

Reverse Split Rank - Semantic Scholar
Watson Research Center, Yorktown Heights, NY, USA. 3 DISOPT, Institut de ..... Given a rational polyhedron P ⊆ Rn, we call relaxation of P a rational polyhe- .... operator (hence, also of the split closure operator) applied to Qi are sufficient to.

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
of the existing on-line-to-batch conversion tech- niques to the ... of selecting the best path expert (or collection of. 0. 1 h11 ... hp1. 2 ... gorithm of (Mohri, 1997); and the Follow the Per- ...... Proceedings of the 1997 IEEE ASRU Workshop, page

Structured Composition of Semantic Vectors
Speed Performance. Stephen Wu ... Speed Performance. Stephen Wu ..... SVS Composition Components. Word vector in context (e) the eα = iu ik ip. .1 .2 .1.

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
Let αt−1 ∈ RN denote the vector obtained after t − 1 iterations and et the tth unit vector in RN . We denote ..... Efficient algorithms for online decision prob- lems.

Learning Ensembles of Structured Prediction Rules - NYU Computer ...
tion and computer vision, with structures or sub- .... with a large set of classes as in the problem we .... of selecting the best path expert (or collection of. 0. 1 h11.

Computing the Mordell-Weil rank of Jacobians of ... - Semantic Scholar
Aug 28, 1992 - Let C be a curve of genus two defined over a number field K, and J its. Jacobian variety. The Mordell-Weil theorem states that J(K) is a finitely-.