User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Srinivasan Janarthanam Oliver Lemon School of Informatics University of Edinburgh

S. Janarthanam & O. Lemon

User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Motivation  Related work  Experimental setup  Training, Evaluation and Results  Conclusion 

2

Motivation 









S. Janarthanam & O. Lemon



Current dialogue systems interact with homogeneous user population in domains like Town-Info, Travel Planning, etc.  Current user simulation models do not simulate diverse groups of users. In reality, users differ from one another in their domain expertise (for e.g. some users don’t understand jargon, have misconceptions, etc.). Troubleshooting domain: Dialogue system helps users diagnose and repair their internet connection. Users have different levels of domain knowledge (e.g. Novice, Intermediates and Experts). Dialogue Manager must not only know “what” to ask next, but also “how” to ask the user. How to learn a dialogue strategy that handles all different users by adapting to their domain expertise?

3

In a nutshell Objectives  To simulate users in a broadband troubleshooting scenario. Diverse groups of users based on their domain knowledge.  Environment-sensitive user simulation. 

S. Janarthanam & O. Lemon



To learn an optimal knowledge alignment dialogue policy to handle Users whose domain expertise we don’t know.  Avoid user frustration during the dialogue (hang-ups).  Shorter dialogue. 

Steps:  Model Troubleshooting dialogue as a Markov-Decision Process.  Learn knowledge alignment policy using Reinforcement Learning.  Evaluate the learned policy against hand-coded policies.

4

Related work 

Knowledge alignment Issacs & Clark (1987): Users learn about each other’s knowledge levels and adapt quickly in a conversation.  Larsson (2007): Formal account of how the change in meanings of NL expressions can be formally accounted. 

S. Janarthanam & O. Lemon



Troubleshooting systems Boye et al (2007):  Hand-coded rules governing “what” and “how” to ask.  Williams (2007):  POMDP system that learns “what” to ask next in the face of uncertainty about ASR input and environment state. 



User simulations 

(Eckert et al. 1997, 1998, Scheffler and Young 1999, Levin et al 2000, K. Georgila 2005, H. Cuayáhuitl et al 2005): They focus on how an average user will respond. They do not focus on user’s domain knowledge, frustration or environment simulation.

5

Setup User

S. Janarthanam & O. Lemon

Dialogue System

Observe & Manipulate

Troubleshooting decision tree 6

Environment

Dialogue Manager/NL Generator  





Trade-off: Request types for different users vs Dialogue length.

S. Janarthanam & O. Lemon



Reinforcement Learning (Sutton & Barto 1998) agent. Interacts with the user at dialogue act level. Diagnosis and repair instructions (“what” to ask) are driven by a hand-coded decision tree (troubleshooting script). Learning task - to decide between giving simple vs. extended requests (“how” to ask).  Simple requests (e.g. Is the ADSL light on?)(costs 1 turn)  Extended requests (e.g. I need to know the status of the ADSL light. It is right next to the modem powerlight? Is it on, off or flashing? ) (costs 3 turns) 7

Decision tree

8

S. Janarthanam & O. Lemon

User Expertise Index 

Estimates the user’s expertise based on request types, user responses and frustration. S. Janarthanam & O. Lemon

initialize, user_expertise_index = 5. for every turn, if request_type == simple, if user_reply == request_clarification, user_expertise_index -= 1 else user_expertise_index += 2 else if request_type == extended, if user_reply == request_clarification, user_expertise_index -= 2 else if user_frustration == true,

user_expertise_index += 1 else

user_expertise_index -= 1

9

DM Action set and State Actions 

   

• • • • • • • • •

More_slot_to_ask (0/1) Solution_found (0/1) User_expertise_index (0-9) User_said_dont_know (0/1) Modem_powerlight_filled (0/1) Adsllight_filled (0/1) Modem_filter_filled (0/1) Phone_filter_filled (0/1) Phoneline_working_filled (0/1)

S. Janarthanam & O. Lemon



Greet Simple_Request_Info Extended_Request_Info Simple_Request_Action Extended_Request_Action Close_Dialogue

DM State

10

Environment simulation 



 Observation act Ou - to observe facts from the environment  

state Se. Manipulation act Mu - to manipulate the environment state Se. User’s observations of the environment are stored in OEu

S. Janarthanam & O. Lemon



Simulates a broadband environment Se. Simulates a modem, ADSL filters, a computer , phone line, telephone and the connections between them. User interaction

11

User simulation 





Slot

Novice user

Expert user

Intermediate

Phone_line

0.5

0.99

0.85

Adsl filter

0.2

0.8

0.6

Modem power light

0.5

0.95

0.8

Adsl light

0.2

0.9

0.6

Change USB

0.3

0.8

0.55

Spectrum of users from novice to experts can be simulated. Domain knowledge of the user can be updated by extended requests. P(DKu,c,t+1)= P(DKu,c,t + boost(c)) boost(c) = 0.5

S. Janarthanam & O. Lemon



Initial domain knowledge for each user (probability that user can answer simple requests for each slot value):

These probabilities would be estimated from a suitable data set (e.g. From data collected by broadband service providers). 12

User frustration 

When do users get frustrated? 

  

When DM explains very well-known concepts. When DM fails to explain unknown concepts.

Frustration leads to call hang-up (after 6th turn). Frustration Index FIu stores overall user’s frustration during the dialogue. Frustration Fu,t is also conveyed to the DM along with the User action. (based on the assumption that emotions can be determined from prosodic (75% average accuracy) and linguistic & pragmatic features (84.24% accuracy) (Ang et al 2002, Lee et al 2002, Lee & Lee 2007)).

S. Janarthanam & O. Lemon



13

User State and Action 



Action set  Provide_Info(C)  Acknowledge  Request Clarification (C)  Hang-up

S. Janarthanam & O. Lemon

Su = (DKu , OEu , FIu , T)  DKu – Domain Knowledge  OEu – Observation of Environment  FIu – Frustration Index  T – Number of dialogue turns

14

Action selection As

As = Extended?

No

Yes Update DKu, FIu S. Janarthanam & O. Lemon

Yes As = Request_repair?

No

Observe Environment Ou

Manipulate Environment Mu, Update Se

Update OEu

Update OEu

Yes

Know concept ?

Know concept ?

No Update FI

Au = Provide Answer

Yes

No Update FI

15

Au = Ack Au = Req. Clarification

Au = Req. Clarification

Reward function 

Reward every dialogue upon completion – successful or hang-up.



Penalize  Longer dialogues  Unnecessary extensions.  Dialogue moves leading to user frustration.

S. Janarthanam & O. Lemon

Task completion reward TCR= 500 (-500 for hang-ups) Turn cost TC = 10.0 Extended Turn cost EC = 30.0 Total turn cost TTC = TC * #(T) Total Extension cost TEC = EC * #(ET) Final Reward = TCR – TTC – TEC

16

Training  



    

Learned to use the user_expertise_index. Extended acts if the user is estimated to be a novice. Simple acts if the user is estimated to be an expert. Simple acts for well-known concepts (e.g. Internet working?). Avoids extra turns by not using simple acts for novices.

S. Janarthanam & O. Lemon



15000 cycles, SARSA Reinforcement Learning Algorithm User simulation produced expert and novice behaviours equally. Users behave stochastically. Learned strategy

17

Dialogue Length

User Frustration

Evaluation 

Performance of learned policy was compared with three hand-coded policies.







Four policies tested on three user groups – experts, novices and intermediates (approx 800 dialogues each). Learned policy performed better than other policies in most cases. Differences in final reward were statistically significant (Wilcoxon Signed Rank test) at p < 0.001.

S. Janarthanam & O. Lemon

Expert Only: Treats all users as experts, issues only simple request acts.  Novice only: Treats all users as novices, issues only extended request acts.  Adaptive: If user_expertise_index > 5, issues simple request acts, extended requests otherwise. 

18

Results Task Success Rate Exp

Int

Nov

Mix

Exp Only

97.7

87.9

19.5

65.1

Nov Only

90.8

92.8

92.6

92.1

Adaptive

96.9

85.2

88.3

90.1

Learned

97.6

88.2

86.2

90.6

Average final reward Policy

Exp

Int

Nov

Mix

Exp Only

393.5

177.5

-430.2

46.9

Nov Only

206.6

230.7

210.1

215.8

Adaptive

379.6

221.7

177.2

259.4

Learned

385.9

252.6

175.2

271.2

S. Janarthanam & O. Lemon

Policy

19

Sample dialogues System >> Welcome. What can I do for you? Novice User >> Hi. My Internet is not working. Sys >> Ok. Is your modem power-light on? NU >> Yes it is on. Sys >> Is the ADSL light on? NU >> What do you mean by ADSL light? Sys >> Ok. It the light right next to the modem power-light. Do you see it? NU >> Yes. Sys >> Is it on, off or flashing? NU >> Its flashing. Sys >> Ok. Your modem will be connected to a the phone-line outlet. Can you see that? NU >> Yes. Sys >>Do you have a phone at home that is connected to that line? NU >> Yes. Sys >> Is it working? NU >> Yes. The phone line is working. .....

Sample dialogues System >> Welcome. What can I do for you? Expert User >> Hi. My Internet connection is not working. Sys >> Ok. your modem power-light on? EU >> Yes. It is on. Sys >> Ok. Is your ADSL light on? EU >> No. It is flashing. Sys >> Ok. Is your phone-line working? EU >> Yes. It is working. Sys >> Is your phone connected to the line through an ADSL filter? EU >> No. There is no filter. Sys >> Ok. That is the problem. Please fix a filter between your phone and the phone-line. Spare filters are available in the package you received from us. EU >> Ok. I am fitting the filter. Sys >> Is your connection working now? EU >> Yes. Sys >> Thank you for calling. EU >> Thank you.

Conclusion 





S. Janarthanam & O. Lemon



Adapting to different users (based on domain knowledge) was studied in a troubleshooting setup. User simulation incorporates new features – domain knowledge, environment simulation, and user frustration. Reinforcement Learning can be used to learn optimal knowledge-alignment policies that adapt to users online. Learned policy performs significantly better than other hand-coded policies on average (i.e. When we don’t know what kind of user we will encounter).

22

Future work 

Extend this User Simulation framework to be sensitive to NLG decisions (CLASSiC project)







How to learn parameters for user simulation from dialogue corpora? Evaluation of User Simulations against real-user corpora. Test policies on real users.

S. Janarthanam & O. Lemon

Can the DM learn what referring expressions to use while adapting to user?  Can the DM learn when to lexically align with the user and when not to? 

23

☺ THANKS ☺

24

S. Janarthanam & O. Lemon

REFERENCES J. Ang, R. Dhillon, A. Krupski, E. Shriberg and A. Stolcke, (2002) ProsodyBased Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proc. ICSLP-2002, Denver, Colorado, USA



J. Boye (2007) Dialogue management for automatic troubleshooting and other problemsolving applications. In: Tim Paek and Harry Bunt (eds) 8th SIGDial workshop on discourse and dialogue, Antwerp, Belgium, September, 2007.



H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira, (2005) HumanComputer Dialogue Simulation Using Hidden Markov Models, In Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico, Nov 2005.



W. Eckert, E. Levin and R. Pieraccini. 1997. User Modelling for Spoken Dialogue System Evaluation. In Proc. of ASRU'97, pages 80-87.



K. Georgila, J. Henderson, and O. Lemon. 2005a. Learning user simulations for Information State Update Dialogue Systems. In Eurospeech-Interspeech '05.



E. A. Isaacs, & H. H. Clark, (1987). References in conversations between experts and novices. Journal of Experimental Psychology: General , 116, 26-37.



C. M. Lee, S. Narayanan, R. Pieraccini, (2002) Classifying emotions in humanmachine spoken dialogs, Proc. of ICME, (Lausanne, Switzerland), 2002.

S. Janarthanam & O. Lemon



25

REFERENCES C. Lee, G. Lee. (2007) Emotion Recognition for Affective User Interfaces using Natural Language Dialogs. In Proc. The 16th IEEE International Symposium on Robot and Human interactive Communication, 2007. ROMAN 2007.



E. Levin, R. Pieraccini, and W. Eckert, “Learning dialogue strategies within the markov decision process framework,” in Proc. ASRU’97, December 1997.



O. Pietquin and T. Dutoit, “A probabilistic framework for dialog simulation and optimal strategy learning,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 589–599, March 2006.



J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young. (2007a) "Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System", HLT/NAACL, Rochester, NY, April 23-25, 2007.



J. Schatzmann, B. Thomson and S. Young. (2007b) "Statistical User Simulation with a Hidden Agenda", 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2-3, 2007.

S. Janarthanam & O. Lemon



26

REFERENCES J. Schatzmann, K. Georgila, and S. Young. (2005) "Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems". 6th SIGdial Workshop on Discourse and Dialogue, Lisbon, September 2-3, 2005



K. Scheffler and S. Young, “Corpus-based dialogue simulation for automatic strategy learning and evaluation,” in Proc. NAACL Workshop on Adaptation in Dialogue Systems, 2001.



R.S. Sutton and A.G. Barto, Reinforcement Learning : An Introduction, MIT Press, ISBN : 0-262-19398-1, 1998.



J. Williams, S. Young (2007) Partially Observable Markov Decision Process for Spoken Dialogue Systems, Computer Speech and Language 21 (2007) 393–422



J. Williams. 2007. Applying POMDPs to Dialog Systems in the Troubleshooting Domain. Proc HLT/NAACL Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology, Rochester, NY, USA.

S. Janarthanam & O. Lemon



27

ENVIRONMENT SIMULATION % Observe environment - Is there a phone filter? pact(adsl_filter_for_phone, present) :equipment(P, phone, _), equipment(A, adsl_filter, _), connected(A, P, rj11, firm). S. Janarthanam & O. Lemon

% Environment setting – Set faulty phone filter set_env1(faulty_phone_filter) :assert(equipment(comp1, desktop, working)), assert(equipment(t1, phone, working)), assert(equipment(m1, modem, working)), assert(equipment(a1, adsl_filter, working)), assert(equipment(a2, adsl_filter, not_working)), assert(connected(phone_socket, a1, rj11, firm)), assert(connected(phone_socket, a2, rj11, firm)), assert(connected(a1, m1, rj11, firm)), assert(connected(a2, t1, rj11, firm)), assert(connected(m1, comp1, usb, working)), assert(phone_line(live)), assert(modem_software(installed, working)), assert(authentication(correct)), !.

% Manipulate environment - Replace the phone filter mact(replace_phone_filter) :equipment(a2, adsl_filter, not_working), !, retract(connected(phone_socket, a2, rj11, firm)), retract(connected(a2, t1, rj11, _)), assert(equipment(a4, adsl_filter, working)), assert(connected(phone_socket, a4, rj11, firm)), assert(connected(a4, t1, rj11, firm)), update_env, !.

28

User Simulation for Online Adaptation and Knowledge ...

Troubleshooting Dialogue Systems ... Current dialogue systems interact with homogeneous user .... Simulates a modem, ADSL filters, a computer , phone line,.

839KB Sizes 0 Downloads 147 Views

Recommend Documents

User Simulation for Online Adaptation and Knowledge ...
Troubleshooting domain: Dialogue system helps users ... They do not focus on user's domain knowledge, frustration or .... When do users get frustrated?

User simulations for online adaptation and knowledge ...
systems. Recent work on troubleshooting concerns automated spoken dialogue sys- .... shown that such trees can be learned from data. (Williams 2007). ... the user or to ask that they perform a manipulate action. .... over-informativeness on the same

Prior Knowledge Driven Domain Adaptation
cally study the effects of incorporating prior ... Learning Strategies to Reduce Label Cost, Bellevue, WA, ..... http://bioie.ldc.upenn.edu/wiki/index.php/POS_tags.

Prior Knowledge Driven Domain Adaptation
domain. In this paper, we propose a new adaptation framework called Prior knowl- edge Driven Adaptation (PDA), which takes advantage of the knowledge on ...

Prior Knowledge Driven Domain Adaptation
The performance of a natural language sys- ... ral language processing (NLP) tasks, statistical mod- ..... http://bioie.ldc.upenn.edu/wiki/index.php/POS_tags.

On the Link Adaptation and User Scheduling with ...
where uk denotes the scheduled user index in the k-th cell, h. (0) u0 ... denotes the source data symbol of user uk, and n. (0) ..... best at every scheduling instance. In this .... for broadband space-time BICM hybrid-ARQ systems with co-channel.

Joint Link Adaptation and User Scheduling With HARQ ... - IEEE Xplore
S. M. Kim was with the KTH Royal Institute of Technology, 114 28. Stockholm ... vanced Institute of Science and Technology, Daejeon 305-701, Korea (e-mail:.

A Two-tier User Simulation Model for Reinforcement ...
policies for spoken dialogue systems using rein- forcement ... dialogue partners (Issacs and Clark, 1987). ... and we collect data on user reactions to system REG.

Parameter estimation for agenda-based user simulation
In spoken dialogue systems research, modelling .... Each time the user simulator receives a system act, .... these systems are commonly called slot-filling di-.

Exploring Web Semantic Knowledge and User ...
available on the Semantic Web to discover new relationships .... be explained in Section IV. ... database, for each correspondence in the current alignment,.

User Domain Knowledge and Eye Movement Patterns ...
Oct 20, 2011 - with self-assessed domain knowledge. A simple regression model based on these measures was successful in predict- ing participant domain ...

User Domain Knowledge and Eye Movement Patterns ...
Rutgers, The State University of New Jersey. School of ..... Notes in Computer Science, Springer, pp. ... Technology Conference (2011) (New York, October.

Fast Speaker Adaptation Using A Priori Knowledge
tion techniques could be applied to SD models to find a low-dimen- sional representation for speaker space, the .... 3.1. Conventional vs. Eigenvoice Techniques. We conducted mean adaptation experiments on the Isolet database. [l], which ..... “Pro

Online PDF Simio and Simulation
Nov 19, 2013 - environment or in support of independent study. Modern ... MBA). For a simulation module that s part of a larger survey course, we recommend ...

PDF Online Simio and Simulation
principles using the popular. Simio product. ... environment or in support of independent study. Modern software makes simulation more useful and accessible.

PDF Online Simio and Simulation
Online PDF Simio and Simulation: Modeling, Analysis, Applications, Read PDF Simio and Simulation: Modeling, Analysis, Applications, Full PDF Simio and ...