User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Srinivasan Janarthanam Oliver Lemon School of Informatics University of Edinburgh
S. Janarthanam & O. Lemon
User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Motivation Related work Experimental setup Training, Evaluation and Results Conclusion
2
Motivation
S. Janarthanam & O. Lemon
Current dialogue systems interact with homogeneous user population in domains like Town-Info, Travel Planning, etc. Current user simulation models do not simulate diverse groups of users. In reality, users differ from one another in their domain expertise (for e.g. some users don’t understand jargon, have misconceptions, etc.). Troubleshooting domain: Dialogue system helps users diagnose and repair their internet connection. Users have different levels of domain knowledge (e.g. Novice, Intermediates and Experts). Dialogue Manager must not only know “what” to ask next, but also “how” to ask the user. How to learn a dialogue strategy that handles all different users by adapting to their domain expertise?
3
In a nutshell Objectives To simulate users in a broadband troubleshooting scenario. Diverse groups of users based on their domain knowledge. Environment-sensitive user simulation.
S. Janarthanam & O. Lemon
To learn an optimal knowledge alignment dialogue policy to handle Users whose domain expertise we don’t know. Avoid user frustration during the dialogue (hang-ups). Shorter dialogue.
Steps: Model Troubleshooting dialogue as a Markov-Decision Process. Learn knowledge alignment policy using Reinforcement Learning. Evaluate the learned policy against hand-coded policies.
4
Related work
Knowledge alignment Issacs & Clark (1987): Users learn about each other’s knowledge levels and adapt quickly in a conversation. Larsson (2007): Formal account of how the change in meanings of NL expressions can be formally accounted.
S. Janarthanam & O. Lemon
Troubleshooting systems Boye et al (2007): Hand-coded rules governing “what” and “how” to ask. Williams (2007): POMDP system that learns “what” to ask next in the face of uncertainty about ASR input and environment state.
User simulations
(Eckert et al. 1997, 1998, Scheffler and Young 1999, Levin et al 2000, K. Georgila 2005, H. Cuayáhuitl et al 2005): They focus on how an average user will respond. They do not focus on user’s domain knowledge, frustration or environment simulation.
5
Setup User
S. Janarthanam & O. Lemon
Dialogue System
Observe & Manipulate
Troubleshooting decision tree 6
Environment
Dialogue Manager/NL Generator
Trade-off: Request types for different users vs Dialogue length.
S. Janarthanam & O. Lemon
Reinforcement Learning (Sutton & Barto 1998) agent. Interacts with the user at dialogue act level. Diagnosis and repair instructions (“what” to ask) are driven by a hand-coded decision tree (troubleshooting script). Learning task - to decide between giving simple vs. extended requests (“how” to ask). Simple requests (e.g. Is the ADSL light on?)(costs 1 turn) Extended requests (e.g. I need to know the status of the ADSL light. It is right next to the modem powerlight? Is it on, off or flashing? ) (costs 3 turns) 7
Decision tree
8
S. Janarthanam & O. Lemon
User Expertise Index
Estimates the user’s expertise based on request types, user responses and frustration. S. Janarthanam & O. Lemon
initialize, user_expertise_index = 5. for every turn, if request_type == simple, if user_reply == request_clarification, user_expertise_index -= 1 else user_expertise_index += 2 else if request_type == extended, if user_reply == request_clarification, user_expertise_index -= 2 else if user_frustration == true,
user_expertise_index += 1 else
user_expertise_index -= 1
9
DM Action set and State Actions
• • • • • • • • •
More_slot_to_ask (0/1) Solution_found (0/1) User_expertise_index (0-9) User_said_dont_know (0/1) Modem_powerlight_filled (0/1) Adsllight_filled (0/1) Modem_filter_filled (0/1) Phone_filter_filled (0/1) Phoneline_working_filled (0/1)
S. Janarthanam & O. Lemon
Greet Simple_Request_Info Extended_Request_Info Simple_Request_Action Extended_Request_Action Close_Dialogue
DM State
10
Environment simulation
Observation act Ou - to observe facts from the environment
state Se. Manipulation act Mu - to manipulate the environment state Se. User’s observations of the environment are stored in OEu
S. Janarthanam & O. Lemon
Simulates a broadband environment Se. Simulates a modem, ADSL filters, a computer , phone line, telephone and the connections between them. User interaction
11
User simulation
Slot
Novice user
Expert user
Intermediate
Phone_line
0.5
0.99
0.85
Adsl filter
0.2
0.8
0.6
Modem power light
0.5
0.95
0.8
Adsl light
0.2
0.9
0.6
Change USB
0.3
0.8
0.55
Spectrum of users from novice to experts can be simulated. Domain knowledge of the user can be updated by extended requests. P(DKu,c,t+1)= P(DKu,c,t + boost(c)) boost(c) = 0.5
S. Janarthanam & O. Lemon
Initial domain knowledge for each user (probability that user can answer simple requests for each slot value):
These probabilities would be estimated from a suitable data set (e.g. From data collected by broadband service providers). 12
User frustration
When do users get frustrated?
When DM explains very well-known concepts. When DM fails to explain unknown concepts.
Frustration leads to call hang-up (after 6th turn). Frustration Index FIu stores overall user’s frustration during the dialogue. Frustration Fu,t is also conveyed to the DM along with the User action. (based on the assumption that emotions can be determined from prosodic (75% average accuracy) and linguistic & pragmatic features (84.24% accuracy) (Ang et al 2002, Lee et al 2002, Lee & Lee 2007)).
S. Janarthanam & O. Lemon
13
User State and Action
Action set Provide_Info(C) Acknowledge Request Clarification (C) Hang-up
S. Janarthanam & O. Lemon
Su = (DKu , OEu , FIu , T) DKu – Domain Knowledge OEu – Observation of Environment FIu – Frustration Index T – Number of dialogue turns
14
Action selection As
As = Extended?
No
Yes Update DKu, FIu S. Janarthanam & O. Lemon
Yes As = Request_repair?
No
Observe Environment Ou
Manipulate Environment Mu, Update Se
Update OEu
Update OEu
Yes
Know concept ?
Know concept ?
No Update FI
Au = Provide Answer
Yes
No Update FI
15
Au = Ack Au = Req. Clarification
Au = Req. Clarification
Reward function
Reward every dialogue upon completion – successful or hang-up.
Penalize Longer dialogues Unnecessary extensions. Dialogue moves leading to user frustration.
S. Janarthanam & O. Lemon
Task completion reward TCR= 500 (-500 for hang-ups) Turn cost TC = 10.0 Extended Turn cost EC = 30.0 Total turn cost TTC = TC * #(T) Total Extension cost TEC = EC * #(ET) Final Reward = TCR – TTC – TEC
16
Training
Learned to use the user_expertise_index. Extended acts if the user is estimated to be a novice. Simple acts if the user is estimated to be an expert. Simple acts for well-known concepts (e.g. Internet working?). Avoids extra turns by not using simple acts for novices.
S. Janarthanam & O. Lemon
15000 cycles, SARSA Reinforcement Learning Algorithm User simulation produced expert and novice behaviours equally. Users behave stochastically. Learned strategy
17
Dialogue Length
User Frustration
Evaluation
Performance of learned policy was compared with three hand-coded policies.
Four policies tested on three user groups – experts, novices and intermediates (approx 800 dialogues each). Learned policy performed better than other policies in most cases. Differences in final reward were statistically significant (Wilcoxon Signed Rank test) at p < 0.001.
S. Janarthanam & O. Lemon
Expert Only: Treats all users as experts, issues only simple request acts. Novice only: Treats all users as novices, issues only extended request acts. Adaptive: If user_expertise_index > 5, issues simple request acts, extended requests otherwise.
18
Results Task Success Rate Exp
Int
Nov
Mix
Exp Only
97.7
87.9
19.5
65.1
Nov Only
90.8
92.8
92.6
92.1
Adaptive
96.9
85.2
88.3
90.1
Learned
97.6
88.2
86.2
90.6
Average final reward Policy
Exp
Int
Nov
Mix
Exp Only
393.5
177.5
-430.2
46.9
Nov Only
206.6
230.7
210.1
215.8
Adaptive
379.6
221.7
177.2
259.4
Learned
385.9
252.6
175.2
271.2
S. Janarthanam & O. Lemon
Policy
19
Sample dialogues System >> Welcome. What can I do for you? Novice User >> Hi. My Internet is not working. Sys >> Ok. Is your modem power-light on? NU >> Yes it is on. Sys >> Is the ADSL light on? NU >> What do you mean by ADSL light? Sys >> Ok. It the light right next to the modem power-light. Do you see it? NU >> Yes. Sys >> Is it on, off or flashing? NU >> Its flashing. Sys >> Ok. Your modem will be connected to a the phone-line outlet. Can you see that? NU >> Yes. Sys >>Do you have a phone at home that is connected to that line? NU >> Yes. Sys >> Is it working? NU >> Yes. The phone line is working. .....
Sample dialogues System >> Welcome. What can I do for you? Expert User >> Hi. My Internet connection is not working. Sys >> Ok. your modem power-light on? EU >> Yes. It is on. Sys >> Ok. Is your ADSL light on? EU >> No. It is flashing. Sys >> Ok. Is your phone-line working? EU >> Yes. It is working. Sys >> Is your phone connected to the line through an ADSL filter? EU >> No. There is no filter. Sys >> Ok. That is the problem. Please fix a filter between your phone and the phone-line. Spare filters are available in the package you received from us. EU >> Ok. I am fitting the filter. Sys >> Is your connection working now? EU >> Yes. Sys >> Thank you for calling. EU >> Thank you.
Conclusion
S. Janarthanam & O. Lemon
Adapting to different users (based on domain knowledge) was studied in a troubleshooting setup. User simulation incorporates new features – domain knowledge, environment simulation, and user frustration. Reinforcement Learning can be used to learn optimal knowledge-alignment policies that adapt to users online. Learned policy performs significantly better than other hand-coded policies on average (i.e. When we don’t know what kind of user we will encounter).
22
Future work
Extend this User Simulation framework to be sensitive to NLG decisions (CLASSiC project)
How to learn parameters for user simulation from dialogue corpora? Evaluation of User Simulations against real-user corpora. Test policies on real users.
S. Janarthanam & O. Lemon
Can the DM learn what referring expressions to use while adapting to user? Can the DM learn when to lexically align with the user and when not to?
23
☺ THANKS ☺
24
S. Janarthanam & O. Lemon
REFERENCES J. Ang, R. Dhillon, A. Krupski, E. Shriberg and A. Stolcke, (2002) ProsodyBased Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proc. ICSLP-2002, Denver, Colorado, USA
J. Boye (2007) Dialogue management for automatic troubleshooting and other problemsolving applications. In: Tim Paek and Harry Bunt (eds) 8th SIGDial workshop on discourse and dialogue, Antwerp, Belgium, September, 2007.
H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira, (2005) HumanComputer Dialogue Simulation Using Hidden Markov Models, In Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico, Nov 2005.
W. Eckert, E. Levin and R. Pieraccini. 1997. User Modelling for Spoken Dialogue System Evaluation. In Proc. of ASRU'97, pages 80-87.
K. Georgila, J. Henderson, and O. Lemon. 2005a. Learning user simulations for Information State Update Dialogue Systems. In Eurospeech-Interspeech '05.
E. A. Isaacs, & H. H. Clark, (1987). References in conversations between experts and novices. Journal of Experimental Psychology: General , 116, 26-37.
C. M. Lee, S. Narayanan, R. Pieraccini, (2002) Classifying emotions in humanmachine spoken dialogs, Proc. of ICME, (Lausanne, Switzerland), 2002.
S. Janarthanam & O. Lemon
25
REFERENCES C. Lee, G. Lee. (2007) Emotion Recognition for Affective User Interfaces using Natural Language Dialogs. In Proc. The 16th IEEE International Symposium on Robot and Human interactive Communication, 2007. ROMAN 2007.
E. Levin, R. Pieraccini, and W. Eckert, “Learning dialogue strategies within the markov decision process framework,” in Proc. ASRU’97, December 1997.
O. Pietquin and T. Dutoit, “A probabilistic framework for dialog simulation and optimal strategy learning,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 589–599, March 2006.
J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young. (2007a) "Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System", HLT/NAACL, Rochester, NY, April 23-25, 2007.
J. Schatzmann, B. Thomson and S. Young. (2007b) "Statistical User Simulation with a Hidden Agenda", 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2-3, 2007.
S. Janarthanam & O. Lemon
26
REFERENCES J. Schatzmann, K. Georgila, and S. Young. (2005) "Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems". 6th SIGdial Workshop on Discourse and Dialogue, Lisbon, September 2-3, 2005
K. Scheffler and S. Young, “Corpus-based dialogue simulation for automatic strategy learning and evaluation,” in Proc. NAACL Workshop on Adaptation in Dialogue Systems, 2001.
R.S. Sutton and A.G. Barto, Reinforcement Learning : An Introduction, MIT Press, ISBN : 0-262-19398-1, 1998.
J. Williams, S. Young (2007) Partially Observable Markov Decision Process for Spoken Dialogue Systems, Computer Speech and Language 21 (2007) 393–422
J. Williams. 2007. Applying POMDPs to Dialog Systems in the Troubleshooting Domain. Proc HLT/NAACL Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology, Rochester, NY, USA.
S. Janarthanam & O. Lemon
27
ENVIRONMENT SIMULATION % Observe environment - Is there a phone filter? pact(adsl_filter_for_phone, present) :equipment(P, phone, _), equipment(A, adsl_filter, _), connected(A, P, rj11, firm). S. Janarthanam & O. Lemon
% Environment setting – Set faulty phone filter set_env1(faulty_phone_filter) :assert(equipment(comp1, desktop, working)), assert(equipment(t1, phone, working)), assert(equipment(m1, modem, working)), assert(equipment(a1, adsl_filter, working)), assert(equipment(a2, adsl_filter, not_working)), assert(connected(phone_socket, a1, rj11, firm)), assert(connected(phone_socket, a2, rj11, firm)), assert(connected(a1, m1, rj11, firm)), assert(connected(a2, t1, rj11, firm)), assert(connected(m1, comp1, usb, working)), assert(phone_line(live)), assert(modem_software(installed, working)), assert(authentication(correct)), !.
% Manipulate environment - Replace the phone filter mact(replace_phone_filter) :equipment(a2, adsl_filter, not_working), !, retract(connected(phone_socket, a2, rj11, firm)), retract(connected(a2, t1, rj11, _)), assert(equipment(a4, adsl_filter, working)), assert(connected(phone_socket, a4, rj11, firm)), assert(connected(a4, t1, rj11, firm)), update_env, !.
28