User Simulation for Online Adaptation and Knowledge ...

Viewer
Transcript

User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Srinivasan Janarthanam Oliver Lemon School of Informatics University of Edinburgh

S. Janarthanam & O. Lemon

User Simulation for Online Adaptation and Knowledge Alignment in Troubleshooting Dialogue Systems Motivation Related work Experimental setup Training, Evaluation and Results Conclusion

2

Motivation

S. Janarthanam & O. Lemon

Current dialogue systems interact with homogeneous user population in domains like Town-Info, Travel Planning, etc. Current user simulation models do not simulate diverse groups of users. In reality, users differ from one another in their domain expertise (for e.g. some users don’t understand jargon, have misconceptions, etc.). Troubleshooting domain: Dialogue system helps users diagnose and repair their internet connection. Users have different levels of domain knowledge (e.g. Novice, Intermediates and Experts). Dialogue Manager must not only know “what” to ask next, but also “how” to ask the user. How to learn a dialogue strategy that handles all different users by adapting to their domain expertise?

3

In a nutshell Objectives To simulate users in a broadband troubleshooting scenario. Diverse groups of users based on their domain knowledge. Environment-sensitive user simulation.

S. Janarthanam & O. Lemon

To learn an optimal knowledge alignment dialogue policy to handle Users whose domain expertise we don’t know. Avoid user frustration during the dialogue (hang-ups). Shorter dialogue.

Steps: Model Troubleshooting dialogue as a Markov-Decision Process. Learn knowledge alignment policy using Reinforcement Learning. Evaluate the learned policy against hand-coded policies.

4

Related work

Knowledge alignment Issacs & Clark (1987): Users learn about each other’s knowledge levels and adapt quickly in a conversation. Larsson (2007): Formal account of how the change in meanings of NL expressions can be formally accounted.

S. Janarthanam & O. Lemon

Troubleshooting systems Boye et al (2007): Hand-coded rules governing “what” and “how” to ask. Williams (2007): POMDP system that learns “what” to ask next in the face of uncertainty about ASR input and environment state.

User simulations

(Eckert et al. 1997, 1998, Scheffler and Young 1999, Levin et al 2000, K. Georgila 2005, H. Cuayáhuitl et al 2005): They focus on how an average user will respond. They do not focus on user’s domain knowledge, frustration or environment simulation.

5

Setup User

S. Janarthanam & O. Lemon

Dialogue System

Observe & Manipulate

Troubleshooting decision tree 6

Environment

Dialogue Manager/NL Generator

Trade-off: Request types for different users vs Dialogue length.

S. Janarthanam & O. Lemon

Reinforcement Learning (Sutton & Barto 1998) agent. Interacts with the user at dialogue act level. Diagnosis and repair instructions (“what” to ask) are driven by a hand-coded decision tree (troubleshooting script). Learning task - to decide between giving simple vs. extended requests (“how” to ask). Simple requests (e.g. Is the ADSL light on?)(costs 1 turn) Extended requests (e.g. I need to know the status of the ADSL light. It is right next to the modem powerlight? Is it on, off or flashing? ) (costs 3 turns) 7

Decision tree

8

S. Janarthanam & O. Lemon

User Expertise Index

Estimates the user’s expertise based on request types, user responses and frustration. S. Janarthanam & O. Lemon

initialize, user_expertise_index = 5. for every turn, if request_type == simple, if user_reply == request_clarification, user_expertise_index -= 1 else user_expertise_index += 2 else if request_type == extended, if user_reply == request_clarification, user_expertise_index -= 2 else if user_frustration == true,

user_expertise_index += 1 else

user_expertise_index -= 1

9

DM Action set and State Actions

• • • • • • • • •

More_slot_to_ask (0/1) Solution_found (0/1) User_expertise_index (0-9) User_said_dont_know (0/1) Modem_powerlight_filled (0/1) Adsllight_filled (0/1) Modem_filter_filled (0/1) Phone_filter_filled (0/1) Phoneline_working_filled (0/1)

S. Janarthanam & O. Lemon

Greet Simple_Request_Info Extended_Request_Info Simple_Request_Action Extended_Request_Action Close_Dialogue

DM State

10

Environment simulation

Observation act Ou - to observe facts from the environment

state Se. Manipulation act Mu - to manipulate the environment state Se. User’s observations of the environment are stored in OEu

S. Janarthanam & O. Lemon

Simulates a broadband environment Se. Simulates a modem, ADSL filters, a computer , phone line, telephone and the connections between them. User interaction

11

User simulation

Slot

Novice user

Expert user

Intermediate

Phone_line

0.5

0.99

0.85

Adsl filter

0.2

0.8

0.6

Modem power light

0.5

0.95

0.8

Adsl light

0.2

0.9

0.6

Change USB

0.3

0.8

0.55

Spectrum of users from novice to experts can be simulated. Domain knowledge of the user can be updated by extended requests. P(DKu,c,t+1)= P(DKu,c,t + boost(c)) boost(c) = 0.5

S. Janarthanam & O. Lemon

Initial domain knowledge for each user (probability that user can answer simple requests for each slot value):

These probabilities would be estimated from a suitable data set (e.g. From data collected by broadband service providers). 12

User frustration

When do users get frustrated?

When DM explains very well-known concepts. When DM fails to explain unknown concepts.

Frustration leads to call hang-up (after 6th turn). Frustration Index FIu stores overall user’s frustration during the dialogue. Frustration Fu,t is also conveyed to the DM along with the User action. (based on the assumption that emotions can be determined from prosodic (75% average accuracy) and linguistic & pragmatic features (84.24% accuracy) (Ang et al 2002, Lee et al 2002, Lee & Lee 2007)).

S. Janarthanam & O. Lemon

13

User State and Action

Action set Provide_Info(C) Acknowledge Request Clarification (C) Hang-up

S. Janarthanam & O. Lemon

Su = (DKu , OEu , FIu , T) DKu – Domain Knowledge OEu – Observation of Environment FIu – Frustration Index T – Number of dialogue turns

14

Action selection As

As = Extended?

No

Yes Update DKu, FIu S. Janarthanam & O. Lemon

Yes As = Request_repair?

No

Observe Environment Ou

Manipulate Environment Mu, Update Se

Update OEu

Update OEu

Yes

Know concept ?

Know concept ?

No Update FI

Au = Provide Answer

Yes

No Update FI

15

Au = Ack Au = Req. Clarification

Au = Req. Clarification

Reward function

Reward every dialogue upon completion – successful or hang-up.

Penalize Longer dialogues Unnecessary extensions. Dialogue moves leading to user frustration.

S. Janarthanam & O. Lemon

Task completion reward TCR= 500 (-500 for hang-ups) Turn cost TC = 10.0 Extended Turn cost EC = 30.0 Total turn cost TTC = TC * #(T) Total Extension cost TEC = EC * #(ET) Final Reward = TCR – TTC – TEC

16

Training

Learned to use the user_expertise_index. Extended acts if the user is estimated to be a novice. Simple acts if the user is estimated to be an expert. Simple acts for well-known concepts (e.g. Internet working?). Avoids extra turns by not using simple acts for novices.

S. Janarthanam & O. Lemon

15000 cycles, SARSA Reinforcement Learning Algorithm User simulation produced expert and novice behaviours equally. Users behave stochastically. Learned strategy

17

Dialogue Length

User Frustration

Evaluation

Performance of learned policy was compared with three hand-coded policies.

Four policies tested on three user groups – experts, novices and intermediates (approx 800 dialogues each). Learned policy performed better than other policies in most cases. Differences in final reward were statistically significant (Wilcoxon Signed Rank test) at p < 0.001.

S. Janarthanam & O. Lemon

Expert Only: Treats all users as experts, issues only simple request acts. Novice only: Treats all users as novices, issues only extended request acts. Adaptive: If user_expertise_index > 5, issues simple request acts, extended requests otherwise.

18

Results Task Success Rate Exp

Int

Nov

Mix

Exp Only

97.7

87.9

19.5

65.1

Nov Only

90.8

92.8

92.6

92.1

Adaptive

96.9

85.2

88.3

90.1

Learned

97.6

88.2

86.2

90.6

Average final reward Policy

Exp

Int

Nov

Mix

Exp Only

393.5

177.5

-430.2

46.9

Nov Only

206.6

230.7

210.1

215.8

Adaptive

379.6

221.7

177.2

259.4

Learned

385.9

252.6

175.2

271.2

S. Janarthanam & O. Lemon

Policy

19

Sample dialogues System >> Welcome. What can I do for you? Novice User >> Hi. My Internet is not working. Sys >> Ok. Is your modem power-light on? NU >> Yes it is on. Sys >> Is the ADSL light on? NU >> What do you mean by ADSL light? Sys >> Ok. It the light right next to the modem power-light. Do you see it? NU >> Yes. Sys >> Is it on, off or flashing? NU >> Its flashing. Sys >> Ok. Your modem will be connected to a the phone-line outlet. Can you see that? NU >> Yes. Sys >>Do you have a phone at home that is connected to that line? NU >> Yes. Sys >> Is it working? NU >> Yes. The phone line is working. .....

Sample dialogues System >> Welcome. What can I do for you? Expert User >> Hi. My Internet connection is not working. Sys >> Ok. your modem power-light on? EU >> Yes. It is on. Sys >> Ok. Is your ADSL light on? EU >> No. It is flashing. Sys >> Ok. Is your phone-line working? EU >> Yes. It is working. Sys >> Is your phone connected to the line through an ADSL filter? EU >> No. There is no filter. Sys >> Ok. That is the problem. Please fix a filter between your phone and the phone-line. Spare filters are available in the package you received from us. EU >> Ok. I am fitting the filter. Sys >> Is your connection working now? EU >> Yes. Sys >> Thank you for calling. EU >> Thank you.

Conclusion

S. Janarthanam & O. Lemon

Adapting to different users (based on domain knowledge) was studied in a troubleshooting setup. User simulation incorporates new features – domain knowledge, environment simulation, and user frustration. Reinforcement Learning can be used to learn optimal knowledge-alignment policies that adapt to users online. Learned policy performs significantly better than other hand-coded policies on average (i.e. When we don’t know what kind of user we will encounter).

22

Future work

Extend this User Simulation framework to be sensitive to NLG decisions (CLASSiC project)

How to learn parameters for user simulation from dialogue corpora? Evaluation of User Simulations against real-user corpora. Test policies on real users.

S. Janarthanam & O. Lemon

Can the DM learn what referring expressions to use while adapting to user? Can the DM learn when to lexically align with the user and when not to?

23

☺ THANKS ☺

24

S. Janarthanam & O. Lemon

REFERENCES J. Ang, R. Dhillon, A. Krupski, E. Shriberg and A. Stolcke, (2002) ProsodyBased Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proc. ICSLP-2002, Denver, Colorado, USA

J. Boye (2007) Dialogue management for automatic troubleshooting and other problemsolving applications. In: Tim Paek and Harry Bunt (eds) 8th SIGDial workshop on discourse and dialogue, Antwerp, Belgium, September, 2007.

H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira, (2005) HumanComputer Dialogue Simulation Using Hidden Markov Models, In Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico, Nov 2005.

W. Eckert, E. Levin and R. Pieraccini. 1997. User Modelling for Spoken Dialogue System Evaluation. In Proc. of ASRU'97, pages 80-87.

K. Georgila, J. Henderson, and O. Lemon. 2005a. Learning user simulations for Information State Update Dialogue Systems. In Eurospeech-Interspeech '05.

E. A. Isaacs, & H. H. Clark, (1987). References in conversations between experts and novices. Journal of Experimental Psychology: General , 116, 26-37.

C. M. Lee, S. Narayanan, R. Pieraccini, (2002) Classifying emotions in humanmachine spoken dialogs, Proc. of ICME, (Lausanne, Switzerland), 2002.

S. Janarthanam & O. Lemon

25

REFERENCES C. Lee, G. Lee. (2007) Emotion Recognition for Affective User Interfaces using Natural Language Dialogs. In Proc. The 16th IEEE International Symposium on Robot and Human interactive Communication, 2007. ROMAN 2007.

E. Levin, R. Pieraccini, and W. Eckert, “Learning dialogue strategies within the markov decision process framework,” in Proc. ASRU’97, December 1997.

O. Pietquin and T. Dutoit, “A probabilistic framework for dialog simulation and optimal strategy learning,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 589–599, March 2006.

J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young. (2007a) "Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System", HLT/NAACL, Rochester, NY, April 23-25, 2007.

J. Schatzmann, B. Thomson and S. Young. (2007b) "Statistical User Simulation with a Hidden Agenda", 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2-3, 2007.

S. Janarthanam & O. Lemon

26

REFERENCES J. Schatzmann, K. Georgila, and S. Young. (2005) "Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems". 6th SIGdial Workshop on Discourse and Dialogue, Lisbon, September 2-3, 2005

K. Scheffler and S. Young, “Corpus-based dialogue simulation for automatic strategy learning and evaluation,” in Proc. NAACL Workshop on Adaptation in Dialogue Systems, 2001.

R.S. Sutton and A.G. Barto, Reinforcement Learning : An Introduction, MIT Press, ISBN : 0-262-19398-1, 1998.

J. Williams, S. Young (2007) Partially Observable Markov Decision Process for Spoken Dialogue Systems, Computer Speech and Language 21 (2007) 393–422

J. Williams. 2007. Applying POMDPs to Dialog Systems in the Troubleshooting Domain. Proc HLT/NAACL Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology, Rochester, NY, USA.

S. Janarthanam & O. Lemon

27

ENVIRONMENT SIMULATION % Observe environment - Is there a phone filter? pact(adsl_filter_for_phone, present) :equipment(P, phone, _), equipment(A, adsl_filter, _), connected(A, P, rj11, firm). S. Janarthanam & O. Lemon

% Environment setting – Set faulty phone filter set_env1(faulty_phone_filter) :assert(equipment(comp1, desktop, working)), assert(equipment(t1, phone, working)), assert(equipment(m1, modem, working)), assert(equipment(a1, adsl_filter, working)), assert(equipment(a2, adsl_filter, not_working)), assert(connected(phone_socket, a1, rj11, firm)), assert(connected(phone_socket, a2, rj11, firm)), assert(connected(a1, m1, rj11, firm)), assert(connected(a2, t1, rj11, firm)), assert(connected(m1, comp1, usb, working)), assert(phone_line(live)), assert(modem_software(installed, working)), assert(authentication(correct)), !.

% Manipulate environment - Replace the phone filter mact(replace_phone_filter) :equipment(a2, adsl_filter, not_working), !, retract(connected(phone_socket, a2, rj11, firm)), retract(connected(a2, t1, rj11, _)), assert(equipment(a4, adsl_filter, working)), assert(connected(phone_socket, a4, rj11, firm)), assert(connected(a4, t1, rj11, firm)), update_env, !.

28

User Simulation for Online Adaptation and Knowledge ...

Troubleshooting Dialogue Systems ... Current dialogue systems interact with homogeneous user .... Simulates a modem, ADSL filters, a computer , phone line,.

Download PDF

839KB Sizes 0 Downloads 171 Views

Report

User Simulation for Online Adaptation and Knowledge ...

Recommend Documents