SLT 2018 Special Session - Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems
Schedule: Dec. 18, 2018, 1PM - 5PM, Olympia Plenary Room
- 1:00 - 1:10PM Jianfeng Gao: opening
- 1:10 – 1:40PM Gokhan Tur (Uber): “Past, Present, and Future of Conversational AI” (slides)
- 1:40 – 2:10PM Minlie Huang (Tsinghua): “Towards Building More Intelligent Conversational System: Semantics, Consistency & Interactiveness” (slides)
- 2:10 – 2:40PM Vivian Chen (NTU): “Towards Open-Domain Conversational AI” (slides)
- 2:40 – 3:00PM break
- 3:00 – 3:20PM Sungjin Lee (MSR): “MS dialogue challenge: result and outlook” (slides)
- 3:20 – 3:35PM Oral presentation 1 by Sihong Liu - “Universe Model: A Human-like User Simulator Based on Dialogue Context” (slides)
- 3:35 – 3:50PM Oral presentation 2 by Yu-An Wang - “Double dueling Agent for Dialogue Policy Learning” (slides)
- 3:50 – 4:30PM Panel discussion (chaired by Jianfeng Gao): 45mins, Panelist:
- Alex Acero (Apple)
- Vivian Chen (NTU)
- Minlie Huang (Tsinghua)
- Sungjin Lee (MSR)
- Spyros Matsoukas (Amazon)
- Gokhan Tur (Uber)
- Alex Acero (Apple)
News
- 12/18/2018 – 12/21/2018: SLT Workshop
Dec. 18, 1:00PM - 3:30PM: Invited talks: 1hr, Speakers:
- Sungjin Lee (MSR) - “MS dialogue challenge: result and outlook”
- Vivian Chen (NTU) - “Towards Open-Domain Conversational AI”
- Minlie Huang (Tsinghua) - “Towards Building More Intelligent Dialogue Systems: Semantics, Consistency, and Interactiveness”
- Gokhan Tur (Uber) - “Past, Present, and Future of Conversational AI”
- Oral presentation: Sihong Liu - “Universe Model: A Human-like User Simulator Based on Dialogue Context”
- Oral presentation: Yu-An Wang - “Double dueling Agent for Dialogue Policy Learning”
Dec. 18, 3:30PM - 4:30PM: Panel discussion (chaired by Jianfeng Gao): 45mins, Panelist:
- Alex Acero (Apple)
- Vivian Chen (NTU)
- Minlie Huang (Tsinghua)
- Sungjin Lee (MSR)
- Spyros Matsoukas (Amazon)
- Gokhan Tur (Uber) - 11/25/2018: Paper acceptance announcement.
- 11/09/2018: Paper submission. Call for Papers.
- 11/08/2018: Results (including human evaluation) Announced.
- 10/25/2018: System submission (https://msrprograms.cloudapp.net/MDC2018)
- 08/03/2018: Movie domain is up, see cmd.md for instruction.
- 07/28/2018: Restaurant and Taxi domains: Data and Simulators are up, see cmd.md for instruction.
- 07/16/2018: Registration is now open.
- 07/06/2018: Task description is up.
Task
This special session introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.
Please check this description for more details about the task.
Data
In this dialogue challenge, we will release well-annotated datasets for three task-completion domains: movie-ticket booking, restaurant reservation, and taxi ordering. Here shows the statistics of the three datasets.
Task | Intents | Slots | Dialogues |
---|---|---|---|
Movie-Ticket Booking | 11 | 29 | 2890 |
Restaurant Reservation | 11 | 30 | 4103 |
Taxi Ordering | 11 | 29 | 3094 |
Evaluation
As described in the task description (Section 4), we will evaluate the dialogue systems using both automatic and human evaluations on three criteria.
- Success Rate: the fraction of dialogs that finish successfully.
- Average Turns: the average length of the dialogue
- Average Reward: the average reward received during the conversation. There is a strong correlation among the three metrics: generally speaking, a good policy should have a high success rate, high average reward and low average turns. Here, we choose success rate as our major evaluation metric.
We will also conduct human evaluation for the competition. We will ask human judges to interact with the final systems submitted by participants. Besides the measurements aforementioned, each user will also give a rating on a scale of 1 to 5 based on the naturalness, coherence, and task-completion capability of the system, at the end of each dialogue session.
Baseline Agents
- A rule-based agent is provided.
- A standard RL agent (DQN model) is provided.
System Submission Guidelines
Open an account in https://msrprograms.cloudapp.net/MDC2018 and create a submission with an abstract and code in the form of zip file(<100MB), trained agent model, and also NLU and NLG models if applicable. Include instructions for execution as below. Submission can be updated without limit no later than 10/14/2018 11:59 PM PST.
Instructions to run the sample submission in the SubmissionSample folder.
- Extract run.zip file (Zip the contents of system/src into run.zip)
-
Run testrun.py to interact with the agent as below example.
python testrun.py –agt 0 –usr 1 –max_turn 40 –kb_path ./run/deep_dialog/data_movie/movie.kb.1k.v1.p –goal_file_path ./run/deep_dialog/data_movie/user_goals_first.v2.p –slot_set ./run/deep_dialog/data_movie/slot_set.txt –act_set ./run/deep_dialog/data_movie/dia_acts.txt –dict_path ./run/deep_dialog/data_movie/slot_dict.v1.p –nlg_model_path ./run/deep_dialog/models/nlg/movie/lstm_tanh_[1533529279.91]87_99_199_0.988.p –nlu_model_path ./run/deep_dialog/models/nlu/movie/lstm[1533588045.3]_38_38_240_0.998.p –diaact_nl_pairs ./run/deep_dialog/data_movie/dia_act_nl_pairs.v7.json –intent_err_prob 0.00 –slot_err_prob 0.00 –episodes 500 –act_level 0 –run_mode 0 –cmd_input_mode 0
Organizers
Reference
If you submit any system to this challenge or publish any other work making use of the resources provided on this project, we ask you to cite the following task description papers:
@article{li2018microsoft,
title={Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems},
author={Li, Xiujun and Panda, Sarah and Liu, Jingjing and Gao, Jianfeng},
journal={arXiv preprint arXiv:1807.11125},
year={2018}
}
@article{li2016user,
title={A User Simulator for Task-Completion Dialogues},
author={Li, Xiujun and Lipton, Zachary C and Dhingra, Bhuwan and Li, Lihong and Gao, Jianfeng and Chen, Yun-Nung},
journal={arXiv preprint arXiv:1612.05688},
year={2016}
}
Contact
- For questions specific to the challenge, you can contact us at xiul@microsoft.com.
FQA
- How to implement an agent: see here