SLT 2018 Special Session - Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems

Schedule: Dec. 18, 2018, 1PM - 5PM, Olympia Plenary Room

  1. 1:00 - 1:10PM Jianfeng Gao: opening
  2. 1:10 – 1:40PM Gokhan Tur (Uber): “Past, Present, and Future of Conversational AI” (slides)
  3. 1:40 – 2:10PM Minlie Huang (Tsinghua): “Towards Building More Intelligent Conversational System: Semantics, Consistency & Interactiveness” (slides)
  4. 2:10 – 2:40PM Vivian Chen (NTU): “Towards Open-Domain Conversational AI” (slides)
  5. 2:40 – 3:00PM break
  6. 3:00 – 3:20PM Sungjin Lee (MSR): “MS dialogue challenge: result and outlook” (slides)
  7. 3:20 – 3:35PM Oral presentation 1 by Sihong Liu - “Universe Model: A Human-like User Simulator Based on Dialogue Context” (slides)
  8. 3:35 – 3:50PM Oral presentation 2 by Yu-An Wang - “Double dueling Agent for Dialogue Policy Learning” (slides)
  9. 3:50 – 4:30PM Panel discussion (chaired by Jianfeng Gao): 45mins, Panelist:
    • Alex Acero (Apple)
    • Vivian Chen (NTU)
    • Minlie Huang (Tsinghua)
    • Sungjin Lee (MSR)
    • Spyros Matsoukas (Amazon)
    • Gokhan Tur (Uber)



This special session introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.

Please check this description for more details about the task.


In this dialogue challenge, we will release well-annotated datasets for three task-completion domains: movie-ticket booking, restaurant reservation, and taxi ordering. Here shows the statistics of the three datasets.

Task Intents Slots Dialogues
Movie-Ticket Booking 11 29 2890
Restaurant Reservation 11 30 4103
Taxi Ordering 11 29 3094


As described in the task description (Section 4), we will evaluate the dialogue systems using both automatic and human evaluations on three criteria.

We will also conduct human evaluation for the competition. We will ask human judges to interact with the final systems submitted by participants. Besides the measurements aforementioned, each user will also give a rating on a scale of 1 to 5 based on the naturalness, coherence, and task-completion capability of the system, at the end of each dialogue session.

Baseline Agents

System Submission Guidelines

Open an account in and create a submission with an abstract and code in the form of zip file(<100MB), trained agent model, and also NLU and NLG models if applicable. Include instructions for execution as below. Submission can be updated without limit no later than 10/14/2018 11:59 PM PST.

Instructions to run the sample submission in the SubmissionSample folder.

  1. Extract file (Zip the contents of system/src into
  2. Run to interact with the agent as below example.

    python –agt 0 –usr 1 –max_turn 40 –kb_path ./run/deep_dialog/data_movie/movie.kb.1k.v1.p –goal_file_path ./run/deep_dialog/data_movie/user_goals_first.v2.p –slot_set ./run/deep_dialog/data_movie/slot_set.txt –act_set ./run/deep_dialog/data_movie/dia_acts.txt –dict_path ./run/deep_dialog/data_movie/slot_dict.v1.p –nlg_model_path ./run/deep_dialog/models/nlg/movie/lstm_tanh_[1533529279.91]87_99_199_0.988.p –nlu_model_path ./run/deep_dialog/models/nlu/movie/lstm[1533588045.3]_38_38_240_0.998.p –diaact_nl_pairs ./run/deep_dialog/data_movie/dia_act_nl_pairs.v7.json –intent_err_prob 0.00 –slot_err_prob 0.00 –episodes 500 –act_level 0 –run_mode 0 –cmd_input_mode 0



If you submit any system to this challenge or publish any other work making use of the resources provided on this project, we ask you to cite the following task description papers:

  title={Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems},
  author={Li, Xiujun and Panda, Sarah and Liu, Jingjing and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1807.11125},

  title={A User Simulator for Task-Completion Dialogues},
  author={Li, Xiujun and Lipton, Zachary C and Dhingra, Bhuwan and Li, Lihong and Gao, Jianfeng and Chen, Yun-Nung},
  journal={arXiv preprint arXiv:1612.05688},



  1. How to implement an agent: see here