Overview

With the emergence of social media and the wide spread of mobile devices, conversation via short texts has become an important way of communication. Many real-life applications can benefit from the research on STC, for example, automatic message reply on mobile phone, voice assistants like Siri, and various chatbots for use with smart home devices. That's why we proposed to organize a pilot task on conversation at NTCIR-12 (for more details) to bring together researchers interested in natural language conversation. At NTCIR-12, STC is taken as an IR problem by maintaining a large repository of post-comment pairs from Weibo in Chinese subtask and Twitter Twitter in Japanese subtask, and then finding a clever way to reuse these existing comments to respond to new posts.


At NTCIR-13, besides the retrieval-based method, we also consider the generation-based method to generating "new" comments. The generation-based method has emerged as a hot research topic and gained the most attention in recent years, while it is still an open problem whether the retrieval-based method should be wholly replaced by or combined with generation-based method for STC task. By organizing this task at NTCIR-13, we will provide a transparent platform to compare the two aforementioned methods via doing comprehensive evaluations. Furthermore, participants are encouraged to explore some effective ways to combine the two methods to get a more intelligent chatbot.


The main purpose of the STC2@NTCIR-13 is to bring together IR, NLP and Machine Learning researchers working on or interested in natural language conversation, to share latest research results, express opinions on the related issues, and discuss future directions.

Task Settings (retrieval-based method)

Smiley face

For the retrieval-based method, STC is defined as an IR task as depicted in the upper figure. A repository of post-comment pairs from Weibo for Chinese task (Twitter for Japanese task, see http://ntcirstc.noahlab.com.hk/STC2/stc-jp.htm for the details of the Japanese task.) is prepared. Each participating team receives the repository in advance.

(1): In the training period, participants can build their own conversation system based on IR technologies, using the given post-comment pairs as training data.

(2): In the test period, each team is given 100 test queries (posts), that have been held out from the repository. Each team is asked to provide a ranked list of ten results (comments) for each query. The comments must be those from the repository.

(3): In the evaluation period, the results from all the participating teams are pooled and labelled with 0 (inappropriate), 1 (appropriate in some context), and 2 (appropriate) by multiple judges. Graded relevance IR measures (e.g. nG@1, nERR@10 and P+) are used for evaluation.

Task Settings (generation-based method)

Smiley face

For the generation-based method, the task settings is depicted in the upper figure. The same repository of post-comment pairs as used in retrieval-based method is used to train the generators. The generator can be modelled by using statistical machine translation (SMT) model, or the RNN-based neural models. The other widely used natural language generation (NLG) methods, such as template-filling-based, rule-based, or linguistic-based generators are also acceptable.

(1): In the training period, participants can build their own generation-based conversation system by using the post-comment repository as training data.

(2): In the test period, each team is given 100 test queries (posts), that have been held out from the repository. Each team is asked to provide a list of ten generated results (comments) for each query. The comments do not need to be those from the repository.

(3): In the evaluation period, the results from all the participating teams are pooled and labelled with 0 (inappropriate), 1 (appropriate in some context), and 2 (appropriate) by multiple judges. Graded relevance IR measures (e.g. nG@1, nERR@10 and P+) are used for evaluation.


Organizers

Lifeng Shang, Noah's Ark Lab, Huawei, Hong Kong

Tetsuya Sakai, Waseda University, Japan

Zhengdong Lu, Deeplycurious.ai, Beijing, China

Hang Li, Noah's Ark Lab, Huawei, Hong Kong

Ryuichiro Higashinaka, Nippon Telegraph and Telephone Corporation, Japan

Yusuke Miyao, National Institute of Informatics, Japan

Yuki Arase, Osaka University, Japan

Masako Nomoto, Yahoo Japan Corporation, Japan

Schedules of Chinese Subtask

  • Post-comment pairs released to registered participants: Jul-Aug 2016
  • Training data released: Oct 2016-Jan 2017
  • Task registration due: Apr 2017 (Important: submission&evaluation)
  • STC run submission deadline: May 2017
  • Relevance assessments: Jun-Jul 2017
  • Results and Draft Task overview released to participants: Sep 1 2017
  • Participants draft papers due: Oct 1 2017
  • All camera ready papers due: Nov 1 2017
  • NTCIR-13 Conference: Dec 2017

Contact Us

stc-org@list.waseda.jp

Please follow us on twitter @ntcirstc