In this challenge, participants will use TourSG corpus to develop the components. TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.
Since each subject in these dialogs tends to be expressed not just in a single turn, but through a series of multiple turns, dialog states are defined in these conversations for each sub-dialog level. A full dialog session is divided into sub-dialogs considering their topical coherence and then they are categorized by topics. Each sub-dialog assigned to one of major topic categories will have an additional frame structure with slot value pairs to represent some more details about the subject discussed within the sub-dialog (see an example of main task reference annotations).
Different from the main task, in the pilot tasks, annotations are provided at the utterance level and, accordingly, systems must deal with slot value pairs at the utterance level. Annotations at the utterance level involve both, semantic slots and speech acts (see an example of pilot task reference annotations).
Train and Development data: manual transcriptions and annotations at both utterance and sub-dialog levels will be provided for 20 dialogs (10 from tour guide-1 and 10 from tour guide-2) for training the trackers and fine-tuning their parameters.
Test data: manual transcriptions will be provided for 15 dialogs (5 from tour guide-1, 5 from tour guide-2 and 5 from tour guide-3) for evaluating the trackers.
Both datasets will be released free of charge to all registered challenege participants after signing a license agreement with ETPL-A*STAR. The dataset will include transcribed and annotated dialogs, as well as ontology objects describing the annotations.