Abstract Julian Hough 14th March 2013
Incremental syntactic context and self-repairs in dialogue
Self-repairs such as repeats ("I..I..I want to go to Paris"), substitution-type modifications ("I want to go to London.. uhh no.. Paris") and insertions ("I went to London, uhh, I went slowly to London") are common-place in spoken conversation, but are notoriously difficult to deal with for formal and distributional accounts of on-line language processing for several reasons. Statistically trained word-based (N-gram) language models will under-predict the events because of data sparsity- e.g. a trigram such as "I I I" in the first example is very rare. For formal accounts, expanding a grammar to be able to generate these strings can result in computational explosion and potential over-generation.
There is hope however, in that the frequency of types (rather than tokens) of self-repair are more regular and predictable than a purely word-based language model would suggest. In previous work it has been shown there are some nice interactions of repair presence and length with contextual factors, such as how many words into the utterance a speaker is (Shriberg and Stolcke 1998) and the POS of the words uttered so far (Heeman and Allen 1999). Johnson and Charniak (2004)'s TAG-based noisy channel approach comes closest to a stochastic syntactic model of self-repair forms with wide-coverage, however it is not clear how this could operate incrementally (as an online dialogue system and humans do), and with their lack of interest in the repaired part of the utterance their system filters out useful information for a dialogue manager.
The corpus study I describe investigates how the syntactic context that would be available to an incremental parser can help inform decisions about likely repair points and the form of the repair as utterances are processed. Some initial results and discussion will be given, and if time a proposal for a new formal and computationally implemented model of processing self-repair for incremental parsing and generation systems will be described.
Back to Cognitive Science Seminar Series