A lot of things happen in human-machine dialogue systems which don't fit in our default definition of dialogues. There are errors from machine's side and loads of counter behaviors from the human side. Compared to human-human conversations, it's harder to look at these conversations from the viewpoint of earlier works in dialogue formalization like (say) Dialogue Macrogame Theory.
Overall, badly gone human-machine conversations are read as having certain
programming or ML-ish bugs instead of being understood as just failed
conversations, which are not uncommon in human-human situations. Most of these
buggish dialogues can be veneered1 in ones which, still holding the same
underlying parse, look more coherent. Here is an example (M
is for machine, H
is
for human) of a possible veneer:
A: A buggy conversation
M
: when do you want to go there?H parse = q(time)
H
4pmM parse = inform(time=3pm
)
M
: okay at 3pm, what date?H parse = ack(time=3pm), q(date)
H
: not at 3, 4 I said 4pmM parse = override(time=4pm)
M
: okay at 4pm, and date?
B: Veneered
M
: when do you want to go there?H parse = q(time)
H
3pmM parse = inform(time=3pm
)
M
: okay at 3pm, what date?H parse = ack(time=3pm), q(date)
H
: ah I am busy actually let's do 4pmM parse = override(time=4pm)
M
: okay at 4pm, and date?
What might be possible is that for any dialogue, we can keep the parse constant2 while moving the whole conversation around in a certain coherence-blame space which look like this:
What we just did is to move the conversation A from a machine blame quadrant to user blame (although much less) one, all the while increasing the overall coherence.
This space of coherence vs blame is interesting because of many reasons. First
we would like to know what kinds of movements are possible from an iso-parse
perspective. Is it possible to keep the blame on machine and still increase
coherence? A better response like sorry I heard 3 did you mean 3pm?
could do
that3. At the
moment, it seems like there are infinitely many movements but optimizing for
coherence might turn out to be a nice learning problem.
The space might have a zone of feasibility which forces a reduction in blame4 when the coherence goes up as shown below. Also when parses become richer and/or more correct the horizontal span gets tighter.
Another interesting point is that the x axis doesn't really specify roles like human or machine. The dumbest bug is equivalent to a reasonable misunderstanding if we are allowed to do veneers. Of course this is no relief since nothing got fixed. Taking this idea to human-human, open conversations, one major challenge is to identify the underlying parses. Can we really separate utterances from mental parses here? We sure have interpretations as intermediate representations so we might try that.
A few things that I will try next:
- Explore possible veneers with various restrictions. This is mostly going to cluster according to the kinds of errors and counter attacks in different types of human machine conversations.
- Experiment with few datapoints to see if it's possible to always make a zero blame coherent dialogue from any starting point. If yes, can the effort be quantified?
- Formalize terms like blame, parse etc.
- Attempt creating a data set of bad dialogues where the target is to maintain same parses (given NLU systems for both parties) while increasing overall dialogue coherence.
- Find actual use cases and interpretations of these things at various levels of representations.
Footnotes:
Using this term since it shows the initial intention of polishing out an error.
You need to be someone who knows both parties' mental models. Here you need to be a human who has worked on the machine.
This might look like needing a richer parse, with confidence, but you can always act dumb and ask for such confirmations all the time.
I need to define this rigorously. It looks like a measure of things that are unreasonable errors.