Google Voice transcriptions - oooh, pretty linguistics

So Google Voice's voicemail transcription is pretty cool.

I already assume that it learns its voice-recognition by training its algorithms on samples of real speech: videos with captions, people calling automated systems. That's pretty cool in itself, because it means it's not just teaching the software what "book" sounds like by having someone in a studio say "book". Instead it uses real examples of people saying "book" - quickly, imperfectly, with background noise. So it can understand you when you say it quickly, imperfectly, or with background noise.

But recently I've been getting the feeling that it's using another, very different trick to figure out what people are saying in voicemails. I think it's starting to notice what people usually say at different points in a voicemail. For instance, it's very likely to guess that you're saying "Hello" at the start of a voicemail. Here's an extreme example, where my mother left a totally blank message except for some breathing and "clunk"s:
I've seen this happen a few times recently. What I think is going on is that it's not related to the "beginning" "middle" and "end" exactly, but it's taking into account the wider context surrounding each word. As in, it's noticing what words are usually said one-after-another. Computational linguists have been using this trick for a while. And maybe it's taking into account more than the word immediately preceding, and is considering the context of the whole sentence or voicemail!

The cool new thing is that Google Voice's speech recognition isn't just matching individual sounds to words, but is thinking about the whole context of the message and asking what word would someone normally say at X point in the conversation?

