Date: 30/08/2013 18:43:45
From: esselte
ID: 381915
Subject: Waveform and Mouth Shapes

Do mouth shapes formed whilst humans are vocalising look anything like the waveform patterns of the sounds we are making?

Reply Quote

Date: 30/08/2013 19:24:29
From: PM 2Ring
ID: 381942
Subject: re: Waveform and Mouth Shapes

What do you mean by “anything like”? :)

Check out The Acoustic Theory of Speech Production: the source-filter model

Reply Quote

Date: 30/08/2013 19:31:28
From: esselte
ID: 381954
Subject: re: Waveform and Mouth Shapes

I’m interested in how closely mouth shapes correspond to waveforms.

Could a deaf lip reader “listen” to the screen of an oscilloscope?

The question was prompted by watching the Arctic Monkey’s Do I Wanna Know video clip, which has a lot of artistically-interpreted-to-match-vocals-mouth-shape sine wave thingies. I was wondering about the degree of artistic interpretation required to make those images.

Reply Quote

Date: 30/08/2013 20:04:39
From: PM 2Ring
ID: 382000
Subject: re: Waveform and Mouth Shapes

> I’m interested in how closely mouth shapes correspond to waveforms.

Ok. There is some correspondence. The mouth shape itself has some impact on the final sound produced, but it also indicates things that are going on deeper in the vocal tract.

> Could a deaf lip reader “listen” to the screen of an oscilloscope?

Perhaps, but I think they’d have more success looking at some form of spectrogram (aka sonogram).

Typical human speech contains a mixture of frequencies, so the resulting waveform tends to look rather complex. A spectrogram shows the energy present at different frequencies, and such images can be used to identify various speech sounds. See the Wikipedia article on formants

Wikipedia said:

Formants are defined by Gunnar Fant as ‘the spectral peaks of the sound spectrum of the voice’. In speech science and phonetics, formant is also used to mean an acoustic resonance of the human vocal tract.

[…]

Formants are the distinguishing or meaningful frequency components of human speech and of singing. By definition, the information that humans require to distinguish between vowels can be represented purely quantitatively by the frequency content of the vowel sounds.

In speech, these are the characteristic partials that identify vowels to the listener. Most of these formants are produced by tube and chamber resonance, but a few whistle tones derive from periodic collapse of Venturi effect low-pressure zones.

I suspect that lip-readers do more than just watch lips – even if they focus on the lips they’re probably also taking in visual info from the rest of the lower face, jaws and throat.

> The question was prompted by watching the Arctic Monkey’s Do I Wanna Know video clip, which has a lot of artistically-interpreted-to-match-vocals-mouth-shape sine wave thingies. I was wondering about the degree of artistic interpretation required to make those images.

It’s possible to generate recognisable synthetic speech sounds even using fairly crude digital models of the human vocal system. It’s not too hard to enhance speech synthesizer software so that it provides vocal tract geometry data as well as acoustic output.

Back in the mid 1980s, the old Amiga speech synthesizer (which used four simple formants to produce speech sounds) had the capability to generate basic mouth shape data. Although it was rather crude, that data could be used to make an animated character look like it was speaking or singing. It was fun to play with, but I never wrote any software that seriously exploited that capability.

So you can use speech synthesis technology to do convincing animation of mouth, face & throat geometry of CGI characters, and then overdub the sounds with human sounds, if desired.

Reply Quote

Date: 31/08/2013 09:30:50
From: Arts
ID: 382460
Subject: re: Waveform and Mouth Shapes

PM 2Ring said:

Back in the mid 1980s, the old Amiga speech synthesizer (which used four simple formants to produce speech sounds) had the capability to generate basic mouth shape data. Although it was rather crude, that data could be used to make an animated character look like it was speaking or singing. It was fun to play with, but I never wrote any software that seriously exploited that capability.

Max Headroom…. wow, he was so cool for his time.

You will also run into difficulties with accents. The accent is created in different parts of the throat.. for example Australian accents tend to emerge from the back of the throat, while American accents are formed at the front. How does that affect your sound wave form?

Reply Quote

Date: 31/08/2013 11:01:17
From: transition
ID: 382517
Subject: re: Waveform and Mouth Shapes

>>Could a deaf lip reader “listen” to the screen of an oscilloscope?”

Doubtful, even a digtal storage scope and repeated across the screen slowly.

A lot of tthe sonds humans make, vocalizing, are courtesy of the placement of the tongue in the mouth, which you get a feel for when spitting out the alphabet along with mouthly-mindful self-observation, which probably involves a combination of physical feeling, + all the drivers of speech, which of the latter themselves are rarely minus some sort of mental ‘feelings’.

Reply Quote

Date: 31/08/2013 11:19:18
From: dv
ID: 382523
Subject: re: Waveform and Mouth Shapes

http://www.theguardian.com/world/2013/aug/29/syria-ed-miliband-succour-assad

A government source told the Times on Wednesday night: “No 10 and the Foreign Office think Miliband is a fucking cunt and a copper-bottomed shit.”

Reply Quote