Linguistics Forum
LINGUISTICS Forum Index » Phonetics & Phonology Forum » formant 1&2 or 3&4 ?
The time now is Fri Dec 19, 2014 10:44 pm

Reply to topic

 
View previous topic :: View next topic
Author Message
peper



Joined: 10 Jan 2012
Posts: 3
PostPosted: Sat Aug 31, 2013 5:57 am
Post subject: formant 1&2 or 3&4 ?
Reply with quote

Hi,

I'm doing research for the first time in phonetics, I have a very basic knowledge in this field. I will actually look at convergence/divergence/nativeness in the accent of L2 English speakers, and I'm not sure whether I should consider formant 3 & 4, or just look at formant 1 & 2. I'm wondering why most research only look at formant 1 and formant 2, and almost never at formant 3 and formant 4 ? I know formant 3 & 4 are more difficult to analyse, but do you think it would be worth to take formant 3 & 4 a bit more into consideration in the future?


Many thanks in advance for your help.
Back to top
View user's profileSend private message
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 12:10 pm
Post subject:
Reply with quote

The question here is whether F3 and F4 are necessary to account for the variable perception of nativeness. In other words, whether or not to consider it is not something you can decide ahead of time but actually is part of your investigation itself.

Most of the research looks at F1 and F2 because these are often sufficient to characterize the relevant contrasts of a language. F3 figures in to some properties like rounding and color, too. F4 is less easy to describe predictably.

But, if your job is to evaluate accoustic perception, what "matters" is ultimately up to your experiment to dictate.
Back to top
View user's profileSend private message
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 12:29 pm
Post subject:
Reply with quote

Remember that there are also F5, F6, F7, F8, etc. Phonetic research has shown that the most significant predictors of especially vowels are F1 and F2, while F3 can also (rarely) help, such as with R sounds. Looking beyond that is possible and could in theory help, but usually F1 and F2 (sometimes with F3) are sufficient.
The good news is that you can measure all of these at one time-- you don't need any special equipment or microphone; everything will be in a normal recording.*

So the best guess is that you will probably only need F1 and F2, maybe F3 as well. But as MalFet said, you can determine this while doing the research.

Remember also how statistical analysis works: we find some indicator (for example, F1) and look at whether it significantly varies based on some predictor (for example, nativeness). That doesn't mean there aren't any other factors, but it does mean that it is one factor. You can't possibly account for everything in your research, but you can try to account for the most important things.

As a general rule, start with F1 and F2 and see if you can fully predict the sounds. If so, that's probably fine. If not, you should look at other formants until you can account for everything.




MalFet, let me ask one question about this: let's imagine that for some phenomenon you find that F1 and F2 are useful, while F3 is not. Is it possible that F4 might also be useful? Or F5? In other words, does the usefulness of the formants depend on linear adjacency, or in theory should one consider every one of them independently?




(*Note that normal microphones only go up to 48Khz, so you won't be able to get great data on too much past F4-- maybe up to F5 or F6? I haven't ever looked at the math. And I've also never seen any research involving anything past F4. Also, this relates to the ear: there is a certain range of frequencies we can hear, up to about 20,000hz, and beyond that the formants probably are not significant because we couldn't hear them anyway. At least I think that would be the case. I'd be surprised to find anything to the contrary.)
Back to top
View user's profileSend private messageVisit poster's website
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 1:23 pm
Post subject:
Reply with quote

djr33 wrote: Remember that there are also F5, F6, F7, F8, etc.

Sort of. My formants rarely congeal past the 4th one, though my sister (who has extensive voice training) usually exhibits formants up through 7 or so. Because of this inter-speaker variation, though, it's hard for language communities to mobilize differences above F4 into socially salient categories (like accent).

djr33 wrote: So the best guess is that you will probably only need F1 and F2, maybe F3 as well. But as MalFet said, you can determine this while doing the research.

I'd be cautious about assuming that at this point. It's true that most of the heavy lifting of contrastive inventories is constituted by F1-3, but our OP is looking at subphonemic and holistic perceptual phenomena. That often involves gradient distinctions expressed by higher formants. I won't say more for fear of giving away the ending.

djr33 wrote: MalFet, let me ask one question about this: let's imagine that for some phenomenon you find that F1 and F2 are useful, while F3 is not. Is it possible that F4 might also be useful? Or F5? In other words, does the usefulness of the formants depend on linear adjacency, or in theory should one consider every one of them independently?

Sure, why not? Certainly you can create perceivable differences by manipulating higher formants while leaving lower formants alone. Playing with this in Praat is fun. Even high-contrast phonological features will do this sometimes. For example, rounding tends to involve F2-F3 interactions, but not F1 so much. Nasals mostly just add another formant above the second.

djr33 wrote: (*Note that normal microphones only go up to 48Khz, so you won't be able to get great data on too much past F4-- maybe up to F5 or F6? I haven't ever looked at the math. And I've also never seen any research involving anything past F4. Also, this relates to the ear: there is a certain range of frequencies we can hear, up to about 20,000hz, and beyond that the formants probably are not significant because we couldn't hear them anyway

Hmm…I think you might be confusing recorder sampling rate (which is 48kHz on most consumer equipment) with microphone frequency response (which varies dramatically but rarely goes much above 20kHz). In any case, average formants occur at roughly 1000 Hz intervals, so 20k Hz is way more than sufficient.
Back to top
View user's profileSend private message
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 2:30 pm
Post subject:
Reply with quote

Helpful post, thanks.

Quote: Sort of. My formants rarely congeal past the 4th one, though my sister (who has extensive voice training) usually exhibits formants up through 7 or so. Because of this inter-speaker variation, though, it's hard for language communities to mobilize differences above F4 into socially salient categories (like accent).
Interesting. Are these then within the range of human hearing?
Also, would that explain why (statistically speaking) F1 and F2 are most often markers of contrast?

Quote: Hmm…I think you might be confusing recorder sampling rate (which is 48kHz on most consumer equipment) with microphone frequency response (which varies dramatically but rarely goes much above 20kHz).

You're right. I was talking about the file formats. And I think 48kHz is positive and negative, meaning 24kHz, so around that 20kHz range.
Quote: In any case, average formants occur at roughly 1000 Hz intervals, so 20k Hz is way more than sufficient.
I thought they were exponentially farther apart. But are they just plain multiples? Why 1000, then, instead of about 200?
Back to top
View user's profileSend private messageVisit poster's website
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 2:53 pm
Post subject:
Reply with quote

djr33 wrote: Interesting. Are these then within the range of human hearing?
Also, would that explain why (statistically speaking) F1 and F2 are most often markers of contrast?

Retrospective explanations of evolved phenomena are always a bit dubious, but there is some relationship here. If I had to guess, I would actually probably reverse the causal direction: because F1-F4 are the most consequential to effective speech, mastering them becomes a necessity of basic language acquisition. The rest are a nicety, at best. It's probably no coincidence that lower formants are more important, as they are generally more perceptually salient.

djr33 wrote:
Quote: Hmm…I think you might be confusing recorder sampling rate (which is 48kHz on most consumer equipment) with microphone frequency response (which varies dramatically but rarely goes much above 20kHz).

You're right. I was talking about the file formats. And I think 48kHz is positive and negative, meaning 24kHz, so around that 20kHz range.

Hmm...pitch frequency doesn't have positives and negatives. I suspect you're thinking about waveform amplitude here, which is again a completely different thing. If you're seeing 48k Hz in file formats, though, it is almost certainly a reference to sampling frequency, not pitch frequency or harmonic frequency.

djr33 wrote:
Quote: In any case, average formants occur at roughly 1000 Hz intervals, so 20k Hz is way more than sufficient.
I thought they were exponentially farther apart. But are they just plain multiples? Why 1000, then, instead of about 200?

Ascribing "why" to evolution inevitably produces just-so stories, but we can speculate that it has something to do with the coevolution of ears and oral cavities. It doesn't seem to be a coincidence, in other words, that the average band of formant intervals (~1000Hz) corresponds roughly to the range of variation in primary articulatory contrasts (front/back, high/low). Ultimately, the system makes use of the materials, and the materials evolve to accommodate the system.
Back to top
View user's profileSend private message
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 3:13 pm
Post subject:
Reply with quote

Quote: Ascribing "why" to evolution inevitably produces just-so stories, but we can speculate that it has something to do with the coevolution of ears and oral cavities. It doesn't seem to be a coincidence, in other words, that the average band of formant intervals (~1000Hz) corresponds roughly to the range of variation in primary articulatory contrasts (front/back, high/low). Ultimately, the system makes use of the materials, and the materials evolve to accommodate the system.
I was referring to something more mathematical here, not evolutionary: the vocal folds vibrate at something like 200Hz, so wouldn't formants be multiples of that value? Shouldn't they be 400, 600, etc.? I was under the impression there was something exponential going on, which would explain the roughly 1000Hz separation for the F1 and F2 formants we often look at.
Back to top
View user's profileSend private messageVisit poster's website
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 3:53 pm
Post subject:
Reply with quote

djr33 wrote:
Quote: Ascribing "why" to evolution inevitably produces just-so stories, but we can speculate that it has something to do with the coevolution of ears and oral cavities. It doesn't seem to be a coincidence, in other words, that the average band of formant intervals (~1000Hz) corresponds roughly to the range of variation in primary articulatory contrasts (front/back, high/low). Ultimately, the system makes use of the materials, and the materials evolve to accommodate the system.
I was referring to something more mathematical here, not evolutionary: the vocal folds vibrate at something like 200Hz, so wouldn't formants be multiples of that value? Shouldn't they be 400, 600, etc.? I was under the impression there was something exponential going on, which would explain the roughly 1000Hz separation for the F1 and F2 formants we often look at.


Why would formant frequency intervals be multiples of larynx frequency? The whole reason that formant distributions work is that they aren't a function of fundamental frequency. That contrast is the point.

You do tend to have resonances at formant harmonics, but when we measure them we are really only interested in the lowest one. That's true for pretty much any acoustic phenomenon.

Anyway, apologies to the OP; I'm going off topic here.
Back to top
View user's profileSend private message
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 4:01 pm
Post subject:
Reply with quote

You're right we're pretty off topic (my fault, although I think some of this might help the OP as well).
Aren't they mathematically related to larynx frequency (some function of it)? Isn't the whole point that they should be distributed as harmonics, then that any interruption in this can tell us about the position of the articulators?
I may be missing something fundamental.
My question: why are the formants all approximately 1000Hz apart? Where does that 1000 come from, mathematically? Are you suggesting that (in theory) that should hold for F1 through F10?
Again, this all comes from my understanding (perhaps misunderstanding) that they were exponentially farther apart.
Back to top
View user's profileSend private messageVisit poster's website
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 4:33 pm
Post subject:
Reply with quote

djr33 wrote: Aren't they mathematically related to larynx frequency (some function of it)? Isn't the whole point that they should be distributed as harmonics, then that any interruption in this can tell us about the position of the articulators?

Formants aren't harmonics. They're primary peaks of intensity located around relatively narrow frequency bands.

djr33 wrote: My question: why are the formants all approximately 1000Hz apart? Where does that 1000 come from, mathematically?

Like I said, it's a conventional property of language that is, historically, probably related to the shape of the oral cavity as a variable resonance chamber. Since an average human mouth can manipulate resonances in the ballpark of 200-800 Hz by moving articulation forward and back, it would be strange to see formant intervals that are either 50Hz or 10,000Hz in size. Something near 1000Hz is the most efficient use of the signal bandwidth, given approximately human-esque speech organs.

djr33 wrote: Are you suggesting that (in theory) that should hold for F1 through F10?

There's no "in theory" here, and that might be what's causing the confusion. The first four formants show this rough patterning, and beyond that there's considerable variation among speakers. That's why we tend to talk about F1-F4 exclusively when talking about a language in aggregate.
Back to top
View user's profileSend private message
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 4:59 pm
Post subject:
Reply with quote

Ok, thanks. I should look into this a bit more (and refresh my memory of what we went over in my phonetics class a couple years ago).
Back to top
View user's profileSend private messageVisit poster's website
jkpate
Top-Notch Linguist


Joined: 05 Jan 2013
Posts: 163
PostPosted: Sat Aug 31, 2013 7:34 pm
Post subject:
Reply with quote

djr33 wrote: I was referring to something more mathematical here, not evolutionary: the vocal folds vibrate at something like 200Hz, so wouldn't formants be multiples of that value? Shouldn't they be 400, 600, etc.? I was under the impression there was something exponential going on, which would explain the roughly 1000Hz separation for the F1 and F2 formants we often look at.


Under the source-filter theory of vowel production, the vocal fold vibration produces a train of harmonics as the source, which is then convolved with the transfer function of the oral cavity to produce the final spectrum. If the vocal fold vibrations were the only source of energy, we would observe a spectrum that is a series of peaks at each harmonic, and the peaks of the peaks would correspond to formants. In fact, formants are less sharply defined for female and child talkers because their harmonics are more widely spaced: the true resonant frequency is more likely to fall between adjacent harmonics.

But the true resonant frequency for a formant is determined by vocal tract characteristics, and there's no reason, in principle, it would vary with F0 within a talker. So the 1000Hz difference is just a coincidence of the broad difference in size between the size of the back tube and the front tube.
_________________
I spend most of my forum-ing time over here now: http://linguistforum.com/
Back to top
View user's profileSend private messageSend e-mailVisit poster's website
djr33
Chomsky


Joined: 08 Mar 2012
Posts: 4004
Location: Illinois, USA
PostPosted: Sat Aug 31, 2013 7:56 pm
Post subject:
Reply with quote

Thanks Smile
And then the 1000Hz intervals are due to multiples of whatever causes any of those formants? Or are they all coincidentally 1000Hz apart?
Back to top
View user's profileSend private messageVisit poster's website
jkpate
Top-Notch Linguist


Joined: 05 Jan 2013
Posts: 163
PostPosted: Sat Aug 31, 2013 8:40 pm
Post subject:
Reply with quote

djr33 wrote: Thanks Smile
And then the 1000Hz intervals are due to multiples of whatever causes any of those formants? Or are they all coincidentally 1000Hz apart?


The distances between subsequent formants are just coincidences. The frequency of a formant is inversely proportional to the length of the tube generating it. For example, the frequency of the first format is determined by the distance from the vocal folds to the top of the tongue, and the frequency of the second formant is determined by the top of tongue to the end of the lips. They are in principle independent. (although the articulators are physically attached to each other and so might not be independent in practice).
_________________
I spend most of my forum-ing time over here now: http://linguistforum.com/
Back to top
View user's profileSend private messageSend e-mailVisit poster's website
MalFet
Chomsky


Joined: 01 Apr 2011
Posts: 2778
Location: Kathmandu / NYC
PostPosted: Sat Aug 31, 2013 9:15 pm
Post subject:
Reply with quote

jkpate wrote: The distances between subsequent formants are just coincidences.


...or perhaps they are a particularly elegant expression of the greatest intelligence the universe has ever known! Wink

(I'm mostly kidding of course. We can at least imagine an alternate universe populated by humans with squat stumpy necks and preposterously long faces who would, presumably, operationalize a very different range of formants in their speech.

But, given that human vocal signaling coevolved with oral cavity physiology, I'm always skeptical about calling anything like this a coincidence. Synchronically, however, you are absolutely right that this rough pattern is not an inherent feature of the system.)
Back to top
View user's profileSend private message
Display posts from previous:   

All times are GMT - 5 Hours

Reply to topic

Jump to:  

Goto page 1, 2  Next

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum