We can make very many different sounds with our voices, and, compared to machines, we are still much better at understanding not only what is said but also how it is said. The voice has a wide range of different behaviours, for instance, from soft to loud and from low notes to high notes. Furthermore, individuals have vocal folds and vocal tracts that are just as different as our faces are different. These differences leave imprints also on the sound and on other signals that we can get from a voice.  

The large variability makes the voice a rich and wondrous channel of communication. Unfortunately, when measuring physical metrics from vocal signals and images, this also means that it is generally unwise to attach a specific interpretation to any individual metric by itself. In particular, it turns out that every metric of the voice will change, when the voice sound level and/or the fundamental frequency changes. Figure 1 shows an example.  

Figure 1. A healthy male amateur choir singer did soft-loud-soft exercises over a limited range of one octave, and five voice metrics were mapped. It can be useful to think of these five as different ‘layers’ of one and the same voice map. The map took about six minutes to record and shows averaged data from hundreds of thousands of phonatory cycles. N.B.: EGG = electrogrottography; dEGG = derivative of the EGG waveform; fo = fundamental frequency, measured in Hertz; SPL = sound pressure levels, measured in decibels (data from Selamtzis & Sternström, 2017). 

In the maps above, we see five colour-coded metrics of the EGG and audio signals, all from the same recording of the same person. Even though the task was to sing on an /a/ vowel only, over a limited pitch range, it is clear that all metrics vary considerably in the vertical direction (which represents sound level) and in some respects also horizontally (which represents the fundamental frequency). Therefore, when reporting voice metrics, it is vital to report also the calibrated SPL and the fo. If we ask this person to do the same task again, the map will look very similar, unless something about his voice has changed in the meantime. This means that we can see effects of an intervention. If we ask another person to do the same task, the map will look much less similar. This means that, in general, the effects of interventions can only be assessed directly within persons.  

One special case of voice maps has been around for a very long time, namely, the voice range profile (VRP; the phonetogram in older literature) (Ternström, Pabon & Södersten, 2016). A VRP is a voice map that is acquired in order to determine the widest possible voice range that a person can produce: softest, loudest, lowest, highest. Typically one then assesses only the contour or ‘coastline’ of this ‘reachable region’. There is a large body of literature that is concerned with eliciting and interpreting such contours. The reliable elicitation of extremes requires some training, and a fair amount of time in the clinic. If it is not done well, it can be yet another source of variation.  

There are many different things that can go wrong with the voice, and, while different pathologies may do different things to the signals, it is quite hard to tease apart the causes and the effects. The large variability and interdependence between metrics implies, for instance, that we cannot expect to find any single metric whose quantity will discriminate accurately between normal and pathological voice. If we do not account for the effects of SPL and fo, then the data will contain a great deal of irrelevant variation. In other words, we cannot hope simply to make measurements such as the spectrum balance, or the EGG Contact Quotient, or the Cepstral Peak Prominence Smoothed (CPPS), on a few sustained vowels at ‘comfortable’ (i.e., unspecified) loudness and pitch, and then issue a verdict on whether or not that voice is healthy. That would be like assessing a photograph of a person’s face from only a few scattered pixels. This explains why decades of research have found that the evidential value of single metrics remains weak, especially when they are compared across individuals and when SPL and fo are not accounted for. It is often not because the metrics are unsuitable, but rather because we have not been collecting and collating them in an appropriate way. Voice mapping is a method by which the SPL and fo are continuously accounted for when doing vocal tasks; much more data is collected for every individual, and the co-variation of different metrics can be made explicit.  

Effects of interventions can be visualized by making maps pre and post, and then constructing a new map that shows the differences, as in Figure 2, which shows the spectrum balance (SB) of the microphone signal, in decibels on a colour scale. A trained male singer did a soft-loud-soft exercise first normally, then while phonating into a flow-ball, and then immediately again without the flow-ball (Lã & Ternström, 2020). It is not possible to make a relevant map during a flow-ball task, because it changes the SPL dramatically. 

Figure 2. Maps of the spectrum balance pre-intervention, post-intervention and of the difference, to facilitate a comparison across the intervention (data from Lã & Ternström, 2020).  

The pre- and post- maps look quite similar, but the map at the right shows the differences after the intervention. Green means an increase and red means a decrease, in this case of the SB. We see that SB consistently increased (green: brighter voice sound) in soft voice, below about 80 dB, and more often decreased (red) in loud voice. This effect would have been impossible to demonstrate convincingly with just a few sustained vowels. Work is in progress on validating the methodology for such difference maps.  

Voice mapping can also be used as a real-time feedback tool, showing the patient or student how various voice properties are changing in the moment. This is highly appreciated by participants, who often enjoy ‘painting’ with the voice, thereby becoming more aware of how their voice works and what it can do over the course of treatment or training. 

The UNED Voice Lab is an early adopter of voice mapping, and is contributing to the continued development of this promising method, in close collaboration with the leading researchers.  

Text by Sten Ternström and Peter Pabon 

Further readings: 

Lã, F.M.B. & Ternström (2020). Flow ball-assisted voice training: immediate effects on vocal fold contacting. Biomedical Signal Processing and Control,62: 102064. 

Pabon, P. (2018). Mapping Individual Voice Quality over the Voice Range: the measurement paradigm of the voice range profile. Doctoral Thesis in Speech and Music Communication, KTH, Stockholm, Sweden. 

Selamtzis, A. & Ternström, S. (2017). Investigation of the relationship between electroglottogram waveform, fundamental frequency and sound pressure level using clustering. Journal of Voice, 31(4): 393-400. 

Tersntröm, S., Paon, P. & Sodersten, M. (2016). The Vocie Range Profile: Its function, applications, pitfalls and potential. Acta Acustica United with Acustica, 102(2): 268-283.