Voice Identification Experts’ Work Debunked

So the news this morning is that two supposed experts in voice identification have concluded that the screaming voice on the 911 calls is not George Zimmerman. A story on MSNBC reads:

“Tom Owen, forensic consultant for Owen Forensic Services LLC and chair emeritus for the American Board of Recorded Evidence, told the Sentinel that he used voice identification software to rule out Zimmerman.”

In this case, the expert compared Zimmerman’s first call to the police’s non-emergency line with the screams heard on another call made to the 911 line.

Curious about the standards for voice identification, I looked up the American Board of Recorded Evidence’s guidelines for reliable and accurate voice identification. The name of the document is “American Board of Recorded Evidence – Voice Comparison Standards” and the first paragraph of the document reads, in part, “This document specifies the requirements of the American Board of Recorded Evidence for the comparison of recorded voice samples. These standards have been established for all practitioners of the aural/spectrographic method of voice identification…”

This document discusses the importance of duplicating the recording conditions–including microphone, transmission system, acoustic environment, etc–and speech delivery–including repetition, speech rates, stresses, etc. A casual reading of these guidelines will see that there are already problems with comparing voices recorded on separate types of phones, recorded on different pieces of equipment, and with the subject under different speaking conditions (casual vs fighting/stressed).

However, here are the real smoking guns, where the supposed expert’s claims are shot down:

5.2 Verbatim/Non-verbatim. The known, or another unknown voice sample, must be either wholly verbatim (preferred), or partially verbatim to allow meaningful comparisons with unknown voice samples. A partially verbatim sample should include phrases and sentences containing at least three (3) similar, consecutive matching words…”

When one sample consists only of “Help” or “Help me”, there cannot be 3 similar phrases between Zimmerman’s non-emergency call and the 911 tapes.

5.3 Number of Comparable words. There must be at least (10) comparable word between two (2) voice samples to reach a minimal decision criteria. Similarly spoken words within each sample can only be counted once. It is noted that in most voice samples at least some of the words identified at this point will not be useful in the final examinations.”

Again, when one sample consists of one or two words total, this criteria cannot be met.

“5.4.1 Disguise. Samples, or portions of samples, that contain falsetto, true whispering (in contrast to low amplitude speech), or other disguises that obviously change or obscure the vocal formants or other speech characteristics, may need to be eliminated from comparison consideration…”

I think that screaming for one’s life could be considered something that would change or obscure one’s normal speaking patterns.

And of course the final nail in the experts’ claims:

5.4.6 Variations between samples. Though the following variations can quickly end a voice comparison, the problem can often be remedied by obtaining additional known samples:

a. Transmission systems. Normally, samples being compared should be produced through the same type of transmission system, for example, the telephone, a microphone in a room, or a RF transmitter/receiver. If aurally or spectrally the samples are noticeably different due to the dissimilarities in the transmission systems and filtering does not rectify these differences, no further comparisons should be made.

b. Recording systems. Normally, samples being compared should be produced on either good quality, or compatible, recording systems. However, if the recordings contain uncorrectable system differences that affect aural and spectral characteristics, no further comparisons should be made. Examples of recording differences that can affect the results include high-level flutter, gross speed fluctuations, and voice-activated stop/starts.

c. Speech delivery. Normally, samples being compared should have the speakers talking in the same general manner, including speech rate, accent, similar pronunciation, and so on. However, in cases where this has not been done, as in poorly produced known exemplars, no further comparisons should be made.

d. Other. Any other differences between the voice samples that noticeably effect aural and spectral characteristics should be closely reviewed before proceeding with the examination.”

I submit that the differences between Zimmerman’s non-emergency call and the later 911 call make any comparison of the two samples meaningless. I encourage readers to read the guidelines for themselves.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s