Imagine having access to the all of the world’s recorded conversations, videos that people have posted to YouTube, in addition to chatter collected by random microphones in public places. Then picture the possibility of searching that dataset for clues related to terms that you are interested in the same way you search Google. You could look up, for example, who was having a conversation right now about plastic explosives, about a particular flight departing from Islamabad, about Islamic State leader Abu Bakr al-Baghdadi in reference to a particular area of Northern Iraq.
On Nov. 17, the U.S. announced a new challenge called Automatic Speech recognition in Reverberant Environments, giving it the acronym ASpIRE. The challenge comes from the Office of the Director of National Intelligence, or ODNI, and the Intelligence Advanced Research Projects Agency, or IARPA. It speaks to a major opportunity for intelligence collection in the years ahead, teaching machines to scan the ever-expanding world of recorded speech. To do that, researchers will need to take a decades’ old technology, computerized speech recognition, and re-invent it from scratch.
Importantly, the ASpIRE challenge is only the most recent government research program aimed at modernizing speech recognition for intelligence gathering. The so-called Babel program from IARPA, as well as such DARPA programs as RATS (Robust Automatic Transcription of Speech), BOLT (Broad Operational Language Translation) and others have all had similar or related objectives.
To understand what the future of speech recognition looks like, and why it doesn’t yet work the way the intelligence community wants it to, it first becomes necessary to know what it is. In a 2013 paper titled “What’s Wrong With Speech Recognition” researcher Nelson Morgan defines it as “the science of recovering words from an acoustic signal meant to convey those words to a human listener.” It’s different from speaker recognition, or matching a voiceprint to a single individual, but the two are related.
A Brief History of Teaching Machines to Listen
The United States military, working with Bell Labs, launched research into computerized speech recognition in World War II when the military attempted to use spectrograms, or crude voice prints, to identify enemy voices on the radio. In the 1970s, IBM researcher Fred Jelinek and Carnegie Mellon University researcher Jim Baker, founder of Dragon Systems, spearheaded research to apply a statistical methodology called “hidden Markov modeling,” or HMM, to the problem. Their work resulted in a 1982 seminar at the Institute for Defense Analysis in Princeton, New Jersey, which established HMM as the standard method for computerized speech recognition. Various DARPA programs followed.
HMM works like this: Imagine you have a friend who works in an office. When his boss comes in late, your friend is more likely to come in late. This is a so-called Markov chain of events. You can’t observe whether or not your friend’s boss is in the office because it’s information that’s hidden from you. But when you call your friend and he tells you he’s not on time you can make an inference about the tardiness of your friend’s boss. Applied to speech recognition, the hidden state might be the thing actually being said but the clues are the sounds that commonly occur together.
Hidden Markov modeling has been the standard methodology for speech recognition for decades. Some noted scholars in the field like Berkley’s Nelson Morgan argue that reliance on it is now holding the field back. After all, while facial recognition has advanced tremendously enabling programs to detect faces and match them to databases in an ever-wider number of circumstances, speech recognition has not progressed nearly so well.
Once speech data has been rendered as text it’s effectively been structured. That means it becomes far more workable as a dataset, allowing algorithms to crawl it in the same way the Google Search algorithm crawls the text of the world’s web pages. That small breakthrough doesn’t sound like much but it could actually revolutionize information gathering for the intelligence community. In theory, when speech in more different types of environments can be collected and transcribed any conversation happening within ear-shot of a networked microphone could become searchable in real-time.
But getting data collection devices into more places becomes easier with every iPhone purchase, thanks, in part to the Internet of Things. The next wave of interconnected consumer gadgets like Google’s Moto X superphone and the Apple Watch coming in 2015 represent a broad trend in devices that rely on voice commands and speak to users, as Rachel Feltman points out in a piece for Defense One sister site Quartz. Are the voice commands that you give your future smart watch legally open to intelligence gathering?
The defeat of the U.S.A. Freedom Act means that the National Security Agency can continue to collect meta-data on cell phone users, which can be used to pinpoint location. Depending on where you talking to your device, whether in public or in private, a judge may rule you don’t have a reasonable expectation of privacy. But if you’re worried about your device becoming a listening ear for the government, so, too, could the very air around you.
Shhh… The Smart Dust Will Hear You
The intelligence community in the decades ahead will rely on an ever smaller and capable array of microphones to pick up intel and some border on the unbelievable. Scientists have actually created a microphone that is just one molecule of dibenzoterrylene (which changes color depending on pitch.) Devices that pickup noise or vibrations can be as small as a grain of rice.
Continued advancement in the field of device miniaturization could one day allow for the dispersal of extremely small but capable listening machines, one of the uses a future technology sometimes called “Smart Dust.”
What is the strategic military advantage presented by ubiquitous, tiny listening machines? In a 2007 paper (PDF) titled Enabling Battlespace Persistent Surveillance: the Form, Function, and Future of Smart Dust, U.S. Air Force Major Scott A. Dickson speculates that future micro-electromechnical systems or MEMS will “sense a wide array of information with the processing and communication capabilities to act as independent or networked sensors. Fused together into a network of nanosized particles distributed over the battlefield capable of measuring, collecting, and sending information, Smart Dust will transform persistent surveillance for the warfighter [sic].”
Full article: What Happens When Spies Can Eavesdrop on Any Conversation? (Defense One)