How the NSA Converts Spoken Words Into Searchable Text
Most people realize that emails and other digital communications they once considered private can now become part of their permanent record.
But even as they increasingly use apps that understand what they say, most people don’t realize that the words they speak are not so private anymore, either.
Top-secret documents from the archive of former NSA contractor Edward Snowden show the National Security Agency can now automatically recognize the content within phone calls by creating rough transcripts and phonetic representations that can be easily searched and stored.
The documents show NSA analysts celebrating the development of what they called “Google for Voice” nearly a decade ago.
Though perfect transcription of natural conversation apparently remains the Intelligence Community’s “holy grail,” the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest.
The documents include vivid examples of the use of speech recognition in war zones like Iraq and Afghanistan, as well as in Latin America. But they leave unclear exactly how widely the spy agency uses this ability, particularly in programs that pick up considerable amounts of conversations that include people who live in or are citizens of the United States.
Spying on international telephone calls has always been a staple of NSA surveillance, but the requirement that an actual person do the listening meant it was effectively limited to a tiny percentage of the total traffic. By leveraging advances in automated speech recognition, the NSA has entered the era of bulk listening.
And this has happened with no apparent public oversight, hearings or legislative action. Congress hasn’t shown signs of even knowing that it’s going on.
The USA Freedom Act — the surveillance reform bill that Congress is currently debating — doesn’t address the topic at all. The bill would end an NSA program that does not collect voice content: the government’s bulk collection of domestic calling data, showing who called who and for how long.
Even if becomes law, the bill would leave in place a multitude of mechanisms exposed by Snowden that scoop up vast amounts of innocent people’s text and voice communications in the U.S. and across the globe.
Civil liberty experts contacted by The Intercept said the NSA’s speech-to-text capabilities are a disturbing example of the privacy invasions that are becoming possible as our analog world transitions to a digital one.
“I think people don’t understand that the economics of surveillance have totally changed,” Jennifer Granick, civil liberties director at the Stanford Center for Internet and Society, told The Intercept.
Extensive Use Abroad
Voice communications can be collected by the NSA whether they are being sent by regular phone lines, over cellular networks, or through voice-over-internet services. Previously released documents from the Snowden archive describe enormous efforts by the NSA during the last decade to get access to voice-over-internet content like Skype calls, for instance. And other documents in the archive chronicle the agency’s adjustment to the fact that an increasingly large percentage of conversations, even those that start as landline or mobile calls, end up as digitized packets flying through the same fiber-optic cables that the NSA taps so effectively for other data and voice communications.
The Snowden archive, as searched and analyzed by The Intercept, documents extensive use of speech-to-text by the NSA to search through international voice intercepts — particularly in Iraq and Afghanistan, as well as Mexico and Latin America.
For example, speech-to-text was a key but previously unheralded element of the sophisticated analytical program known as the Real Time Regional Gateway (RTRG), which started in 2005 when newly appointed NSA chief Keith B. Alexander, according to the Washington Post, “wanted everything: Every Iraqi text message, phone call and e-mail that could be vacuumed up by the agency’s powerful computers.”
The Real Time Regional Gateway was credited with playing a role in “breaking up Iraqi insurgent networks and significantly reducing the monthly death toll from improvised explosive devices.” The indexing and searching of “voice cuts” was deployed to Iraq in 2006. By 2008, RTRG was operational in Afghanistan as well.
A slide from a June 2006 NSA powerpoint presentation described the role of VoiceRT:
Extent of Domestic Use Remains Unknown
What’s less clear from the archive is how extensively this capability is used to transcribe or otherwise index and search voice conversations that primarily involve what the NSA terms “U.S. persons.”
The NSA did not answer a series of detailed questions about automated speech recognition, even though an NSA “classification guide” that is part of the Snowden archive explicitly states that “The fact that NSA/CSS has created HLT models” for speech-to-text processing as well as gender, language and voice recognition, is “UNCLASSIFIED.”
Also unclassified: The fact that the processing can sort and prioritize audio files for human linguists, and that the statistical models are regularly being improved and updated based on actual intercepts. By contrast, because they’ve been tuned using actual intercepts, the specific parameters of the systems are highly classified.
“The National Security Agency employs a variety of technologies in the course of its authorized foreign-intelligence mission,” spokesperson Vanee’ Vines wrote in an email to The Intercept. “These capabilities, operated by NSA’s dedicated professionals and overseen by multiple internal and external authorities, help to deter threats from international terrorists, human traffickers, cyber criminals, and others who seek to harm our citizens and allies.”
Full article: The Computers are Listening (The//Intercept)