The NSA has just vigorously denied that their new Utah Data Center, intended for storing and processing intelligence data, will be used to spy on US citizens. The center will have a capacity of at least one yottabyte, and will provide employment for 100-200 people. With the most generous assumptions [200 employees, all employed only on reviewing the data, only one yottabyte of data, ten years to collect the yottabyte, 5GB per movie], each employee would be responsible on average for reviewing
4500 billion terabytes, or approximately 23 million years’ worth of Blu-ray quality movies, every year.
This astounding and continually increasing mismatch shows that we are well beyond the point where law enforcement is able to have a human review a manageable amount of the data in its possession potentially relating to terrorist threats. Computer processing power doubles every two years, but law enforcement employment is rising at a rate of about 7% every ten years, and nobody’s going to pay for it to double every two years instead. Purely machine-based review inevitably carries with it a far higher probability that important things will be missed, even if we were to suppose that the data was entirely accurate to begin with – which it certainly is not.
So why is anybody surprised that Tamerlan Tsarnaev, the elder of the Boston Marathon bombing suspects and one of around 750,000 people in the TIDE database, was not stopped at the border? That facial recognition software wasn’t able to flag him as a match for a suspect? That the fusion centers, intended to synthesize data into actionable “suspicious activity reports”, flag things too late for them to be of any use? That the Air Force is panicking a little at not having enough people to process the data provided by our drone fleet?
It’s in this context, then, that we should understand the calls for more surveillance after the Boston Marathon attacks for what they are. More cameras, more surveillance drones and more wiretapping, without many more humans to process the data, will make this problem worse, not better. These calls are being driven not by a realistic assessment that surveillance will help prevent the next attack, but by the internal incentives of the players in this market. Neither the drone manufacturers, nor law enforcement, nor elected officials, have an interest in being the ones to call a halt. So instead they’re promoting automation – automated drones, automated surveillance, and email scanning software techniques.
They are missing something very simple. We don’t need a terrorism database with 750,000 names on it. There are not 750,000 people out there who pose any sort of realistic threat to America. If the “terrorism watch list” were limited by law to a thousand records, then law enforcement would have to focus only on the thousand most serious threats. Given the real and likely manpower of the federal government, and the rarity of actual terrorism, that’s more than enough. If law enforcement used the power of the Fourth Amendment, instead of trying to find ways round it, it could focus more on the highest-probability threats.
Yes, they would miss stuff. That’s inevitable under both a tight and a loose system. But a tight system has the added advantages that it protects more people’s liberties, and costs a lot less.
UPDATE: With the help of a New Yorker fact-checker, the figure of “400 billion terabytes” above has been corrected to “500 billion terabytes”.
3 thoughts on “Drowning in Data, Starved for Wisdom: The surveillance state cannot meaningfully assess terrorism risks”
The 200 or so employees at the Bluffdale, UT NSA data center will not be analysts. The main employment there will be operations monitoring and maintenance of the equipment, so the calculation of effort presented is not only absurd, but false. It is all but certain that the data to be stored there never will be reviewed, for the exact reasons cited. However, it would be entirely possible for NSA analysts to search the data for particular information related to known or suspected terrorists, from any NSA location.
Generally speaking, surveillance databases should not be thought of as containing data useful for crime or terrorist attack prevention. This will be possible only in rare cases, notwithstanding the claims one sees for the benefits of pattern analysis. Such data is likely to be more useful for investigating an actual incident after the fact and assisting in identification of participants. This does not presume or require that someone is watching the surveillance products which, as the article points out quite correctly, is not feasible due to limitations on the available personnel.
Excellent points, Tom. The analysis was intended to be both (a) very rough and (b) to be as conservative as humanly possible, in order to come to as low a figure as possible; and the lowest possible figure turned out to be 23 million years per year of NSA employee time if they all worked 24 hours a day analyzing data.
The key point here is that the proposition that the NSA cannot use mass surveillance to thwart terrorist attacks ahead of time should not be controversial. It’s not even politics. It’s just math. They can’t do it.