Pity the analysts.
The NSA has just vigorously denied that their new Utah Data Center, intended for storing and processing intelligence data, will be used to spy on US citizens. The center will have a capacity of at least one yottabyte, and will provide employment for 100-200 people. With the most generous assumptions [200 employees, all employed only on reviewing the data, only one yottabyte of data, ten years to collect the yottabyte, 5GB per movie], each employee would be responsible on average for reviewing
4500 billion terabytes, or approximately 23 million years’ worth of Blu-ray quality movies, every year.
This astounding and continually increasing mismatch shows that we are well beyond the point where law enforcement is able to have a human review a manageable amount of the data in its possession potentially relating to terrorist threats. Computer processing power doubles every two years, but law enforcement employment is rising at a rate of about 7% every ten years, and nobody’s going to pay for it to double every two years instead. Purely machine-based review inevitably carries with it a far higher probability that important things will be missed, even if we were to suppose that the data was entirely accurate to begin with – which it certainly is not.
So why is anybody surprised that Tamerlan Tsarnaev, the elder of the Boston Marathon bombing suspects and one of around 750,000 people in the TIDE database, was not stopped at the border? That facial recognition software wasn’t able to flag him as a match for a suspect? That the fusion centers, intended to synthesize data into actionable “suspicious activity reports”, flag things too late for them to be of any use? That the Air Force is panicking a little at not having enough people to process the data provided by our drone fleet?
It’s in this context, then, that we should understand the calls for more surveillance after the Boston Marathon attacks for what they are. More cameras, more surveillance drones and more wiretapping, without many more humans to process the data, will make this problem worse, not better. These calls are being driven not by a realistic assessment that surveillance will help prevent the next attack, but by the internal incentives of the players in this market. Neither the drone manufacturers, nor law enforcement, nor elected officials, have an interest in being the ones to call a halt. So instead they’re promoting automation – automated drones, automated surveillance, and email scanning software techniques.
They are missing something very simple. We don’t need a terrorism database with 750,000 names on it. There are not 750,000 people out there who pose any sort of realistic threat to America. If the “terrorism watch list” were limited by law to a thousand records, then law enforcement would have to focus only on the thousand most serious threats. Given the real and likely manpower of the federal government, and the rarity of actual terrorism, that’s more than enough. If law enforcement used the power of the Fourth Amendment, instead of trying to find ways round it, it could focus more on the highest-probability threats.
Yes, they would miss stuff. That’s inevitable under both a tight and a loose system. But a tight system has the added advantages that it protects more people’s liberties, and costs a lot less.
UPDATE: With the help of a New Yorker fact-checker, the figure of “400 billion terabytes” above has been corrected to “500 billion terabytes”.