Data Machines and Privacy Games (Confessions of an Email Snooper)

Ecology Internet Culture Love Memes New York City News Personal Politics Technology

The story of Edward Snowden, Booz Allen/NSA/Prism whistleblower, is a rorschach test. Everybody sees something different in it. Me, I told you how I felt this weekend (though I wrote that blog post before the identity of Edward Snowden had been revealed). I consider Edward Snowden a hero, in the proud tradition of Daniel Ellsberg and Bradley Manning -- though I'm more interested in the attack on Booz Allen Hamilton (a sycophantic government/military contractor that's been soaking the American taxpayer for years) than I am in the attack on the National Security Agency. I'm glad Snowden revealed the facts of PRISM, and I believe a helpful public dialogue about privacy in the Internet age is beginning to emerge.

I have a unique angle on this topic, because databases are my thing. They've been my thing since the early 1990s, when I became an expert in SQL (Structured Query Language, the most prevalent computer language in the world) data modeling and database application development. My favorite database is MySQL, the powerful open source platform. (This is the database world equivalent of saying that my favorite ice cream is vanilla, since MySQL is certainly the traditional choice for favorite open source database, but I can't help it.) MySQL powers Litkicks, and it probably powers most of the websites you've visited today.

Every 17 years or so, like the coming of the cicadas, the death of SQL will be announced. In the mid-1990s, Object-Oriented Databases were the big fad. The only problem is that they didn't work, and the only company to actually launch one, Illustra, quickly went out of business. Today, the big trends competing with traditional SQL are more substantial, because they address the growing need to handle much larger data sets. These include cloud computing, NoSQL (you can just hear the hostility in the name), search engine platforms, Hadoop-based massive distribution. This is the type of stuff that powers the NSA's data center at Bluffdale, Utah. Though I still specialize in SQL-based systems, I have a few years experience with high-capacity non-SQL database servers too.

This experience was gained at a startup called Inference Data Systems, with offices in Lynbrook, Long Island and on William St. in Manhattan. We built massive search engines for corporate litigation defense teams. The idea was to vacuum up corporate documents -- mainly emails, obtained directly from network email servers of corporations suspected of criminal activity -- and provide algorithms to allow legal teams to explore this data. Here's what our app looked like:

Inference Data Systems was a bizarre environment for me to be in, since I tend to have a strong natural dislike for and suspicion of the kinds of large banks and corporations that were our customers. (Indeed, I was working in the William St. office when the Wall Street crashes of 2007 and 2008 went down, and saw my suspicions confirmed). The best thing I can say about Inference is that it was a good bunch of folks, including some very smart people, and I was very sad when our CEO Nick Croce, a very nice guy, suddenly died of cancer two years ago.

By that time Inference had gone out of business, because we never got profitable and our venture funding ran out. But the data analysis software we built was an absolute marvel. It was based on Autonomy, an advanced search engine platform from a British company that offered Bayesian analysis and innovative visualizations. This is the same type of software that the National Security Agency and the Central Intelligence Agency and the Federal Bureau of Investigation and the Internal Revenue Service all use to search massive repositories of unstructured information like phone call metadata or social network activity.

At Inference, our mission was to ingest and analyze emails gathered from corporations that were in trouble with the law. Emails are highly valued by both prosecutors and defense lawyers when gathering evidence for corporate trials, since emails often provide smoking guns. We would get data dumps of subpoenaed email servers, and load the entire data dumps onto our Autonomy servers. These were gigantic data repositories -- it's amazing how much email a large corporation's employees can generate -- and they always included a wide variety of personal emails. Private emails. I got to sit at my desk and read a lot of them, and it was a lot of fun.

Yeah, now that I say this, it sounds horrible. But browsing through the private emails of corporate employees (from executives to interns) can be hilarious. I found love affairs. Racist jokes. Intense, lengthy Monday analyses of football games by employees who clearly should have been working. And lots and lots of inter-office backbiting (this is America, after all).

Perhaps I should be ashamed to admit that I enjoyed wallowing in this filth. Well, hell, developing the software to analyze these emails was my job, and I had to keep my job interesting. I'm a naturally curious guy.

The most amazing story I found in one of the data dumps was the private email of a Silicon Alley investor I'll call HB. HB was a well-known Merrill Lynch executive who was indicted and eventually convicted for hyping many Internet stocks he didn't believe in himself. The evidence that he didn't believe them was in his emails, in which he described some of the very stocks he was hyping as "POS" (pieces of shit) or other derogatory terms.

Since I had worked for iVillage, one of the very companies HB had mercilessly plundered for fast profit, and since I needed some test document streams to work with as I developed the web interface, I decided to spend a few days reading HB's entire email stream as found in the Merrill Lynch data dump. Since all his emails had been acquired by subpoena, there was a lot of personal material.

As I dove in, I became immediately impressed by the dramatic value of what I found. During the same years that HB was violating financial laws at Merrill Lynch, he was also carrying on a quiet, intense and very emotional personal encounter with a woman he'd briefly talked to at a party in Europe.

I sat at my desk for several days and did nothing but read these emails. (I could pretend to be testing our web interface as I read, and in a way, I was testing it).

Some people are very good at conveying personality in their emails, and this European woman was one of them. She was warm and witty, a clever storyteller. I gathered that she lived some kind of fabulous but under-funded urban lifestyle flitting through various European cities, and that she had some kind of nebulous career in the fashion industry. HB and this woman got to know each other well via email, and I got to know and like both of them more and more as I read. The whole thing was really very sweet.

There was never any sign that their attitude towards each other was anything more than friendly, and HB tended to awkwardly point out at various points in their exchange that he was married. Still, their mutual attraction was palpable. Eventually they made a plan to meet in London where he would be travelling on business.

The meeting was clearly secretive ... and as I read through the emails leading up to the meeting, the intensity of anticipation made me feel like I was in the grip of a potboiler romance. I could barely wait to get to the next screen.

They each sent a final email before the meeting -- and then the email exchange dropped suddenly dead. Naturally, this was because they were at this moment meeting in Europe, but the first few days of silence were followed by weeks of silence, even as HB resumed his work in New York City and resumed his emails with other people. There was no sign of his European friend, and I wondered if perhaps they were communicating on a different channel.

Eventually, she returned, but something had changed. They both seemed meek, tentative. I sensed that the meeting had been perhaps tragic in some way. I imagined sordid scenes reminiscent of Dostoevsky or T. S. Eliot. Maybe the meeting had been a terrible failure.

Or maybe something else was in the air that I couldn't detect, or maybe I was reading the entire thing wrong. Anyway, at this point, the novel was over. I wish I had printed it out, so I could read it again today.

I am admitting that I took this not-very-admirable action of reading HB's private emails because it may help to explain why I seem to have a slightly different perspective on the Internet privacy issue today than many of my friends. To me, the fact that the government can subpoena your private emails and then release them all to the public -- not only if you did something illegal, but if anybody you emailed with did something illegal -- is absolutely shocking. It's inexplicable. I wonder how many lives these subpoenas have already ruined.

I also wonder how bitter HB is today about the embarrassment to his private life caused by the exposure of his entire email archive during this legal battle, and I even feel sorry to be gossiping about him right now. (Well ... maybe this is valid punishment for his crimes in 1999 and 2000 as a Silicon Alley mountebank).

Yes, I find the fact that the NSA is today gathering Internet traces and phone metadata also disturbing, but not even as disturbing as the fact that entire network email servers can be subpoenaed and carelessly released by federal prosecutors. I think most people today are unaware of the power of federal prosecutors to broadly invade email privacy on the suspicion of any involvement in any crime.

I heartily approve of Edward Snowden's decision to abandon his comfortable life to stage a public protest of the US federal government's invasion of privacy -- not because I think this particular invasion of privacy is unique, but because I support any such act of peaceful protest. Edward, bravo.

Perhaps the ultimate irony is this: Edward Snowden's great crime has been to invade the National Security Agency's and Booz Allen Hamilton's privacy. And now we're all invading his.

9 Responses to "Data Machines and Privacy Games (Confessions of an Email Snooper)"

by ds on

Wow.

So just to clarify, you could read any employee@inference.com emails, but not that employee's gmail if he was logging into gmail on work servers?

I agree about the importance of whistleblowers and I hope Snowden is not prosecuted.

As for your story, wow!

by Levi Asher on

Hi ds -- glad you asked to clarify -- no, that is definitely NOT the situation I'm describing. I'm going to edit the article to make sure it's more clear.

I was not able to snoop on my fellow employee's email (and if I had been able to, I wouldn't have done so). Rather, the company I worked for was in the business of managing and analyzing email archives provided by companies that were being prosecuted for crimes. These email archives were forced to be released by the government, and copies were given to us so that we could run our analytic software and the emails for the purpose of detecting patterns, running keyword searches, etc. In the case I'm describing, Merrill Lynch was indicted for a crime, and the emails of several Merrill Lynch employees were turned over to the government. In the case of Merrill Lynch, the government released the entire email archive to the public, so theoretically anybody could have read (and could still read) these emails, if they could find a way to obtain the archive from some government office and find software with which to view it. Since my company built this software, I was able to do this easily as part of my work.

Hope that clears it up! I will try to clear up the article explanation now too.

by TKG on

This was a really good read.

It was interesting and compelling.

Here is some advice from my heart and mind:

Levi Asher: Interesting

Politics: Boring

And, btw, the article was very clear as written.

by Levi Asher on

Thanks, TKG. I'll take that under advisement ...

by ds on

Ok, I understand better now, thanks. I know it's dumb to put anything personal in mywork.com emails. But I think using gmail at work is private, ie work doesn't log your gmail password and cannot read those emails? Anyway it's a great article.

by Levi Asher on

Hi DS -- yes, I agree that if you use gmail or any other private email at work, you are much less likely to ever have those emails exposed or seen by others. I can't think of any reason why it wouldn't be a better idea to use a private email account than a work email account, in terms of data ownership and privacy.

All I know is, most Republicans thought surveillance was okay until a Democrat got elected.

by Subject Sigma on

Wow... great blog post! Bravo Levi!

Add new comment