Downloading Enron Data
The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems.
A number of folks at SRI, notably Melinda Gervasio , worked hard to correct these problems, and it is thanks to them not me that the dataset is available.
The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user enron.
I get a number of questions about this corpus each week, which I am unable to answer, mostly because they deal with preparation issues and such that I just don't know about. If you ask me a question and I don't answer, please don't feel slighted. I am distributing this dataset as a resource for researchers who are interested in improving current email tools, or understanding how email is currently used.
This data is valuable; to my knowledge it is the only substantial collection of "real" email that is public. The reason other datasets are not public is because of privacy concerns.
In using this dataset, please be sensitive to the privacy of the people involved and remember that many of these people were certainly not involved in any of the actions which precipitated the investigation.
Prior versions of the dataset are no longer being distributed.
Research uses of the dataset
If you are using the March 2, Version; the August 21, Version; or the April 2, Version of this dataset for your work, you are requested to replace it with the newer version of the dataset below, or make the the appropriate changes to your local copy. May 7, Version of dataset about 1.
There are also several on-line databases that allow you to search the data, at UCB , and www.