Gender Identified Enron Corpus (GIEC)
Data: gender_identified_enron_corpus.tgz
Setup instructions as well as documentation for the original database can be found here. As part of this release, three new attributes are added to the entries in the "entities" collection.
Please cite: Prabhakaran, Vinodkumar; Reid, Emily; Rambow, Owen. Gender and Power: How Gender and Gender Environment Affect Manifestations of Power. In Proceedings of the conference on Empirical Methods for Natural Language Processing (EMNLP). October, 2014. Doha, Qatar. (pdf) (bib)
Setup instructions as well as documentation for the original database can be found here. As part of this release, three new attributes are added to the entries in the "entities" collection.
- gender which can have values "M" (male), "F" (Female), and "I" (gender could not be determined).
- inferred_first_name which lists the first name that was automatically inferred from the corpus (details described in the reference below)
- affiliation which can have values "Core" (one of the 145 Enron employees from whose mailbox the corpus was built), "NonCore" (An Enron employee who was not one of the 145 core Enron employees), and "NonEnron" (entities which are not Enron employees).
Please cite: Prabhakaran, Vinodkumar; Reid, Emily; Rambow, Owen. Gender and Power: How Gender and Gender Environment Affect Manifestations of Power. In Proceedings of the conference on Empirical Methods for Natural Language Processing (EMNLP). October, 2014. Doha, Qatar. (pdf) (bib)