Creating Network

1. Check if valid email
2. Check if "from" is valid name and extract it
3. check if the name is in the network by reversing the first and last name. flip if exists
4. For each to and cc
     1. Check if "to" is valid name and extract it
     2. Check if the name is in the network by reversing the first and last name. flip if exists
     3. if all checks were good, insert new edge "from" -> "to" into network

Getting rid of "spam" Emails

1. blacklist
2. If its cc'd to more than x people, don't use it in the network

Extract Name

1. remove troubling characters such as: & . ? * ( ) '
2. if digits, or @, then its not a name
3. remove titles such as: Mr Mrs Dr ans suffixes such as: Jr Sr
4. if more than 3 words, then its not a name
5. Convert all names to same format: (note case)
Valid Formats: (size is 2 or 3)
     a. LAST, FIRST
     b. FIRST LAST
     c. LAST, FIRST MIDDLE|INITIAL.
     d. FIRST MIDDLE|INITIAL.


Analysis

10-24-00, 11-29-00, and 12-20-00 (Skilling named as CEO) Network
DateBlacklistMass Email Removal at# Nodes#Edges
10/24/00Yes10436640
10/24/00No104411090
10/24/00Yes205411379
11/29/00Yes5397451
11/29/00Yes10567728
11/29/00No10572733
11/29/00Yes207701154
11/29/00No207751158
12/20/00Yes1020806016
12/20/00No1020996043
12/20/00No2026758719

Note: Using a blacklist can delete more than just those "names", since people may only converse with the blacklist.

Name Problems:
1. There are nodes that should have been, but were not listed in the blacklist
2. Some names are more than 3 words. Example: Janet De La Paz
In 12-20-00 (blacklist and x=10) Network, 26/2080 had errors; a mere 1.2% - Can be reduced even further by adding more names to the blacklist.

Top 10 - 12-20-00 (blacklist and x=10)

#persondegree
1 Jeff Dasovich 145
2 Vince Kaminski 121
3 Sara Shackleton 120
4 Steven Kean 119
5 Tana Jones 112
6 Kay Mann 93
7 Jeffrey Shankman 90
8 Mark Taylor 86
9 Chris Germany 85
10 David Delainey 83
...
35 Jeff Skilling 43

Top 10 - 11-29-00 (blacklist and x=10)

#persondegree
1 Tana Jones 32
2 Sara Shackleton 27
3 Jeff Dasovich 24
4 Vince Kaminski 23
5 Susan Scott 20
6 Chris Germany 19
7 Kate Symes 19
8 Mike Mcconnell 15
9 Jeffrey Shankman 13
10 Karen Denne 13
...
133 Jeff Skilling 3

Networks

11-29-00 Network (x=10)
12-20-00 Network (x=10)
11-29-00 Strongly Connected Component Subnet