The FaceTracer database is a large collection of real-world face images, collected from the internet. Each of the 15,000 faces in the database has a variety of metadata and fiducial points marked. In addition, a large subset of the faces contain hand-labeled descriptive attributes, including demographic information such as age and race, facial features like mustaches and hair color, and other attributes such as expression, environment, etc. There are 5,000 labels in all.
This rich dataset can be used for a variety of different applications. Since it is composed of real-world images collected in the wild, it provides a much more representative sample of typical images than other, more controlled, datasets -- there is large variation in pose, environment, lighting, image quality, imaging conditions (and cameras), etc. The various metadata we offer for each face provides opportunities for comparison and evaluation of a large number of common vision tasks, such as face detection, fiducial point detection, and pose angle detection. In addition, the large number of manually labeled attributes we provide -- 5,000 -- can be used for training and evaluating different learning tasks and algorithms. Finally, we also provide the URL of the page each face image was found on. This can be used to analyze the webpage and its links to obtain more information about the images. We are excited to see how the research community tackles these and other problems using our dataset.
Due to privacy and copyright issues, we are unable to provide the actual images of each face. Instead, we provide the URLs of each image, as well as the URL of the page the image was found on. Simple programs and libraries can be used to download the images (e.g., the command line tool wget or the urllib.urlretrieve() function in the Python programming language). Note that the dataset may slowly shrink over time as links disappear. In rare cases, a different image might be put up at the same location as an image from our dataset. In this case, the provided metadata and attribute labels will become incorrect.
This database is made available only for non-commercial use.
The database is organized into a variety of text files, which are all easy to parse. The first two lines of each file are comments (start with '#') -- the first is an identification of the file, while the second describes the format of each subsequent line in the file. Each line generally contains a face id followed by various other information, all separated by tabs ('\t'). The face id uniquely defines a face, and these ids are the common element linking all the files.
In detail, these are the different files:
- faceindex.txt: This contains a list of all faces with image urls and page urls. Each line contains the face id, followed by the image url (where the image is actually located), and then the page url (the page on which the image was located), all separated by tabs. Note that since many faces can come from a single image, the image and page urls for multiple face ids can be identical. The face rectangle (defined in the next file) will differentiate these faces.
- facestats.txt: This contains statistics for each face. Each line contains all of the statistics (as integers) for one face, separated by tabs. In order, these are:
- The face id
- The width of the face in pixels
- The height of the face in pixels
- The x-location of the top-left corner of the face
- The y-location of the top-left corner of the face
- The pose angles (in degrees) of the face:
- The yaw angle (out-of-plane left-right) of the face
- The pitch angle (up-down) of the face
- The roll angle (in-plane left-right) of the face
- The fiducial points of the face (all relative to the top-left of the face rectangle):
- The x-location of the left corner of the left eye
- The y-location of the left corner of the left eye
- The x-location of the right corner of the left eye
- The y-location of the right corner of the left eye
- The x-location of the left corner of the right eye
- The y-location of the left corner of the right eye
- The x-location of the right corner of the right eye
- The y-location of the right corner of the right eye
- The x-location of the left corner of the mouth
- The y-location of the left corner of the mouth
- The x-location of the right corner of the mouth
- The y-location of the right corner of the mouth
- attributes.txt: This contains a list of the different attributes we have labeled, along with the valid labels for each attribute. There are 10 attributes, with a total of 25 different labels. Each line contains first the attribute name (lower case, no spaces), followed by all valid labels for that attribute (each lower case, no spaces). All elements are separated by tabs.
- facelabels.txt: This contains a list of various attribute labels assigned to faces (5,000 in all). These labels were assigned manually by a group of people, and then manually verified by a single person to ensure consistency. Each line contains a single face id, attribute name, and label, separated by tabs. The attribute name and label are the exact same strings found in the "attributes.txt" file. Since many faces have multiple labeled attributes, face ids may be repeated (i.e., if a given face has two attributes labeled, then there will be two lines starting with that face id).
- facetracer.py: A simple python script that demonstrates how to parse the dataset and display information about a particular face in the dataset.
- facetracer.zip: This contains all of the above files in a zip file, for ease in downloading.
Please contact Neeraj Kumar (neeraj [at] cs.columbia.edu) for any questions or comments regarding this database.
If you use this database, please cite the "FaceTracer" paper listed below! The database is made available only for non-commercial use.