PubFig Download v1.0: Public Figures Face Database

This is only for reference purposes. Please use the current version.

Development Set (60 people)

For algorithm development

Please use this dataset when developing your algorithm, to avoid overfitting on the evaluation set. You can create any type and number of training pairs from this dataset.

  • dev_people.txt: This contains a list of the 60 people in the development set. The first line is a comment (starts with '#') identifying the file. Each subsequent line contains one person's name. There is NO overlap between this list and the people in the LFW dataset.
  • dev_urls.txt: This contains URLs for all 16,597 images of the 60 people in the development set. (Because of copyright issues, we cannot distribute the images themselves.) The first line in the file is a comment (starts with '#') identifying the file. Each subsequent line is for one image, and contains 4 elements, separated by tabs ('\t'):
    • the person name,
    • the image number for that person,
    • the url
    • the face rectangle around the chosen person, as x0,y0,x1,y1 coordinates (x- and y-locations of the top-left and bottom-right corners of the face). Note that we only give the rectangle for the chosen person, even if there are multiple faces in the image.

Evaluation Set (140 people)

ONLY for final performance evaluation

Please use this dataset ONLY when evaluating your algorithm, in preparation for submitting/publishing results. This is to prevent overfitting to the data and obtaining unrealistic results.

  • eval_people.txt: This contains a list of the 140 people in the evaluation set. The first line is a comment (starts with '#') identifying the file. Each subsequent line contains one person's name.
  • eval_urls.txt: This contains URLs for all 42,879 images of the 140 people in the evaluation set. (Because of copyright issues, we cannot distribute the images themselves.) The first line in the file is a comment (starts with '#') identifying the file. Each subsequent line is for one image, and contains 4 elements, separated by tabs ('\t'):
    • the person name,
    • the image number for that person,
    • the url
    • the face rectangle around the chosen person, as x0,y0,x1,y1 coordinates (x- and y-locations of the top-left and bottom-right corners of the face). Note that we only give the rectangle for the chosen person, even if there are multiple faces in the image.
  • Cross-validation sets for benchmarking results:
    • pubfig_full.txt: Full verification benchmark of 20,000 images.
    • pubfig_posefront.txt: The "easy" pose benchmark. This consists of a subset of the full benchmark, in which both faces in each pair are in a roughly frontal pose (less than 10 degrees of yaw and pitch).
    • pubfig_poseside.txt: The "difficult" pose benchmark. This consists of the remaining pairs of the full benchmark, in which at least one of the faces in each pair is in a non-frontal pose (greater than 10 degrees of yaw and pitch).
    • pubfig_lightfront.txt: The "easy" lighting benchmark. This consists of a subset of the full benchmark, in which both faces in each pair are lit roughly from the front or uniformly (e.g., flash lighting, soft lighting, etc.).
    • pubfig_lightside.txt: The "difficult" lighting benchmark. This consists of the remaining pairs of the full benchmark, in which at least one of the faces in each pair is lit from the side (e.g., harsh lighting, outside lighting, etc.).
    • pubfig_exprneutral.txt: The "easy" expression benchmark. This consists of a subset of the full benchmark, in which both faces in each pair have a roughly neutral expression (not smiling, frowning, etc.).
    • pubfig_exprexpr.txt: The "difficult" expression benchmark. This consists of the remaining pairs of the full benchmark, in which at least one of the faces in each pair has a non-neutral expression (strong smile, frowning, laughing, etc.).
    Each file consists of some subset of the data, and is divided into 10 cross-validation sets. Each set is mutually disjoint from all other sets, both by person and by image. During evaluation, you should use 9 of the sets for training, and the remaining 1 for testing. Then rotate through all 10 sets, so that in the end you have evaluated all pairs. Since each set is disjoint by identity, your evaluation algorithm will never have seen that person in training. Please do NOT use the filenames or person identities for anything other than reading the images! The format of each file is identical:
    • The 1st line is a comment (starts with '#') identifying the file.
    • The 2nd line lists the number of cross-validation sets in this file. This is currently 10 in all our files. After this follows each cross-validation set.
    • For each cross-validation set, the 1st line contains the number of positive and negative pairs within the set, separated by a tab.
    • This is then followed by the given number of positive examples (pairs of images of the same person), one per line. Each line contains four elements separated by tabs:
      • The first person's name
      • The image number of the first person (as in eval_urls.txt)
      • The second person's name (for positive examples, this is the same as the first)
      • The image number of the second person (as in eval_urls.txt)
      For example, 'Bob Dole 2 Bob Dole 14' means the 2nd and 14th images of Bob Dole.
    • Finally, there are the given number of negative examples, in exactly the same format.
  • pubfig_attributes.txt: A list of attribute values for all the images in the PubFig cross-validation sets, computed using our attribute classifiers. The first two lines are comments (start with '#').
    • The 1st line identifies the file.
    • The 2nd line lists the order of attributes that will follow for each subsequent line.
    • Each following line contains the attribute values for a given image, separated by tabs. The 1st column has the person's name. The 2nd column has the image number. Each additional column then lists attribute values, following the order specified in the 2nd line.
    Positive attribute values indicate the presence of the attribute, while negative ones indicate its absence or negation. The magnitude of the value indicates the degree to which the attribute is present/negated. The magnitudes are simply the distance of a sample to the support vector for the given classifier (using an RBF kernel). Thus, magnitudes cannot be directly compared, even for the same attribute (and certainly not for different attributes).

The database is made available only for non-commercial use. If you use this dataset, please cite the following paper:

Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar.
Attribute and Simile Classifiers for Face Verification.
International Conference on Computer Vision (ICCV), 2009.
[bibtex] [pdf] [webpage]
@InProceedings{attribute_classifiers,
author = {N. Kumar and A. C. Berg and P. N. Belhumeur and S. K. Nayar},
title = {{A}ttribute and {S}imile {C}lassifiers for {F}ace {V}erification},
booktitle = {IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2009}
}