As part of our research at Columbia University, we have developed a categorization system which categorizes images based on associated text such as captions and articles. The system uses a TF*IDF based methodology, and can be trained for particular categories using manually labelled images which are categorized by volunteers. The system allows specification of many different parameters, and we have tuned it to work well for the categorization of images as either indoor or outdoor. We tested our system on unseen images and achieved an accuracy rate of over 82%. (A more current version of our system achieves a higher accuracy rate on the same test set, but the links below apply to our older system.)