Course: CS W4735 Title: Visual Interfaces to Computers Instructor: Professor John R. Kender Credits: 3 pts. Hourrs/week: Lecture 3 Catalog Description: -------------------- Prerequisites: CS W3139, Data Structures and Algorithms. CS W4701, Artificial Intelligence, and/or CS W4731, Computer Vision, would be helpful but are not required; the course is otherwise self-contained. An introduction to the use of visual input as data and for control of computer systems. Survey and analysis of architecture, algorithms, and underlying assumptions of commercial and research systems that: recognize and interpret human hand and body gestures, analyze imagery such as fingerprint or iris patterns for security data, generate natural language descriptions of medical or map imagery, index into a database of pictures to retrieve related images, summarize visually long video sequences like news reports or dramas, steer automobiles automatically, recover CAD/CAM models by inspecting physical examples, monitor large outdoor areas for types of activity, and other tasks. Exploration of foundations in human psychophysics, cognitive science, and artificial intelligence. Course Rationale: ----------------- As the processing, storage, and communication capacity of computers increase, it becomes possible to use imagery intelligently as a direct form of input. Images can be used for far more that simply transmitting pictures; they can be used to query, update, and govern systems. This course explores the algorithms and systems, largely experimental, that understand and manipulate visual data directly, in ways that extend the power of keyboard, mouse, or even speech input. This modality is the complementary opposite of computer graphics; it promises to make the camera as essential a part of a workstation environment as the display screen. This course is therefore a complement to CS W4170, User Interface Design, which concentrates on visual output. This course is of necessity research-oriented, since there are currently relatively few existing commercial systems that use cameras for more than simple image capture and transmission. Note on Prior Offering: ----------------------- This course was first presented as CS W4995, Topics in Computer Science, in Fall 1997. It drew 50 students, including a dozen on the CVN. That offering only required of the students a proposal, and a final paper or project. About 1/3 of the students designed projects, almost always in teams. Some of the more notable projects included a visual burglar alarm (demonstrated in my own office!), a system which detected class changing times from a video of campus pedestrian traffic, and two separate systems using "visual passwords" for computer logins. The remainder of the students did 30-page research papers, some of them including an analysis of current commercial systems, and a few proposing and defending designs for new visual interface application areas. Course Materials: ----------------- Readings: There is as yet no single text that gathers and discusses such interfaces. Course materials will consist mainly of reprints of research articles. Some examples of working systems are available for exploration on the Web, on the sites of the authors of the articles. Work: Homework 1 (20%): A theoretic and programming exploration of spatial description minimization and inference, given abstract visual data. Homework 2 (20%): Either a programming exercise to match fingerprints and iris patterns from a prestored database, or a programming exercise to detect gross motion, using the cameras attached to the workstations in the "CLIC" (the Computer Science Department's student research lab). Paper/project proposal with preliminary research (10%, due shortly after midterm): A five-page proposal for a course paper or project, complete with description, proposed methods, and literature references. Paper/project (50%): At finals, either a 20-page research paper surveying some aspect of visual interfaces, or a demonstrable working project documented with a 10-page write-up. Syllabus: A partial list of publications follows. ---On visual gesture interpretation: Frederik Kjeldsen, "Visual Interpretation of Hand Gestures as a Practical Interface Modality", Ph.D. Thesis, Columbia University, 1997. Andrew Wilson, Aaron Bobick, Justine Cassel, "Temporal Classification of Natural Gesture and Application to Video Coding", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997. Mark Torrance, "Advances in Human-Computer Interaction: The Intelligent Room", CHI'95 Research Symposium, 1995. Robb Lovell and John Mitchell, "Using Human Movement to Control Activities in Theatrical Environments", Technical Report, Institute for Studies in the Arts, Arizona State University, 1995. Axel Mulder, "Human Movement Tracking Technology", Technical Report 94-1, Simon Fraser University School of Kinesiology, July 1994. ---On visual information retrieval: Amarnath Gupta and Ramesh Jain, "Visual Information Retrieval", Communications of the ACM, Vol. 40, No. 5, May 1997. Myron Flickner, Harpreet Sawhney, Wayne Biblack, Jonathan Ashley, Quan Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker, "Query by Image and Video Content: The QBIC System", IEEE Computer, September 1995. John Smith, Shih-Fu Chang, "VisualSEEk: A Fully Automated Content-Based Image Query System", ACM Multimedia 96. ---On interpretation of visual imagery into natural language: Alicia Abella, "From Imagery to Salience: Locative Expressions in Context", Ph.D. Thesis, Columbia University, 1995. ---On visual guidance of vehicles: Charles Thorpe, Martial Hebert, Takeo Kanade, Steven Shafer, "Vision and Navigation for the Carnegie-Mellon Navlab", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 10, No. 3, May, 1988. Dean Pomerleau, "RALPH: Rapidly Adapting Lateral Position Handler", Technical Report, Carnegie-Mellon University, 1995. ---On visual methods for biometrics: Richard Wildes, "Iris Recognition: An Emerging Biometric Technology", Proceedings of the IEEE, Vol. 85, No. 9, September 1997. Anil Jain, Lin Hong, Sharath Pankanti, and Ruud Bolle, "An Identity-Authentication System Using Fingerprints", Proceedings of the IEEE, Vol. 85, No. 9, September 1997.