This course provides an introduction to computer vision including: fundamentals of image formation; camera imaging geometry; feature detection and matching; multiview geometry including stereo, motion estimation and tracking; and classification. We’ll develop basic methods for applications that include finding known models in images, depth recovery from stereo, camera calibration, image stabilization, automated alignment (e.g. panoramas), tracking, and action recognition.
The focus of the course is to develop the intuitions and mathematics of the methods in lecture, and then to learn about the difference between theory and practice in the problem sets. All algorithms work perfectly in the slides. But remember what Yogi Berra said: In theory there is no difference between theory and practice. In practice there is. (Einstein said something similar but who knows more about real life?)
In general we will not make use of image and vision libraries until first understanding (and often coding) the basic methods. You should be comfortable writing code that reflects mathematics, coding a variety of data structures, and comparing them to evaluate different hypotheses.
Note: Sample syllabi are provided for informational purposes only. For the most up-to-date information, consult the official course documentation.
Before Taking This Class...
Suggested Background Knowledge
- Students should have a working knowledge of Python for doing mathematical programming. All assignments will be in Python using the OpenCV system as the backend. For those familiar with programming in general, they should be able to pick it up on their own. However, it should be emphasized that this course is not about learning to program, but using programming to experiment with Computer Vision concepts.
- This course has more math than many CS courses: linear algebra, vector calculus, linear algebra, probability, and linear algebra. (Get the hint?)
- No prior knowledge of vision is assumed. Some experience with programming with images is helpful. Experience with any image/signal processing will also be informative.
If you answer "no" to any of the following questions, it may be beneficial to refresh your knowledge of some material prior to taking CS 6476:
- Do you remember what a Gaussian distribution is and what its parametric form looks like?
- If you had a 2D array of numbers and you wanted to compute the derivative in the x direction, could you do that? How about the magnitude of the “gradient”?
- If you had to draw a line on an image with your own code, could you do that (ie no libraries)?
- If you wanted to convert a color image into a monochrome version (gray scale), would you know how to compute it?
- Are you comfortable with code that works in theory but in practice the results are poorer than you expect? And do you enjoy fiddling (that’s the technical term) with parameters of your algorithm to get it to work on real images?
Technical Requirements and Software
- At least 32GB of available disk space and ability to install additional (free) software. For course projects, you will need to install the Oracle VirtualBox VM and run a Linux virtual machine that contains the setup for the project. Although it is possible to install the software for the projects natively in Linux, such a setup will not be supported.
- Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps is recommended; the minimum requirement is 0.768 Mbps download speed.
- Operating system:
- PC: Windows XP or higher with latest updates installed
- Mac: OS X 10.6 or higher with latest updates installed
- Linux: any recent distribution that has the supported browsers installed
All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. This course may impose additional academic integrity stipulations; consult the official course documentation for more information.