CSE 6242 Data and Visual Analytics


Guy Lebanon
Prof. Duen Horng (Polo) Chau


This course will introduce you to broad classes of techniques and tools for analyzing and visualizing data at scale. It emphasizes on how to complement computation and visualization to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds. Students will work in small teams to complete a project exploring novel approaches for interactive data & visual analytics.

Prerequisites and Course Requirements

If you are a Georgia Tech Analytics (OMS or campus) student, you should first take CSE 6040 and do very well in it; if necessary, please also first take CS 1301.

You are expected to quickly learn many things simultaneously, and for some materials you will need to learn them on your own (e.g., Linux commands, for working with MS Azure/Amazon AWS). This can be very intimidating for many students.

The amounts of time students spend on this class greatly vary, based on their backgrounds, and what they may already know. Some former students told us they spent about 40-60 hours on each homework assignment (we have 4 big assignments, and no exams), and some reported much less. For example, for the homework assignment about D3 visualization programming, students who are completely new to javascript, css, and html likely will spend significantly more time than their peers who have already tried them before. Some former students who do not have a computer science background found the homework assignments challenging, would take significant time and effort, but were rewarding, fun, and "do-able."

Students have at least 3 weeks to complete each homework assignment. Some students waited until the last week, and could not finish. It is critical to plan ahead and prepare for the significant time needed.

Almost all homework assignments involve very large amount of programming tasks (which naturally means likely a lot of debugging will be needed, thus can be time consuming). You should be proficient in at least one high-level programming language (e.g., Python, C++, Java), and is efficient with debugging principles and practices. If not, you should NOT take this course. Instead, you should first take CSE 6040 (for OMS Analytics students) and, if needed, CS 1301 and CS 1371 as well.

Some programming assignments involve high-level languages or scripting (e.g., Python, Java, SQL etc.). Some assignments involve web programming and D3 (e.g., Javascript, CSS, HTML). For example, an assignment on Hadoop and Spark may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. It is unlikely that you all know tools/skills needed in the programing tasks, so you are expected to learn many of them on the fly.

Basic linear algebra, probability and statistics knowledge is also expected.

Course Goals

  • Learn visual and computation techniques and tools, for typical data types
    • Learn how to complement each kind of methods
    • Gain a breadth of knowledge
  • Work on real datasets and problems
  • Learn practical know-how (useful for jobs, research) through significant hands-on programming assignments

Course Preview

Announcements and Discussion

We use Piazza for all announcements and discussion. Everyone must join this class's Piazza (link available on Canvas). Double check that you are joining the correct Piazza! There are multiple concurrent course sections with the same name and course number taking place, e.g., online for OMSA and OMSCS, and campus for Atlanta-based students.

The fastest way to get help with homework assignments is to post your questions on Piazza. That way, only our TAs and instructor can help, your peers can too.

If you prefer that your question addresses to only our TAs and the instructor, you can use the private post feature (i.e., check the "Individual Students(s) / Instructors(s)" radio box).

While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.

For special cases such as failed submissions due to system errors, missing grades, failed file uploads, emergencies that prevent you from submitting, personal issues, you can contact the staff using a private Piazza post.

Canvas will be used for submission of assignments and projects, but not for announcements or discussion.


  1. There will be 4 homework assignments. Together, they are worth 50% (10%, 15%, 15%, 10%) of the course grade.
  2. There will be one course group project worth 50% of the course grade. The project components are:
    1. Proposal (7.5% of course grade)
    2. Proposal presentation (5%) (video recording)
    3. Progress report (5%)
    4. Final poster presentation (7.5%) (video recording)
    5. Final report (25%)
  3. You must achieve an overall weighted average of 60% to pass the course.
  4. All deliverables will be graded by our TAs, except the project poster presentation, which will be peer-graded.
  5. When assigning course grades, I will start with the standard grade thresholds (90, 80, etc.). I may lower (and never raise) the thresholds (i.e., to your benefits). For example, I may use 88 instead of 90. 

Reading Materials

All content and course materials can be accessed online. There is no textbook for this course. 

Other Info

Academic Honesty

All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code.