Big Data Analytics
CUNY Graduate Center
Course Offering Spring 2015
Area of program: Data Science, Interdisciplinary
CSC 86005 Big Data Analytics
Location CUNY GC ROOM TBD
Instructor: Prof. Soon Ae Chun
This seminar course covers the research issues and practical methods of managing and analyzing Big Data to gain and discover insights, patterns, and knowledge nuggets that can support decision makers. In addition to constantly growing volumes of proprietary transaction, product, inventory, customer, competitor, and industry data collected from enterprise systems, organizations are also faced with overwhelming amounts data from the Web, social media, mobile sources, and sensor networks that do not fit into traditional databases in terms of volume, velocity and variety (the three Vs of Big Data). This Big Data flood poses challenges as well as opportunities, if managed and analyzed properly, to derive new actionable knowledge and intelligence in a timely manner. This course will explore existing and emerging methods to manage, integrate, analyze and visualize domain‐specific Big Data, to identify and provide domain specific solutions.
Prerequsites: The course is open to PhD students, including both CS and non‐CS majors. No prior knowledge about Big Data Management or Analytics is required. However, knowledge of college level Calculus, Probability theory and Statistics will be helpful. Experience with Big Data, database management systems (DBMSs), analytics, data warehouses, Business Intelligence systems, Data Mining programs, and statistical packages such as R or MatLab would be a plus.
This seminar will have a series of invited talks by CUNY‐wide faculty members and invited speakers from outside, including domain experts, to cover the breadth and depth of the topics in Big Data Analytics.
Two broad goals are:
- To expose students to Big Data as a scientific or engineering problem. Students will be guided to focus on a particular domain specific area, identify research challenges or application utilities, and present existing and/or innovative methods and algorithms to design a solution. Students are expected to submit a conference paper and/or a demonstration paper to a conference related to Big Data Analytics by the end of this seminar, in collaboration with the faculty member(s). A series of student presentations are expected at the end of the semester.
- To foster faculty research collaborations throughout the CUNY campuses on topics of interest in Big Data Analytics. Through the invited speaker series, the computer science faculty and domain area experts will exchange ideas to identify and address real‐world challenges, resulting in a set of new research plans.
No textbook is required. Reading materials and lecture notes will be available. The recommended books include:
- Mining of Massive Data Sets, by Anand Rajaraman, Jure Leskovec and Jeff Ullman. http://infolab.stanford.edu/~ullman/mmds.html
- Data‐Intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer. http://lintool.github.com/MapReduceAlgorithms/index.html
- Big Data Analytics by David Loshin.
- Students are expected to read weekly topic‐related research papers, summarize the problems, solutions and identify the remaining research issues. The students will be assigned to give a presentation including a paper summary and a critical review.
- Each student also selects one application domain area and collects a repository of data sets. Collectively, the data sets will serve a wider community for Big Data analytics experiments and tests.
- Students will identify a Big Data Analytics research problem related to their domain application and dataset, and write a research paper discussing the existing solutions and design/propose a potential new applied solution that can be used by the domain area decision makers.
- With the given dataset, each student can analyze and design a specific use‐case related to her/his research problem, and design (and possibly implement) his/her proposed solution as a tool.
- The final presentation of the research paper and a demo will be given in the form of a workshop/poster presentation at the end of the semester with an audience of invited faculty,
students and industry leaders. Top paper awards will be given and students will have the chance to work with a faculty or industry mentor on a conference paper and/or journal publication.
1. Domain‐specific Big Data Collection & Repository – 15%
2. Analytics Project – 25%
3. Research Paper – 40%
4. Weekly reading summary, reviews and presentation ‐ 10%
5. Research Paper and System Presentation/Participation – 10%
Tentative Topics (subject to change)
Part 1: Big Data Analytics Fundamentals/Theories/Platform
1. Big Data Analytics: environment, challenges and opportunities & Course Overview
2. Analytics Platform (Architecture, Process and analytics tools)
3. Multiple data source management and data integration
Part 2: Structured Data Analytics
1. Structured Big data – issues and approaches
2. Transportation data analytics
(Guest speaker: Prof. Jonathan Peters)
3. Financial or Banking or Web‐based Transaction data analytics
4. Environmental data analytics
Part 3: Semi/Unstructured Data Analytics
1. Textual Data Analytics
2. Social media data analytics
3. Short Text Classification/Clustering
(Guest speaker: Prof. Sarah Zelikovitz)
4. Real‐time Big Data Processing
(Guest speaker: Prof. Paolo Cappellari)
Part 4: Media Data analytics
1. Fundamentals of Image/video Data analytics
2. Cultural analytics and Visualization
(Guest speaker: Prof. Lev Manovich)
3. Statistical inference and Real‐time classification for 3D point data
(Guest speaker: Prof. Olimpia Hadjiliadis)
Part 5: Network and Graph Data Analytics
1. Social Network/Graph data analytics
2. Semantic Web and Linked data analytics
Part 6: Societal Impacts on Big Data Analytics
1. Security, Privacy issues
2. Accountability issues: Open Government Data
Part 7: Big Data Analytics Workshop/Poster/Demo
Student Research/Project Presentation
Faculty Panel Discussions/Research Presentations