At least 35% of your grade in CS-AD 214 consists of a final project.
This project can either be a piece of independent research or engineering effort related to material we have studied in class or database research in general.
For example, your project may involve a comparison of systems we have read about, an application of database techniques to a problem you are familiar with, etc. Your project cannot be a pure data analytics project that does not advance our data management techniques. Your project must be independent of your senior capstone project.
You can choose one of the project suggestions below, and possibly adapt it. You can work independently or in groups of at most 3 members.
|3 March||Project Proposal||A 1-page proposal (single spaced, 11pt) with a list of team members, a short description of the project,
a short list of references (at most 5 references from VLDB, PODS, SIGMOD, ICDE, EDBT, InfoVis, etc.) and
a list of tools, systems, databases, etc. you plan to use.
|24 March||Project Milestone Report||Extend your proposal into a 3-4 page report. (Use VLDB tex formatting). Include:
(i) problem description
(ii) solution approach description
(iii) related work. Compare your approach to this work. Related work should cite about 3-5 papers. Don't waste time on an extensive, comprehensive, complete related work section. We care about the novelty of your project and you should learn to place your work in the perspective of prior work. It is okay if related work is somewhat incomplete.
(iv) Accomplishments so far, problems you faced or unexpected findings.
(v) Your action plan for the rest of the semester and an evaluation plan (experiments, user studies, etc.)
This document will eventually become your project report. Start by learning how to write a good research paper.
|9 May||Project Presentations||Each project gets 20 minutes: 15 minutes presentation + 5 minutes of Q/A.
Not sure how to give a good research talk?
Look at some slides on how,
|12 May||Project Report||Extend your milestone report into a 6-7 page report (Use VLDB tex formatting):
(i) improve and expand on all sections from the milestone report
(ii) include an eval section
(iii) write your conclusions
- Databases for humans: Pick a database task and investigate ways to make it simpler for humans to specify. For example, (i) let users sketch a pattern when querying time series data, or (ii) let users specify examples of data and automatically construct data generation scripts, or (iii) let users specify how to mask data and automatically transform the data. Such projects will combine machine learning, script generation, HCI and database concepts.
- Forgetful Data Structures: Extend this work to other data types such as text.
- Data Analytics: We have access to two datasets (a database of shoe print images and a crowdfunding database).
Many queries on these datasets are non-traditional, use ML and graph algorithms, and require coming up with ingenious tricks to
achieve good query performance or to approximate the answer. A la MAD skills style, come up with ways to allow
databases to better support these datasets.
Interactive Data Visualization: One of our datasets requires analyzing the impact of social media on crowdfunding. Develop tools and visualizations to explore and analyze this effect.
- Package Queries - Specification Elicitation: Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. These applications require the support of package queries. Package queries are more complex to specify and validate. In this project, you will study ways to elicit a package query specification with simple user interactions.
- Package Queries - Faster, Stronger, ...: Read the scalable package queries paper and design performance optimizations for different package query classes.
- Project Ideas from MIT - Dr. Sam Madden