COMP 6111A project ideas
Distributed text processing
Process a collection of texts to extract statistics of interest.
As an example, you can consider the following statistical processing:
Parallel sorting
Sort a large data set as fast as you can using many computers.
It is shown that Hadoop can perform high-throughput sorting and another example is TritonSort.
RPC in the cloud
Cloud-based services shall interact with users in a seamless way.
Not only browsers, but also other client side software, should be able to communicate with a cloud-based service.
Hence, many Intenet based services publish programming interfaces which are equivalent to RPCs in traditional distributed systems.
In this project, you will design an RPC interface for baijia.info.
For the front end (software running on users' PCs), you can choose your favorite language and programming platform.
The backend of baijia.info is modified from MyBB forum software.
Hence, you need to modify PHP code to extend the backend software to support the RPC interface.
At miminum, the RPC interface should support BiBTeX query and retrieval.
For example,
string getBibtex(string token);
would return a string containing the BibTeX entry of the paper identified by the token.
Online storage on Amazon EC2
Can you implement a file system on Amazon EC2?
If an EC2-based service can grow the capacity "elastically", can you design the system so that the file system grows "elastically", too?
make parallel
It takes probably 15 minutes to compile and build a Linux kernel.
Can we use MapReduce to perform parallel compilation? Implement it with Hadoop and compile the Linux kernel in parallel.
Collaborative paper authoring
Create a tool that we can use to co-author papers.
Particularly, the tool allows others (e.g., a co-author or your advisor) to comment on the paper and you see the comments instantly.
(Warning: Intensive programming)
Event-driven MapReduce
A natural way of implementing MapReduce is to use threads.
Can you implement it in the event-driven model?
Would it be faster than the thread-based implementation?
Your cool ideas come here...