In this post, I will list some useful open datasets.
CrowdFlower
CrowdFlower is a crowdsourcing platform. Thanks for them to support researches in crowdsourcing area. They open some completed datasets, called Data For Everyone processed on their platform for researchers in crowdsourcing area to test new methods.
AMiner
Aminer is a researcher mining platform, which ranks researchers in many areas based on their influences. Also the group developed Aminer is active and successful in data mining area and published many papers in KDD in recent years. They open their datasets used in their papers for other researchers to utilize.
Yelp Dataset
The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. Available in both JSON and SQL files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.
Population estimation from mobile network traffic metadata Dataset
This dataset cover one month of data taken during the month of April 2015 for three Italian cities: Rome, Milan, Turin. The raw data has been provided during the Telecom Italia Big Data Challenge
Datasets and codes from Gao Cong’s group
Prof. Gao Cong in NTU summarizes many datasets and source codes of the publications from his group in this webpage. Very useful for recommendation, POI mining and social influence analyses.