Open Datasets Collection

16 Dec 2017

In this post, I will list some useful open datasets.

  1. CrowdFlower

    CrowdFlower is a crowdsourcing platform. Thanks for them to support researches in crowdsourcing area. They open some completed datasets, called Data For Everyone processed on their platform for researchers in crowdsourcing area to test new methods.

  2. AMiner

    Aminer is a researcher mining platform, which ranks researchers in many areas based on their influences. Also the group developed Aminer is active and successful in data mining area and published many papers in KDD in recent years. They open their datasets used in their papers for other researchers to utilize.

  3. Yelp Dataset

    The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. Available in both JSON and SQL files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.

  4. Population estimation from mobile network traffic metadata Dataset

    This dataset cover one month of data taken during the month of April 2015 for three Italian cities: Rome, Milan, Turin. The raw data has been provided during the Telecom Italia Big Data Challenge

  5. Datasets and codes from Gao Cong’s group

    Prof. Gao Cong in NTU summarizes many datasets and source codes of the publications from his group in this webpage. Very useful for recommendation, POI mining and social influence analyses.