2007 IEEE ICDM Data Mining Contest

Overview

The 2007 IEEE ICDM Data Mining contest was conducted between August and October 2007.  There were two tasks in this contest, both were about indoor location estimation using Wifi data.  The data were collected and organized at Hong Kong University of Science and Technology, Hong Kong, China by Professor Qiang Yang's research group. The following material describes the tasks, the data and the contest results.  Contest winner's slides are also posted. 

If you use this data set in your publications, please reference the following citation:

Qiang Yang, Sinno Jialin Pan, Vincent Wenchen Zheng, Estimating Location Using Wi-Fi, IEEE Intelligent Systems, vol. 23,  no. 1,  pp. 8-13,  Jan/Feb,  2008 (URL: http://doi.ieeecomputersociety.org/10.1109/MIS.2008.4) (PDF)

@article{icdm2007contest,
author={Qiang Yang and Sinno Jialin Pan and Vincent Wenchen Zheng},
title={
Estimating Location Using Wi-Fi},
booktitle={IEEE Intelligent Systems},
volume={23},
number={1},
pages={8-13},
year={2008}
}

Winners (Organizers' Slides)

Task 1

  • Winner: IBM Research, Tokyo Research Laboratory Team, Japan.  Members: Hisashi Kashima, Shoko Suzuki, Shohei Hido, Yuta Tsuboi, Toshihiro Takahashi, Tsuyoshi Ide, Rikiya Takahashi, and Akira Tajima. Slides
  • First Runner up: Yuichi Katori, University of Tokyo and JST, Japan. 
  • Second Runner up: d-cup Team.  Members: Yang Qu and Chun Li, Tsinghua University, China. 

Task 2

  • Winner: Feng Guo, Machine Learning Center, Hebei University, China.  Slides
  • First Runner up: MCT.ICT Team.  Members: Juan Qi , Zhuo Sun, Junfa Liu and Yiqiang Chen, Institute of Computing Technology. Chinese Academy of Sciences, China. 
  • Second Runner up: IBM Research, Tokyo Research Laboratory Team, Japan.  Members: Hisashi Kashima, Shoko Suzuki, Shohei Hido, Yuta Tsuboi, Toshihiro
    Takahashi, Tsuyoshi Ide, Rikiya Takahashi, and Akira Tajima.

Final contest ranking

Introduction to the Contest

The first IEEE ICDM Data Mining Contest (IEEE ICDM DMC’07) is held in conjunction with the 2007 IEEE International Conference on Data Mining (IEEE ICDM 2007). This contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs). This is a problem of practical significance and technical challenge.  Indoor location estimation in wireless networks using Received Signal Strength (RSS) values has attracted great interests in data mining and machine learning communities. Many applications rely on this task, ranging from robotics to context-aware computing can now be realized with the help of distributed wireless networks, to security related applications, and to mobile commerce and healthcare for the sick and elderly.  Researchers have been trying to apply data mining and machine learning techniques in a WiFi to recognize a mobile user’s localizations and activities.   The problem can be visualized by considering the following scenario:

 

A person holding a wireless client device walks around a building floor.  The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs).  Each of these APs is identifiable with a unique ID.  Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.

 

A typical way to do this task is through triangulation.  However, triangulation methods cannot cope with the uncertainty associated with the RSS values.  Thus, in this contest we will collect some training data, and apply data mining and machine learning methods to locate the user.

 

However, accurately locating a mobile device in an indoor environment by its RSS values is a challenging problem. This is because:

1. The WiFi data are very noisy due to the so-called multi-path effect in indoor environments, the movement of people and temperature and humidity, among other factors.

2. Collecting the (RSS values, Location Label) pairs as training data in a large building is very costly, because humans need to take a mobile device and walk through the building to collect the RSS values and mark down the ground locations.

3. Due to the dynamic factors in indoor environments, the distribution of training data collected offline can be different from the distribution of online test data.

 

We hope the ICDM contest data set will serve as a benchmark data set in comparing solutions for this challenging and practical problem.

Tasks

This year's contest involves two tasks: task 1 and task 2.  The map of the data collection area is not disclosed during the contest period.  However, for convenience of research on this data set, we provide the map here.

Task 1. Indoor Location Estimation.

The first task is to predict the location of each collection of received signal strength (RSS) values in an indoor environment, received from the WiFi Access Points (APs). You are given a set of (RSS values, Location Label) data as training data, where the location labels are discrete (here the location label is the class label). Some data are given without labels; that is for those data only the RSS values are given.  In addition, you are given a collection of partially labelled user traces, which corresponds to a sequence of RSS values collected as a user continuously walks around a building.

You are asked to design an algorithm for predicting the location labels of a collection of test data.  The test data are obtained by collecting the RSS values as a user walks around a building.  Your task is to predict the location label for each vector of RSS values.

Task 2. Transferring the Learned Knowledge for Indoor Location Estimation.

The second task is similar to task 1, except that the training data are collected at a different time period from the test data.  For this task, the test data are discrete (that is, the test data are not sequential data). To help with prediction when the training and test data may be from different distributions, some test data are associated with location labels for you to use as benchmarks.

For task 2, you need to adapt or transfer the knowledge learned from the training data for predicting the test data.

 

Detailed Description:

 

All WiFi data are collected in approximately 200 locations, where each location is a grid.  A grid has a size of about 1.5m×1.5m.  The RSS values include a set of IDs for the access points (AP) and their corresponding RSS values (received signal strength). The larger the RSS value is received from an access point AP_1, the closer to the AP_1 is the client device.

 

For Task 1, all the WiFi data (training data and test data) are collected by the same device in the same time period. There are two types of data provided in this task: trace data and non-trace data. In the trace data, we provide the sequential information in the data collection process, and some collections of  RSS values in a trace are given their corresponding location labels (grids). In the non-trace data, similar to trace data, some of them are given location labels, except we hide the sequential information.  This task can be treated as semi-supervised learning/multi-class learning problem.

 

For Task 2, the training data are collected by a device in time period A (which can be within one or two hours of time at night time), while the test data are collected by the same device in time period B (where each time period can be one to two hours of time during the day). In this task, we do not provide any additional sequential information at all.

 

Evaluation Criterion

Evaluation Criterion

The evaluation will be run on the held-back ground truth labels to determine the the winners. 

Additionally, you will be asked to submit your algorithms description in 2 or 3 pages within one week of the final submission.

 

 

Data Format

Task 1: location estimation

(In the Task_1_ground_truth.txt, each line is either the ID of a trace (Trace_X) or the ground truth location label.)

Task 2: transfer learning

( In the Task_2_ground_truth.txt, each line is the ground truth location label)

--------------------------------------------------------------------------------

The following information are from the contest period in Aug-Oct 2007

Important Dates:

Aug 16, 2007

Task evaluation, training datasets for both tasks, test data for task 2 and registration available

Sept 16, 2007

Test data for task 1 are available

Sept 26, 2007

Submissions of estimation results (by midnight PST)

Oct 1, 2007

Submission of brief algorithm description (by midnight PST)

Oct 28-31, 2007

Contest Winner Presentation at ICDM Conference

Rules

The contest is open to any party planning to attend IEEE ICDM 2007. A person can participate in only one group.  Thus each team can not add any member after they submit the final result. Each team can participate in either one or both of the tasks and each team can submit multiple times before the contest deadline, of which only the last submission is counted.  The contestant takes the responsibility of obtaining any permission to use any algorithms, tools or data that are intellectual property of third party.

Contest Result:

Result Submission

You are asked to submit the output of your classifier in the form of location predictions for all the test data.  If you work on both tasks, submit your results in separate files under the submission link.

Winner Selection

There will be one winner and one runner-up for each Task 1 and Task 2, provided that they are above the specified baselines (see Evaluation Criterion above).

Winner of "Location Estimation Performance Award" – Task 1, the best average performance in Task 1.

Runner-up of "Location Estimation Performance Award" -- Task 1, the second best average performance in Task 1.

Winner of "Location Estimation Performance Award" – Task 2, the best average performance in Task 2.

Runner-up of "Location Estimation Performance Award" -- Task 2, the second best average performance in Task 2. 

Frequently Asked Questions and News

Organization

ICDM-2007 Contest Co-chairs:

Qiang Yang, Hong Kong University of Science and Technology (Hong Kong UST), Hong Kong

Gang Kou, Thomson Corporation (USA)

Chris Ding, University of Texas, Arlington (USA)

Local Committee Members

Sinno Jialin Pan, Hong Kong UST

Rong Pan, Hong Kong UST

Jeff Junfeng Pan, Hong Kong UST

Vincent Wenchen Zheng, Hong Kong UST

Sponsoring Organization

Thomson Corporation (USA)

 

Note: if you have any questions, please send to: {sinnopan,vincentz,qyang} @ cse dot ust dot hk

 

Back to IEEE ICDM 2007 main page.

Locations of visitors to this page