The 2007 IEEE ICDM Data Mining contest was conducted between August and October 2007. There were two tasks in this contest, both were about indoor location estimation using Wifi data. The data were collected and organized at Hong Kong University of Science and Technology, Hong Kong, China by Professor Qiang Yang's research group. The following material describes the tasks, the data and the contest results. Contest winner's slides are also posted.
If you use this data set in your publications, please reference the following citation:
Qiang Yang, Sinno Jialin Pan, Vincent Wenchen Zheng, Estimating Location Using Wi-Fi, IEEE Intelligent Systems, vol. 23, no. 1, pp. 8-13, Jan/Feb, 2008 (URL: http://doi.ieeecomputersociety.org/10.1109/MIS.2008.4) (PDF)
@article{icdm2007contest,
author={Qiang Yang and Sinno Jialin Pan and Vincent Wenchen Zheng},
title={Estimating Location Using Wi-Fi},
booktitle={IEEE Intelligent Systems},
volume={23},
number={1},
pages={8-13},
year={2008}
}
Task 1
Task 2
Introduction
The
first IEEE ICDM Data Mining Contest (IEEE ICDM DMC’07) is held in conjunction
with the 2007 IEEE International Conference on Data Mining (IEEE ICDM 2007). This contest
is about indoor location estimation from radio signal strengths received by a
client device from various WiFi
Access Points (APs). This is a problem of practical significance and technical
challenge. Indoor location estimation in wireless networks using Received
Signal Strength (RSS) values has attracted great interests in data mining
and machine learning communities. Many applications rely on this task, ranging
from robotics to context-aware computing can now be realized with the help of
distributed wireless networks, to security related applications, and to mobile
commerce and healthcare for the sick and elderly. Researchers have been
trying to apply data mining and machine learning techniques in a WiFi to
recognize a mobile user’s localizations and activities. The problem
can be visualized by considering the following scenario:
A person holding a wireless client device walks around a building floor. The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs). Each of these APs is identifiable with a unique ID. Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.
A
typical way to do this task is through triangulation. However,
triangulation methods cannot cope with the uncertainty associated with the RSS
values. Thus, in this contest we will collect some training data, and
apply data mining and machine learning methods to locate the user.
However,
accurately locating a mobile device in an indoor environment by its RSS values
is a challenging problem. This is because:
1. The WiFi data are very noisy due to
the so-called multi-path effect in indoor environments, the movement of people
and temperature and humidity, among other factors.
2. Collecting the (RSS values, Location
Label) pairs as training data in a large building is very costly, because
humans need to take a mobile device and walk through the building to collect
the RSS values and mark down the ground locations.
3. Due to the dynamic factors in indoor
environments, the distribution of training data collected offline can be
different from the distribution of online test data.
We hope the ICDM contest data set will serve as a benchmark data set in comparing solutions for this challenging and practical problem.
This
year's contest involves two tasks: task 1 and task 2.
Task 1. Indoor
Location Estimation.
The first task is to
predict the location of each collection of received signal strength (RSS)
values in an indoor environment, received from the WiFi Access Points (APs).
You are given a set of (RSS values, Location Label) data as training data,
where the location labels are discrete (here the location label is the class
label). Some data are given without labels; that is for those data only the RSS
values are given. In addition, you are given a collection of partially
labelled user traces, which corresponds to a sequence of RSS values collected
as a user continuously walks around a building.
You are asked to
design an algorithm for predicting the location labels of a collection of test
data. The test data are obtained by collecting the RSS values as a user
walks around a building. Your task is to predict the location label for
each vector of RSS values.
Task 2.
Transferring the Learned Knowledge for Indoor Location Estimation.
The second task is
similar to task 1, except that the training data are collected at a different
time period from the test data. For this task, the test data are discrete
(that is, the test data are not sequential data). To help with prediction when
the training and test data may be from different distributions, some test data
are associated with location labels for you to use as benchmarks.
For
task 2, you need to adapt or transfer the knowledge learned from the training
data for predicting the test data.
Detailed
Description:
All
WiFi data are collected in approximately 200 locations, where each location is
a grid. A grid has a size of about
For
Task 1, all the WiFi data (training data and test data) are collected by the same
device in the same time period. There are two types of data provided in this
task: trace data and non-trace data. In the trace data, we provide the
sequential information in the data collection process, and some collections
of RSS values in a trace are given their corresponding location labels
(grids). In the non-trace data, similar to trace data, some of them are given
location labels, except we hide the sequential information. This task can
be treated as semi-supervised learning/multi-class learning problem.
For Task 2, the training data are collected by a device in time period A (which can be within one or two hours of time at night time), while the test data are collected by the same device in time period B (where each time period can be one to two hours of time during the day). In this task, we do not provide any additional sequential information at all.
Evaluation Criterion
The evaluation will be run on the held-back ground truth labels to determine the the winners.
Additionally, you will be asked to submit your algorithms description in 2 or 3 pages within one week of the final submission.
(In
the Task_1_ground_truth.txt, each line is either the ID of a trace (Trace_X) or
the ground truth location label.)
( In the Task_2_ground_truth.txt, each line is the ground truth location label)
|
|
Task evaluation, training datasets for both tasks, test data for task 2 and registration available |
|
Sept 16, 2007 |
Test data for task 1 are available |
|
Sept 26, 2007 |
Submissions of estimation results (by midnight PST) |
|
Oct 1, 2007 |
Submission of brief algorithm description (by midnight PST) |
|
Oct 28-31, 2007 |
Contest Winner Presentation at ICDM Conference |
The
contest is open to any party planning to attend
IEEE ICDM 2007. A person can
participate in only one group. Thus each team can not add any
member after they submit the final result. Each team can participate in either
one or both of the tasks and each team can submit multiple times before the
contest deadline, of which only the last submission is counted. The
contestant takes the responsibility of obtaining any permission to use any
algorithms, tools or data that are intellectual property of third party.
Winner Selection
There will be one winner and one runner-up for each Task 1 and Task 2, provided that they are above the specified baselines (see Evaluation Criterion above).
Winner of "Location
Estimation Performance Award" – Task 1, the best average
performance in Task 1.
Runner-up of "Location
Estimation Performance Award" -- Task 1, the second best
average performance in Task 1.
Winner of "Location
Estimation Performance Award" – Task 2, the best average
performance in Task 2.
Runner-up of "Location Estimation Performance Award" -- Task 2, the second best average performance in Task 2.
Qiang Yang,
Gang
Kou,
Chris Ding,
Sinno Jialin Pan, Hong Kong UST
Rong Pan,
Hong Kong UST
Jeff Junfeng Pan,
Hong Kong UST
Vincent Wenchen Zheng, Hong Kong UST
Note: if you have any questions, please send to: {sinnopan,vincentz,qyang} @ cse dot ust dot hk
Back to IEEE ICDM 2007 main page.