Thursday, August 21, 2003
Washington, DC
Nitesh Chawla | Business Analytic Solutions, CIBC (chawla@morden.csee.usf.edu) |
Nathalie Japkowicz | University of Ottawa (nat@site.uottawa.ca) |
Aleksander Kolcz | America Online, Inc. (ark@pikespeak.uccs.edu) |
Workshop Description:
Overview:
Recent years brought increased interest in applying machine learning
techniques to difficult "real-world" problems, many of which are
characterized by imbalanced learning data, where at least one class is
under-represented relative to others. Examples include (but are not limited to): fraud/intrusion
detection, risk management, medical diagnosis/monitoring, bioinformatics, text categorization and personalization of information. The
problem of imbalanced data is often associated with asymmetric costs of
misclassifying elements of different classes. Additionally the
distribution of the test data may differ from that of the learning sample
and the true misclassification costs may be unknown at learning time.
The AAAI-2000 Workshop on "Learning from Imbalanced Data Sets" provided the
first venue where this important problem was explicitly addressed and has
been received with much interest.
The related ICML-2000 Workshop "Cost-Sensitive Learning" provided
another venue for addressing the problem of asymmetric costs of different classes and features.
Although much awareness of the issues
related to data imbalance has been raised, many of the key problems still
remain open and are in fact encountered more often, especially when
applied to massive datasets.
We believe that it would be of value to the machine learning community to
not only examine the progress achieved in this area over the last three
years but also discuss the current school of thought on research in learning
from imbalanced datasets.
Based on our understanding of class imbalance problem, the following topics
of discussion are proposed (but not limited to):
Proposed Format: The workshop will open with an invited talk by Foster Provost that will introduce and overview the topic. Presentations will then be organized into several sessions corresponding roughly to the to the categories identified above. The workshop will conclude with a discussion during which a distinguished guest will comment on the presentations of the day, and open the floor for general discussion.
Proposed Length: One Day during which each panel will be allocated 1 to 2 hours, depending on the number of contributions and the expected length of the discussion session.
Workshop Notes: The accepted papers will be available electronically from the workhop website, and also as printed workshop notes to the attendees.
Submissions:
Authors are invited to submit papers on the topics outlined above or
on other related issues. Submissions should not exceed 8 pages, and should
be in line with the
ICML style sheet . Electronic submissions, in PDF format, are
prefered and should be sent to Nitesh Chawla at chawla@morden.csee.usf.edu. If
electronic submissions are inconvenient, please send four hard copies of
your submission to:
Timetable:
Invited Speakers:
|
Program Committee:
Kevin Bowyer | University of Notre Dame, USA |
Chris Drummond | National Research Council, Canada |
Charles Elkan | University of California San Diego, USA |
Marko Grobelnik | Jozef Stefan Institute, Slovenia |
Larry Hall | University of South Florida, USA |
Robert Holte | University of Alberta, Canada |
W. Philip Kegelmeyer | Sandia National Labs, USA |
Miroslav Kubat | University of Miami, USA |
Aleksandar Lazarevic | University of Minnesotta, USA |
Charles Ling | University of Western Ontario, Canada |
Dragos Margineantu | Boeing Corporation, USA |
Foster Provost | New York University, USA |
Gary Weiss | AT&T Labs, USA |