Bibliography
The Class Imbalance Problem
A Bibliography
(Please send your updates to: nat@site.uottawa.ca
(The list is not formally maintained anymore, though)))
- An, A., Cercone, N. and Huang, X. (2001), A Case Study for Learning
from Imbalanced Data Sets in Advances in Artificial Intelligence:
Proceedings of the 14th Conference of the Canadian Society for Computational
Studies of Intelligence, pp. 1-15.
- Nitesh
V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall and Kevin W. Bowyer, (2003),
SMOTEBoost: Improving Prediction of the Minority Class in Boosting, in
7th European Conference on Principles and Practice of Knowledge
Discovery in Databases Cavtat-Dubrovnik, Croatia, September 22-26, 2003,
Pages 107 - 119
- Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P. (2002),
SMOTE: Synthetic Minority Over-sampling TEchnique ,
Journal of Artificial Intelligence Research (JAIR), Volume 16,
pp. 321-357.
- Chawla, N., Japkowicz, N. and Kolcz, A. (editors), Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Data Sets , http://www.site.uottawa.ca/~nat/Workshop2003/workshop2003.html, August 2003.
- Chawla, N., Japkowicz, N. and Kolcz, A. (editors), SIGKDD Explorations, Special Issue on Class Imbalances , SIGKDD Explorations 6(1), June 2004.
- DeRouin, E., Brown, J., Fausett, L. & Schneider, M. (1991).
Neural Network Training om Unequally Represented Classes.
In Intelligent Engineering Systems Through Artificial Neural Networks,
pp. 135-140.
- Domingos, Pedro (1999), Metacost: A general method for making
classifiers cost sensitive, in Proceedings of the Fifth International
Conference on Knowledge Discovery and Data Mining, pp. 155-164.
- Drummond, Chris and Holte, Robert (2000),
Explicitly Representing Expected Cost: An Alternative to ROC
Representation, in Proceedings of the Sixth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,, pp. 198-207.
- Drummond, Chris and Holte, Robert (2000),
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria,
in Proceedings of the Seventeenth International Conference on Machine
Learning , pp. 239-249.
- Eavis, Todd and Japkowicz, Nathalie, (2000) A Recognition-Based
Alternative to Multi-Layer Perceptrons , in Advances in Artificial
Intelligence: Proceedings of the 13th Biennial Conference of the Canadian
Society for Computational Studies of Intelligence , pp. 280-292.
- Elkan, Charles (2001), The Foundations of Cost-Sensitive Learning ,
in Proceedings of the Seventeenth International Joint Conference on
Artificial Intelligence.
- Estabrooks, Andrew (2000),
A Combination Scheme for Inductive Learning from Imbalanced Data Sets ,
MCS Thesis, Faculty of Computer Science, Dalhousie University.
- Estabrooks A., and Japkowicz, N., (2001),
A Mixture-of-Experts Framework for Concept-Learning from Imbalanced Data
Sets, in Proceedings of the 2001 Intelligent Data Analysis
Conference.
- Estabrooks A., Jo, T., and Japkowicz, N., (2004),
A Multiple Resampling Method for Learning from Imbalances Data Sets, in Computational Intelligence, Volume 20, Number 1, 2004. (in press)
- Ezawa, K., Singh, M., & Norton, S. W. (1996), Learning Goal Oriented
Bayesian Networks for Telecommunications Risk Management. In
Proceedings of the
International Conference on Machine Learning, ICML-96, pp. 139-147.
- Fawcett, T.E. and Provost, F., (1997),
Adaptive Fraud Detection , In Data Mining and Knowledge Discovery
, Volume 1, Number 3, pp. 291-316.
- Hongyu Guo and Herna L. Viktor, Learning from Imbalanced Data Sets with
Boosting and Data Generation: The DataBoost-IM Approach, ACM SIGKDD
Explorations, 6(1), 2004, 30-39.
- Hongyu Guo and Herna L. Viktor, Learning from Skewed Class
Multi-relational Databases, Journal of Fundamenta Informaticae (FI),
Special Issue on "Multi-relational Data Mining", Volume 89, Issue 1, pages
69-94, 2008
-
- Japkowicz, N., Myers, C. and Gluck, M. (1995),
A Novelty Detection Approach to Classification ,
in Proceedings of the Fourteenth Joint Conference on Artificial
Intelligence , pp. 518-523.
- Japkowicz, N. (editor), Proceedings of the AAAI'2000 Workshop on
Learning from Imbalanced Data Sets , AAAI Tech Report WS-00-05, July 2000.
- Japkowicz, N. (2000), The Class Imbalance Problem: Significance and
Strategies, in Proceedings of the 2000 International Conference on
Artificial Intelligence (IC-AI'2000) , pp. 111-117.
- Japkowicz, N., (2001), Concept-Learning in the Presence of
Between-Class and Within-Class Imbalances , in Advances in Artificial
Intelligence: Proceedings of the 14th Conference of the Canadian Society for
Computational Studies of Intelligence, pp. 67-77.
- Japkowicz, N. and Stephen, S., The Class Imbalance Problem: A Systematic Study , Intelligent Data Analysis Journal, Volume 6, Number 5,
November 2002.
- Jo, T. and Japkowicz, N., Class Imbalances versus Small Disjuncts ,
SIGKDD Explorations 6(1), June 2004.
- Kolcz, A. and Alspector, J., Asymmetric Missing-data Problems:
Overcoming the Lack of Negative Data in Preference Ranking ,
Information Retrieval , Volume 5, Number 1, pp. 5-40, 2002.
- Kubat, M. and Matwin, S. (1997), Addressing the Curse of Imbalanced
Data Sets: One-Sided Sampling, in Proceedings of the Fourteenth International
Conference on Machine Learning , pp. 179-186.
- Kubat, M., Holte R. and Matwin, S. (1997), Learning when Negative
Examples Abound, in Proceedings of ECML-97 , pp. 146-153.
- Kubat M., Holte, R. and Matwin, S., (1998)
Machine Learning for the Detection of Oil Spills in Satellite Radar Images ,
in Machine Learning , Volume 30, pp. 195-215.
- Latinne P., Saerens M. & Decaestecker C. (2001), Adjusting the
outputs of a classifier to new a priori probabilities may
significantly improve classification accuracy: Evidence from a
multi-class problem in remote sensing. Proceedings of the 18th
International Conference on Machine Learning (ICML), pp. 298-305.
- Lawrence, S., Burns, I., Back, A.D., Tsoi, A.C., Giles, C.L., (1998)
Neural Network Classification and Unequal Prior Class Probabilities in
G. Orr, R.-R. Muller, and R. Caruana, editors, Tricks of the Trade,
Lecture Notes in Computer Science State-of-the-Art Surveys, pp. 299-314.
Springer Verlag.
- Lee, Y., Wahba, G. and Ackerman, S., Classification of Satellite
Radiance Data by Multicategory Support Vector Machines
TR 1075, February 2003, available at http://www.stat.wisc.edu/~wahba -> TRLIST
- Lee, Y., Lin, Y. and Wahba, G., Multicategory Support Vector
Machines, Theory, and Application to the Classification of Microarray Data and
Satellite Radiance Data , TR 1064, September 2002, available at http://www.stat.wisc.edu/~wahba -> TRLIST
- Lee, Y., Multicategory Support Vector Machines, Theory, and
Application to the Classification of Microarray Data and Satellite Radiance
Data , TR 1063, September 2002. PhD. Thesis, available at http://www.stat.wisc.edu/~wahba -> TRLIST
- Lewis, D. and Gale, W. (1994),
Training Text Classifiers by Uncertainty Sampling, in
Proceedings of the Seventh Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval .
- Ling, C.X. and Li, C., (1998), Data Mining for Direct Marketing:
Problems and Solutions , in Proceedings of the Fourth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining ,
pp. 73-79,
- M. Maragoudakis, K. Kermanidis, A. Tasikas, N. Fakotakis and G.
Kokkinakis, 2004, Bayesian Induction of Verb Subcategorization Frames in
Imbalanced Heterogeneous Data, Journal of Quantitative Linguistics,
ISSN: 0929-6174 , Swets & Zeitlinger (accepted for publication).
- Nickerson, A., Japkowicz, N. and Milios, E., (2001),
Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets ,
in Proceedings of the Eighth International Workshop on Artificial
Intelligence and Statistics.
- Nugroho, A.S., Kuroyanagi, S. and Iwata, A.,
(2002), A Solution for Imbalanced Training Sets Problem by CombNET-II
and Its Application on Fog Forcasting , in
Transactions on Information and Systems, The Institute of Electronics,
Information and Communication Engineers (IEICE), Vol.E85-D, No.7,
pp. 1165-1174, July 2002.
- Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T. and Brunk, C.,
(1994), Reducing Misclassification Costs , in
Proceedings of the Eleventh International Conference on Machine Learning ,
pp. 217-225.
- Saerens M., Latinne P. & Decaestecker C. (2002), Adjusting the
outputs of a classifier to new a priori probabilities: A simple
procedure. Neural Computation,. 14 (1), pp. 21-41.
-
Riddle, P., Segal, R. and Etzioni, O., (1991),
Representation Design and Brute-Force Induction in a Boeing Manufactoring
Domain , in Applied Artificial Intelligence , Volume 8,
pp. 125-147.
- Wahba, G., Soft and Hard Classification by Reproducing Kernel Hilbert
Space Methods, TR 1067, October 2002. In Proceedings of the National
Academy of Sciences, 99, 16524-16530 (2002), available at http://www.stat.wisc.edu/~wahba -> TRLIST
- Gary M. Weiss and Foster Provost (2003). "Learning when Training Data
are Costly: The Effect of Class Distribution on Tree Induction" , Journal of
Artificial Intelligence Research, 19:315-354.
-
Yan, L., Dodier, R., Mozer, M.C., and Wolniewicz, R., Optimizing Classifier
Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic. In
Proceedings of the 20th International Conference on Machine Learning ,
2003.
- Zadrozny, B. and Elkan, C. (2001) Learning and Making Decisions When
Costs and Probabilities are Both Unknown in Proceedings of the
Seventh International Conference on Knowledge Discovery and Data Mining.