Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond.

Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond.

Naoki Abe
TJ Watson Research Research Center, IBM



Of various approaches to dealing with the issue of imbalanced datasets, many of them are based on some form of sampling, including the well-known "under" and "over-sampling" methods. In this talk, we will review and compare some of these methods, drawing where appropriate upon some recent progress made on the subject, with colleagues (B. Zadrozny and J. Langford). Methods covered include some applications of active learning and cost-sensitive learning.