Abstract: The imbalanced learning theory proposes varied distribution of data samples among different classes. According to this theory most of the samples get grouped under some classes and rest of the samples belong to the remaining classes. The solution for the problem can be provided by synthetic oversampling methods such as Majority Weighted Minority Oversampling Technique (MWMOTE). This method produces the artificial samples from the biased instructive alternative class samples by means of a clustering approach. Average-linkage agglomerative clustering is used to form clusters. The agglomerative clustering is not appropriate for large databases and has time complexity and high sensitive to noise. The proposed system introduces a clustering algorithm to adopt even for large database. Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) is used in the proposed system. BIRCH algorithm clusters incoming multi-dimensional metric dataset and produces the unsurpassed clustering with the available resources dynamically. Another approach called Random Under Sampling (RUS) decreases the number of majority class dataset by randomly eliminating majority class data points currently in the training data set. The approach of using oversampling and under sampling is called the Re-sampling Technique. The performance comparison between the two methods is performed with the 14 data sets taken from the UCI repository. Experimental result exposes that the proposed system is competent in time complexity and providing high quality.
S. Lavanya and S. Palaniswami, 2016. Hierarchical Sampling Techniques for Imbalanced Datasets. Asian Journal of Information Technology, 15: 2887-2896.