ETRI Knowledge Sharing Platform : Simple yet Effective Ensemble Feature Selection Using Hierarchical Binning

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Simple yet Effective Ensemble Feature Selection Using Hierarchical Binning

Cited - time in scopus

Download 13 time Share share

Authors: Jinho Park, Dohun Kim, Wonjong Kim

Issue Date: 2026-04

Citation: Applied Sciences (Switzerland), v.16, no.7, pp.1-26

ISSN: 2076-3417

Publisher: Multidisciplinary Digital Publishing Institute (MDPI)

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.3390/app16073404

Abstract: Feature selection is essential for improving classification performance and reducing overfitting in high-dimensional learning tasks. However, conventional importance-based methods often suffer from instability, model bias, and sensitivity to threshold settings. To address these limitations, we propose EFSHB (Ensemble Feature Selection using Hierarchical Binning), a hybrid ensemble framework that integrates importance-based sorting, bin-level greedy evaluation, iterative hierarchical refinement, and union-based integration of model-wise selected features. At each iteration, five tree-based models independently perform bin-wise greedy selection, and their selected subsets are merged through a union operation to form the feature set for the next iteration. This iterative process progressively refines the feature space while mitigating model-specific bias and promoting robust predictive performance across heterogeneous models. EFSHB was evaluated on nine high-dimensional benchmark datasets, including biomedical gene-expression, synthetic, proteomics, and speech-feature data. Across all datasets, EFSHB achieved the highest or near-highest classification accuracy, outperforming traditional Greedy Feature Selection (GFS), binning-based GFS (GFSB), and hierarchical binning GFS (GFSHB). On average, EFSHB improved accuracy for all classifiers, achieving mean gains of 14.0% over GFS and 13.3% over GFSHB. EFSHB also provided balanced feature reduction by avoiding excessive feature retention while preserving complementary informative features identified across models. In terms of computational efficiency, EFSHB reduced average feature selection time from 266 min (GFS) to 11 min, corresponding to a 24-fold speed-up. These results demonstrate that EFSHB achieves robust predictive performance and high computational efficiency, making it suitable for diverse high-dimensional applications.

Keyword: ensemble feature selection, AI-driven data processing, machine learning for data management, high-dimensional data analysis, big data analytics, hierarchical binning

KSP Keywords: Benchmark datasets, Classification Performance, Computational Efficiency, Data Management, Data analysis, Data processing, Ensemble feature selection, Feature data, Feature retention, Feature space, Greedy selection

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI