ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Simple yet Effective Ensemble Feature Selection Using Hierarchical Binning
Cited - time in scopus Download 13 time Share share facebook twitter linkedin kakaostory
Authors
Jinho Park, Dohun Kim, Wonjong Kim
Issue Date
2026-04
Citation
Applied Sciences (Switzerland), v.16, no.7, pp.1-26
ISSN
2076-3417
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.3390/app16073404
Abstract
Feature selection is essential for improving classification performance and reducing overfitting in high-dimensional learning tasks. However, conventional importance-based methods often suffer from instability, model bias, and sensitivity to threshold settings. To address these limitations, we propose EFSHB (Ensemble Feature Selection using Hierarchical Binning), a hybrid ensemble framework that integrates importance-based sorting, bin-level greedy evaluation, iterative hierarchical refinement, and union-based integration of model-wise selected features. At each iteration, five tree-based models independently perform bin-wise greedy selection, and their selected subsets are merged through a union operation to form the feature set for the next iteration. This iterative process progressively refines the feature space while mitigating model-specific bias and promoting robust predictive performance across heterogeneous models. EFSHB was evaluated on nine high-dimensional benchmark datasets, including biomedical gene-expression, synthetic, proteomics, and speech-feature data. Across all datasets, EFSHB achieved the highest or near-highest classification accuracy, outperforming traditional Greedy Feature Selection (GFS), binning-based GFS (GFSB), and hierarchical binning GFS (GFSHB). On average, EFSHB improved accuracy for all classifiers, achieving mean gains of 14.0% over GFS and 13.3% over GFSHB. EFSHB also provided balanced feature reduction by avoiding excessive feature retention while preserving complementary informative features identified across models. In terms of computational efficiency, EFSHB reduced average feature selection time from 266 min (GFS) to 11 min, corresponding to a 24-fold speed-up. These results demonstrate that EFSHB achieves robust predictive performance and high computational efficiency, making it suitable for diverse high-dimensional applications.
Keyword
ensemble feature selection, AI-driven data processing, machine learning for data management, high-dimensional data analysis, big data analytics, hierarchical binning
KSP Keywords
Benchmark datasets, Classification Performance, Computational Efficiency, Data Management, Data analysis, Data processing, Ensemble feature selection, Feature data, Feature retention, Feature space, Greedy selection
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY