« Previous Article
Next Article »

Original Research
Received: 14 Apr 2026, Accepted: 02 Jun 2026,
 


Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction

Shirina Samreen.

SUPPLEMENTARY FILES :

  • Supplementary file - 1

  • Abstract
    The rapid growth of Android applications has led to a significant increase in malware threats, making accurate and robust detection mechanisms essential for mobile security. However, challenges such as class imbalance and high-dimensional feature spaces limit the effectiveness of traditional machine learning approaches.

    This work proposes a robust machine learning pipeline for accurate detection of Android malware by integrating generative data augmentation and deep feature extraction with classical classification models. We employ Conditional Tabular Generative Adversarial Networks (CTGAN) to synthetically balance a permission- and API-based feature dataset (TUANDROMD), developed at Tezpur University from real benign and malicious Android applications. An autoencoder is then utilized to learn compact and discriminative latent representations from the original 241 numerical features, effectively reducing dimensionality and redundancy. The extracted features are used to train multiple machine learning classifiers, including Logistic Regression, Random Forest, and XGBoost, enabling a comparative evaluation of model performance.

    The models are assessed using accuracy, precision, recall, and F1-score under stratified validation and holdout testing. Four experimental configurations are investigated: (i) baseline classification using raw features, (ii) CTGAN-based data augmentation, (iii) autoencoder-based feature extraction, and (iv) CTGAN-based augmentation followed by autoencoder-driven feature extraction. Experimental results demonstrate that the combined CTGAN and autoencoder pipeline significantly improves minority-class detection while maintaining high overall accuracy. These findings highlight that integrating generative augmentation with learned feature representations is an effective strategy for handling high-dimensional, imbalanced Android malware datasets.

    Key words: Android Malware Detection; CTGAN, Autoencoder; Data Augmentation; Feature Extraction; Ensemble Learning; Imbalanced Data


     
    ARTICLE TOOLS
    Abstract
    PDF Fulltext
    How to cite this articleHow to cite this article
    Citation Tools
    Related Records
     Articles by Shirina Samreen
    on Google
    on Google Scholar


    How to Cite this Article
    Pubmed Style

    Shirina Samreen. Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences. 2026; 13(Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications): 33-45. doi:10.5455/jeas.2025060104


    Web Style

    Shirina Samreen. Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. https://jecasmu.org/?mno=317366 [Access: June 27, 2026]. doi:10.5455/jeas.2025060104


    AMA (American Medical Association) Style

    Shirina Samreen. Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences. 2026; 13(Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications): 33-45. doi:10.5455/jeas.2025060104



    Vancouver/ICMJE Style

    Shirina Samreen. Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences. (2026), [cited June 27, 2026]; 13(Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications): 33-45. doi:10.5455/jeas.2025060104



    Harvard Style

    Shirina Samreen (2026) Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences, 13 (Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications), 33-45. doi:10.5455/jeas.2025060104



    Turabian Style

    Shirina Samreen. 2026. Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences, 13 (Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications), 33-45. doi:10.5455/jeas.2025060104



    Chicago Style

    Shirina Samreen. "Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction." Journal of Engineering and Applied Sciences 13 (2026), 33-45. doi:10.5455/jeas.2025060104



    MLA (The Modern Language Association) Style

    Shirina Samreen. "Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction." Journal of Engineering and Applied Sciences 13.Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications (2026), 33-45. Print. doi:10.5455/jeas.2025060104



    APA (American Psychological Association) Style

    Shirina Samreen (2026) Android Malware Detection Using CTGAN-Based Data Augmentation and Autoencoder-Driven Feature Extraction. Journal of Engineering and Applied Sciences, 13 (Recent Trends in Computational Modelling of Thermo-Fluid Systems and Nanofluid Applications), 33-45. doi:10.5455/jeas.2025060104