Artificial Intelligence

Proactive Insider Threat Detection in Cloud Environments Using Machine Learning

Abstract

Insider threats represent one of the most persistent and difficult-to-detect security challenges in cloud based enterprise environments. Unlike external attacks, insider activities exploit legitimate access credentials, rendering traditional signature-based detection mechanisms largely ineffective. This paper presents an end to end insider threat detection framework that combines unsupervised machine learning with Security Information and Event Management (SIEM) visualization in a simulated cloud context. The study utilizes the CERT Insider Threat Dataset v6.2, developed and maintained by the Software Engineering Institute (SEI) at Carnegie Mellon University, which is widely recognized in academic research as a benchmark dataset for insider threat detection due to its realistic modeling of enterprise user behavior and availability of groundtruth labels. A structured data pipeline is designed encompassing data preprocessing, feature engineering through label encoding, and controlled threat injection to emulate four realistic insider attack behaviors: excessive data access (Get Object spamming), privilege escalation (Assume Role misuse), mass deletion (Delete Object bombing), and abnormal login activity. Two unsupervised anomaly detection models Isolation Forest and One-Class Support Vector Machine (SVM) are trained on normal behavioral patterns and evaluated against both authentic and injected anomalous events. Experimental results show that the Isolation Forest achieves a weighted F1-score of 0.92 with balanced precision and recall, while the One-Class SVM achieves an F1-score of 0.81 with higher recall. The F1-score is adopted as the primary evaluation metric due to its suitability for highly imbalanced security datasets, where both false negatives and excessive false positives carry significant operational cost. Model predictions are exported and ingested into Splunk Cloud, where custom dashboards provide real-time, analyst-oriented threat visualization. A reference architecture for direct pipeline integration using Splunk's HTTP Event Collector (HEC) is also proposed, with explicit acknowledgment that full live cloud deployment remains future work. In real-world enterprise and cloud environments, the proposed framework demonstrates practical applicability by enabling early detection of malicious insider behavior using existing audit logs and SIEM infrastructure, thereby reducing reliance on manual rule creation and lowering deployment costs. The results indicate that unsupervised anomaly detection combined with SIEM-based visualization offers a scalable and operationally feasible foundation for proactive insider threat mitigation, particularly in resource-constrained organizations lacking access to continuous labeled data or advanced threat intelligence feeds.

DOI: doi.org/10.63721/26JPAIR0125

To Read or Download the Article  PDF