Cosine Similarity and Data Balancing Technique for the Augmentation of ML Models in the Identification of Cerebro Vascular Accident
Cerebro vascular accident, commonly referred to as stroke, represents the foremost cause of mortality and irreversible disability. The prompt detection of this ambiguous condition is essential in mitigating its adverse effects. Given that stroke is characterized by its classification-either arising...
Saved in:
| Published in | 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS) pp. 1476 - 1482 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
17.12.2024
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/ICICNIS64247.2024.10823348 |
Cover
| Summary: | Cerebro vascular accident, commonly referred to as stroke, represents the foremost cause of mortality and irreversible disability. The prompt detection of this ambiguous condition is essential in mitigating its adverse effects. Given that stroke is characterized by its classification-either arising from the rupture of blood vessels or from a clot resembling a plague that develops within the arteries, termed ischemic or hemorrhagic stroke-it is imperative that the issue be recognized as expeditiously as possible. The International Stroke Trail (IST) dataset encompasses only salient features relevant to the subject matter, comprising a total of 19,435 rows and 112 columns. Data preprocessing techniques, which include data imputation, data labeling, and various other methodologies, are employed on the assembled data to convert the unprocessed data into a usable format. It has also been observed that the dataset exhibits class imbalance, a phenomenon that is often inherent in clinical datasets. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is utilized to explore how the utilization of skewed data can yield high accuracy that may misrepresent the model's resilience along with cosine similarity is employed to determine the highly correlated features. In this investigation, machine learning algorithms such as Support vector machine (SVM), K- Nearest Neighbour (KNN), Adaboost classifier (ABC), and Logistic Regression (LR) are implemented, as the early identification of ischemic stroke is highly recommended. The SVM, when applied in conjunction with SMOTE and cosine similarity, achieved an accuracy of 86.4%. |
|---|---|
| DOI: | 10.1109/ICICNIS64247.2024.10823348 |