Faculty Members Publication
Can a Simple Approach Perform Better for Cross-Project Defect Prediction?
We introduce a transfer learning technique, correlation alignment, in software defect prediction.
Automatic Regression Parameter Selection: A Divide and Conquer based Approach
Manually selection of optimal hyper parameter in regression (Lass, Ridge, Elastic net) is time consuming as well as error prone. In this work we introduce "divide and conquer" based approach here to select hyper parameter automatically and efficiently.
Threat and abusive language detection on social media in Bengali language
Abstract:
Threat and abusive languages spread quickly through social media which can be controlled if we can detect and remove them. Since there exist many social media like Facebook, Twitter, Instagram etc and a huge number of social media users, we need a robust and effective automatic system to identify threat and abusive languages. In our proposed system Machine Learning and Natural Language Processing techniques have been implemented to build an automatic system. Previous research on Bengali abusive language detection used Multinomial Näıve Bayes (MNB), Support Vector Machine(SVM) algorithms and considered Bengali Unicode characters to build their system. We considered both Unicode emoticons and Unicode Bengali characters as valid input in our proposed system. Besides MNB and SVM algorithm, we implemented Convolutional Neural Network (CNN) with Long Short Term Memory(LSTM). Among three algorithms, SVM with linear kernel performed best with 78% accuracy.
The Challenges and Approaches during the Detection of Cyberbullying Text for Low-resource Language: A Literature Review
Abstract:
Article information: Objective: The primary intent of this paper is to review related studies that are more corresponding to the detection of five variants of cyberbullying text, such as abusive, hateful, aggressive, bully, and toxic comments or texts of Bengali language as a sample of low-resource language, to gain a comprehensive understanding of the challenges and state-of-the-art approaches used to identify these types of text. Materials: We have searched the associated articles on cyberbullying text detection in the Bengali language published from 2017 to July 2021 since there was no research being detected before the year 2017 on this domain-specific paradigm. After that, we scrutinize the different levels of aspects by inspecting the title, abstract, and entire text to enlist the subsequent research in this review study. Results: After applying different levels of filtering, from the initial search results, 28 domain-centric papers are considered out of 2,745 documents. At first, we deeply analyze the context of each study and then narrate a clear comparative review in case of research challenges and approaches, as well as providing the direction for the future work on the road to the detection of cyberbullying text for the Bengali language. Conclusion: In this paper, we discuss five variants of cyberbullying text, such as abusive text, hateful speech, aggressive text, bully text, and toxic comments over the web, and their detection process by studying existing literature in this domain. We present advice on dataset preparation, pre-process and feature extraction tasks, and classier selection that may aid in comprehensive research for better detection.
Link: https://ph01.tci-thaijo.org/index.php/ecticit/article/view/248039
Opinion Mining: Is Feature Engineering Still Relevant?
Abstract:
This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.
Education Certification and Verified Documents Sharing System by Blockchain
The emergence of new and improved technological advances created severe problems in the security state of the educational certification system. Throughout this paper, a proposal has been made to improve security. Here, Blockchain technology has been introduced as reliable secure storage for the educational certification system, providing an additional facility to the users. That is the validation and authentication of the student’s academic records. Moreover, for security purposes, Blockchain technology can replace the traditional academic certification system and contribute to a new model for sharing student information. After completion of data inclusion and hashing, the blocks will be inserted into the Blockchain network. This proposed model enhances document security and fraud reduction and additionally reduces a significant amount of authentication time almost up to double the current speed. With this system, we will get a certification process in which all data will be digitalized and secured in an unbreakable database with proper authentication and with a noticeable amount of time efficiency.
An ML-based decision support system for reliable diagnosis of ovarian cancer by leveraging explainable AI
Ovarian cancer (OC) is one of the most prevalent types of cancer in women. Early and accurate diagnosis is crucial for the survival of the patients. However, the majority of women are diagnosed in advanced stages due to the lack of effective biomarkers and accurate screening tools. While previous studies sought a common biomarker, our study suggests different biomarkers for the premenopausal and postmenopausal populations. This can provide a new perspective in the search for novel predictors for the effective diagnosis of OC. Genetic algorithm has been utilized to identify the most significant biomarkers. The XGBoost classifier is then trained on the selected features and high ROC-AUC scores of 0.864 and 0.911 have been obtained for the premenopausal and postmenopausal populations, respectively. Lack of explainability is one major limitation of current AI systems. The stochastic nature of the ML algorithms raises concerns about the reliability of the system as it is difficult to interpret the reasons behind the decisions. To increase the trustworthiness and accountability of the diagnostic system as well as to provide transparency and explanations behind the predictions, explainable AI has been incorporated into the ML framework. SHAP is employed to quantify the contributions of the selected biomarkers and determine the most discriminative features. Merging SHAP with the ML models enables clinicians to investigate individual decisions made by the model and gain insights into the factors leading to that prediction. Thus, a hybrid decision support system has been established that can eliminate the bottlenecks caused by the black-box nature of the ML algorithms providing a safe and trustworthy AI tool. The diagnostic accuracy obtained from the proposed system outperforms the existing methods as well as the state-of-the-art ROMA algorithm by a substantial margin which signifies its potential to be an effective tool in the differential diagnosis of OC.
A CNN Based Model for Plant Disease Classification using Transfer Learning
Global food security is seriously threatened by plant diseases, which annually cause large losses in agricultural productivity. Early diagnosis and accurate classification of plant diseases are required for disease management programs to be implemented promptly and efficiently. In the area of plant disease classification, Convolutional Neural Networks (CNN) have demonstrated encouraging results in recent years. In this study, we propose a CNN based approach for plant disease classification using a MobileNetV2 based model and transfer learning. The proposed model leverages the MobileNetV2 architecture, known for its lightweight and efficient design, making it well-suited for resource-constrained environments. The pre-trained MobileNetV2 model is modified using transfer learning to accommodate the goal of classifying plant diseases. The model benefits from the characteristics that have been learned from a large-scale dataset through the use of pre-trained weights, leading to improved generalization and reduced training time. We use a standard plant disease dataset with a filtering method as a preprocessing strategy in extended trials to assess the efficiency of the proposed approach. The performance of the model is compared using several cutting-edge techniques, including VGG16, AlexNet and InceptionV3. The experimental findings show that the suggested model performs competitively in classifying plant diseases, surpassing other approaches with an accuracy of 98.56%.
A Transformer Based Model for Twitter Sentiment Analysis using RoBERTa
In recent years, social media platforms, particularly twitter, have emerged as crucial sources of public opinion and sentiment. Analyzing sentiment on twitter data presents a significant challenge due to the platform's inherent characteristics, such as brevity, informality, and the prevalence of slang and emojis. This research paper proposes a method for twitter sentiment analysis by leveraging the power of a transformer-based model called RoBERTa. The proposed strategy employs RoBERTa due to its exceptional performance in various natural language processing tasks. Our system captures intricate contextual information and semantic nuances in tweets, making it well-suited for sentiment analysis on this challenging platform. To build an effective sentiment analysis system, the architecture is fine-tuned using a large corpus of twitter data, annotated with sentiment labels. Additionally, we explore various strategies to handle the unique characteristics of twitter data, including tokenization, handling hashtags, user mentions, and URLs, as well as the incorporation of emojis and emoticons. We compare the performance of our model with three other standard machine learning and deep learning models, such as Decision Tree (DT), Support Vector Machine (SVM), and Long Short Term Memory (LSTM) in order to show that our model is superior at correctly analyzing twitter sentiment. The model showcases an exceptional accuracy of 96.78%, highlighting its effectiveness in understanding and classifying sentiment within the context of tweets.
Enhancing E-Commerce Text Classification: A GRU-Based Approach for Improved Product Understanding
In the burgeoning landscape of e-commerce, the ability to accurately classify product texts is paramount for enhancing user experience and driving business success. Traditional approaches to text classification often struggle with the nuances and complexities inherent in e-commerce product descriptions. In this paper, we propose a novel approach utilizing Gated Recurrent Unit (GRU) to address these challenges and improve product understanding in e-commerce text classification tasks. Our model leverages the inherent sequential nature of product descriptions, effectively capturing long-range dependencies and semantic relationships within the text. We use a standard dataset in extended trials to demonstrate the superiority of our GRU-based approach over conventional methods in terms of classification accuracy and robustness across diverse product categories. Furthermore, we conduct comprehensive analyses to gain insights into the inner workings of our model and its ability to learn meaningful representations of e-commerce text data. The performance of the model is compared using several cutting-edge techniques, including Support Vector Machine (SVM), Random Forest (RF), and Long Short-Term Memory (LSTM) in order to show that our model is superior at correctly classifying e-commerce texts. The experimental findings show that the suggested model performs competitively in classifying e-commerce texts, surpassing other approaches with an accuracy of 98.35%. Our findings underscore the potential of GRU-based approaches for advancing the state-of-the-art in e-commerce text classification, offering promising avenues for future research and practical applications in the domain.
Conference Papers
Nishat Tasnim, Asraf Ullah Rahat, Dr. Md. Musfique Anwar “Retrieving Top K% Relevant Patterns for Relation Extraction in Bangla using Distant Supervision”, International Conference on Signal Processing, Information, Communication and System (SPICSON) 2024.
Journal Publication
Nishat Tasnim, Asraf Ullah Rahat, Tanjim Taharat Aurpa, Dr. Md. Musfique Anwar “Bangla-REX: A Distinct Dataset for Bangla Relation Extraction”, Data in Brief, 2025.
Conference proceedings
- M. A. K. Rifat, A. Kabir, and A. Huq, “An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction,” Procedia Computer Science, vol. 246, pp. 1905–1914, 2024, doi: https://doi.org/10.1016/j.procs.2024.09.704. [Presented at the 28th International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES 2024), as part of a special issue.]
Prevalence and User Perception of Dark Patterns: A Case Study on E-Commerce Websites of Bangladesh
Y. Sazid and K. Sakib
19th International Conference on Evaluation of Novel Approaches to Software Engineering | ENASE 2024
Commit Classification into Maintenance Activities Using In-Context Learning Capabilities of LLMs
Y. Sazid, S. Kuri, K. S. Ahmed, and A. Satter
19th International Conference on Evaluation of Novel Approaches to Software Engineering | ENASE 2024
Automated Detection of Dark Patterns Using In-Context Learning Capabilities of GPT-3
Y. Sazid, M. M. N. Fuad, and K. Sakib
30th Asia-Pacific Software Engineering Conference | APSEC 2023
Journal
Sanzana Karim Lora, M. Sohel Rahman, Rifat Shahriyar, “ConVerSum: A Contrastive Learning based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents”, ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 24, Issue 5, Article No.: 50, Pages 1 - 22, May 2025.
Journal
Sanzana Karim Lora, G. M. Shahariar, Tamanna Nazmin, Noor Nafeur Rahman, Rafsan Rahman, Miyad Bhuiyan, Faisal Muhammad Shah; “Ben-Sarc: A Self-Annotated Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation”, Natural Language Processing.
Journal
Sanzana Karim Lora, Ishrat Jahan, Rahad Hussain, Rifat Shahriyar, A.B.M Alim Al Islam; “A transformer-based generative adversarial learning to detect sarcasm from Bengali text with correct classification of confusing text”, Heliyon, volume 9, issue 12.
Conference Paper
Sanzana Karim Lora, Istiak Ahmed, Muhammad Abdullah Adnan, “Short Paper: A Cloud-based Distributed Approach for Social Media Sentiment Analysis using Machine Learning with Distributed Hyperparameter Tuning”, 11th International Conference on Networking, Systems, and Security. (11th NSysS 2024), Khulna, Bangladesh, ACM, New York, NY, USA.
Conference Paper
Sanzana Karim Lora, Nusrat Jahan, Shahana Alam Antora, Nazmus Sakib, “Detecting Emotion of Users’ Analyzing Social Media Bengali Comments Using Deep Learning Techniques”, in Proceedings of the 2nd International Conference on Advanced Information and Communication Technology 2020 (ICAICT 2020), Dhaka, Bangladesh, 2020, pp. 88-93.
Structure and dynamics of financial networks by feature ranking method
MI Rakib, A Nobi, JW Lee (2021). Structure and dynamics of financial networks by feature ranking method. Published on: Scientific Reports - 11, pp: 17618. doi.org/10.1038/s41598-021-
Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
MS Hossain, A.Q.M. SU Pathan, MN Islam, MIQ Tonmoy, MI Rakib, NM Bahadur, et al (2021). Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning. Published on: Informatics in Medicine Unlocked – 27, pp: 100798.
Feature ranking and network analysis of global financial indices
MI Rakib, MJ Hossain, A Nobi (2022). Feature ranking and network analysis of global financial indices. Published on: Plos One – 17(6), pp: e0269483. doi.org/10.1371/journal.pone.
Identification of comorbidities, genomic associations, and molecular mechanisms for COVID-19 using bioinformatics approaches
SBS Omit, S Akhter, HK Rana, ARM MH Rana, NK Podder, MI Rakib, A Nobi (2023). Identification of comorbidities, genomic associations, and molecular mechanisms for COVID-19 using bioinformatics approaches. Published on: BioMed Research International – 2023, pp: 6996307. doi.org/10.1155/2023/6996307 [Q2, IF: 3.41]
Modular structures of trade flow networks in international commodities
ZM Koli, A Nobi, MI Rakib, MJ Alam, JW Lee (2023). Modular structures of trade flow networks in international commodities. Published on: Sustainability – 15(22), pp: 15786. doi.org/10.3390/su152215786 [Q1, IF: 3.9]
Change in hierarchy of the financial networks: A study on firms of an emerging market in Bangladesh
MI Rakib, MJ Alam, N Akter, KH Tuhin, A Nobi (2024). Change in hierarchy of the financial networks: A study on firms of an emerging market in Bangladesh. Published on: Plos One – 19(5), pp: e0301725. doi.org/10.1371/journal.pone.
Long Short-Term Memory Autoencoder Based Network of Financial Indices
KH Tuhin, A Nobi, MI Rakib, JW Lee (2025). Long Short-Term Memory Autoencoder Based Network of Financial Indices. Published on: Humanit Soc Sci Commun–12(100). https://doi.org/10.1057/
Structure of global financial networks before and during COVID-19 based on mutual information
SS Hassan, MI Rakib, KH Tuhin, A Nobi (2023). Structure of global financial networks before and during COVID-19 based on mutual information. Published in: The Proceedings of International Conference on Machine Intelligence and Emerging Technologies (MIET 2022), LNICST – 491, Springer. doi.org/10.1007/978-3-031-
Entropy and relative entropy in the composition of commodities
MI Rakib, M Akter, SA Milu, A Nobi (2024). Entropy and relative entropy in the composition of commodities. Published in: The Proceedings of 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, IEEE Bangladesh. PP. 1-6. doi.org/10.1109/ICCIT60459.
Effect of network size on comparing different stock networks
KH Tuhin, A Nobi, MJ Sadique, MI Rakib, JW Lee (2024). Effect of network size on comparing different stock networks. Published on: Plos One – 18(12), pp: e0288733. doi.org/10.1371/journal.pone.
Greener and energy-efficient data center for blockchain-based cryptocurrency mining
Cryptocurrency mining data centers consume 100-200 times more energy than conventional office areas annually. Regulating power consumption, cooling mechanisms, and thermal control performance is crucial to creating a greener and more energy-efficient crypto-mining data center. This paper presents a new cryptocurrency mining data center design that is both environmentally friendly and energy-efficient. The design considers popular green and energy-saving data center cooling and temperature management approaches, as well as cost-effective operations. The total monthly cost of the proposed data center is 358025 USD, with renewable energy generating 68520 kW of electricity. The monthly profit from Bitcoin mining is 3200806.969 USD, while Ethereum mining is 2317353.503 USD. The PUE number is 1.04, and the DCiE is 96.15 percent. These statistics help determine the model’s conclusion.
An Energy-Efficient Virtual Machine Scheduling Algorithm in Cloud Data Center
Power consumption has a significant influence on resource allocation, which has a negative effect on the environment. To lessen the negative effects, an effective resource allocation algorithm is needed. In this paper, we suggested a unique hybrid approach for energy-efficient scheduling of virtual machines (VMs) in cloud data centers called Energy Efficient Particle Swarm Optimization (EE-PSO), which combines Genetic Algorithms (GA) and Particle Swarm Optimization (PSO). This integration is special since it makes use of both algorithms’ advantages to improve scheduling effectiveness and energy consumption. The novelty lies in how GA and PSO are combined, which differs from previous attempts where these algorithms were either used separately or in a less integrated manner. EE-PSO represents significant achievement toward achieving the goals of decreasing power usage and promoting green …
A Comparative Study of Convolutional Neural Network Architectures for Detecting Prostate
One of the most prevalent common cancers among men is prostate cancer. Therefore, early detection is crucial for effective treatment. This study aims to detect prostate cancer using four Convolutional Neural Network (CNN) architectures. We evaluated our trained models and found a lower Root Mean Square Error (RMSE) of 2.9960 on the validation set indicating that our model can accurately detect prostate cancer in medical images. Our study suggests a promising prostate cancer detection model that could help improve patients' early diagnosis and treatment outcomes.
Multi-Class Brain Tumour Classification with CNN on MRI Scans
The process of classifying brain tumors using MRI data is difficult to do accurately. The challenges and limitations of manual analysis in treatments can be overcome by implementing Computer-Aided Diagnosis (CAD) systems for categorizing brain tumors. This study assesses the performance of five well-known Transfer Learning CNN architectures - VGG19, VGG16, GoogleNet, ResNet-50, and DenseNet-121 - using magnetic resonance imaging (MRI) scans. The main goal is to classify brain malignancies into four groups: no tumor, pituitary, meningioma, and glioma. The focus of this research for the classification task was on data preparation, training on preprocessed data, and comparing each model using evaluation metrics such as f1-score, accuracy, recall, and precision. Following preprocessing and training, the evaluation of the five CNN models revealed promising results for the VGG models, with both …
SwiftCNN: A Deep Learning Model for B-ALL Diagnosis and Subtype Classification from Blood Smear Images
The most advanced methods available today are convolutional neural networks (CNNs), which are frequently used for image categorization tasks. This article uses sophisticated neural network models to explore the categorization of peripheral blood smear pictures for B-ALL diagnosis and its subtypes. We introduce a method for classifying images using a modified VGG19 model. Images are pre-processed at first before being input into the multi-class classification algorithms. We discovered throughout this study that the suggested methods improve model performance. Our study focused on two types of images: benign and malignant, as well as three subtypes of malignant lymphoblasts: Early Pre-B, Pre-B, and Pro-B ALL. The model has a validation loss of 0.1499 and an accuracy of 94.63%, whereas its training loss is 0.1127 and 96.97%, respectively. These findings demonstrate how well the VGG19-based model …
White Blood Cells to Classify Leukemic Blood Images Using Deep Learning and Image Processing
White Blood Cell (WBC) count is a significant task in identifying leukemia, a widely known malignancy that can be devastating gradually. Infantile WBCs existing in the sponge tissues of bone marrow affect the superfluous expansion, which in turn produces leukemia cancer. Deep learning and image processing techniques models can be applied in the field to detect leukemic blood and generate outstanding outcomes. Leukemia occurs from the leukocyte blood type, which is one kind of white blood cell. This proposed system introduces a method of classifying leukemic blood images and counting the number of white blood cells in a Leukemic blood image. This is a hybrid procedure combining deep learning and image processing techniques. A collection of 221 blood cell images available on a website known as ‘RaabinData’ is used, which were collected from patients at the Takht-e Tavoos Medical School …
Deep Learning Models for Classification of Red Blood Cells in Microscopy Images for Anemia Diagnosis
Anemia, a condition affecting human red blood cells (RBCs), presents in various forms. Different blood cell types and anemia variants exist. Addressing such a significant challenge requires integrating pathophysiology, advanced technology, and a comprehensive understanding of RBC classifications. In this pursuit, we utilize Deep Learning (DL) models to establish connections and propose innovative solutions to pathophysiological issues related to anaemia diagnosis through RBC classification. The customized Convolutional Neural Network (CNN) demonstrates exceptional performance, boasting a Training Accuracy of 99.42% and a Test Accuracy of 98.88%. Furthermore, the Training Loss is impressively low at 0.0232, while the Validation Loss remains minimal at 0.0964. The associated confusion matrix attests to the model's robust performance, affirming its accuracy in classification tasks.
Flattening the Recall Line Using a Voting Classifier for Forest Cover Type Data
To address the challenge of flattening the recall line in Forest Cover Type data classification, this study focuses on the application of a Voting Classifier. Forest cover is crucial for biodiversity preservation and climate regulation, and accurate classification of forest cover types is essential for effective forest management. The paper utilizes a dataset containing attributes related to forest cover, and models such as K-Nearest Neighbors (KNN), Extra Tree, Random Forest (RF), and Extreme Gradient Boosting (XGBoost) are employed. However, the individual performance of the models varies for recall. To overcome this, a Voting Classifier is introduced, which combines the predictions from multiple models using a majority or weighted vote. The experiments demonstrate the effectiveness of the Voting Classifier in flattening the recall line and enhancing the accuracy of forest cover type classification.
Green Task Scheduling Algorithm in Green-Cloud
Cloud-dedicated servers can better meet green computing standards by being ecologically friendly. “Green cloud computing” refers to utilizing information technology and other technological achievements to help the environment. Task scheduling is one of the biggest challenges in cloud-based systems that must be addressed to improve system efficacy and user experience. The primary goal of this study is to develop an algorithm that focuses on minimizing green cloud computing execution times while remaining environmentally friendly. We compared task scheduling algorithms based on execution time, such as FCFS, SJF, and Round Robin, to the approach we recommended, the generalized priority (GP) algorithm. We experimented with evaluating our technique using the CloudSim 3.0.3 simulator. The algorithm we suggested has the shortest runtime out of all the algorithms which is 97.91.