SMS Spam Classification

Project Domain / Category

Data Science/Machine Learning

Abstract / Introduction

As we all know, SMS (Short Message Service) is one of the most cost-effective and widely used services in the mobile network. It boasts a high response rate and a high level of anonymity, as well as a reliable and personal service. As a result, unsolicited SMS, often known as spam SMS, will emerge, posing a variety of problems for mobile users. One of the most difficult difficulties in the internet and wireless network is identifying spam messages.

We will utilise the PYTHON text classification algorithm to identify and classify spam messages in this project. By using appropriate algorithms (such as NaiveBayes, NaiveBayesMultinomial, and J48, among others) on the SMS Dataset, we will be able to determine accuracy, time, and error rate, as well as comparing which approach is better for text categorization. • The data set for SMS spam collection includes both spam and non-spam texts.

Requirements for functionality:

  1. Pre-processing • Because the majority of data in the real world is incomplete, with noisy and missing values. As a result, we’ll need to pre-process your data.
  2. Feature Picking • Following the pre-processing stage, we apply the feature selection method, which is the Best First Feature Selection algorithm in this case.
  3. Use Spam Filtering Algorithms • Handle Data: Load and split the dataset into training and test datasets. • Summarize Data: in order to calculate probabilities and generate predictions, we need to summaries the properties in the training dataset. · Make a Prediction: Using the dataset summaries, make a single prediction. • Make Predictions: Generate predictions given a test dataset and a summarized training dataset. • Assess Accuracy: Assess the accuracy of predictions made for a test dataset as a percentage of all predictions that were correct.
  4. Split data into 75 percent training and 25 percent testing data sets to train and test.
  5. Confusion Matrix • To describe the performance of a classification model, create a confusion matrix table.
  6. Accuracy • Compare the accuracy of all algorithms.

Tools:
Anaconda Python
Artificial Intelligence Concepts and Machine Learning are required.

Leave a Comment