How to Create a Fake News Detection System?

The world is changing at a rapid pace. Without question, the digital world has many benefits, but it also has certain drawbacks. There are numerous challenges in today's digital environment.

Data is crucial today; by 2020, 1.7 megabytes of data will be generated per second. As a result, numerous technologies are changing the world due to this massive volume of data. One is machine learning, which we are utilizing to detect fake news.

What is Fake News?

The fundamental definition of fake news is information that leads people wrong. Nowadays, fake news spreads like wildfire, and people share it without confirming it. This is frequently done to advance or enforce specific beliefs and is frequently accomplished through political agendas.

The ability to draw users to media organizations' websites is required to create online advertising revenue. As a result, it is vital to recognize fake news.

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

How to Create a Fake News Detection System?

To create a Fake news detection system and to make the system functional, python provides a bunch of libraries. To understand how to create a system using python and make it functional for the Fake News detection system, stay tuned till the end of the article.

Step 1: Importing Libraries.

Fake_News_Detection_System_Img_1.

To understand these libraries, please refer to the below table:-

Libraries

Functionality


pandas

Working with "relational" or "labeled" data can be simple and intuitive thanks to the Python module pandas, which offers quick, adaptable, and expressive data structures.

numpy

The Python package NumPy is used to manipulate arrays.

Additionally, it has matrices, Fourier transform, and functions for working in the area of linear algebra.


seaborn

A package called Seaborn uses Matplotlib as its foundation to plot graphs. In order to see random distributions, it will be used.


matplotlib

For the Python programming language & its NumPy numerical mathematics add-on, Matplotlib is a graphing library. It offers an object-oriented API for integrating charts into programs utilizing all-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.



sklearn

It includes a variety of classification, regression, and clustering methods, such as support vector machines, random forests, gradient boosting, k-means, and DBSCAN, and is built to work with Python's NumPy and SciPy scientific and numerical libraries.



train_test_split()

Machine learning algorithms applicable to prediction-based algorithms and applications are evaluated using the train-test split. We can compare the output of our own machine-learning model to that of other machines using this quick and simple process.


accuracy_score

This function computes subset accuracy in multilabel classification: the set of labels predicted for a sample must exactly match the corresponding set of labels in y true.



classification_report

A classification report is used to assess the accuracy of a classification algorithm's predictions. How many predictions are correct and how many are incorrect? True Positives, False Positives, True Negatives, and False Negatives are specifically utilized to predict the metrics of a classification report.


re

The functions in this module allow you to determine whether a given text fits a given regular expression, known as a regular expression.



string

You can use the Python library NLTK, or Natural Language Toolkit, for NLP. A large portion of the data you might be examining is unstructured and contains text humans can read. Preprocessing that data is necessary before you can programmatically evaluate it.

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

Step 2: Importing the Dataset

Link for the Dataset: click here to download the dataset.

Fake_News_Detection_System_Img_2

To understand this code, please refer to the below table:-

Function

Description


pd.read_csv

DataFrame is read from a comma-separated values (csv) file. Additionally enables optional file iteration or file segmentation.

head()

The top five rows of the dataframe are shown by default when using Python's head method.

Result:

1: Fake news data

2: True news data

Fake_News_Detection_System_Img_3

Fake_News_Detection_System_Img_4.

Step 3: Assigning Classes to the Dataset

Fake_News_Detection_System_Img_5

Step 4: Checking Number of Rows and Columns in the Dataset

Fake_News_Detection_System_Img_6

To understand this code, please refer to the below table:-

Function

Description

              shape

Using the shape function we can check how many rows and columns are present in the dataset.

Result:

Fake_News_Detection_System_Img_7

Step 5: Manual Testing for Both the Dataset

Fake_News_Detection_System_Img_8

        

Manual testing

The process of manually checking software for faults is known as manual testing. It requires a tester to act like an end user, using the majority of the application's capabilities to ensure proper behavior.

Become a Data Scientist with Hands-on Training!

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist with Hands-on Training!

Step 6: Assigning Classes to the Dataset

Fake_News_Detection_System_Img_9

Step 7: Merging Both the Dataset

Fake_News_Detection_System_Img_10

To understand this code, please refer to the below table:-

 pd.concat

We can add or merge two dataset using concat function.

Result:

Fake_News_Detection_System_Img_11

Step 8: Dropping Unwanted Columns

Fake_News_Detection_System_Img_12

To understand this code, please refer to the below table:-

      

 drop

The drop() function deletes the given row or column. The drop() method eliminates the selected column by specifying the column axis (axis='columns').

Step 9: Create a Function to Clean Text

Fake_News_Detection_System_Img_13.

To understand this code, please refer to the below table:-

Function

Description

lower()

An all-lower case string is produced by the lower() function.

re.escape()

Automatically escaping each space.

re.sub()

The Python Regular Expressions (re) module contains the sub() method. All instances of the supplied pattern that match are replaced by the replace string in the returned string. We must import the re-module first before we can utilize this function.

 string.punctuation

A pre-initialized string called punctuation is utilized as a string constant. Python's string.punctuation function returns all available punctuation 

Step 10: Applying Function to Text Column and Assigning X and Y

Fake_News_Detection_System_Img_14

To understand this code, please refer to the below table:-

       apply()

In Python, this method serves the same purpose as map(). It applies a function that is provided as input to a whole DataFrame. When working with tabular data, you need to define the axis your function should 

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

Step 11: Defining Training and Testing Data and Splitting Them Into &5 -25 Percent Ratio.

Fake_News_Detection_System_Img_15

Step 12: Converting Raw Data Into Matrix for Further Process.

Fake_News_Detection_System_Img_16

To understand this code, please refer to the below table:-

TfidVectorizer

The TfidfVectorizer turns a set of raw documents into a TF-IDF feature matrix. Python implementation of Us with and Word2Vec word embeddings.

 fit_transform

It is used to train data in order to scale it and learn the scaling parameters.

Step 13: Creating First Model.

Fake_News_Detection_System_Img_17.

To understand this code, please refer to the below table:-

LogisticRegression

Based on a collection of independent variables, logistic regression assesses the likelihood of an event occurring, such as voting or not voting. Because the outcome is a probability, the dependent variable is limited to values between 0 and 1.

LR.fit

Linear regression fits a line to the data in order to predict a new quantity, whereas logistic regression fits a line in order to optimally distinguish the two classes. The input data is given by X with n examples, and the output by y with one output for each input.

Step 14: Checking the Model Accuracy and Classification Report

Fake_News_Detection_System_Img_18.

Result:

Logistic Regression Model Accuracy:

Fake_News_Detection_System_Img_19.

Classification Report:

Fake_News_Detection_System_Img_20.

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

Step 15: Creating a Second Model.

Fake_News_Detection_System_Img_21

To understand this code, please refer to the below table:-

DecisionTreeClassifier

The DecisionTreeClassifier class may conduct multi-class classification on a dataset. If numerous classes have the same and highest probability, the classifier will forecast the class with the lowest index among those classes.

Step 16: Checking the Model Accuracy and Classification Report

Fake_News_Detection_System_Img_22.

Result:

DecisionTreeClassifier Model Accuracy:

Fake_News_Detection_System_Img_23

Classification Report:

Fake_News_Detection_System_Img_24

Step 17: Checking Fake News

Fake_News_Detection_System_Img_25

Fake_News_Detection_System_Img_26.

Here you have to give input of random news to check whether it’s fake or not

Example:

Fake_News_Detection_System_Img_27.

Looking forward to a successful career in AI and Machine learning. Enrol in our Professional Certificate Program in AI and ML in collaboration with Purdue University now.

Conclusion

Overall, this article explains how he/she can create their own fake news detection system using python. If you want to enhance your skills further, you can check Simplilearn’s Professional Certificate Program in AI and Machine Learning. This course will help you hone the essential skills and make you job-ready. Do you have any questions for us? Please mention it in the comment section of the "Your Own Fake News Detection System" article, and we'll have our experts answer it at the earliest.

About the Author

Mayank BanoulaMayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.