How to Create a Fake News Detection System?

The world is changing at a rapid pace. Without question, the digital world has many benefits, but it also has certain drawbacks. There are numerous challenges in today's digital environment.

Data is crucial today; by 2020, 1.7 megabytes of data will be generated per second. As a result, numerous technologies are changing the world due to this massive volume of data. One is machine learning, which we are utilizing to detect fake news.

What is Fake News?

The fundamental definition of fake news is information that leads people wrong. Nowadays, fake news spreads like wildfire, and people share it without confirming it. This is frequently done to advance or enforce specific beliefs and is frequently accomplished through political agendas.

The ability to draw users to media organizations' websites is required to create online advertising revenue. As a result, it is vital to recognize fake news.

Caltech AI & Machine Learning Bootcamp

Advance Your AI & ML Career With Our BootcampEnroll Now
Caltech AI & Machine Learning Bootcamp

How to Create a Fake News Detection System?

To create a Fake news detection system and to make the system functional, python provides a bunch of libraries. To understand how to create a system using python and make it functional for the Fake News detection system, stay tuned till the end of the article.

Step 1: Importing Libraries.


To understand these libraries, please refer to the below table:-




Working with "relational" or "labeled" data can be simple and intuitive thanks to the Python module pandas, which offers quick, adaptable, and expressive data structures.


The Python package NumPy is used to manipulate arrays.

Additionally, it has matrices, Fourier transform, and functions for working in the area of linear algebra.


A package called Seaborn uses Matplotlib as its foundation to plot graphs. In order to see random distributions, it will be used.


For the Python programming language & its NumPy numerical mathematics add-on, Matplotlib is a graphing library. It offers an object-oriented API for integrating charts into programs utilizing all-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.


It includes a variety of classification, regression, and clustering methods, such as support vector machines, random forests, gradient boosting, k-means, and DBSCAN, and is built to work with Python's NumPy and SciPy scientific and numerical libraries.


Machine learning algorithms applicable to prediction-based algorithms and applications are evaluated using the train-test split. We can compare the output of our own machine-learning model to that of other machines using this quick and simple process.


This function computes subset accuracy in multilabel classification: the set of labels predicted for a sample must exactly match the corresponding set of labels in y true.


A classification report is used to assess the accuracy of a classification algorithm's predictions. How many predictions are correct and how many are incorrect? True Positives, False Positives, True Negatives, and False Negatives are specifically utilized to predict the metrics of a classification report.


The functions in this module allow you to determine whether a given text fits a given regular expression, known as a regular expression.


You can use the Python library NLTK, or Natural Language Toolkit, for NLP. A large portion of the data you might be examining is unstructured and contains text humans can read. Preprocessing that data is necessary before you can programmatically evaluate it.

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

Step 2: Importing the Dataset

Link for the Dataset: click here to download the dataset.


To understand this code, please refer to the below table:-




DataFrame is read from a comma-separated values (csv) file. Additionally enables optional file iteration or file segmentation.


The top five rows of the dataframe are shown by default when using Python's head method.


1: Fake news data

2: True news data



Step 3: Assigning Classes to the Dataset


Step 4: Checking Number of Rows and Columns in the Dataset


To understand this code, please refer to the below table:-




Using the shape function we can check how many rows and columns are present in the dataset.



Step 5: Manual Testing for Both the Dataset



Manual testing

The process of manually checking software for faults is known as manual testing. It requires a tester to act like an end user, using the majority of the application's capabilities to ensure proper behavior.

Become a Data Scientist With Real-World Experience

Data Scientist Master’s ProgramExplore Program
Become a Data Scientist With Real-World Experience

Step 6: Assigning Classes to the Dataset


Step 7: Merging Both the Dataset


To understand this code, please refer to the below table:-


We can add or merge two dataset using concat function.



Step 8: Dropping Unwanted Columns


To understand this code, please refer to the below table:-



The drop() function deletes the given row or column. The drop() method eliminates the selected column by specifying the column axis (axis='columns').

Step 9: Create a Function to Clean Text


To understand this code, please refer to the below table:-




An all-lower case string is produced by the lower() function.


Automatically escaping each space.


The Python Regular Expressions (re) module contains the sub() method. All instances of the supplied pattern that match are replaced by the replace string in the returned string. We must import the re-module first before we can utilize this function.


A pre-initialized string called punctuation is utilized as a string constant. Python's string.punctuation function returns all available punctuation 

Step 10: Applying Function to Text Column and Assigning X and Y


To understand this code, please refer to the below table:-


In Python, this method serves the same purpose as map(). It applies a function that is provided as input to a whole DataFrame. When working with tabular data, you need to define the axis your function should 

Join the Ranks of AI Innovators

UT Dallas AI and Machine Learning BootcampEXPLORE PROGRAM
Join the Ranks of AI Innovators

Step 11: Defining Training and Testing Data and Splitting Them Into &5 -25 Percent Ratio.


Step 12: Converting Raw Data Into Matrix for Further Process.


To understand this code, please refer to the below table:-


The TfidfVectorizer turns a set of raw documents into a TF-IDF feature matrix. Python implementation of Us with and Word2Vec word embeddings.


It is used to train data in order to scale it and learn the scaling parameters.

Step 13: Creating First Model.


To understand this code, please refer to the below table:-


Based on a collection of independent variables, logistic regression assesses the likelihood of an event occurring, such as voting or not voting. Because the outcome is a probability, the dependent variable is limited to values between 0 and 1.

Linear regression fits a line to the data in order to predict a new quantity, whereas logistic regression fits a line in order to optimally distinguish the two classes. The input data is given by X with n examples, and the output by y with one output for each input.

Step 14: Checking the Model Accuracy and Classification Report



Logistic Regression Model Accuracy:


Classification Report:


Caltech AI & Machine Learning Bootcamp

Advance Your AI & ML Career With Our BootcampEnroll Now
Caltech AI & Machine Learning Bootcamp

Step 15: Creating a Second Model.


To understand this code, please refer to the below table:-


The DecisionTreeClassifier class may conduct multi-class classification on a dataset. If numerous classes have the same and highest probability, the classifier will forecast the class with the lowest index among those classes.

Step 16: Checking the Model Accuracy and Classification Report



DecisionTreeClassifier Model Accuracy:


Classification Report:


Step 17: Checking Fake News



Here you have to give input of random news to check whether it’s fake or not



Looking forward to a successful career in AI and Machine learning. Enrol in our Professional Certificate Program in AI and ML in collaboration with Purdue University now.


Overall, this article explains how he/she can create their own fake news detection system using python. If you want to enhance your skills further, you can check Simplilearn’s Professional Certificate Program in AI and Machine Learning. This course will help you hone the essential skills and make you job-ready. Do you have any questions for us? Please mention it in the comment section of the "Your Own Fake News Detection System" article, and we'll have our experts answer it at the earliest.

About the Author

Mayank BanoulaMayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.