The IoT Academy Blog

Machine Learning Aided Differentiation of Real and Fake News

  • Written By  

  • Published on September 14th, 2022

Table of Contents [show]

Since the start of the current millennium, technology has advanced quickly. This led to the introduction of many news channels in various media viz. electronic, including online, television, and print media. The rise of platforms and channels has set the stage for ever-increasing competition. Sensationalism has become a new way to attract audience attention, especially for electronic media, which is sometimes fueled by fake news. For billions of people, the Internet has emerged as the primary medium for information consumption. Our opinions and worldview are shaped by what we read and see online. Access to information is vital to democracy. Fake news is bleeding democracy from a thousand cuts by constantly hacking the truth.
While the spread of fake news is funded, supported, and encouraged by several vested interests and further fueled by human behavior, the technology that helps create it can also be used to combat it. Can the algorithms that exacerbate the effects of fake news also be used to suppress it and promote critical thinking on a mass scale? Machine learning holds promise in distinguishing between real news and fake news.

What is Fake News?


Fake news, a kind of yellow journalism, is material that may be a hoax and is typically disseminated via social media and other online media. This is typically accomplished through political agendas and is frequently done to advance or impose particular views. Such messages may contain false and/or exaggerated claims and are virtualized by algorithms, and users may end up in a filter bubble.

About Fake News Detection Using Python


We’ll start by importing NumPy, pandas, and re. “re” is a built-in package that represents a regular expression. A search pattern is created using a sequence of characters. Then we will import the ignored words. Icons are words that are not very significant, like a, an, etc. We import the icons from nltk.corpus, where “nltk” is a natural language toolkit and “corpus” is a repository of ignored words.
Lemmatization is done to convert a word into its basic form. Lemmatization is more contextual than stemming, which is another procedure that reduces a word to its fundamental form. WordNetLemmatizer from nltk.stem.wordnet is imported for this. The intrinsic Morphy feature of wordnets is used for lemmatization. A Python lemmatization library is called nltk.stem. The string is then imported for use with any classes and constants.
The TfidVectorizer is now imported from sklearn.feature extraction.text. TfidVectorizer is a term frequency-inverse document frequency that converts text into a meaningful collection of numbers. The numbers are used to adjust the machine’s algorithm for prediction. This uses the sklearn.feature_extraction package, which extracts features in a format supported by machine learning. Since this is a binary classification problem, we will use logistic regression to classify real and fake messages. Next, we import nltk and download the ignored words.
This advanced fake news detection python project deals with fake and real news. Using sklearn, we create a TfidfVectorizer on our dataset. We then initialize the PassiveAggressive Classifier and fit the model. The accuracy score and confusion matrix will ultimately tell us how well our model is doing.

Fake News Dataset


The dataset we will use for this python project – will call it news.csv. This data set has a shape of 7796?4. The first column labels the messages, the second and third have the title and text, and the fourth column has labels indicating whether the message is REAL or FAKE. The dataset takes up 29.2 MB .


Our Learners Also Read: What are the top Machine Learning tools?

Steps To Detect Fake News Using Python


Follow the steps below to detect fake messages and complete your first advanced Python Project :-

 1. Make the necessary imports:
 

    import numpy as np
    import pandas as pd
    import itertools
    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.linear_model import PassiveAggressiveClassifier
    from sklearn.metrics import accuracy_score, confusion_mat


2. Now we load the data into the DataFrame and get the shape of the data and the first 5 records.
“`
#Read the data
df=pd.read_csv(‘news.csv’)

#To Get shape and head
df.shape
df.head()
“`
3. And get the labels from the DataFrame.

#DataFlair – Get Labels
labels=df.label
labels.head()


4. Now Split the data set into training and test sets.

#DataFlair – Splitting the dataset
x_train, x_test, y_train, y_test=train_test_split(df[‘text’], labels, test_size=0.3, random_state=7)

5. Let’s begin by initializing the TfidfVectorizer with English stop words and a maximum document frequency of 0.7. (terms with a higher document frequency will be discarded).

Now customize and transform the vectorizer on the train set and the vectorizer on the test set.
#DataFlair – Initialize the TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words=’English’, max_df=0.7)

#DataFlair – Customize and transform the train set, transform the test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)

6. Next, we initialize the PassiveAggressiveClassifier. This is. We will place it on tfidf_train and y_train.

We then predict the test set from TfidfVectorizer and calculate the accuracy using accuracy_score() from sklearn.metrics.
#DataFlair – Initialize PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=50)
Pac.fit(tfidf_train,y_train)

#DataFlair – Predict the test set and calculate the accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f’Accuracy: {round(score*100.2)}%’)


7. We obtained an accuracy of 92.82% with this model. Finally, let’s print the confusion matrix to get an overview of the number of false and true negatives and positives.

#DataFlair – Build a confusion matrix
confusion_matrix(y_test,y_pred, labels=[‘FAKE’,’REAL’])

There are 589 true positives, 587 true negatives, 42 false positives, and 49 false negatives with this model.

Summary


In this blog, we have seen Machine Learning Aided Differentiation of Real and Fake News. We also discussed how to use python to identify fake news.

About The Author:

logo

Digital Marketing Course

₹ 9,999/-Included 18% GST

Buy Course
  • Overview of Digital Marketing
  • SEO Basic Concepts
  • SMM and PPC Basics
  • Content and Email Marketing
  • Website Design
  • Free Certification

₹ 29,999/-Included 18% GST

Buy Course
  • Fundamentals of Digital Marketing
  • Core SEO, SMM, and SMO
  • Google Ads and Meta Ads
  • ORM & Content Marketing
  • 3 Month Internship
  • Free Certification
Trusted By
client icon trust pilot
1whatsapp