Python is a popular scripting language developed by Guido van Rossum in 1991. It is highly readable, interactive, high-level, object-oriented, and interpreted. It typically uses English terms instead of punctuation and has lesser syntactic structures than other programming languages.

Some of the features of Python include:

  • It uses new lines to complete a command.
  • Python relies on white space, indentation, and defines the scope.
  • It is procedural, object-oriented, and functional.

In this article, we will dive deeper into some topics related to Internet access in Python. We will be discussing the Urllib.Request and Urlopen() functions present in Python, which help in accessing the Internet using Python.

Python Training Course

Learn Data Operations in PythonExplore Course
Python Training Course

What Is Urllib?

In order to open URLs, we can use the urllib Python module. This Python module defines the classes and functions that help in the URL actions.

The urlopen() function provides a fairly simple interface. It is capable of retrieving URLs with a variety of protocols. It also has a little more complicated interface for dealing with typical scenarios, such as basic authentication, cookies, and proxies. Handlers and openers are objects that perform these services.

Python can also access and retrieve data from the internet, such as JSON, HTML, XML, and other formats. You can also operate directly with this data in Python. 

Fetching URLs With Urllib.Request With Syntax

We use urllib.request in the following way:

import urllib.request

with urllib.request.urlopen('<some url>/') as response:

html = response.read()

To temporarily store an URL resource in a location, we can use the tempfile.NamedTemporaryFile() and the shutil.copyfileobj() functions. 

Syntax

import shutil

import tempfile

import urllib.request

with urllib.request.urlopen('https://www.python.org/') as response:

    with tempfile.NamedTemporaryFile(delete=False) as tmp:

        shutil.copyfileobj(response, tmp)

with open(tmp.name) as html:

    pass

How to Open Url Using Urllib

After connecting to the Internet, import the urllib or the URL module.

Code

import urllib.request

webUrl=urllib.request.urlopen('https://www.python.org/')

print("result: "+str(webUrl.getCode()))

Output

result: 200 

Here, on running the code, if 200 is printed out as the result, that means that our HTTP request was successfully executed and processed, meaning our internet has worked fine.

The steps are highlighted below:

  • Import the urllib library.
  • Define the primary goal.
  • Declare the variable webUrl, then use the URL lib library's urlopen function.
  • The URL we're going to is www.python.org
  • After that, we are going to print the result code.
  • The getcode() function on the webUrl variable we had established is used to get the result code.
  • We'll convert it to a string so that it may be combined with our "result code" string.
  • This will be a standard HTTP code of "200," indicating that the request was properly handled.

Free Course: Python for Beginners

Master the fundamentals of PythonEnroll Now
Free Course: Python for Beginners

How to Read an HTML File for Your URL in Python?

By using the read() function in Python, we can read an HTML file in Python which will generate the HTML directly in the console.

Code (Python 3)

import urllib.request

webUrl=urllib.request.urlopen('https://www.python.org/')

print("result: "+str(webUrl.getCode()))

htmldata=webUrl.read()

print(htmldata)

Output

result: 200

<!DOCTYPE html>

<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->

<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->

<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->

<!--[if gt IE 8]><!-->

<html class="js no-touch geolocation fontface generatedcontent svg formvalidation placeholder boxsizing retina flexslide" lang="en" dir="ltr" data-darkreader-mode="dynamic" data-darkreader-scheme="dark" style=""><script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script>

………….………….………….………….

  <li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>

  <li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>

………….………….

………….

……

class="darkreader darkreader--sync" media="screen"></style><style type="text/css">#__wikibuy__ .__wikibuy.__onTop,#earny-root,#honeyContainer,#piggyWrapper,body~div:not(#gdx-bubble-host){position:absolute!important;z-index:100000!important}body[data-shop-url="https://www.honeybum.com"] header>.header{z-index:99999}.mm-slideout{z-index:auto}.sorry-for-this__empty-styles{position:relative;z-index:10000}</style><style class="darkreader darkreader--sync" media="screen"></style><div style="all: initial;"></div></div></body><grammarly-desktop-integration data-grammarly-shadow-root="true"></grammarly-desktop-integration></html>

Code (Python 2)

import urllib2

def main():

   webUrl = urllib2.urlopen("https://www.python.org/")

   print "result : " + str(webUrl.getcode()) 

   data = webUrl.read()

   print data

if __name__ == "__main__":

  main()

Output

result: 200

<!DOCTYPE html>

<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->

<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->

<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->

<!--[if gt IE 8]><!-->

<html class="js no-touch geolocation fontface generatedcontent svg formvalidation placeholder boxsizing retina flexslide" lang="en" dir="ltr" data-darkreader-mode="dynamic" data-darkreader-scheme="dark" style=""><script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script>

………….………….………….………….

<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>    

<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>

………….………….

………….

……

class="darkreader darkreader--sync" media="screen"></style><style type="text/css">#__wikibuy__ .__wikibuy.__onTop,#earny-root,#honeyContainer,#piggyWrapper,body~div:not(#gdx-bubble-host){position:absolute!important;z-index:100000!important}body[data-shop-url="https://www.honeybum.com"] header>.header{z-index:99999}.mm-slideout{z-index:auto}.sorry-for-this__empty-styles{position:relative;z-index:10000}</style><style class="darkreader darkreader--sync" media="screen"></style><div style="all: initial;"></div></div></body><grammarly-desktop-integration data-grammarly-shadow-root="true"></grammarly-desktop-integration></html>

The steps are highlighted below:

  • On the webURL variable, use the read() function.
  • The read variable allows you to read data files' contents.
  • Data is a variable that stores the complete content of the URL.
  • Run the code, and the data will be printed in HTML format.
Learn data operations in Python, strings, conditional statements, error handling, and the commonly used Python web framework Django with the Python Training course.

Learn Python Development Online

To get internet access using Python and fetching data from different websites, we use the Urllib.Request and the urlopen() function are readily available in Python. To get more such information on Python and its various libraries, consider getting more in-depth with Python concepts. 

To get more resourceful knowledge on Mobile and Software development using Python, check out Simplilearn’s Python Development Training, to get started with your knowledge-filled Python journey.

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.