Welcome to lesson four Python Environment Set up and Essentials of the Data Science with Python tutorial, which is a part of the Python for Data Science Certification Training Course.
In this lesson, we will learn how to install the Anaconda Python Distribution platform and the Jupyter notebook it supports. We shall also go through some basic Python concepts that will come in handy in the upcoming sections.
By the end of this lesson on Python Environment Set up and Essentials, you'll be able to learn:
How to install Anaconda and Jupyter notebook
Some of the important data types supported by Python
Data structures such as lists, tuples, sets, and dicts
Slicing and accessing the four data structures
Few basic operators and functions
Some important control flow statements
We have already seen how Python and its libraries can efficiently tackle every stage of data analytics and why it is such a popular tool among data scientist.
Although there are several python distributors, one of the most popular and preferred distributors is Anaconda. That is because of the reasons shown below:
For all these reasons and more, we recommend that you use Anaconda, even if you may have a different Python platform already installed in your system.
Currently, there are two versions of Python:
Python 2.7
Python 3.5
You can download and use either of them, though the 2.7 version is preferable. That's because most of the advanced libraries and modules still support Python 2.7 only. And this support is still growing. In this tutorial, we would be using the 2.7 version.
Let us see how to install Anaconda Python Distribution on different platforms.
Windows | Mac OS | Linux |
Website URL: https://www.continuum.io/downloads Graphical Installer
|
Website URL: https://www.continuum.io/downloads Graphical Installer:
Command Line Installer:
Python 2.7: bash Anaconda2-4.0.0-MacOSX-x86_64.sh |
Website URL: https://www.continuum.io/downloads Command Line Installer
Python 2.7: bash Anaconda2-4.0.0-Linux-x86_64.sh |
Jupyter is an open source and an interactive, web-based python interface for data science and scientific computing. Some of its advantages are:
it provides a very rich and powerful python language support.
It allows you to create and share your Jupyter notebook and also contribute to other notebooks online.
It has several interactive widgets, which make data manipulation and data visualization easier in real time.
It seamlessly integrates with big data platforms such as Hadoop and Spark and performs data analysis more productively and efficiently.
Let us create a basic Python Jupyter notebook as shown in the following image.
It's running as a web application at port 8888 on localhost.
As seen in the image, we have first imported the sys module and verified the version of the downloaded Anaconda platform.
We can also import the ‘platform’ library to view the Python version, as seen in line 3.
Next, we try to create and print a test string.
As you can see in line 5 of the code, the Python interpreters successfully generate the string output.
We can also try out some basic mathematical operations, such as addition and multiplication, to test if the installation is working well.
A variable can be assigned or bound to any value. When you assign a value to a variable, you're creating references and not duplicates of the value.
Let us see how values are assigned to variables.
Remember that the variable should appear on the left, followed by an equal sign, and then the value itself. For example:
y = 2.1
As we can see in the image, no matter what the data type of the assigned value is (integer, float or string), you don't need to separately specify the data type of the variable.
The variable directly takes on the data type of the assigned value.
Let's, look at an example.
Here we assign a string value and integer value to the two variables. To print these variables, type print, followed by the variable name.
To view the data type of the variables, use the ‘print type’ method and specify the variable names within the parentheses.
Consider the image shown below:
Now, let's see if we can access a variable without actually defining it. We can see that this throws an error.
Let's fix it by assigning a value to the variable. This proves that a variable can only be accessed if it is defined or has an assignment. You can also make multiple assignments simultaneously. To view the variables, type their names.
Remember that when you assign a value to a variable, you are creating references and not duplicates of the value. Let's try to understand what that means. Consider the image shown below:
In this example, we're assigning x the value 7. Since seven is an integer, data value x becomes an integer data type as well. Now at the backend, what happens is that an integer value 7 is created and stored in memory.
Then a name or variable ‘x’ is created. It is assigned a reference address of the memory location, where the value 7 is stored, so ‘x’ refers to 7 and therefore holds the value 7.
If we increment x by one, then the reference of the name x is looked up. The value at the reference has been retrieved. The 7+1 calculation occurs, producing a new data element 8, which is stored in a fresh memory location with a new reference.
The variable ‘x’ now refers to this new address. The old value is now no longer needed and is therefore discarded.
Let us look at some basic data types in Python. The two main numeric Python types are:
Integer
Float
Floats are decimal numbers, that's how they're referred to in all programming languages. The size of the integer, which can be stored as an ‘int’, is dependent on your platform, whether it's thirty-two or sixty-four bit.
But large integers are automatically converted to long type by Python.
Consider the image shown below:
Here you can see that we divide two numbers. It's important to remember that if the numerator and denominator are integers, then the result will also be an integer, even if the accurate answer is a float.
However, if either the numerator or denominator is a float, then the result will also be afloat.
So you can see that mathematical operations in python largely depend on the data types of the numbers involved.
Python has extremely powerful and flexible built-in string processing capabilities. There are multiple ways to create string objects.
You can enclose them with single quotes, double quotes or three double quotes. All three ways generate similar outputs as seen in the following image.
Python also supports the None and Boolean data types. None is the python Null value type. If a function does not explicitly return a value, it implicitly returns None.
The two Boolean values in python are written as True and False. Comparisons and other conditional expressions evaluate to either true or false.
In this example, a variable is assigned a value ‘None’. To check if the assignment occurred correctly, you can use the keyword ‘is’ as shown in the image.
You can see that it returns a boolean value, which is ‘True’ in this case. Now assign an integer value to the same variable and check it again. As expected, it returns False this time.
You can also type cast a number from one data type to another. Consider the image shown below:
Let's define a float as shown and then print its data value. To cast this float value to an integer, use the int() function as shown in the image.
You can see that it generates an integer output.
Similarly, to cast a float value to a string, use the str() function, and it will return a string output.
A tuple is a one dimensional, fixed length, immutable sequence of python objects. Immutable implies that its content cannot be modified. The easiest way to create a tuple is to provide a comma-separated sequence of values.
In this example, a tuple was assigned a bunch of mixed comma-separated data type values enclosed within parentheses. Let's view it by referencing the tuple object.
To access the tuple element at index one, use the syntax shown here:
The index usually starts from zero. If you try to modify the value at a specified index, it throws an error since a tuple is immutable. As you saw earlier you can use the index of an element to view and access it. To access elements with the help of positive indices, count from the left starting with zero.
You can also use negative indices by counting from the right starting with a negative one. Negative indices are useful as they help you to easily refer two elements at the end of a long tuple.
We have seen how to access individual elements in a tuple. We can also access a range or slice of elements within a tuple. Slicing allows you to create a subset of the tuple. To slice a tuple, mention the indices of the first element and that of the element immediately after the last element.
This is because while the first index is inclusive, the second one is not. For example, here you can see that referencing indices 1:4 creates a tuple subset with elements from index one to three.
You can also use negative indices to slice tuples as shown here.
In contrast to tuples, the length of lists is variable and their contents can be modified. They could be defined using square brackets or using the list type function.
Here a list is defined using comma separated values of mixed data types. You can view the content of the list by just referring to the list object. You can use the append method to add a value to the list. Note that this value gets added to the end of the list.
You can also remove any particular item by just referring to the element value.
You can see in the output of line 164 that “Mark” is no longer a part of the list.
You can use the pop method to simultaneously view and remove the value at a particular index. Similarly, use the insert method to insert a value at a particular index.
Just like tuples, you can access the elements in a list through indices. We know that positive indices are counted from the left starting with zero. By providing the positive index, you can access the specified list element.
Consider the image shown below:
Recall that negative indices are counted from the right starting with negative one. Since -2 refers to the second element from the right in the list, the value 11 is generated as the output.
As we sliced tuples, we can also slice lists. Recall that to slice a tuple, the indices of the first element and the element immediately after the last element must be specified.
The same is applicable to lists. You can also use negative indices to slice lists as shown below:
Dictionary is likely the most important built-in Python data structure. Dictionaries are mappings of a set of keys to a set of values. Keys are variables and they're listed together with the values assigned to them.
This forms key-value pairs. The keys can be of any immutable type and the values can be of any type. A dictionary is a flexibly sized collection of key-value pairs where keys and values are python objects. You can define, modify, view, lookup, and delete the key-value pairs in the dictionary.
The difference between view and lookup is that View allows you to view the entire object while Lookup lets you access a particular element within the object. You can create a dictionary using colons to separate keys and values enclosed within curly brackets.
In this example, the dictionary contains three key-value pairs separated by colons and enclosed within curly brackets. You can view the content of the dictionary by referring to the dict object. Use the keys method to view all the keys. Similarly, use the values method to view all the values present in the dictionary.
You can also access and modify individual elements in a dict. To access the value, pass the key name to the dict object. In the example shown above, passing the name ‘Kelly’ retrieves the associated value which is the email id.
Similarly, passing id retrieved the values associated with it. You can access only one value through the key. Use the update method to update the value for a corresponding key.
In the above image, you can see how the values for id are updated.
You can also delete a key with the help of the delete function. You can see that the id key and the values associated with it have been deleted from the dict.
A set is an ordered collection of unique elements. You can think of them as dicts but without values. You can create a set using either the set function or by listing all the elements within curly brackets.
If you check the object's type, you can see that it shows up as a set. To view the set, type its name. Note that BMW and GM which are mentioned twice, appear only once since a set only contains unique elements.
Let's understand set operations through an example. Create two separate sets of the auto survey. Now try to generate a combined survey report using the or operation which is a union operation. Note that the combined survey report does not contain any duplicate values. Use the and operation which is an intersection operation to view the common elements between both sets.
We can now look at some basic operators.
The ‘in’ operator is used to generate a boolean value to indicate whether a given value is present in the container or not. You can use it to verify the presence of both strings and substrings or characters.
The ‘plus’ operator produces a new tuple, list, or string whose value is the concatenation of its arguments. Here you can see how two tuples, lists, and strings are concatenated.
The ‘multiplication’ operator produces a new tuple, list, or string that repeats the original content. Please note that it does not actually multiply the values. It only repeats the values for the specified number of times.
Functions are the primary and most important methods of code organization and reuse in python. Each function can have some number of positional arguments and some number of keyword arguments.
A function is usually created using the keyword ‘def’.
A few basic properties of a function are:
the outcome of the function is communicated by a return statement
arguments in parentheses are basically assignments
Here are a few key considerations while dealing with functions.
A function has to have a return value.
If the return is not defined, then it returns ‘None’
Functions Overloading is not permitted. Function Overloading happens when you have more than one function with the same name. Some programming languages, such as Java, permit this, but python does not.
You can use a function to return a single value or multiple values.
Consider the code shown below:
In the first example, a single value, which is the sum of the two numbers is returned.
In the second example, three values i.e. the age, height, and weight are returned using the same function.
Python has built-in sequence functions to make the computations faster and easier. Here are some examples of built-in sequence functions, which we would be using in the tutorial.
These include:
enumerate
This keeps track of indices and corresponding data mapping. It enables loop and has an automatic counter.
sorted
It returns the new sorted list for the given sequence.
reversed
This iterates the data in reverse order.
Zip
It creates lists of tuples by pairing up elements of lists, tuples, or other sequences.
Let's take a look at the enumerate built-in function.
In this example, a list of stores is passed in-store list. Then the enumerate function is used to print position or index and its corresponding data elements. We can use this function to create dicts.
Pass the name and index of the list using enumerate function to create a dict with key and value pair. The output returns the food store names and its corresponding index positions.
As the name suggests, it's mainly used to sort values, both numbers, and strings.
Consider the code shown below:
In the first example, a list with random value is sorted.
In the second example, the string value “the data science” is sorted as characters present in the string.
Next, let's, see how to use reverse and zip built-in functions.
First, create a list of numbers using the range function. Here the range is 15. Now use reversed function to view the list in the reverse order.
In the second example, let's declare two lists. The first one is for subjects with its values ‘math’, ‘statistics’, and ‘algebra’.
The second list is for subject counting. It declares the values as ‘one’, ‘two’ and ‘three’.
Now use the zip function to pair the data elements of subjects and subject_count.
The output returns a list of tuples in it.
The type function will return the type of the variable, which is a list in this case.
The if statement is one of the most well-known types of control flow statements. It checks the condition, which if true, evaluates the code in the block that follows.
Consider the code shown below:
Here, if age is more than 18, ‘adult’ is printed.
An if statement can be optionally followed by one or more ‘elif’ blocks and ‘catch all else’ block. If all of the conditions are false, if any of the intermediate conditions is true, no further elif or else blocks will be reached.
In this example, since the marks equal 81, grade B is printed out.
For loops are used to iterate over a collection like a list, tuple, or an iterator. In this example, a for loop is used to iteratively print out the list of stock tickers.
The function ‘continue’ is used to continue the operation if the condition is met. While the break operation is used to exit the loop.
A while loop specifies a condition and a block of code that is to be executed until the condition evaluates to false or the loop is explicitly ended with break.
In this example, the while loop exits after printing a temperature value greater than 95°F.
Handling python errors or exceptions gracefully is an important part of building robust programs and algorithms. In data analysis applications, many functions only work on certain kinds of input.
Here, in this example, we have created a function which accepts the number and returns the float value. It worked fine for number values, but the moment you pass a string value to the function, it throws a value error.
We can use the try-except block to handle the exception. This helps generate a graceful exit of the program or algorithm, as shown here.
In this lesson, we learned the following topics:
Download Python 2.7 version from Anaconda and install Jupyter notebook.
When you assign values to variables, you create references and not duplicates.
Integers, floats, strings, None, and Boolean are some of the data types supported by Python.
Tuples, lists, dicts, and sets are some of the data structures of Python.
You can use indices to access individual or a range of elements in a data structure.
The “in”, “+”, and “*” are some of the basic operators.
Functions are the primary and the most important methods of code organization and reuse in Python.
The conditional “if”, “elif” statements, “while” and “for” loops and exception handling are some important control flow statements.
With this, we have come to an end of this lesson on Python Environment Set up and Essentials. The next lesson focuses on Mathematical Computing with Python (NumPy).
Name | Date | Place | |
---|---|---|---|
Data Science with Python | 3 May -21 May 2021, Weekdays batch | Your City | View Details |
Data Science with Python | 8 May -12 Jun 2021, Weekend batch | San Francisco | View Details |
Data Science with Python | 14 May -18 Jun 2021, Weekdays batch | Washington | View Details |
A Simplilearn representative will get back to you in one business day.