An Introduction To AWS SageMaker

A lot of data scientists develop, train, and deploy machine learning models in the hosted environment. But unfortunately, they did not have the facility to scale up or scale down resources when required. AWS SageMaker solves this issue by facilitating developers to build and train models to get production faster with minimum effort and lower cost. 

In this article on AWS SageMaker, you will get an in-depth understanding of:

  • What is AWS?
  • What is AWS SageMaker?
  • Advantages of AWS SageMaker?
  • Machine Learning With AWS SageMaker
  • How to Train a Model With AWS SageMaker?
  • How to Validate a Model With AWS SageMaker?
  • Companies Using SageMaker service
  • Demo - How to build and train a model using SageMaker?

And before jumping into SageMaker, here’s a primer on ”What is AWS”.

What is AWS?

Amazon Web Services (AWS) is an on-demand cloud platform offered by Amazon, that provides service over the internet. AWS services can be used to build, monitor, and deploy any application type in the cloud. Here's where the AWS SageMaker comes into play. 

AWS Solutions Architect Certification Course

Master AWS Architectural Principles and ServicesEXPLORE COURSE
AWS Solutions Architect Certification Course

What is AWS SageMaker?

Amazon SageMaker is a cloud-based machine-learning platform that helps users create, design, train, tune, and deploy machine-learning models in a production-ready hosted environment. The AWS SageMaker comes with a pool of advantages (know all about it in the next section)

Advantages of AWS SageMaker

Some of the advantages of SageMaker are below:

  • It enhances the productivity of a machine learning project
  • It helps in creating and managing compute instance with the least amount of time 
  • It inspects raw data and automatically creates, deploys, and trains model with complete visibility 
  • It reduces the cost of building machine learning models up to 70% 
  • It reduces the time required for data labeling tasks 
  • It helps in storing all ML components in one place
  • It is highly scalable and trains model faster

Machine Learning With AWS SageMaker

Now, let’s have a look at the concept of Machine Learning With AWS SageMaker and understand how to build, test, tune, and deploy a model.

The following diagram shows how machine learning works with AWS SageMaker.

sagemaker

Builds

  • It provides more than 15 widely used ML Algorithm for training purpose
  • It gives the capability to select the required server size for our notebook instance
  • A user can write code (for creating model training jobs) using notebook instance 
  • Choose and optimize the necessary algorithm, such as
  • AWS SageMaker helps developers to customize Machine Learning instances with the Jupyter notebook interface

Test and Tune 

  • Set up and import required libraries
  • Define a few environment variables and manage them for training the model
  • Train and tune the model with Amazon SageMaker
  • SageMaker implements hyperparameter tuning by adding a suitable combination of algorithm parameters
  • SageMaker uses Amazon S3 to store data as it’s safe and secure.

Note: S3 is used for storing and recovering data over the internet. 

  • SageMaker uses ECR for managing Docker containers as it is highly scalable.

Note: ECR helps a user to save, monitor, and deploy Docker containers. 

  • SageMaker divides the training data and stores in Amazon S3, whereas the training algorithm code is stored in ECR
  • Later, SageMaker sets up a cluster for the input data, trains, and stores it in Amazon S3 itself

Note: Suppose you want to predict limited data at a time, use Amazon SageMaker hosting services, but if you're going to get predictions for an entire dataset, use Amazon SageMaker batch transform.

Deploy

  • Once tuning is done, models can be deployed to SageMaker endpoints
  • In the endpoints, a real-time prediction is performed
  • Now, evaluate your model and determine whether you have achieved your business goals

How to Train a Model With AWS SageMaker?

Model training in SageMaker is done on machine learning compute instances.

  • When a user trains a model in Amazon SageMaker, he/ she creates a training job. 
  • Training jobs comprise of:
      • S3 bucket (within the compute instance): The URL of the Amazon S3 bucket where the training data is stored
      • AWS SageMaker on ML instance: Compute resources or Machine Learning compute instances
      • S3 bucket (outside the compute instance): The URL of the Amazon S3 bucket where the output will be stored
      • Inference code image: The path of AWS Elastic Container Registry path where the code data is saved
  • The input data is fetched from the specified Amazon S3 bucket
  • Once the training job is built, Amazon SageMaker launches the ML compute instances
  • Then, it trains the model with the training code and dataset
  • SageMaker stores the output and model artifacts in the AWS S3 bucket
  • In case the training code fails, the helper code performs the remaining task 
  • The interference code consists of multiple linear sequence containers that process the request for inferences on data
  • EC2 container registry is a storage registry that helps users to save, monitor, and deploy container images

Note: container images are the ready applications

  • Once the data is trained, the output is stored in the specified Amazon S3 bucket
  • To prevent your algorithm from being deleted, save the data in Amazon SageMaker critical system processes on your ML compute instances

How to Validate a Model With SageMaker?

You can evaluate your model using offline or historical data:

1. Offline Testing

Use historical data to send requests to the model through Jupyter notebook in Amazon SageMaker for evaluation.

2. Online Testing with Live Data

It deploys multiple models into the endpoint of Amazon SageMaker and directs live traffic to the model for validation.

3. Validating Using a "Holdout Set"

Here, a part of the data is set aside, which is called a "holdout set“. Later, the model is trained with remaining input data and generalizes the data based on what it learned initially.

4. K-fold Validation

Here, the input data is split into two parts. One part is called k, which is the validation data for testing the model, and the other part is k − 1 which is used as training data. Now, based on the input data, the machine learning models evaluate the final output.

AWS Basics: A Beginner’s Guide

Get Started With a Career in AWSDownload Now
AWS Basics: A Beginner’s Guide

Companies Using SageMaker Service

train-s3

Let’s consider an example of ProQuest

ProQuest is a global information-content and technology company that provides valuable content such as eBooks, newspapers, etc. to the users.

ProQuest used AWS SageMaker to create a content recommendation system. With the help of SageMaker, ProQuest was able to create videos of better user experience and helped in providing maximum relevant search results. 

Demo- Steps to Build and Train a Machine Learning Model using AWS Sagemaker

Let us create a SageMaker notebook instance:

  • To create a notebook instance, use either the SageMaker console or the CreateNotebookInstance API
  • First, open the SageMaker console at https://console.aws.amazon.com/sagemaker/.
  • Once the instance is opened, select Notebook instances -> Create notebook instance. This will create the notebook instance successfully
  • On the instance page, enter the following information:
  • In the Notebook instance name and tab, type a suitable name and tag for your notebook instance.

notebook-instance

  • Next, in the Notebook instance type, select an appropriate instance size for your project. 
  • In the Elastic Inference option, choose none if you want to skip that option, otherwise, select inference accelerator type in case you are planning to conduct inferences 
  • (Optional) In this configuration option, you can specify ML storage volume in MB for notebook instances. 

Create an IAM Role

  • Next, specify the IAM role for the SageMaker model. You can either select an existing IAM role in your account or Create a new role.

new-role

  • Now, enable root access for all notebook instances. For this, select Enable. In case, you want to disable root access, select Disable.
  • Finally, click on the Create notebook instance.

notebook

Within a few minutes, SageMaker creates a Machine Learning Notebook instance and attaches a storage volume. 

Note: This notebook instance has a preconfigured Jupyter notebook server and predefined libraries.

Learn about the AWS architectural principles and services like IAM, VPC, EC2, EBS and more with the AWS Solutions Architect Course. Register today

Prepare Data Using AWS SageMaker

  • Now, prepare the data using the Amazon SageMaker notebook that you require to train your ML model. (Note: Wait until the SageMaker Instance changes from Pending to InService state.)
  • Once the Jupyter notebook opens, go to -> Files tab -> New -> conda_python3. 
  • In this step, you should train and deploy the ML model by importing necessary libraries in your Jupyter notebook environment
  • Here, Copy the following code into the code cell in your instance and select Run

import libraries

import boto3, re, sys, math, json, os, sagemaker, urllib.request

from sagemaker import get_execution_role

import numpy as np                                

import pandas as pd                               

import matplotlib.pyplot as plt                   

from IPython.display import Image                 

from IPython.display import display               

from time import gmtime, strftime                 

from sagemaker.predictor import csv_serializer   

# Define IAM role

role = get_execution_role()

prefix = 'sagemaker/DEMO-xgboost-dm'

containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',

              'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',

              'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest',

              'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:latest'} # each region has its XGBoost container

my_region = boto3.session.Session().region_name # set the region of the instance

print("Success - the MySageMakerInstance is in the " + my_region + " region. You will use the " + containers[my_region] + " container for your SageMaker endpoint.")

  • Next, create an Amazon S3 bucket to store your necessary data by copying and pasting the below program into the next code cell in your notebook

bucket_name = 'dummydemo' # <--- CHANGE THIS VARIABLE TO A UNIQUE NAME FOR YOUR BUCKET

s3 = boto3.resource('s3')

try:

    if  my_region == 'us-east-1':

      s3.create_bucket(Bucket=bucket_name)

    else: 

      s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })

    print('S3 bucket created successfully')

except Exception as e:

    print('S3 error: ',e)

  • Select Run. Suppose you get an error, rename the S3 bucket name and run again.
  • Below is the following output we get after execute:

Output:

output-sage

With this, we reach the end of this article about the AWS SageMaker.

Conclusion

All clear about AWS SageMaker and its benefits, how Machine Learning works with SageMaker, different ways to train a model, how to validate a model with SageMaker, companies using SageMaker? 

Whether you’re an experienced AWS Architect, or you’re aspiring to break into this exciting industry, enrolling in our Cloud Architect Master’s program will help you with all levels of experience in master AWS Cloud Architect techniques and strategies. 

Do you have any questions? Please feel free to leave them in the comments section of this article; our experts will get back to you as soon as possible.

About the Author

Sana AfreenSana Afreen

Sana Afreen is a Senior Research Analyst at Simplilearn and works on several latest technologies. She holds a degree in B. Tech Computer Science. She has also achieved certification in Advanced SEO. Sana likes to explore new places for their cultures, traditions, and cuisines.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.