Secure Networking

Amazon SageMaker allows you to create resources attached to your AWS Virtual Private Cloud (VPC). This allows you to govern access to SageMaker resources and your data sets using familiar tools such as security groups, routing tables, and VPC endpoints. Using these network-layer tools you can create a secure network environment that allows you to explicitly control the data ingress and egress of your data science environment. Let’s take a few moments and discuss these tools in more detail.


Private Network Environment

Let’s begin with your VPC which will be used to host Amazon SageMaker and other components of your data science environment. Your VPC provides a familiar set of network-level controls to allow you to govern ingress and egress of data. We will begin this workshop by creating a VPC with no Internet Gateway (IGW), therefore all subnets will be private, without Internet connectivity. Network connectivity with AWS services or your own shared services will be provided using VPC endpoints and PrivateLink. Security Groups will be used to control traffic between different resources, allowing you to group like resources together and manage their ingress and egress traffic.

Virtual Private Cloud

A Virtual Private Cloud (VPC) gives you a self-contained network environment that you control. When initially created the VPC does not allow network traffic into or out of the VPC. It’s only by adding VPC endpoints, Internet Gateways (IGW), or Virtual Private Gateways (VPGW) that you begin to configure your private network environment to communicate with the wider world. For the rest of these labs we will assume that no access to the internet is required and that all communication with AWS services will be done explicitly through private connectivity to the AWS APIs through VPC endpoints. We also recommend creating multiple subnets in your VPC in multiple availability zones to support highly available deployments and resilient architectures.

To find out more about VPCs and VPC concepts such as routing tables, subnets, security groups, and network access control lists please visit the AWS documentation.

VPC Endpoints

A VPC endpoint allows you to establish a private, secure connection between your VPC and an AWS service without requiring you to configure an Internet Gateway, NAT device, or VPN connection. Using VPC endpoints your VPC resources can communicate with AWS services without ever leaving the AWS network. A VPC endpoint is a highly available virtual device that is managed on your behalf. As a VPC resource an endpoint is given IP addresses within your VPC and security groups assigned to the endpoint to control who can communicate with the endpoint.

VPC Endpoint

In addition to security groups you can also apply VPC endpoint policies to your endpoints. An endpoint policy is an IAM resource policy that gets applied to a VPC endpoint and governs which APIs can be called on an AWS service through the endpoint. For example, if the following endpoint policy were applied to an Amazon S3 VPC endpoint it would only allow read access to objects in the specified S3 bucket. Actions against other buckets would be denied.

{
  "Statement": [
    {
      "Sid": "Access-to-specific-bucket-only",
      "Principal": "*",
      "Action": [
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::my_secure_bucket",
        "arn:aws:s3:::my_secure_bucket/*"
      ]
    }
  ]
}

VPC endpoint policies, along with security groups, provide the ability to implement Defense-in-Depth and bring a multi-layered approach to security and who can access what resources within your VPC.

In this lab you will create a VPC with endpoints for the following services:

  • Amazon S3, for reading and writing data
  • Amazon SageMaker, for creating training jobs and hosted models
  • Amazon STS, for obtaining temporary credentials
  • Amazon CloudWatch Logs, for writing out log data from VPC-based resources

Let’s now use CloudFormation to create an AWS Virtual Private Cloud, security groups, and VPC endpoints to configure a precise, secure network environment to support your data science teams. The VPC will have no IGW or NAT gateway attached and it will have multiple subnets across 2 availability zones. VPC endpoints will be created and attached to the VPC and security groups applied to control what VPC resources can communicate with each other.