Lab 1: Best Practice as Code

Before you can begin creating templates for deployment by the Project Administration team you will need a shared services VPC to host a Python package mirror (PyPI) for use by data science teams. The mirror will host a collection of approved Python packages. The concept of a shared services VPC or PyPI mirror is not something that is detailed in this workshop, and is partially assumed as common practice among many AWS customers. After you have created a shared services VPC and PyPI mirror you will then, as the Cloud Platform Engineering Team, create a Service Catalog Portfolio which the project administrators can use to easily deploy data science environments in support of new projects.

This lab assumes other recommended security practices such as enabling AWS CloudTrail and capturing VPC Flow Logs. The contents of this lab focus soley on controls and guard rails directly related to data science resources.

Shared Services architecture

In this section you will quickly get started by deploying a shared PyPI mirror for use by data science project teams. In addition to deploying a shared service this template will also create an IAM role for use by the AWS Service Catalog and for use by project administrators who are responsible for creating data science project environments.

The shared PyPI mirror will be hosted in a shared services VPC and exposed to project environments using a PrivateLink-powered endpoint. The mirror will host approved Python packages and can be used by all internal Python applications, such as machine learning code running on SageMaker.

The resulting architecture will look like this:

Shared Services Architecture

Deploy your shared service

As a cloud platform engineering team member, deploy the CloudFormation template linked below to provision a shared service VPC and IAM roles.

Region Launch Template
Oregon (us-west-2) Deploy to AWS Oregon
Ohio (us-east-2) Deploy to AWS Ohio
N. Virginia (us-east-1) Deploy to AWS N. Virginia
Ireland (eu-west-1) Deploy to AWS Ireland
London (eu-west-2) Deploy to AWS London
Sydney (ap-southeast-2) Deploy to AWS Sydney

Deployment should take around 5 minutes.

Step-by-Step instructions

Create Project Portfolio

With the shared services VPC online and available you now need to provide the project administration team with a configured Service Catalog to provision data science project environments. To start, visit the AWS Service Catalog console and create a Portfolio. Grant the DataScienceAdmin role permissions to access the portfolio adn then use the appropriate CloudFormation template linked below to create a Data Science Environment product. Ensure that the product has a constraint applied to it that uses the IAM role ServiceCatalogLaunchRole to launch the product upon request. This will give the Service Catalog service the permissions needed to create a Data Science Environment.

Service Catalog Product Templates by Region:

  • Region ap-southeast-2,
  • Region eu-west-1,
  • Region eu-west-2,
  • Region us-east-1,
  • Region us-east-2,
  • Region us-west-2,
Step-by-Step instructions

Review team resources

In addition to the Service Catalog Portfolio and product you have also created the following AWS resources to support the project administration team. Please take a moment and review these resources and their configuration.

IAM roles

  • AWS IAM roles

    The IAM roles for the project administration team and the Service Catalog have been created. Visit the AWS IAM console and review the permissions granted to these two roles.

  • AWS Lambda Detective Control

    An AWS Lambda has been deployed and configured to execute whenever an Amazon SageMaker resource is deployed. The Lambda function will act as a detective control, inspecting launched resources to ensure that the resource is configured correctly. To inspect the Lambda function and its triggers visit the AWS Lambda console. Can you determine exactly what types of events will cause the Lambda function to execute?

  • Parameters added to Parameter Store

    A collection of parameters have been added to Parameter Store. Can you see what parameters have been added? How would you use these values?

  • Shared Services VPC

    The template has created a VPC that will house our shared applications. Visit the console and see what services are accessible from within the VPC?

  • PyPI Mirror Service

    A Python package mirror has been deployed as a containerised service in the Shared Services VPC. This service is running on a cluster managed by Amazon Elastic Container Service (ECS) Fargate which means there are no Amazon EC2 servers for you to manage. Visit the ECS console to check whether the service is up and running. You can also see the task logs from the container through the ECS console to check its status.

With these resources created you can now move on to Lab 2 where you will, as a project administrator, deploy a secure data science environment for a new project team.