Storage on AWS
© 2015 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.
Agenda • • • • • •
Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration Structured Data Store
0 Storage Primer
Block vs File vs Object Block Storage Raw Storage Data organized as an array of unrelated blocks Host File System places data on disk e.g.: Microsoft NTFS, Unix ZFS
File Storage Unrelated data blocks managed by a file (serving) system Native file system places data on disk
Object Storage Stores Virtual containers that encapsulate the data, data attributes, metadata and Object IDs API Access to data Metadata Driven, Policy-based, etc
Structured storage - Databases Relational Databases Static Schema Highly structured table organization Rigid data format
Document Store Dynamic Schema Key/Value Database Collection of complex documents Arbitrary, nested data format
Storage - Characteristics Some of the ways we look at storage Durability
Availability
Measure of expected data loss
Measure of expected downtime
Security Security measures in place
Cost Amount per storage unit, e.g. $ / GB
Scalability Upward flexibility
Performance
Integration
Performance metrics
Ability to interact with
AWS has a variety of storage options Amazon EBS (Elastic Block Storage) Amazon Elastic File System (EFS) Amazon EC2 Instance Store (Ephemeral Volumes) Amazon S3 (Simple Storage Service) Amazon Glacier AWS Storage Gateway Amazon Import/Export Snowball
AWS also has a variety of database options Amazon EC2 (Self Managed) Amazon RDS (Relational Database Service)
Amazon DynamoDB Amazon ElastiCache Amazon Redshift
1 Block Storage
Amazon EBS • • • • • •
Persistent block level storage for EC2 Pay only for what you provision Native redundancy and write cache Consistent and low-latency performance Optimized for random I/O Native support for encryption at rest (data volumes)
Amazon EBS • Network attached block device – – – – –
Independent data lifecycle Virtual disks Multiple volumes per EC2 instance Only one EC2 instance at a time per volume Can be detached from an instance and attached to a different one
• Raw block devices – Unformatted block devices – Ideal for databases, filesystems
• Available in multiple types
EBS Volume Types Comparison Magnetic
General Purpose (SSD)
Provisioned IOPS (SSD)
Performance Lowest Cost
Burstable
Predictable
Use Cases
Infrequent Data Access
Boot volumes Small to Medium DBs Dev & Test
I/O Intensive Relational & NoSQL
Media
Magnetic (HDD)
SSD
SSD
Max IOPS
100 on average with the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GB Burstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price
$.05/GB/Month $.05/million I/O
$.10/GB/Month $.125/GB/Month I/O Operations - Free $.065/provisioned IOPS
IOPS Token Bucket Model • Each token represents an “I/O credit” that pays for one read or one write. • A bucket is associated with each General Purpose (SSD) volume, and can hold up to 5.4 million tokens. • Tokens accumulate at a rate of 3 per configured GB per second, up to the capacity of the bucket. • Tokens can be spent at up to 3000 per second per volume. • The baseline performance of the volume is equal to the rate at which tokens are accumulated — 3 IOPS per GB per second.
Magnetic
General Purpose (SSD)
Provisioned IOPS (SSD)
Perform ance
Lowest Cost
Burstable
Predictable
Use Cases
Infrequent Data Access
Boot volumes Small to Medium DBs Dev & Test
I/O Intensive Relational & NoSQL
Media
Magnetic (HDD)
SSD
SSD
Max IOPS
100 on average with the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GB Burstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price
$.05/GB/Month $.05/million I/O
$.10/GB/Month I/O Operations Free
$.125/GB/Month $.065/provisioned IOPS
分
EBS Provisioned IOPS • EBS Optimized Instances • Dedicated storage throughput
• Predictable Performance • 100-20000 IOPS per volume • Single digit millisecond latency
• Performance Design • Deliver within 10% of PIOPs, 99.9% of the time
Magnetic
General Purpose (SSD)
Provisioned IOPS (SSD)
Perform ance
Lowest Cost
Burstable
Predictable
Use Cases
Infrequent Data Access
Boot volumes Small to Medium DBs Dev & Test
I/O Intensive Relational & NoSQL
Media
Magnetic (HDD)
SSD
SSD
Max IOPS
100 on average with the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GB Burstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price
$.05/GB/Month $.05/million I/O
$.10/GB/Month I/O Operations Free
$.125/GB/Month $.065/provisioned IOPS
Enhanced Throughput for PIOPS & GP2 Volumes • Maximum attainable throughput to each volume was doubled to 128 MB/s read or write traffic • An I/O request of up to 256 KB is now counted as a single I/O operation (IOP) • In many cases you can configure the block size used by your application
• Capable of dramatically reducing your storage costs
Magnetic
General Purpose (SSD)
Provisioned IOPS (SSD)
Perform ance
Lowest Cost
Burstable
Predictable
Use Cases
Infrequent Data Access
Boot volumes Small to Medium DBs Dev & Test
I/O Intensive Relational & NoSQL
Media
Magnetic (HDD)
SSD
SSD
Max IOPS
100 on average with the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GB Burstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price
$.05/GB/Month $.05/million I/O
$.10/GB/Month I/O Operations Free
$.125/GB/Month $.065/provisioned IOPS
Amazon EBS at 20,000 IOPS • Provisioned IOPS (SSD) – Max Volume 16 TB – Max I/O rate 20,000 IOPS – Max throughput 320 MB/s
• General Purpose (SSD) – Max Volume 16 TB – Max I/O rate 10,000 IOPS – Max throughput 160 MB/s
Magnetic
General Purpose (SSD)
Provisioned IOPS (SSD)
Perform ance
Lowest Cost
Burstable
Predictable
Use Cases
Infrequent Data Access
Boot volumes Small to Medium DBs Dev & Test
I/O Intensive Relational & NoSQL
Media
Magnetic (HDD)
SSD
SSD
Max IOPS
100 on average with the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GB Burstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price
$.05/GB/Month $.05/million I/O
$.10/GB/Month I/O Operations Free
$.125/GB/Month $.065/provisioned IOPS
EBS Snapshots AWS Cloud EC2 Availability Zone EBS
EBS
EBS
EBS
EBS
Amazon S3
Create Snapshot
EBS Snapshot
EBS
Clone From Snapshot EC2
EC2
EC2
EBS Snapshot EBS Snapshot EBS Snapshot EBS Snapshot
Internet
How Do Snapshots Work? Time
Snapshot 1
Snapshot 2
Snapshot 3 S3
EBS Volume Block 11 Chunk Block 22 Chunk Block 33 Chunk Block 44 Chunk
EC2 Instance Store (Ephemeral Volumes) • Free with your EC2 instance – SAS and SSD options – Size/type based on instance type
• Local, direct attached resource • Consistent sequential reads and writes • Use only for non-persistent data
2 Shared file system
Elastic File System (EFS) • • • • • • • •
Fully managed file system for EC2 instances Provides standard file system semantics Works with standard operating system APIs Sharable across thousands of instances Elastically grows to petabyte scale Delivers performance for a wide variety of workloads Highly available and durable NFS v4–based
EFS – Mounting EFS
EC2
EC2
EC2
EC2
EC2
EC2
EFS DNS Name availability-zone.file-system-id.efs.aws-region.amazonaws.com
Mount on machine sudo mount -t nfs4 mount-target-DNS:/ ~/efs-mount-point
3 Object Stores
Amazon S3 (Simple Storage Service) • • • • •
Web accessible object store Pay for exactly what you use Highly durable (99.999999999% design) Limitlessly scalable Natively online
• Two flavors: – –
Standard Storage - $0.0300* per GB / mo Standard – Infrequent Access Storage (min size 128KB) – $0.0125* per GB / mo + Data retrieval cost
* (US East (N Virginia) pricing)
Amazon S3 (Simple Storage Service) • Parallel I/O for max speed (Multipart Upload, Ranged GETs)
• • • • • •
Resource-level IAM permissions Bucket Policies & ACLs Direct access through APIs Server Side Encryption Static Website Hosting Data Lifecycle Rules
Amazon Glacier • Low-Cost Archival Storage • Secure •
SSL & AES-256
• Durable •
Designed for 99.999999999% durability
• Optimized for data archiving and backup • •
Suitable for RTO measured in hours Includes storage costs and retrieval costs
• $0.007 per GB/Month (US East pricing) • Integrated with S3
Amazon CloudFront • Easy-to-use Content Delivery Network (CDN) • Pay-as-you-go pricing • Multiple origins: S3, EC2, on-premise • • • • • •
Worldwide network of 53+ edge locations Video streaming Geo Restriction Custom SSL Certificates Dynamic Content POST/PUT
4 On-Premises Storage Integration
AWS Storage Gateway • • •
VM Appliance run on-premise Creates iSCSI volume mount points Directly interfaces with S3 or Glacier
• • •
Gateway-Stored Volumes Gateway-Cached Volumes Virtual Tape Library
Amazon Import/Export Snowball • • • • •
Petabyte scale data transport Uses secure appliances Economic and fast Faster than Internet for significant data sets Import into S3
5 Structured Data Stores
Amazon RDS A fully managed SQL database service Choice of Database engines Simple to deploy and scale Reliable and cost effective Without any operational burden
Amazon Aurora
If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you
If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you
If you host your databases in EC2
App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches you
OS installation Server maintenance Rack & stack Power, HVAC, net
If you host your databases in EC2
App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches you
OS installation Server maintenance Rack & stack Power, HVAC, net
If you choose a managed DB service like RDS Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance
Rack & stack App optimization you
Power, HVAC, net
Traditional Database Architecture Client Tier
one database for all workloads
App/Web Tier
RDBMS
Traditional Database Architecture Client Tier
• • • •
key-value access complex queries transactions analytics
App/Web Tier
RDBMS
Cloud Data Tier Architecture Client Tier
best database for each workload
App/Web Tier
Data Tier Cache
Data Warehouse
NoSQL
RDBMS
Workload Driven Data Store Selection hot reads
analytics
key/value simple query
complex queries & transactions
Data Tier Cache
Data Warehouse
NoSQL
RDBMS
Workload Driven Data Store Selection hot reads
key/value simple query
analytics complex queries & transactions
Data Tier Amazon ElastiCache
Amazon Redshift
Amazon DynamoDB
Amazon RDS
Amazon DynamoDB • Fully managed NoSQL database service • Massively scalable, distributed key/value store • • •
Reserved capacity model Fast and predictable Built-in fault tolerance
• Strong consistency model • Unlimited potential storage and throughput
Amazon ElastiCache • In-memory cache in the cloud • Improve latency and throughput for read-heavy workloads •
Supports open-source caching engines – Memcached – Redis
• Examples – Caching of MySQL database query results – Caching of complex query post-processing results
Amazon Redshift •
Fast and powerful, petabyte-scale data warehouse – – –
•
Data warehouse-type queries – –
•
Aggregations, historical analysis BI Tool integration
Grow with your data –
•
Fully managed Highly-parallel Columnar Data Store
160 GB 1.6 PB
SSD and SAS Options –
SSD provides 10-15x perf @ 5.5x the cost/tb/year
Using Multiple Storage Options Together • EBS + S3: snapshots • S3 + EC2 Instance Store: caching
• S3 + CloudFront: edge caching • S3 + Glacier: data lifecycle archiving • RDS + ElastiCache: cached queries
It’s all about
choice Performance-oriented Cost-oriented
Any Questions?