Amazon S3 – A Summary


A Summary of S3

Amazon S3 is the core object storage service on AWS, allowing you to store an unlimited amount of data with very high durability. It can be used to backup and archive data, host websites, Mobile and Cloud app hosting, disaster recovery and big data analytics.

Amazon S3 can be integrated with many other AWS cloud services, including AWS IAM, AWS KMS, Amazon EC2, EBS, EMR, DynamoDB, Redshift, SQS,  Lambda and CloudFront.

Object storage differs from traditional block storage and file storage. Block storage manages data at a device level as addressable blocks, while file storage manages data at the operating system level as files and folders. Object storage manages data as objects that contain both data and metadata, manipulated by an API.

S3 Buckets are containers for objects stored in Amazon S3. Bucket names must be UNIQUE you cannot have the same name as someone else in the world. Each bucket is created in a specific region and data does not leave the region unless explicitly copied by the user.

Objects are stored in Buckets, the objects can be up to 5TB and can contain any kind of data. Objects contain both data and metadata and are identified by keys. Each Amazon S3 object can be addressed by a unique URL formed by the web services endpoint, the bucket name and the object key.

Amazon S3 has a minimalistic API—create/delete a bucket, read/write/delete objects, list keys in a bucket—and uses a REST interface based on standard HTTP verbs—GET, PUT, POST,
and DELETE. You can also use SDK wrapper libraries, the AWS CLI, and the AWS Management Console to work with Amazon S3.
Amazon S3 is highly durable and highly available, designed for 11 nines of durability of objects in a given year and four nines of availability.
Amazon S3 is eventually consistent, but offers read-after-write consistency for new object PUTs. Amazon S3 objects are private by default, accessible only to the owner. Objects can be marked public readable to make them accessible on the web. Controlled access may be provided to others using ACLs and AWS IAM and Amazon S3 bucket policies.

Static websites can be hosted in an Amazon S3 bucket. Prefixes and delimiters may be used in key names to organize and navigate data hierarchically much like a traditional file system.
Amazon S3 offers several storage classes suited to different use cases: Standard is designed for general-purpose data needing high performance and low latency. Standard-IA is for less
frequently accessed data. RRS offers lower redundancy at lower cost for easily reproduced data. Amazon Glacier offers low-cost durable storage for archive and long-term backups that
can are rarely accessed and can accept a three- to five-hour retrieval time. Object lifecycle management policies can be used to automatically move data between storage classes based on time.

Amazon S3 data can be encrypted using server-side or client-side encryption, and encryption keys can be managed with Amazon KMS. Versioning and MFA Delete can be used to protect against accidental deletion. Cross-region replication can be used to automatically copy new objects from a source bucket
in one region to a target bucket in another region. Pre-signed URLs grant time-limited permission to download objects and can be used to protect media and other web content from unauthorized “web scraping.” Multipart upload can be used to upload large objects, and Range GETs can be used to download portions of an Amazon S3 object or Amazon Glacier archive.
Server access logs can be enabled on a bucket to track requestor, object, action, and response.

Amazon S3 event notifications can be used to send an Amazon SQS or Amazon SNS message or to trigger an AWS Lambda function when an object is created or deleted.
Amazon Glacier can be used as a standalone service or as a storage class in Amazon S3. Amazon Glacier stores data in archives, which are contained in vaults. You can have up to
1,000 vaults, and each vault can store an unlimited number of archives. Amazon Glacier vaults can be locked for compliance purposes.


Understanding CloudFront


What is CDN

  • A Content Delivery Network or CDN is a system of distributed servers (Network) that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server.

CloudFront – Terminology

  • Edge Location – This is the location where content will be cached or saved. This is separate to an AWS Region/AZ.
  • Origin – This is the origin of all the files that the CDN will distribute. This has been an S3 Bucket, EC2 Instance, an Elastic Load Balancer or Route53.
  • Distribution – This is the name given to the CDN which consists of a collection of Edge Locations.

What is CloudFront

  • Amazon CloudFront cant be used to deliver your entire website, including dynamic, static, streaming and interactive content using a global network of edge locations. Requests for your content are automatically routed to the closest Edge Location of the user for the best possible performance. The more users that access the content from one location, the quicker the content will load.
  • Amazons CloudFront is optimized to work with other AWS Services, like Amazon S3, Amazon EC2, Amazon Load Balancing and AWS Route 53. CloudFront also works seamlessly with any non-AWS origin server, which stores the original, definitive versions of your files.


Lifecycle and Glacier


Notes for AWS Glacier studying.

  • Lifecycle rules can be configured to archive or move files after a certain period of time to Glacier.
  • After a certain period of time Objects can be migrated to IA Storage and then to Glacier before being expired.
  • Lifecycle Management can be used in conjunction with versioning.
  • Can be applied to current versions of Objects as well as previous versions.
  • The Following actions can now be done:
    • Transition to the standard – IA Storage class. (128KB and 30 days after the creation date).
    • Archive to the Glacier Storage Class. (30 days after IA, if relevant).
  • Glacier is not available in Singpapore or South America.
  • You can permanently delete objects using life-cycle management policies.


S3 – Cross Region Replication


Notes for S3 – Amazon Solutions Architect Associate (AWS)

  • In order for cross region replication to work versioning will need to be enabled on the original bucket.
  • You can replicate the entire bucket or specific sub-folders.
  • On the bucket you are replicating too you can change the storage class.
  • Only new objects or objects that we update or change that will be replicated once replication has been enabled.
  • The best way to copy the contents of an existing bucket to a new one would be to use the command line too.
  • The regions that you create your buckets in must be unique, you cannot replicate a bucket that’s in the same region.
  • You cannot replicate to multiple buckets or daisy chain them at this time.
  • Deletion markers are replicated.
  • Deleting individual versions or delete markers will NOT be replicated.


S3 101 – Continued


Exam Tips for Creating an S3 Bucket

  • Buckets are a universal namespace, you cannot have the same name as someone else using aws because each bucket is assigned a unique DNS name.
  • Upload an object to S3 to receive a HTTP 200 code while uploading with command line utilities.
  • Encryption
    • Client Side Encryption.
    • Server Side Encryption.
      • Amazon S3 Managed Keys (SSE-S3)
      • KMS (SSE-KMS)
      • Customer Provided Keys (SSE-C)
  • Control access to buckets using either a bucket ACL or using Bucket Policies.
  • By Default Buckets are private and all objects stored within them.


S3 – Versioning Exam Tips

  • Stores all versions of an object (file) this includes all writes and even if you delete the object.
  • Great backup tool.
  • Once enabled, Versioning cannot be disabled only suspended.
  • Integrates with Lifecycle rules.
  • Versioning’s MFA delete capability, which uses Multi-Factor Authentication, can be used to provide an additional layer of security.


S3 101


Revision Notes for Amazon Solutions Architect Associate.

What is S3?

  • S3 is a safe place to store your files.
  • It is Object based storage, meaning you can store files such as images, videos and documents.
  • It is not a place to install an Operating System, for that you will need block based storage.
  • S3 is designed to withstand failure, Your data is spread across multiple devices and multiple facilities.
  • Your files can be anywhere from 0B in size all the way up to 5TB, storage is virtually unlimited.
  •  Files are stored in Buckets which is essentially a Folder.
  • Names of Buckets MUST be unique, you cannot share the same name as another user.
  • Your buckets will be assigned a DNS name upon creating it, it will always begin with https://s3- and then the region you created your bucket in, and then this will be followed by your bucket name
  • When you upload a file successfully into S3 you will receive a HTTP 200 code.

Data Consistency Model for S3

  • Read after write consistency for PUTS of new Objects. – Meaning when you put a new object in S3 you are going to get immediate consistency, you will be able to make changes straight away
  • Eventual Consistency for overwrite PUTS and DELETES (can take a while to propagate.)  – Meaning making changed or deleting older files will take a while as it will need to update all the disks / locations your files have been written too.

S3 is a simple key, value store

  • S3 is object based. objects consist of the following;
    • Key (This is simply the name of the object)
    • Value (This is the data and is made up of a sequence of bytes)
    • Version ID (Important for versioning)
    • Metadata (Data about the data you are storing)
    • Subresoruces
      • Access control lists
    • Torrent

The Basics

  • S3 is built for 99.99% availability.
  • Amazon Guarantee 99.9% up time.
  • Amazon Guarantee 99.999999999% durability for S3 Information. (What!?) You will never lose a file.
  • Tiered Storage Options Available.
  • LifeCycle Management, gives you the option to move or archive files to different areas of your buckets after a certain period of time.
  • Versioning, you can have 1 file with several different versions.
  • Encryption.
  • Secure your data with Policies and Access Control Lists.

Storage Classes / Tiers

  • S3 – 99.99% availability, 99.999999999% durability, stored redundantly across multiple devices and locations. It is designed to sustain the loss of 2 facilities concurrently.
  • S3 – IA (Infrequently Accessed.) For data that is access less frequiently, but requires rapid access when needed. there is a lower fee than S3 but you are charged a retrieval fee.
  • Reduced Redundancy Storage (RRS) – Designed to provide 99.99% durability and 99.99% availability of objects over a given year. – Best used for data you can generate again.
  • Glacier – Very cheap, but used for archival only. It takes 3 -5 hours to restore from Glacier.

What is Glacier?

Glacier is an extremely low-cost storage service for data archival. Amazon Glacier stores data for as little as $0.01 per gigabyte per month, and is optimized for data that is infrequently used, for which it will take 3 to 5 hours to retrieve a file.

S3 Vs Glacier

S3 – Charges – What are you charged for?

  • Storage
  • The number of requests
  • Storage Management Pricing
  • Data Transfer Pricing, uploading data is free but transferring to different regions are chargeable

What is S3 Transfer Acceleration?

Amazon S3 Transfer Acceleration enables fast, easy and secure transfers of files over long distances between your end users and an s3 bucket. Transfer Acceleration takes advantage of Aamzon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.


IAM – Revision Notes


Some notes I have taken down for the AWS Solutions Architect Associate Exam.

  • IAM is Global it is not tied down to a region.
  • Amazons version of Active Directory.
  • You can define User, Role or Group Policies.
  • Policies can be added to a Groups, Users and Roles.
  • Can be integrated with Microsoft Active Directory.
  • Allows you to manage access to Compute, Storage, DB and Application Services.
  • Can be defined to only allow users access to what they require.
  • Multi Factor Authentication can be configured with AWS.
  • New Users have no access when first created. Users will need to be added to a group, given a role or have permissions set once the account is created.
  • You can create and customize your own password rotation policies, setting the passwords to expire every 90 days and also the complexity of the passwords.

This can be configured along side IAM to log information about who made requests to access specific resources in your AWS account.

Free Service
Unlike some of the other Services provided by AWS, IAM is free to use and can be integrated with other services.

IAM can be accessed through different methods such as the Web Console, Command Line tools, AWS SDKs and IAM HTTPS API.


AWS Services Overview


List of Aws Services

A quick overview of some of the AWS Services and what they do, found this very useful when studying for the AWS certifications.

I will be covering a lot more on AWS as i delve deeper into the course and start looking at other courses as well.

Name of ServiceEasy naming conventionUse This to?
EC2Amazon Virtual MachinesHosting your virtual machines
IAMUsers, Keys and Certs managerSimilar to Active Directory, it's used to manage users, policies and keys
S3Amazon Unlimited FTP ServerHosts files such as, Images, assets for websites, backups and can share these files between services.
VPCA Virtual Private NetworkCreate and configure your own network
LambdaAWS App ScriptsRun snippets of code such as JD, Java or Python.
API GatewayAPI ProxyAllows you to monitor and block specific types of traffic that is targeting your apps
RDSAmazon SQLThis will be where you host your databases. MySQL, PostgreSQL and Oracle are supported
Route53Amazon DNS / DomainsBuy new domain names and configure the DNS records for it
SESAmazon Transactional EmailAllows you to send one-off emails such as password resets and notifications
CloudFrontAmazon CDNAllows a website to load faster by distributing content to a server local to the user
CloudSearchAmazon Full Text SearchPulls in data from other services and allows you to search for something across every instance
DynamoDBAmazon NoSQLautomatically scale-able database
ElastiCacheAmazon MemcachedCaches specific data so that it can be called for more quickly
Elastic TranscoderAmazon Cut ProAllows you to transcode Audio and Video
SQSAmazon QueueStore data that can be processed in the future. Allows data to be called upon more quickly
WAFAWS FirewallBlocks bad requests to CloudFront protected sites.
CognitoAmazon Oauth as a serviceAllows users to login with Social media accounts, i.e google, facebook
Device FarmAWS AndroidAllows you to test your apps against various smart phone Operating Systems
Mobile AnalyticsMobile Analytics.. One that makes sense!Track what users are doing within your app
SNSAmazon MessengerSend mobile notifications, E-mails and/ or SMS Messages
CodeCommitAmazon GitHubVersion control for your code
CodeDeploySend code from CodeCommit to your EC2 Instances
CodePipelineAmazon Continuous IntegrationRun automated tests with your code and make amendments if it passes your tests
EC2 Container ServiceAmazon Docker as a ServicePut a Docker file into an EC2 instance so you can run a webiste
Elastic BeanStalkAmazon Platform as a ServiceMove your app from 3rd party services to Amazon when it gets too expensive
AppStreamAmazon CitrixPuts a copy of a Windows application on a Windows Machine and allows people to connect to that machine
Direct ConnectAmazon Direct ConnectPay an ISP and AWS to get directly connected to the cloud.. Faster Solution
Directory ServiceAmazon Directory ServiceLink applications to Microsoft Active Directory
WorkDocsAmazon File ShareShare Word Documents with colleagues
WorkMailAmazon EmailEmail solution for your business
WorkSpacesAmazon Remote ComputerProvides a Windows box that you can remotely connect too
Service CatalogAmazon Setup AlreadyGive users access to apps that you have already configured
Storage GatewayCloud DriveFor external storage without paying for excess hardware
Data PipelineAmazon ETLManage data from other locations in AWS. Can be scheduled and you can setup alerts for when it fails
Elastic Map ReduceAmazon HadooperAnalyse massive files of raw data that is being kept in S3
GlacierReally slow S3Used for archiving old data - to be used for long term archiving.
You will not be able to get data back quickly unless you pay a fee
KinesisAmazon High ThroughputIngests a lot of data very quickly so it can be referred back to later
RedShiftAmazon Data WarehouseStore analytic data, process it and dump it back out
Machine LearningSkyNet..Predict future behavior from existing data for problems like fraud detection or market research
SWFAmazon EC2 QueueA web service that keeps that current state of your workflow
SnowballAWS NAS DriveAn Amazon Harddrive that you would connect to your network and transfer data between your network and AWS
Cloud FormationAmazon Services SetupConfigure multiple AWS services at once
CloudTrailAmazon LoggingLog who is doing what in your AWS Environment
CloudWatchAmazon Status PagerGet alerts about AWS services failing or disconnecting
ConfigAWS Configuration ManagementTrack AWS changes
OpsWorksAmazon ChefHandle running your application with things like auto-scaling
Trusted AdvisorAmazon PennypincherFind out where all your money is going in AWS
InspectorAWS AuditingScans your AWS setup to see if it is secure