S3 vs. EBS: Choosing the Right AWS Storage for Your Workload

When building on AWS, one of the most common architectural decisions is choosing between S3 and EBS — both store data, but they are fundamentally different tools designed for entirely different jobs. Using the wrong one is like bringing a filing cabinet to a construction site: technically it holds things, but it's the wrong tool for the context.

TL;DR — Quick Comparison

Dimension Amazon S3 Amazon EBS
Storage Type Object Storage Block Storage
Access Model HTTP/S API (REST) Mounted as a local disk (OS-level)
Attachment Not attached to any instance; accessed over the internet/VPC endpoint Attached to a single EC2 instance (within same AZ)
Latency Milliseconds (network I/O) Sub-millisecond (local block I/O)
Scalability Virtually unlimited, auto-scales Fixed provisioned size (can be resized manually)
Durability 99.999999999% (11 nines), replicated across ≥3 AZs Replicated within a single AZ
Use Case Static assets, backups, data lakes, ML datasets OS boot volumes, databases, transactional workloads
Pricing Model Pay per GB stored + requests Pay per GB provisioned + IOPS (for io1/io2)
Persistence Independent of any compute resource Lifecycle tied to AZ; survives instance stop/terminate (if configured)

The Core Architectural Difference

The distinction is not just about features — it's about the storage abstraction layer each service operates at.

  • EBS (Block Storage): Presents raw storage blocks to the OS, just like a physical hard drive. The OS formats it with a filesystem (ext4, xfs, NTFS) and reads/writes at the block level. Applications see it as /dev/xvdf or a drive letter — they have no idea it's network-attached.
  • S3 (Object Storage): Stores data as discrete objects (file + metadata + unique key) in a flat namespace (buckets). There is no filesystem. You interact via HTTP PUT/GET/DELETE API calls. You cannot "open" a file and append a single byte — you must re-upload the entire object.
Analogy: Think of EBS as a private office desk drawer — only you (your EC2 instance) can open it, you can edit documents in place, and it's fast. S3 is the company's shared document archive room — anyone with a key card (IAM permissions) can retrieve a document, but you must check out the whole file, edit it, and re-file it. The archive room is massive and never runs out of space; your desk drawer is fixed-size but instant to access.

Architecture: How Each Service Connects to EC2

graph LR subgraph AZ1["Availability Zone A"] EC2["EC2 Instance"] EBS_ROOT["EBS Root Volume /dev/xvda (gp3)"] EBS_DATA["EBS Data Volume /dev/xvdf (io2)"] EC2 -- "Block I/O (sub-ms latency)" --> EBS_ROOT EC2 -- "Block I/O (sub-ms latency)" --> EBS_DATA end subgraph REGION["AWS Region (us-east-1)"] S3["Amazon S3 Bucket (Multi-AZ replicated)"] VPCEndpoint["VPC Gateway Endpoint"] end EC2 -- "HTTPS via VPC Endpoint" --> VPCEndpoint VPCEndpoint --> S3 Lambda["AWS Lambda"] -- "HTTPS API" --> S3 OnPrem["On-Premises"] -- "HTTPS API" --> S3
  1. EC2 Instance has an EBS root volume (/dev/xvda) attached directly within the same Availability Zone — this is the boot disk.
  2. A second EBS data volume can be attached for databases or high-IOPS workloads, also within the same AZ.
  3. The same EC2 instance accesses S3 over the network (ideally via a VPC Gateway Endpoint to avoid public internet traversal and reduce cost).
  4. S3 is a regional service — objects are replicated across multiple AZs automatically, making it inherently more durable than a single EBS volume.
  5. Other services (Lambda, another EC2 in a different AZ, on-premises systems) can all access the same S3 bucket simultaneously — EBS cannot be shared this way (EBS Multi-Attach is limited to io1/io2 volumes and specific use cases).

Can You Use S3 as the Main Disk for EC2?

No — not directly, and not as a replacement for EBS. Here is the precise technical reason:

  • EC2 requires a block device to boot from. The OS kernel needs to read raw blocks to load the bootloader, kernel, and root filesystem. S3 does not expose a block device interface.
  • AWS does not provide a native mechanism to mount S3 as a POSIX-compliant block device for use as a boot volume.
  • While open-source tools like s3fs-fuse can mount an S3 bucket as a FUSE filesystem, this is not recommended for production — it has significant performance limitations, no atomic operations, and consistency caveats. It is not a substitute for EBS.

The correct pattern is: EBS for the OS and runtime disk; S3 for application data that needs to be shared, archived, or accessed at scale.

Deep Dive: When to Use Each

Use EBS When:

  • Running a relational database (PostgreSQL, MySQL) directly on EC2 — requires low-latency, consistent block I/O.
  • Your application needs a traditional filesystem with random read/write access (e.g., log files being actively written).
  • You need sub-millisecond latency — EBS io2 Block Express volumes offer the lowest latency for I/O-intensive workloads.
  • Running boot volumes — always EBS (or instance store, which is ephemeral).

Use S3 When:

  • Storing static website assets, images, videos, or documents.
  • Building a data lake — S3 integrates natively with Athena, Glue, EMR, and Redshift Spectrum.
  • Storing application backups or EBS snapshots (EBS snapshots are stored in S3 internally, though not directly accessible as objects).
  • Distributing content globally via CloudFront with S3 as the origin.
  • Storing ML training datasets consumed by SageMaker.
  • Any workload requiring multi-region or cross-account access to the same data.

EBS Volume Types — Choosing the Right Tier

Volume Type Best For Max IOPS
gp3 (General Purpose SSD) Most workloads, boot volumes, dev/test 16,000
io2 Block Express Mission-critical databases (Oracle, SAP HANA) 256,000
st1 (Throughput HDD) Big data, log processing (sequential reads) 500 (MB/s throughput)
sc1 (Cold HDD) Infrequently accessed data, lowest cost block storage 250 (MB/s throughput)

Note: Exact IOPS limits and pricing vary. Always verify current specifications in the official AWS EBS documentation.

S3 Storage Classes — Optimizing Cost

graph LR S3["S3 Object Upload"] --> Q{"Access Pattern?"} Q -- "Frequent" --> STD["S3 Standard"] Q -- "Unknown" --> IT["S3 Intelligent-Tiering"] Q -- "Infrequent" --> IA["S3 Standard-IA"] Q -- "Rare / Archive" --> GL{"Retrieval Speed?"} GL -- "Milliseconds" --> GI["Glacier Instant Retrieval"] GL -- "Minutes" --> GF["Glacier Flexible Retrieval"] GL -- "Hours" --> GDA["Glacier Deep Archive"]
  1. S3 Standard — Default. Frequently accessed data. Highest storage cost, lowest retrieval cost.
  2. S3 Intelligent-Tiering — AWS automatically moves objects between access tiers based on usage patterns. Ideal when access patterns are unpredictable.
  3. S3 Standard-IA / One Zone-IA — Infrequent access. Lower storage cost but retrieval fee applies.
  4. S3 Glacier Instant / Flexible / Deep Archive — Long-term archival. Retrieval times range from milliseconds to hours. Lowest storage cost.

Practical Implementation: Attaching EBS and Accessing S3 from EC2

Attach and Mount an EBS Volume (CLI)

🔽 [Click to expand] — Attach EBS volume to EC2 and mount it
# Step 1: Attach the EBS volume to your EC2 instance
aws ec2 attach-volume \
  --volume-id vol-0abcd1234efgh5678 \
  --instance-id i-0abcd1234efgh5678 \
  --device /dev/xvdf \
  --region us-east-1

# Step 2: SSH into your EC2 instance, then format the volume (first time only)
sudo mkfs -t xfs /dev/xvdf

# Step 3: Create a mount point and mount the volume
sudo mkdir /data
sudo mount /dev/xvdf /data

# Step 4: Make the mount persistent across reboots
# Get the UUID of the device
sudo blkid /dev/xvdf

# Add to /etc/fstab (replace UUID with actual value from blkid output)
echo "UUID=your-uuid-here  /data  xfs  defaults,nofail  0  2" | sudo tee -a /etc/fstab
  

Access S3 from EC2 Using IAM Role (Best Practice)

🔽 [Click to expand] — IAM policy and CLI commands for S3 access from EC2
// IAM Policy (least privilege) — attach to the EC2 instance role
// Grants read/write access to a specific S3 bucket only
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3BucketAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-application-bucket",
        "arn:aws:s3:::my-application-bucket/*"
      ]
    }
  ]
}

# From EC2 (with the above role attached), use AWS CLI:
# Upload a file to S3
aws s3 cp /data/report.csv s3://my-application-bucket/reports/report.csv

# Download a file from S3
aws s3 cp s3://my-application-bucket/reports/report.csv /data/report.csv

# Sync a local directory to S3
aws s3 sync /data/exports/ s3://my-application-bucket/exports/
  

Create a VPC Gateway Endpoint for S3 (Avoid Public Internet)

🔽 [Click to expand] — CLI command to create S3 VPC Gateway Endpoint
# Create a VPC Gateway Endpoint for S3
# This routes S3 traffic through AWS's private network — no NAT Gateway needed
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abcd1234efgh5678 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-0abcd1234efgh5678 \
  --region us-east-1
  

Decision Flowchart: S3 or EBS?

graph TD START(["New Storage Requirement"]) --> Q1{"Does EC2 need to boot from it?"} Q1 -- "Yes" --> EBS1["Use EBS (Root Volume, gp3)"] Q1 -- "No" --> Q2{"Does the app need random read/write at block level?"} Q2 -- "Yes" --> Q3{"Latency critical? Database workload?"} Q3 -- "Yes" --> EBS2["Use EBS io2 (High IOPS)"] Q3 -- "No" --> EBS3["Use EBS gp3 (General Purpose)"] Q2 -- "No" --> Q4{"Need to share data across services or regions?"} Q4 -- "Yes" --> S3A["Use Amazon S3"] Q4 -- "No" --> Q5{"Large scale, unstructured data or backups?"} Q5 -- "Yes" --> S3B["Use Amazon S3 (+ appropriate storage class)"] Q5 -- "No" --> EBS4["Use EBS gp3 (Default choice)"]

Glossary

Term Definition
Block Storage Storage that exposes raw data blocks to the OS, which formats them with a filesystem. Enables random read/write at the byte level.
Object Storage Storage that manages data as discrete objects (data + metadata + key). Accessed via API; no filesystem abstraction.
IOPS Input/Output Operations Per Second — the key performance metric for block storage, critical for database workloads.
VPC Gateway Endpoint A horizontally scaled, redundant VPC component that enables private connectivity from your VPC to S3 or DynamoDB without traversing the public internet.
EBS Multi-Attach A feature allowing a single io1 or io2 EBS volume to be attached to multiple EC2 instances within the same AZ simultaneously. Requires application-level coordination for write consistency.

Wrap-Up & Next Steps

The rule of thumb is simple: EBS is your disk, S3 is your warehouse. Use EBS for anything that needs to behave like a local hard drive — OS, databases, active application files. Use S3 for anything that needs to be stored at scale, shared across services, or accessed via API. Never try to force S3 into the role of a block device for production workloads.

For further reading, consult the official AWS documentation:

Related Posts

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Attaching an Internet Gateway and Fixing Route Tables

EC2 SSH Connection Timeout: The Exact Security Group Rules You Need to Fix It

IAM User vs. IAM Role: Why Your EC2 Instance Should Never Use a User