S3 vs. EBS: Choosing the Right AWS Storage for Your Workload

When building on AWS, one of the most common architectural decisions is choosing between S3 and EBS — both store data, but they are fundamentally different tools designed for entirely different jobs. Using the wrong one is like bringing a filing cabinet to a construction site: technically it holds things, but it's the wrong tool for the context.

TL;DR — Quick Comparison

Dimension	Amazon S3	Amazon EBS
Storage Type	Object Storage	Block Storage
Access Model	HTTP/S API (REST)	Mounted as a local disk (OS-level)
Attachment	Not attached to any instance; accessed over the internet/VPC endpoint	Attached to a single EC2 instance (within same AZ)
Latency	Milliseconds (network I/O)	Sub-millisecond (local block I/O)
Scalability	Virtually unlimited, auto-scales	Fixed provisioned size (can be resized manually)
Durability	99.999999999% (11 nines), replicated across ≥3 AZs	Replicated within a single AZ
Use Case	Static assets, backups, data lakes, ML datasets	OS boot volumes, databases, transactional workloads
Pricing Model	Pay per GB stored + requests	Pay per GB provisioned + IOPS (for io1/io2)
Persistence	Independent of any compute resource	Lifecycle tied to AZ; survives instance stop/terminate (if configured)

The Core Architectural Difference

The distinction is not just about features — it's about the storage abstraction layer each service operates at.

EBS (Block Storage): Presents raw storage blocks to the OS, just like a physical hard drive. The OS formats it with a filesystem (ext4, xfs, NTFS) and reads/writes at the block level. Applications see it as /dev/xvdf or a drive letter — they have no idea it's network-attached.
S3 (Object Storage): Stores data as discrete objects (file + metadata + unique key) in a flat namespace (buckets). There is no filesystem. You interact via HTTP PUT/GET/DELETE API calls. You cannot "open" a file and append a single byte — you must re-upload the entire object.

Analogy: Think of EBS as a private office desk drawer — only you (your EC2 instance) can open it, you can edit documents in place, and it's fast. S3 is the company's shared document archive room — anyone with a key card (IAM permissions) can retrieve a document, but you must check out the whole file, edit it, and re-file it. The archive room is massive and never runs out of space; your desk drawer is fixed-size but instant to access.

Architecture: How Each Service Connects to EC2

graph LR subgraph AZ1["Availability Zone A"] EC2["EC2 Instance"] EBS_ROOT["EBS Root Volume /dev/xvda (gp3)"] EBS_DATA["EBS Data Volume /dev/xvdf (io2)"] EC2 -- "Block I/O (sub-ms latency)" --> EBS_ROOT EC2 -- "Block I/O (sub-ms latency)" --> EBS_DATA end subgraph REGION["AWS Region (us-east-1)"] S3["Amazon S3 Bucket (Multi-AZ replicated)"] VPCEndpoint["VPC Gateway Endpoint"] end EC2 -- "HTTPS via VPC Endpoint" --> VPCEndpoint VPCEndpoint --> S3 Lambda["AWS Lambda"] -- "HTTPS API" --> S3 OnPrem["On-Premises"] -- "HTTPS API" --> S3

EC2 Instance has an EBS root volume (/dev/xvda) attached directly within the same Availability Zone — this is the boot disk.
A second EBS data volume can be attached for databases or high-IOPS workloads, also within the same AZ.
The same EC2 instance accesses S3 over the network (ideally via a VPC Gateway Endpoint to avoid public internet traversal and reduce cost).
S3 is a regional service — objects are replicated across multiple AZs automatically, making it inherently more durable than a single EBS volume.
Other services (Lambda, another EC2 in a different AZ, on-premises systems) can all access the same S3 bucket simultaneously — EBS cannot be shared this way (EBS Multi-Attach is limited to io1/io2 volumes and specific use cases).

Can You Use S3 as the Main Disk for EC2?

No — not directly, and not as a replacement for EBS. Here is the precise technical reason:

EC2 requires a block device to boot from. The OS kernel needs to read raw blocks to load the bootloader, kernel, and root filesystem. S3 does not expose a block device interface.
AWS does not provide a native mechanism to mount S3 as a POSIX-compliant block device for use as a boot volume.
While open-source tools like s3fs-fuse can mount an S3 bucket as a FUSE filesystem, this is not recommended for production — it has significant performance limitations, no atomic operations, and consistency caveats. It is not a substitute for EBS.

The correct pattern is: EBS for the OS and runtime disk; S3 for application data that needs to be shared, archived, or accessed at scale.

Deep Dive: When to Use Each

Use EBS When:

Running a relational database (PostgreSQL, MySQL) directly on EC2 — requires low-latency, consistent block I/O.
Your application needs a traditional filesystem with random read/write access (e.g., log files being actively written).
You need sub-millisecond latency — EBS io2 Block Express volumes offer the lowest latency for I/O-intensive workloads.
Running boot volumes — always EBS (or instance store, which is ephemeral).

Use S3 When:

Storing static website assets, images, videos, or documents.
Building a data lake — S3 integrates natively with Athena, Glue, EMR, and Redshift Spectrum.
Storing application backups or EBS snapshots (EBS snapshots are stored in S3 internally, though not directly accessible as objects).
Distributing content globally via CloudFront with S3 as the origin.
Storing ML training datasets consumed by SageMaker.
Any workload requiring multi-region or cross-account access to the same data.

EBS Volume Types — Choosing the Right Tier

Volume Type	Best For	Max IOPS
gp3 (General Purpose SSD)	Most workloads, boot volumes, dev/test	16,000
io2 Block Express	Mission-critical databases (Oracle, SAP HANA)	256,000
st1 (Throughput HDD)	Big data, log processing (sequential reads)	500 (MB/s throughput)
sc1 (Cold HDD)	Infrequently accessed data, lowest cost block storage	250 (MB/s throughput)

Note: Exact IOPS limits and pricing vary. Always verify current specifications in the official AWS EBS documentation.

S3 Storage Classes — Optimizing Cost

graph LR S3["S3 Object Upload"] --> Q{"Access Pattern?"} Q -- "Frequent" --> STD["S3 Standard"] Q -- "Unknown" --> IT["S3 Intelligent-Tiering"] Q -- "Infrequent" --> IA["S3 Standard-IA"] Q -- "Rare / Archive" --> GL{"Retrieval Speed?"} GL -- "Milliseconds" --> GI["Glacier Instant Retrieval"] GL -- "Minutes" --> GF["Glacier Flexible Retrieval"] GL -- "Hours" --> GDA["Glacier Deep Archive"]

S3 Standard — Default. Frequently accessed data. Highest storage cost, lowest retrieval cost.
S3 Intelligent-Tiering — AWS automatically moves objects between access tiers based on usage patterns. Ideal when access patterns are unpredictable.
S3 Standard-IA / One Zone-IA — Infrequent access. Lower storage cost but retrieval fee applies.
S3 Glacier Instant / Flexible / Deep Archive — Long-term archival. Retrieval times range from milliseconds to hours. Lowest storage cost.

Practical Implementation: Attaching EBS and Accessing S3 from EC2

Attach and Mount an EBS Volume (CLI)

🔽 [Click to expand] — Attach EBS volume to EC2 and mount it

# Step 1: Attach the EBS volume to your EC2 instance
aws ec2 attach-volume \
  --volume-id vol-0abcd1234efgh5678 \
  --instance-id i-0abcd1234efgh5678 \
  --device /dev/xvdf \
  --region us-east-1

# Step 2: SSH into your EC2 instance, then format the volume (first time only)
sudo mkfs -t xfs /dev/xvdf

# Step 3: Create a mount point and mount the volume
sudo mkdir /data
sudo mount /dev/xvdf /data

# Step 4: Make the mount persistent across reboots
# Get the UUID of the device
sudo blkid /dev/xvdf

# Add to /etc/fstab (replace UUID with actual value from blkid output)
echo "UUID=your-uuid-here  /data  xfs  defaults,nofail  0  2" | sudo tee -a /etc/fstab

Access S3 from EC2 Using IAM Role (Best Practice)

🔽 [Click to expand] — IAM policy and CLI commands for S3 access from EC2

// IAM Policy (least privilege) — attach to the EC2 instance role
// Grants read/write access to a specific S3 bucket only
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3BucketAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-application-bucket",
        "arn:aws:s3:::my-application-bucket/*"
      ]
    }
  ]
}

# From EC2 (with the above role attached), use AWS CLI:
# Upload a file to S3
aws s3 cp /data/report.csv s3://my-application-bucket/reports/report.csv

# Download a file from S3
aws s3 cp s3://my-application-bucket/reports/report.csv /data/report.csv

# Sync a local directory to S3
aws s3 sync /data/exports/ s3://my-application-bucket/exports/

Create a VPC Gateway Endpoint for S3 (Avoid Public Internet)

🔽 [Click to expand] — CLI command to create S3 VPC Gateway Endpoint

# Create a VPC Gateway Endpoint for S3
# This routes S3 traffic through AWS's private network — no NAT Gateway needed
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abcd1234efgh5678 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-0abcd1234efgh5678 \
  --region us-east-1

Decision Flowchart: S3 or EBS?

graph TD START(["New Storage Requirement"]) --> Q1{"Does EC2 need to boot from it?"} Q1 -- "Yes" --> EBS1["Use EBS (Root Volume, gp3)"] Q1 -- "No" --> Q2{"Does the app need random read/write at block level?"} Q2 -- "Yes" --> Q3{"Latency critical? Database workload?"} Q3 -- "Yes" --> EBS2["Use EBS io2 (High IOPS)"] Q3 -- "No" --> EBS3["Use EBS gp3 (General Purpose)"] Q2 -- "No" --> Q4{"Need to share data across services or regions?"} Q4 -- "Yes" --> S3A["Use Amazon S3"] Q4 -- "No" --> Q5{"Large scale, unstructured data or backups?"} Q5 -- "Yes" --> S3B["Use Amazon S3 (+ appropriate storage class)"] Q5 -- "No" --> EBS4["Use EBS gp3 (Default choice)"]

Glossary

Term	Definition
Block Storage	Storage that exposes raw data blocks to the OS, which formats them with a filesystem. Enables random read/write at the byte level.
Object Storage	Storage that manages data as discrete objects (data + metadata + key). Accessed via API; no filesystem abstraction.
IOPS	Input/Output Operations Per Second — the key performance metric for block storage, critical for database workloads.
VPC Gateway Endpoint	A horizontally scaled, redundant VPC component that enables private connectivity from your VPC to S3 or DynamoDB without traversing the public internet.
EBS Multi-Attach	A feature allowing a single io1 or io2 EBS volume to be attached to multiple EC2 instances within the same AZ simultaneously. Requires application-level coordination for write consistency.

Wrap-Up & Next Steps

The rule of thumb is simple: EBS is your disk, S3 is your warehouse. Use EBS for anything that needs to behave like a local hard drive — OS, databases, active application files. Use S3 for anything that needs to be stored at scale, shared across services, or accessed via API. Never try to force S3 into the role of a block device for production workloads.

For further reading, consult the official AWS documentation:

Search This Blog

SW BBANG