Interview Questions: Storage & Databases

Storage and database questions separate people who can name AWS services from people who can pick the right one. Interviewers want to hear trade-offs: when object storage beats a block disk, why a standby database does not give you more read capacity, and how a bad key choice quietly throttles a fast NoSQL table. This page gives a model answer for each common question, plus the deeper follow-up the interviewer is really probing for, so you can defend your choice instead of just listing features.

What are the S3 storage classes, and how do you choose?

Amazon S3 (Simple Storage Service) is object storage — you store files (“objects”) in “buckets” and access them over HTTP. Every object is stored across multiple Availability Zones, so it is extremely durable (Amazon designs it for 99.999999999% — “eleven nines” — durability). The classes differ mainly in how often you expect to read the data and how fast you need it back.

Storage class	Best for	Retrieval	Relative cost
S3 Standard	Frequently accessed data, websites, active datasets	Instant	Highest storage, lowest access
S3 Intelligent-Tiering	Unknown or changing access patterns	Instant	Auto-moves data to save money
S3 Standard-IA (Infrequent Access)	Backups read a few times a year	Instant, but per-GB read fee	Cheaper storage, retrieval fee
S3 Glacier Instant Retrieval	Archives needed instantly but rarely	Instant	Cheap storage, higher read fee
S3 Glacier Deep Archive	Long-term compliance archives	Minutes to 12 hours	Cheapest storage of all

When to use which: Use Standard for anything served to users. Use Intelligent-Tiering when you genuinely do not know the access pattern — it moves objects between tiers automatically for a tiny monitoring fee, so you never guess wrong. Use Glacier Deep Archive for “keep for 7 years for the auditors” data you hope to never read.

Set a class at upload time with the CLI.

aws s3 cp backup-2026-06.tar.gz s3://my-app-backups/ \
  --storage-class STANDARD_IA

Output:

upload: ./backup-2026-06.tar.gz to s3://my-app-backups/backup-2026-06.tar.gz

The cost gotcha interviewers dig for: The cheaper tiers are cheaper to store but charge a per-GB retrieval fee and have a minimum storage duration (for example, 30 days for Standard-IA, 180 days for Deep Archive). If you put rapidly-read data in Standard-IA to “save money,” the retrieval fees can cost more than Standard. The real answer is “match the class to the access frequency, and use a lifecycle policy to age objects down automatically.”

Is S3 strongly consistent?

Yes. Since December 2020, S3 provides strong read-after-write consistency for all operations. After you successfully write or overwrite an object, any subsequent read returns the latest version — there is no stale window. This matters in interviews because older tutorials still say S3 is “eventually consistent,” which is now outdated. The correct modern answer: reads are strongly consistent, no special configuration needed.

EBS vs EFS vs S3 — when do you use each?

These are three completely different storage shapes, and mixing them up is a classic red flag.

Amazon EBS (Elastic Block Store) is a virtual hard disk attached to one EC2 instance. It behaves like a local drive — you format it and mount it. It generally lives in one AZ and (for most volume types) attaches to one instance at a time.
Amazon EFS (Elastic File System) is a shared network file system. Many EC2 instances can mount it at once and see the same files, and it spans multiple AZs automatically.
Amazon S3 is object storage accessed over an API, not a mounted disk. You cannot run a database or OS off it directly.

Feature	EBS	EFS	S3
Type	Block (a disk)	File (shared NFS)	Object (API)
Attaches to	One instance (usually)	Many instances	Anything via HTTP
Scope	Single AZ	Multi-AZ	Region-wide
Best for	Boot volumes, databases	Shared content for a fleet	Backups, media, static sites, data lakes
Cost shape	Pay for provisioned GB	Pay for used GB	Pay for used GB, lowest

When to use which: Use EBS for a single server’s disk — the OS and the data files of a database like MySQL. Use EFS when several servers must read and write the same files at once (a shared upload folder for an autoscaling web fleet). Use S3 for storing and serving files your application fetches by key — images, backups, logs, big data.

Tip: A neat one-liner: “EBS is a disk for one machine, EFS is a shared drive for many machines, S3 is a giant filing cabinet you talk to over the network.”

RDS Multi-AZ vs read replicas — what is the difference?

This is one of the most common database questions, and people mix them up constantly. Amazon RDS (Relational Database Service) is managed SQL (Structured Query Language) databases like MySQL, PostgreSQL, and SQL Server.

Multi-AZ is about availability, not performance. AWS keeps a synchronous standby copy in a second AZ. The standby is not readable — it just sits there. If the primary fails, RDS automatically promotes the standby (failover usually completes in 60–120 seconds). You never read from the standby.
Read replicas are about scaling reads. AWS keeps one or more asynchronous copies that you can read from. You point reporting queries and read-heavy traffic at them to take load off the primary. They can be in the same Region or another Region.

Aspect	Multi-AZ	Read replica
Purpose	High availability / failover	Read scaling
Replication	Synchronous	Asynchronous
Can you read it?	No	Yes
On primary failure	Auto-promotes standby	Manual promotion possible
Helps with	Downtime	Slow read performance

Create a read replica with the CLI.

aws rds create-db-instance-read-replica \
  --db-instance-identifier mydb-read-1 \
  --source-db-instance-identifier mydb-primary

Output:

{
  "DBInstance": {
    "DBInstanceIdentifier": "mydb-read-1",
    "DBInstanceStatus": "creating",
    "ReadReplicaSourceDBInstanceIdentifier": "mydb-primary"
  }
}

The trap: “We’re getting slow reads, so let’s turn on Multi-AZ.” Wrong — Multi-AZ does nothing for read load because the standby is invisible to your app. The fix for read load is a read replica. The fix for downtime is Multi-AZ. Strong candidates also mention Amazon Aurora, AWS’s high-performance MySQL/PostgreSQL-compatible engine, which can have up to 15 read replicas sharing one storage layer, so replicas catch up almost instantly.

SQL vs NoSQL — when do you pick DynamoDB over RDS?

SQL (relational) databases like RDS store data in tables with rows and columns, enforce a fixed schema, support joins, and give you transactions across tables. NoSQL databases like Amazon DynamoDB store flexible key-value or document items and scale horizontally to enormous request rates with single-digit millisecond latency.

Pick relational (RDS/Aurora) when your data is highly related, you need joins and complex queries, and consistency across tables matters — orders, accounting, inventory. Pick DynamoDB when you have a huge, predictable access pattern (look up an item by a known key), need massive scale, and want a serverless database with no servers to manage — user sessions, shopping carts, IoT event streams, leaderboards.

Warning: DynamoDB is fast only if you query by key. It has no efficient “find all rows where X” without a secondary index. If you cannot describe your queries up front, a relational database is usually the safer choice.

How do you design a DynamoDB partition key to avoid hot partitions?

DynamoDB splits your table across many physical partitions, and it uses the partition key of each item to decide which partition stores it. A hot partition happens when too much traffic hits one key value, so one partition is overloaded while others sit idle — and you get throttled even though your total provisioned capacity looks fine.

The fix is to choose a partition key with high cardinality — many distinct values that spread requests evenly.

Bad key: status (only a few values like active/inactive) — everything piles onto two partitions.
Good key: userId or orderId — millions of distinct values spread load evenly.
Time-series trap: using today’s date (2026-06-15) as the key sends all of today’s writes to one partition. Fix it by adding a suffix — 2026-06-15#7 (“write sharding”) — to spread writes across several keys.

Tip: If you must aggregate by a low-cardinality field, store that field in a Global Secondary Index (GSI) for querying, but keep the table’s partition key high-cardinality so writes stay spread out.

Best Practices

Match each S3 object to a storage class by access frequency, and use lifecycle policies to age data into cheaper tiers automatically.
Use EBS for one server, EFS for many servers sharing files, and S3 for API-accessed objects — never force-fit one shape.
Reach for Multi-AZ for availability and read replicas for read scaling — they solve different problems.
Enable encryption at rest on EBS, RDS, and S3, and turn on S3 Block Public Access unless a bucket truly must be public.
Choose high-cardinality DynamoDB partition keys and apply write sharding for time-series or low-cardinality access patterns.
Pick relational for related data and complex queries, and DynamoDB for known-key lookups at massive scale.