Navigation

AWS aws storage 6 min read

S3 Glacier & Archival

Some data needs to be kept for years but is almost never read again — think old backups, finished project files, compliance records, and medical or legal archives you must retain by law. Storing that data in the regular, instantly-available S3 tier wastes money. S3 Glacier is a family of archival storage classes inside Amazon S3 (Simple Storage Service, AWS’s object storage) that store data for a fraction of the cost in exchange for slower access. The trade is simple: the cheaper the tier, the longer you wait to get your data back.

What “archival” really means

In normal S3, when you ask for an object you get it back in milliseconds. With archival tiers, your data is moved to cold storage that is much cheaper to keep but is not always ready to read instantly. For two of the three Glacier tiers, you must first restore (also called “retrieve”) an object — a request that AWS processes in the background and that can take minutes or hours — before you can download it.

This is the core idea to internalize: Glacier optimizes for storage cost, not access speed. Use it for data you are confident you will rarely or never touch.

The three Glacier tiers

There are three archival storage classes. They all store the same data durably (eleven nines of durability, meaning 99.999999999%), but they differ in retrieval time and price.

Storage class	Retrieval time	Best for	Min. storage charge
Glacier Instant Retrieval	Milliseconds (instant)	Archives you might still need quickly, accessed maybe once a quarter	90 days
Glacier Flexible Retrieval	Minutes to hours (1 min – 12 hrs depending on option)	Backups, DR copies you rarely read	90 days
Glacier Deep Archive	Hours (12 – 48 hrs)	Long-term compliance, “set and forget” for 7–10 years	180 days

Glacier Instant Retrieval

This tier costs more than the other two but gives you the data back instantly, just like Standard storage. Use it when you want archive-level storage prices but cannot tolerate a wait — for example, old media files a user might re-download at any moment.

Glacier Flexible Retrieval

Formerly just called “Glacier.” Cheaper than Instant Retrieval, but you must request a restore. You choose a retrieval speed: Expedited (1–5 minutes, costs more), Standard (3–5 hours), or Bulk (5–12 hours, cheapest, often free). Good for backups you rarely need but want some flexibility on.

Glacier Deep Archive

The cheapest storage on AWS, often quoted around $0.00099 per GB per month (roughly $1 per TB per month). Retrievals take 12 hours (Standard) or up to 48 hours (Bulk). Use it for data you are legally required to keep for years and realistically expect never to read.

When to use Glacier (and when not to)

Use Glacier when data is accessed rarely, you can tolerate (or never expect to need) a delay, and you plan to keep it for a long time. The longer the retention and the rarer the access, the deeper (cheaper) the tier you should pick.

Do NOT use Glacier when data is read often, when you need it back in seconds (except Instant Retrieval), or when you might delete or move it within a few months — early-deletion penalties will wipe out any savings (more on this below).

Moving data into Glacier with a lifecycle rule

You rarely upload directly to Glacier. The normal pattern is to transition objects automatically using an S3 Lifecycle policy — a set of rules that change an object’s storage class as it ages.

Console steps

Open the S3 console and click your bucket name.
Go to the Management tab.
Under Lifecycle rules, choose Create lifecycle rule.
Give the rule a name (e.g. archive-old-logs) and choose a prefix like logs/ to limit its scope, or apply it to the whole bucket.
Under Lifecycle rule actions, tick Move current versions of objects between storage classes.
Add transitions: e.g. to Glacier Flexible Retrieval after 90 days, then to Glacier Deep Archive after 365 days.
Click Create rule.

CLI equivalent

Save the rule to a file, then apply it. AWS CLI v2 uses the storage-class names GLACIER_IR, GLACIER, and DEEP_ARCHIVE.

cat > lifecycle.json <<'JSON'
{
  "Rules": [
    {
      "ID": "archive-old-logs",
      "Filter": { "Prefix": "logs/" },
      "Status": "Enabled",
      "Transitions": [
        { "Days": 90,  "StorageClass": "GLACIER" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ]
    }
  ]
}
JSON

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-archive-bucket-0a1b2c3d \
  --lifecycle-configuration file://lifecycle.json

Output:

(no output on success; exit code 0)

Verify it was stored:

aws s3api get-bucket-lifecycle-configuration --bucket my-archive-bucket-0a1b2c3d

Output:

{
    "Rules": [
        {
            "ID": "archive-old-logs",
            "Filter": { "Prefix": "logs/" },
            "Status": "Enabled",
            "Transitions": [
                { "Days": 90, "StorageClass": "GLACIER" },
                { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
            ]
        }
    ]
}

Getting your data back (restoring)

For Flexible Retrieval and Deep Archive you must restore an object before downloading. This kicks off a background job and creates a temporary copy you can read for a set number of days.

aws s3api restore-object \
  --bucket my-archive-bucket-0a1b2c3d \
  --key logs/2024/app.log.gz \
  --restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Bulk"}}'

This says: make the object available for 7 days, using the cheap Bulk tier. Check progress with head-object — ongoing-request="true" means it is still restoring.

aws s3api head-object --bucket my-archive-bucket-0a1b2c3d --key logs/2024/app.log.gz

Output:

{
    "Restore": "ongoing-request=\"true\"",
    "StorageClass": "DEEP_ARCHIVE",
    "ContentLength": 524288000
}

Once ongoing-request="false", download it normally with aws s3 cp. Glacier Instant Retrieval needs no restore — you read it like any other object.

The big gotcha: cheap to store, expensive to wake up

Deep Archive’s tiny storage price hides three costs that bite if you misjudge your access pattern:

Cost warning: Glacier Deep Archive charges (1) a per-GB retrieval fee when you restore, (2) data-transfer-out charges to download, and (3) enforces a 180-day minimum storage charge — delete or move an object before 180 days and you still pay as if it stayed the full 180 days. Glacier Instant and Flexible Retrieval carry a 90-day minimum.

So if you archive 5 TB to Deep Archive and discover next week you need it all back, you pay the retrieval fee, the egress fee, and the full 180 days of storage anyway. For data you might actually need soon, leaving it in S3 Standard (or S3 Standard-IA) is often cheaper overall. Archive only what you are confident is truly cold.

Best practices

Match the tier to real access patterns. Look at S3 Storage Lens or access logs before choosing; do not guess.
Never archive data younger than the minimum storage period you expect to keep it. The early-deletion penalty cancels the savings.
Use Bulk retrieval when you are not in a hurry — it is the cheapest and often free for Flexible Retrieval.
Automate transitions with lifecycle rules rather than changing storage classes by hand; it scales and avoids mistakes.
Consider S3 Intelligent-Tiering if your access pattern is unpredictable — it moves objects to archive tiers automatically and back without retrieval fees on the frequent/infrequent tiers.
Set a realistic restore expiry (Days) so the temporary copy is large enough to use but does not linger and add cost.
Tag and document archived data so future engineers know what it is before paying to restore it.

S3 Glacier & Archival

What “archival” really means

The three Glacier tiers

Glacier Instant Retrieval

Glacier Flexible Retrieval

Glacier Deep Archive

When to use Glacier (and when not to)

Moving data into Glacier with a lifecycle rule

Console steps

CLI equivalent

Getting your data back (restoring)

The big gotcha: cheap to store, expensive to wake up

Best practices

Related Topics