“Cold” cloud storage isn’t cheap at scale. Parking 10 PB in AWS/GCP/Azure with just 2.5% monthly access still lands you in the $3–8M range over 10 years and $6–30M over 20 years, once you include storage, retrieval, and egress. A tape-backed object store sized for the same 10 PB (using realistic server/library/media/ops costs) plus a $500k one-time egress hit comes out around $2.3M over 10 years and $4.1M over 20 years—roughly $1–10M cheaper than staying in cloud cold tiers, depending on which tier you’re in. Cloud cold tiers = perpetual rent + metered reads. Tape = CapEx + stable OpEx, no per-GB retrieval tax. Over long horizons, the tape TCO curve flattens; the cloud curve doesn’t. Repatriation starts to make sense when you have: (a) ≥5–10 PB, (b) 10+ year retention, (c) low-but-steady access, and (d) real requirements for sovereignty, governance, and preservation. The punchline: keep fast-changing workloads in the cloud; move decade-scale archives to a preservation tier you own. Otherwise you’re renting your institutional memory indefinitely.“Cold” cloud storage isn’t cheap at scale. Parking 10 PB in AWS/GCP/Azure with just 2.5% monthly access still lands you in the $3–8M range over 10 years and $6–30M over 20 years, once you include storage, retrieval, and egress. A tape-backed object store sized for the same 10 PB (using realistic server/library/media/ops costs) plus a $500k one-time egress hit comes out around $2.3M over 10 years and $4.1M over 20 years—roughly $1–10M cheaper than staying in cloud cold tiers, depending on which tier you’re in. Cloud cold tiers = perpetual rent + metered reads. Tape = CapEx + stable OpEx, no per-GB retrieval tax. Over long horizons, the tape TCO curve flattens; the cloud curve doesn’t. Repatriation starts to make sense when you have: (a) ≥5–10 PB, (b) 10+ year retention, (c) low-but-steady access, and (d) real requirements for sovereignty, governance, and preservation. The punchline: keep fast-changing workloads in the cloud; move decade-scale archives to a preservation tier you own. Otherwise you’re renting your institutional memory indefinitely.

What 10 PB of Cold Data Really Costs in AWS, GCP, Azure vs Tape Over 20 Years

2025/12/10 01:41

What a 10 PB, 20-year archive really costs in AWS/GCP/Azure vs a tape-backed object store

Why long-term preservation is back on the board agenda

Most cloud bills started life as rounding errors.

A few terabytes here, a backup bucket there, a “just in case” archive for compliance. It looked cheap, flexible, and—critically—someone else’s problem.

Fast-forward a few years:

  • You’ve got petabytes sitting in “cold” tiers.
  • Finance is staring at a 7-figure annual cloud line item for data that almost never changes.
  • Legal and compliance are asking awkward questions about sovereignty, retention, and deletion guarantees.
  • Sustainability is no longer a feel-good slide; it’s an emissions target with your company’s name on it.

This is where preservation storage (data you must keep for 5–20+ years) diverges from operational storage (data you actively use to run the business). Treat them the same, and you’ll overspend on one or under-protect the other.

Cloud cold storage is, in many ways, a rental model that works brilliantly for elastic workloads:

  • Short-term experiments
  • Bursty analytics
  • Seasonal apps
  • Disaster recovery copies you hope to never touch

But for regulatory archives, historical content, medical images, logs, and R&D datasets that must live for a decade or more, the economics and risk profile start to look very different.

That’s why repatriation—moving data out of the cloud and into an on-premise or hosted environment you control—is back in serious conversations. Not because “cloud was a mistake,” but because not all data belongs on a permanent subscription model.

This article walks through a concrete scenario:

We’ll keep the math visible, the tone honest, and the conclusion practical.

The scenario: 10 PB under glass

Let’s pin down the workload so the numbers aren’t hand-wavy.

  • Total archive size: \n 10 PB = 10,000 TB ≈10,000,000 GB
  • Access rate: \n 2.5% of the data accessed per month \n → 0.025 × 10,000,000 GB = 250,000 GB/month \n → ~3,000,000 GB/year retrieved

So you have:

  • A large, relatively stable corpus (10 PB)
  • Low but non-zero access (monthly reads for audits, investigations, restorations, etc.)
  • A long horizon: 5, 10, 20 years

We’ll compare:

Cloud cold storage classes

  • AWS
  • S3 Glacier Instant Retrieval
  • S3 Glacier Flexible Retrieval
  • S3 Glacier Deep Archive
  • Google Cloud Platform
  • Nearline
  • Coldline
  • Archive
  • Azure
  • Blob Storage Cool tier (Cool access tier)

On-premise: a tape-backed object store

Using a cost profile based on LTO.org-style numbers for a mid-size tape infrastructure:

  • Hardware (Server): $35,000
  • Hardware (Disk storage / cache): $25,000
  • Hardware (Tape library + drives): $80,000
  • Media: $63,600
  • Software: $35,000

Initial CapEx = $238,600

Ongoing:

  • Operations staff: $5,000/month = $60,000/year
  • Maintenance: $2,000/year
  • Floor space: $4,500/month = $54,000/year, with 4% annual uplift
  • Energy: not specified; we’ll treat it as zero in the math (real life: add maybe $10–15k/year).

Refresh cycles:

  • Server + disk: every 5 years$60,000 at years 5, 10, 15, 20
  • Tape hardware + media: every 7 years$143,600 at years 7, 14, 21

And we’ll add a one-time cloud egress cost to pull the full 10 PB home.

For egress we’ll assume a large-volume blended rate of $0.05/GB for cloud → on-prem transfers. Real discount deals will vary, but public pricing for big volumes lands in that neighborhood.

Cloud cold storage: the actual cost of “keeping it just in case”

Let’s start with AWS, then we’ll look at GCP and Azure at a high level.

Anatomy of cloud cold storage cost

For each cloud class, your annual cost has two main components:

  1. Storage rent:

Annual storage cost=Data (GB)×Price ($/GB-month)×12

\

  1. Retrieval + egress tolls:

Annual retrieval cost=Data retrieved annually (GB)×(retrieval fee+egress fee)

In our case:

  • Data stored: 10,000,000 GB
  • Data retrieved annually: 3,000,000 GB
  • Egress fee: $0.05/GB (assumed)
  • Retrieval fee: varies by storage class

We’ll use typical US region prices (rounded) as of 2025 for illustration:

AWS (US region; approximate base on AWS advertised pricing (your mileage may very))

  • S3 Glacier Instant Retrieval
  • Storage: ~$0.004/GB-month
  • Retrieval: ~$0.03/GB (class fee)
  • S3 Glacier Flexible Retrieval
  • Storage: ~$0.0036/GB-month
  • Retrieval: ~$0.01/GB (standard retrieval)
  • S3 Glacier Deep Archive
  • Storage: $0.00099/GB-month (AWS public doc)
  • Retrieval: ~$0.02/GB (typical deep archive retrieval)

GCP and Azure have similar shapes—slightly different list prices, but the same pattern: low storage, non-trivial retrieval and egress.


Worked example: AWS S3 Glacier Deep Archive

Let’s fully show the math for the cheapest major cloud cold tier: S3 Glacier Deep Archive.

Inputs

  • Data stored: 10,000,000 GB
  • Storage price (S): $0.00099 per GB-month
  • Annual retrieval volume: 3,000,000 GB
  • Retrieval fee (R): $0.02 per GB
  • Egress fee (E): $0.05 per GB

Step 1 – Annual storage cost

Annual storage=10,000,000×0.00099×12

  • 10,000,000 × 0.00099 = 9,900
  • 9,900 × 12 = $118,800/year

Step 2 – Annual retrieval + egress cost

Annual retrieval=3,000,000×(0.02+0.05)

  • Retrieval + egress = 0.02 + 0.05 = $0.07/GB
  • 3,000,000 × 0.07 = $210,000/year

Step 3 – Total annual cost

Annual total=118,800+210,000=$328,800

Step 4 – 5, 10, 20-year totals

  • 5 years: 5 × 328,800 = $1,644,000
  • 10 years: 10 × 328,800 = $3,288,000
  • 20 years: 20 × 328,800 = $6,576,000

So even in the cheapest AWS cold class, 10 PB with modest ongoing access is:

  • ~$1.64M over 5 years
  • ~$3.29M over 10 years
  • ~$6.58M over 20 years

…and that’s ignoring:

  • Minimum storage duration penalties (90–180 days)
  • Per-request fees
  • Staging copies into S3 Standard while restores are “rehydrated”

You pay rent forever and tolls every time you read.

High-level look at other cloud cold tiers

Using similar reasoning and current public price examples (storage + retrieval + egress), you end up in this ballpark for 10 PB and the same access pattern:

All numbers rounded to nearest $0.1M, assuming similar $0.05/GB egress and representative retrieval fees.

  • AWS
  • Glacier Instant: ~$3.6M (5 yr), $7.2M (10 yr), $14.4M (20 yr)
  • Glacier Flexible: ~$3.1M, $6.1M, $12.2M
  • Glacier Deep Archive: ~$1.6M, $3.3M, $6.6M
  • GCP (Nearline / Coldline / Archive with typical prices like 0.010 / 0.007 / 0.004 $ per GB-month)
  • Nearline: ~$6.9M, $13.8M, $27.6M
  • Coldline: ~$5.3M, $10.5M, $21.0M
  • Archive: ~$3.9M, $7.8M, $15.6M
  • Azure Blob Cool (≈ $0.01–0.0115/GB-month plus retrieval + egress; Azure also has Cold/Archive tiers, but you asked specifically for Cool)
  • Blob Cool: ~$7.8M, $15.6M, $31.2M

The broad pattern:

  • Anything “Instant” or “Cool”: easily $6–15M+ over 10–20 years.
  • Deepest archive tiers (AWS Deep Archive, GCP Archive, Azure Archive) lower the storage part, but retrieval + egress still bite.

Tape-backed object store: your own cold cloud

Now let’s look at the tape-backed object store using the numbers you provided.

One-time CapEx

  • Server hardware: $35,000
  • Disk storage (cache): $25,000
  • Tape library + drives: $80,000
  • Tape media: $63,600
  • Software: $35,000

Initial CapEx=35,000+25,000+80,000+63,600+35,000=$238,600

So $238,600 to stand up the environment.

Annual OpEx (baseline)

  • Operations staff: $5,000/month$60,000/year
  • Maintenance: $2,000/year
  • Floor space: $4,500/month$54,000/year, with 4% annual increase

Ignoring energy for now, your year-1 baseline OpEx:

Year 1 OpEx=60,000+2,000+54,000=$116,000

From year 2 onward, floor space increases:

Floor space in year n=54,000×(1.04)^(n−1)

Everything else (ops + maintenance) we’ll keep flat for simplicity.

Refresh cycles

  • Every 5 years (5, 10, 15, 20): $60,000 for server + disk upgrades
  • Every 7 years (7, 14, 21…): $143,600 for tape hardware + new media (if needed)

These are big, but they happen only a few times over 20 years.

Putting it together: TCO over 5, 10, 20 years

Running the math with:

  • Initial CapEx
  • Annual OpEx (with escalating floor space)
  • Scheduled refreshes

You get approximate totals:

  • 5-year TCO (tape env only)$0.90M
  • 10-year TCO$1.77M
  • 20-year TCO$3.61M

Now add the one-time egress to bring the 10 PB home:

  • 10,000,000 GB × $0.05/GB = $500,000

So “tape + migration” TCO:

| Horizon | Tape env only | + 10 PB egress | Total Tape+Egress TCO | |----|----|----|----| | 5yr | $0.90M | $0.50M | $1.40M | | 10yr | $1.77M | $0.50M | $2.27M | | 20yr | $3.61M | $0.50M | $4.11M |

Remember: that’s with no energy cost added. Even if you add $10–15k/year for power and cooling, the totals barely move compared to multi-million cloud bills.

Head-to-head: cloud vs tape for 10 PB

Versus AWS S3 Glacier Deep Archive

Compare “cheapest viable cloud cold” vs “your own tape cloud”:

| Horizon | AWS S3 Glacier Deep Archive | Tape + Egress | Cloud – Tape (extra spend) | |----|----|----|----| | 5yr | $1.64M | $1.4M | +0.24M | | 10yr | $3.29M | $2.27M | +1.02M | | 20yr | $6.58M | $4.11M | +2.47M |

Interpretation:

  • Over 5 years, tape wins, but by a modest margin (~$240k).
  • Over 10 years, tape+egress is about $1M cheaper.
  • Over 20 years, tape+egress is roughly $2.5M cheaper.

And that’s against the cheapest of the mainstream cloud cold tiers.

Versus more expensive tiers (Instant, Nearline, Cool), tape wins by multiple millions.

Pros and cons: this isn’t just about dollars

Cloud cold storage – PROs

  • No CapEx, no capacity planning
  • You “flip a switch,” you pay monthly.
  • Good when future capacity needs are uncertain.
  • Global durability and availability baked in
  • Multi-AZ, multi-region options; you inherit their durability and uptime SLAs.
  • Operational simplicity
  • No hardware lifecycle management, no tape handling, no real estate decisions.
  • Elastic recovery and DR use cases
  • Easy to wire into cloud-native DR patterns, cross-region replication, etc.

Cloud cold storage – CONs

  • Permanent rental model
  • The longer you keep data, the more the cloud wins—not you.
  • For 10–20-year archives, the TCO behaves like a perpetual tax.
  • Retrieval + egress penalties
  • You pay every time you actually use your preserved data.
  • Harder to justify new analytics or reuse on old data when the meter is running.
  • Pricing opacity and volatility
  • New tiers, new knobs, “discounts” that add complexity.
  • It’s easy to end up paying for the wrong tier or misconfigured lifecycle.
  • Lock-in
  • The more archive data you park, the harder it becomes politically and financially to leave.

Tape-backed object store – PROs

  • TCO flattens over time
  • After you absorb CapEx and migration, your run-rate is stable, and the 5/7-year refresh cycles are predictable.
  • The 10–20-year graph looks more like a plateau than an ever-increasing slope.
  • No per-GB retrieval fees
  • You can read as much as you want; the cost is operational, not metered.
  • Incentivizes more reuse of historical data instead of hoarding it in the dark.
  • Control and sovereignty
  • Physical control over media.
  • You decide destruction procedures, air-gapping, and chain of custody.
  • Preservation-friendly
  • Well-designed tape workflows (with checksums, periodic scrubbing, and migration) are still considered among the most robust strategies for long-term digital preservation.

Tape-backed object store – CONs

  • Up-front CapEx and project risk
  • You need budget, design time, and internal champions to execute.
  • Migration from cloud introduces scheduling, bandwidth, and cutover risk.
  • Operational complexity
  • Tape libraries, drives, media management, and monitoring are not “set and forget.”
  • You need people (or managed services) with relevant skills.
  • Physical dependency
  • Libraries and robotics can fail.
  • You must design for spare parts, vendor support windows, and disaster recovery.
  • Scalability and agility trade-offs
  • Scaling beyond initial design requires deliberate planning and additional CapEx.
  • You don’t get the same “spin up another region” flexibility you have in cloud.

When does repatriation make sense?

From a C-suite / IT management lens, you repatriate preservation data when:

  1. The dataset is large enough
  • Order of magnitude: ≥ 5–10 PB and growing.
  • Below that, the complexity may outweigh the savings unless there are strong non-financial drivers.
  1. The access pattern is low but steady
  • You do touch the data (audits, research, investigations, training sets) but not daily.
  • Enough to be penalized by retrieval/egress, but not enough to justify hot storage.
  1. The retention horizon is long
  • 10+ years is where the cloud rent really starts to look ugly compared to “owning the house.”
  • Regulatory or institutional mandates often live in this band (records, medical, scientific, cultural).
  1. Governance and sovereignty matter
  • You need to prove, with confidence, that:
    • Data hasn’t been altered.
    • Retention and deletion policies are actually enforced.
    • Data is stored in specific jurisdictions under your operational control.
  1. You can staff or source the operational competency
  • Either in-house tape and archive expertise, or a managed service provider who can run the environment with clear SLAs.

If you check most of those boxes, cloud cold storage for preservation becomes less “strategic agility” and more “expensive comfort blanket.”

A practical decision framework for the board and CIO

If you’re trying to decide what to do with a 10 PB archive sitting in Glacier/Coldline/Blob Cool, here’s a simple roadmap you can put in front of the C-suite.

==Step 1 – Inventory and segment the archive==

  • How much of the 10 PB is:
  • Regulatory / compliance data?
  • High-value institutional memory (R&D, design, media, logs)?
  • Low-value “just in case” junk?
  • Segment by:
  • Retention requirement (years)
  • Access pattern (frequency, type of queries)
  • Business owner and risk tolerance

==Step 2 – Build a real baseline of current cloud cost==

  • Pull 12–24 months of actual bills.
  • Separate:
  • Storage charges (by tier)
  • Retrieval charges
  • Egress charges
  • Per-request charges
  • Project those costs over 5 / 10 / 20 years using your actual growth assumptions, not vendor marketing examples.

This is where the 10 PB, 2.5% monthly access math above gives you a sanity check: if your model says your TCO will be $800k over 20 years, the model—not the math—is wrong.

==Step 3 – Design an on-prem (or hosted) preservation tier==

  • Size the tape environment (drives, slots, media) and disk cache for:
  • Current 10 PB + expected growth.
  • Monthly access pattern (enough drives/cache to keep SLA reasonable).
  • Get real quotes for:
  • Hardware (library, drives, servers, storage)
  • Software (object storage, HSM, preservation workflows)
  • Support contracts
  • Floor space and power

Compare the multi-year CapEx/OpEx profile against your cloud baseline.

==Step 4 – Plan the migration and cutover==

  • Decide how quickly you want to move 10 PB out:
  • Network only?
  • Cloud provider’s bulk export devices?
  • Hybrid (start with hottest subsets)?
  • Take into account:
  • Network capacity and impact on other workloads.
  • Order of operations (which buckets/collections first).
  • Parallel validation and fixity checking (hash verification).
  • Temporary storage needed for staging.

This is where the one-time $500k egress is a financial decision: pay now, buy back long-term control.

==Step 5 – Embed governance and preservation practices==

Regardless of where the data lives, for long-term preservation you need:

  • Clear retention schedules and legal holds.
  • Fixity checks (hash verification) over time.
  • Format migration plans (file formats, codecs, schemas will age).
  • Auditability (who accessed, what was restored, what was changed).

A tape-backed object store can become your institutional memory tier, but only if governance and process are designed along with the hardware.

==Step 6 – Revisit every 5 years==

  • On each tech refresh cycle (5 or 7 years), reassess:
  • Storage media roadmap (LTO generations, new cold media technologies).
  • Cloud pricing changes (cloud isn’t standing still).
  • Regulatory changes (retention, privacy, jurisdiction).
  • Sustainability targets.

The decision you make today is not irrevocable; what matters is not sleepwalking into paying millions in rent for what could have been an owned, well-run archival estate.

The bottom line for the C-suite

For operational workloads, cloud is still a powerful tool.

For 10 PB of long-term preservation data with modest ongoing access, cloud cold tiers look a lot like a lifetime subscription to your own history, with a toll booth every time you try to read it.

A tape-backed object store, properly designed and operated, turns that into:

  • A predictable, flattening TCO curve over 20 years.
  • No per-GB read penalties, encouraging reuse and analysis.
  • Improved sovereignty, governance, and preservation control.

The decision is not “cloud bad, tape good.” It’s:

If you’re holding 10 PB of “cold” data in the cloud today and expecting to keep it there for 10–20 years, you’re not just storing history—you’re renting it.

The question for the C-suite is simple:

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like