Legacy Migration Starts with Understanding, not Inventory

An alternative to scale and direct legacy migration activities

The Default Playbook

Legacy migration has an almost universal playbook:

Step 1 - Asset Discovery: Make an export of all assets in the environment to see what is there to migrate. The output is usually a massive spreadsheet or dashboard. The leadership sees the number and it becomes the anchor: We have X thousand resources to migrate.
Step 2 - Categorize: Tag the assets based on their domain, owner and the type of migration they potentially require. This is usually done manually.
Step 3 - Assign: Now that we know what we have, and who owns it, it's time to assign each slice to a team and say: These are yours. Assess them, decide what to do with each one, and report back.
Step 4 - Teams Investigate and Act: Each team is expected to review their assigned assets, determine what's still needed, plan the migration, rewrite what's valuable, and flag the rest for deletion.
Step 5 - Track and Report Progress: A program manager tracks completion rates. Dashboards show how many assets have been categorized, how many migrated, how many deleted, and the progress is measured as a percentage of the original inventory.

The Hidden Assumption

Every step in this playbook sounds reasonable on its own. Together, they rest on assumptions that rarely hold:

Every asset has an owner

The plan assumes that for each resource, someone in the organization knows what it is, why it exists, and whether it still matters. In practice, organic environments accumulate resources that outlive the teams or individuals who created them. The owner left, the project ended, but the assets are running and getting billed.

Assets map to products or business domains

The plan assumes you can draw lines from resources to business capabilities:

these tables belong to marketing analytics.
those belong to finance reporting.

...

In reality, only a small percentage of assets map cleanly to such groups. Most are organized around pipelines, ad-hoc SQL, and intermediate computation steps, not around products.

Inventory equals understanding

This is the deepest assumption. Listing everything feels like progress. But any environment painful enough (in cost or maintenance) to justify migration has accumulated significant scale. The inventory will contain hundreds of thousands of assets with no inherent grouping. It feels like the first logical step. It's actually the first illusion of control.

Teams can self-serve classification

The plan assumes you can hand teams a filtered list and they'll sort it out: keep, migrate, delete. But it requires:

Teams actually recognize the assets. Which usually means reverse-engineering their history.
The rest of the organization, dependent on those resources, holds still. But what if they change their process during the migration, and with it, their requirements?

Effort scales with volume

The intuition is: twice as many assets, twice as much work. But in reality effort scales with ambiguity. A thousand well-structured, well known assets might be easier to migrate than a handful of orphaned, unnamed ones. The cost is in the investigation and understanding, not the count.

What You Actually Find

I recently faced this exact situation, and the reality was eye-opening. A data warehouse that had accumulated tables, data pipelines, and scheduled queries over the past 10 years. The tables alone numbered over half a million.

No reasonable grouping would make that number workable. Not even if every technical member focused solely on migration and parked all other work.

So I asked a different question: do we really need to migrate all of this? Asking teams wasn't an option, for all the reasons above. But I could observe the environment directly. Are any of these tables actually being read?

And the result was surprising. Over 99% of the tables hadn't been accessed in the past 90 days. A large portion had never been accessed at all. Of those that were accessed, only a small percentage showed consistent, ongoing usage.

We stopped asking teams to review the full inventory. The migration was not about moving half a million assets to a new platform. It was about finding the ones that were still alive.

Why Understanding IS the Migration

The default playbook frames migration as logistics: X thousand assets need to move from A to B, and progress is the percentage that have moved. This treats every asset as a unit of work. But the real unit of work is not the asset. It is the decision: does this asset still matter?

That reframing changes the shape of the project. A logistics project scales with volume or basically twice the assets, twice the effort. A decision project scales with ambiguity. And ambiguity is not evenly distributed. In our case, one query against 90 days of usage data made the decision for 99% of the environment. No team meetings, no reverse-engineering, no spreadsheets. The remaining 1% was the only part that needed human judgment at all.

This is why inventory-first feels right but leads to pointless activities. An inventory gives leadership a number they can put on a slide: 500,000 assets to migrate. But that number tells you nothing about how many decisions you actually face. It treats an orphaned table untouched for five years the same as a pipeline feeding a daily business report. Usage data makes them obviously different. The inventory makes them equal.

The default playbook also misidentifies what a legacy environment is. It assumes a system something designed, something you can pick up and relocate. But a ten-year-old data warehouse is not a system. It is an accumulation: layers of decisions made by people who are no longer around, solving problems that may no longer exist. You do not relocate an accumulation. You find what is still alive inside it and build forward from there. Martin Fowler describes the general pattern as a Strangler Fig the new system grows around the old one while the old one atrophies. But in a mature legacy environment, most of the atrophy has already happened. The tables are already dead. The pipelines already stopped. You just haven't confirmed it yet.

Understanding the environment does not prepare you for the migration. It is the migration. Once you know what is alive, the rest is cleanup.

An Alternative: Quarantine Before You Classify

The alternative requires a different starting point: let usage tell you what matters, instead of asking teams to figure it out from a spreadsheet.

Start with what you can act on immediately. Usage data (who accessed what, and when) separates the living parts of the environment from the dead ones, without requiring anyone to understand what each asset is.

Once you have that separation, introduce a quarantine window. If a resource hasn't been accessed for a defined period (say 90 days), block access to it and notify the teams. If nobody requests restoration within another 90 days, back it up and delete it. If someone does need it, you've just identified a genuinely valuable asset: route it to the appropriate team and start a proper migration lifecycle for it.

The important assets identify themselves. Instead of teams sifting through hundreds of thousands of items, the environment surfaces its own priorities through actual usage. The noise falls away on its own.

Postpone permanent deletion as long as practically possible. Data loss is the one mistake you can't reverse. Quarantine simulates deletion before you commit to it, and catches seasonal patterns that a 90-day snapshot would miss.

With this model, you can report meaningful progress monthly: deadwood percentage, reduction in active resources, cleanup projections. Metrics an executive can act on, without requiring teams to manually work through irrelevant assets.

The Question to Ask About Your Own Environment

Before committing to the default playbook, ask one question about your legacy environment: what percentage of your assets have been accessed in the last 90 days?

If the answer is low, and it may be far lower than you expect, you are not facing an inventory problem. You are facing an archaeology problem. Archaeology does not start with a catalog. It starts with a question: what here is still alive?