Data migration: the silent killer of ERP programmes

In every ERP transformation programme I've been part of, there's one workstream that consistently gets underestimated, under-resourced, and overlooked until it's too late: data migration.

It's never the glamorous part of the programme. Nobody gets excited about data cleansing. There are no flashy demos to show the steering committee. But time and again, it's the workstream with the power to derail even the most meticulously planned ERP implementation — and it regularly does.

The irony is that everyone involved in ERP delivery knows this. Yet programme after programme, the same mistakes are repeated. Data migration is treated as a technical exercise to be dealt with later, when in reality it's a strategic activity that should be embedded from day one.

The design-data disconnect

At the heart of most data migration failures is a fundamental disconnect between how the new system is designed and the reality of the data that will feed it.

Process architects design the future state assuming clean, structured, complete data. And why wouldn't they? During the design phase, legacy data stays exactly where it is — in the old system. The new process flows are built in a vacuum, tested with manufactured data, and signed off by stakeholders who've never had to reconcile a decade's worth of inconsistencies in a customer master file.

This creates what I call the design-data disconnect. Process architects and data migration teams work in silos. When a process changes — a new approval step, a different tax calculation, a restructured chart of accounts — it creates a silent requirement for data that might not even exist in the legacy system. Nobody tells the migration team. Nobody updates the mapping specification. The gap sits there, invisible, until the moment the data actually hits the new system.

And that moment is almost always during testing.

The lifecycle of data decay in ERP projects

The way data migration failures unfold follows a depressingly predictable pattern.

The design blind spot

During the design phase, architects build what they believe are perfect processes. The logic is sound, the workflows are elegant, the configuration is thorough. But all of this assumes perfect data. Legacy data — with its duplicate records, inconsistent formats, missing fields, and years of accumulated workarounds — remains hidden in the old system. Nobody is forced to confront it yet.

This is the most dangerous phase, because decisions are being made that will create data requirements nobody has identified. A seemingly simple process change ("let's calculate shipping by warehouse zone") can create a dependency on a data field that doesn't exist in the source system. But because the process team and the data team aren't in the same room, the gap goes undetected.

The testing explosion

During SIT and UAT, migrated data finally meets the new system logic. Clean process design collides with messy legacy reality. The result is a wave of transaction failures — not because the software is broken, but because the data inputs are invalid.

Testing halts. Defect logs fill up. The project team scrambles to work out whether each failure is a software issue, a configuration issue, or a data issue. In my experience, the majority turn out to be data. But by this point, the programme has lost weeks, confidence is eroding, and the go-live date is under pressure.

The trust deficit

Even if the technical issues are resolved, the damage often extends beyond the programme team. When an end user — a warehouse manager, a finance controller, a production planner — sees incorrect data in the new system on day one, trust evaporates.

If a warehouse manager sees zero stock for a product they know is sitting on the shelf, they stop trusting the system's reports entirely. It doesn't matter that the software works perfectly and the processes are well designed. The data told them a lie on day one, and they'll build shadow spreadsheets for months afterwards. Recovering from that trust deficit takes far longer than fixing the data itself.

How process changes create silent data requirements

The design-data disconnect isn't abstract — it manifests in very specific, very predictable ways. Here are three patterns I see repeatedly.

The mandatory field ambush

The process team decides the new ERP must automatically calculate shipping costs based on warehouse zone. It's a sensible process improvement. But your legacy data doesn't have a "zone" field — it only has address text. Nobody tells the migration team they need to derive zone from address, or that zone is now a mandatory field.

During testing, every shipping transaction fails because the system hits a null value where it expects a zone code. The tester logs it as a software defect. The developers investigate. Eventually, someone realises the issue is a missing data attribute that nobody told the migration team to extract. Days lost.

Logic drift

To simplify the product catalogue, the business decides to consolidate five product categories into two. The process team updates the configuration. But the data migration team is still mapping legacy codes one-to-five to the old structure, because nobody updated the mapping specification.

When the process is tested, reporting is skewed. Automated workflows don't trigger because the data values don't match the new process logic. The numbers look wrong, but nobody can immediately pinpoint why — because the root cause is a mapping that's two versions behind the process design.

Granularity mismatch

A new finance process requires line-item level tax detail for regulatory compliance. But the legacy system only stored tax at the invoice header level. You cannot decompose a header-level number into line-item detail after the fact — not without significant manual effort or business rules that didn't exist in the old system.

The result: you cannot test the new process because the historical data being migrated isn't granular enough to feed the new engine. The programme faces a choice between delaying go-live, reducing scope, or accepting that historical data won't comply with the new process requirements.

When data migration failures make headlines

Data migration problems don't just delay projects — at scale, they can destroy businesses. These cases demonstrate what happens when data issues aren't treated as a strategic programme risk.

Target Canada

One of the most cited examples of data-driven failure led to Target's complete exit from the Canadian market in 2015. Rather than migrating legacy data from its US systems, Target decided to start fresh with manually entered data — keyed in by entry-level employees working under extreme time pressure.

The result was catastrophic. An estimated 70% of the data contained errors. Incorrect product dimensions meant items wouldn't fit on shelves. Wrong manufacturer codes prevented orders from reaching suppliers. The supply chain collapsed: shelves sat empty while warehouses overflowed with the wrong stock. Target Canada filed for bankruptcy less than two years after launch.

The lesson: manual data entry is not a migration strategy. Even "fresh" data needs governance, validation, and quality controls.

Lidl

In 2018, German grocery giant Lidl scrapped a seven-year SAP implementation after spending an estimated €500 million. The root cause was a fundamental data model mismatch: Lidl traditionally valued inventory at purchase prices, while the standard SAP retail module used retail prices.

Rather than restructuring its data to fit the software — or configuring the software to match its data — Lidl attempted to force both sides through excessive customisation. The system became too brittle to maintain or roll out, and the project was eventually abandoned entirely.

The lesson: data model alignment must be resolved in the design phase, not during implementation. If the fundamental data structures don't match, no amount of customisation will paper over the gap.

Revlon

After acquiring Elizabeth Arden, Revlon attempted to consolidate both companies onto a new SAP S/4HANA system in 2018. The integration of historical data from two separate legacy systems proved far more complex than anticipated.

On go-live, the system could not accurately record or locate inventory at a key manufacturing plant. Revlon was unable to fulfil an estimated $64 million in net sales because they had effectively "lost" visibility of their own stock. The failure triggered investor lawsuits and a significant drop in stock price.

The lesson: merging data from multiple legacy systems requires dedicated integration architecture, not just parallel migration streams. The complexity is multiplicative, not additive.

National Grid

In 2012, National Grid launched a new SAP system that had been tested using standard "happy path" scenarios. When Hurricane Sandy struck the US East Coast shortly after go-live, the system was immediately overwhelmed by complex data scenarios it had never been tested against — emergency pay calculations, irregular shift patterns, and vendor reimbursement logic.

The system miscalculated pay rates across the organisation, leaving some employees significantly underpaid and others overpaid. A backlog of 15,000 unpaid supplier invoices accumulated within two months. The total cost of fixing the botched rollout reportedly approached $1 billion.

The lesson: data migration testing must include stress scenarios and edge cases, not just clean data flows. The real world doesn't send clean data.

The common threads

Across every one of these failures — and the hundreds of less publicised ones that happen every year — the same patterns emerge.

Manual entry is not a strategy

Target Canada proved that humans keying data under pressure will always introduce errors at scale. Automated extraction, transformation, and validation are non-negotiable.

Data model alignment is a design decision

Lidl demonstrated that you cannot resolve a fundamental structural mismatch through customisation. The data model conversation must happen during process design.

Multiple sources multiply complexity

Revlon showed that merging data from separate systems requires its own architecture and governance. Each additional source doesn't add complexity — it multiplies it.

Testing must reflect reality

National Grid proved that "happy path" testing with clean data gives false confidence. Migration testing must include edge cases, historical anomalies, and stress scenarios.

Trial migrations: the dress rehearsals that save your go-live

Think of an ERP migration like a massive theatrical performance. You wouldn't show up on opening night and hope the actors know their lines — you'd run multiple dress rehearsals.

In ERP terms, these rehearsals are trial migrations: full-scale mock go-lives where you move data from the old system to the new one. Doing this early and often is vital, and yet a surprising number of programmes leave their first real migration attempt until a few weeks before cutover.

Finding the data landmines

You don't know what you don't know. A trial migration reveals that a phone number field in your legacy system allows 50 characters, but the new ERP only accepts 15. It surfaces the customer records with missing postcodes, the supplier masters with duplicate tax codes, and the inventory balances that don't reconcile. Find these during a trial in month four and you have time to fix them. Find them during the final go-live and you're in crisis mode.

Timing the cutover window

Moving millions of rows of data takes time — sometimes days. The business needs to know exactly how long the "blackout period" will be: the window between shutting down the old system and going live on the new one. Regular trial migrations are the only reliable way to measure this. Each trial refines the timing, identifies bottlenecks in the load sequence, and builds confidence in the cutover plan. Without this, your go-live timeline is a guess.

Proving the process design

This is where trial migrations connect directly to the design-data disconnect described earlier. A trial migration is the only genuine test of whether the new processes can actually run using real migrated data. If a process change has created a silent data requirement — a new mandatory field, a changed category structure, a different level of granularity — the trial migration will expose it. And it will expose it in a controlled environment where you have time to respond, rather than on go-live day when you don't.

The golden rule

Automate the migration logic early, run trials regularly, and iterate. Every trial should make the process smarter. By the time you reach the final go-live migration, it should be a non-event — because every error, every edge case, every data anomaly was found and resolved months ago. If your final migration is the most stressful day of the programme, you haven't rehearsed enough.

The ETL engine: your migration's backbone

You cannot simply copy and paste data into a modern ERP. Legacy data is messy, fragmented, stored across multiple systems, and structured in ways that bear no resemblance to the target. You need a formal ETL (Extract, Transform, Load) process to act as the filter and translator between old and new.

Extract

Pull data from legacy systems — which often means extracting from multiple databases, flat files, spreadsheets, and occasionally systems that haven't been documented in a decade. The extraction layer needs to handle all of these sources and produce a consistent, auditable output.

Transform — where the real work happens

This is the step that separates a professional migration from a disaster. Transformation is where raw legacy data is cleansed, restructured, and enriched to meet the requirements of the new system.

Cleansing

Fixing the inconsistencies that have accumulated over years: standardising formats, correcting typos, resolving duplicates, and applying consistent naming conventions. "Co." becomes "Company." "St" becomes "Street." Three different spellings of the same supplier become one.

Mapping

Translating old codes and structures into new ones. Legacy department "101" becomes "Finance." This mapping must be maintained in lockstep with the process design — every time the target structure changes, the mapping must be updated.

Enrichment

Adding data the new system requires but the legacy system never captured. The new mandatory "warehouse zone" field. The line-item tax detail. The supplier risk rating. This is often the most time-consuming step, because you're creating data that has never existed.

Load

Push the cleansed, transformed data into the new ERP in the correct sequence. Sequencing matters: you can't load an invoice before you've loaded the customer it belongs to. You can't load a stock transaction before the product master exists. The load sequence must respect the referential integrity of the target system, and it must handle failures gracefully — identifying which records failed, why, and allowing them to be corrected and reloaded without repeating the entire batch.

Why manual migration fails

Without a repeatable, automated ETL process, you're relying on spreadsheets and manual uploads. This is dangerous for two reasons.

First, it's not repeatable. If someone fixes a data issue manually in a spreadsheet during trial migration three, that fix doesn't carry forward. They'll need to remember to apply it again for trial four, and again for the final go-live. In a migration involving millions of records, this is untenable.

Second, it lacks an audit trail. When something goes wrong — and it will — you need to know exactly what transformation was applied, when, by whom, and why. A spreadsheet passed between three people's inboxes doesn't give you that.

The lesson from Target Canada applies here directly: manual data entry is not a migration strategy. Automated, repeatable, auditable ETL is not optional — it's foundational.

The expertise gap

Even when organisations recognise that data migration is critical, they often lack the right people to execute it properly. The skills required sit at the intersection of two worlds that rarely overlap — and most organisations have people in one camp or the other, but not both.

The domain experts

Most businesses have people who understand the data intimately. The finance controller who knows why certain GL codes were used as workarounds. The warehouse manager who can explain the logic behind a product categorisation that makes no sense on paper. The procurement lead who knows which supplier records are duplicates and which are genuinely separate entities trading under similar names.

These people are invaluable. They carry years of institutional knowledge about what the data actually means, where the bodies are buried, and which records can be trusted. But they're not data engineers. Their instinct when asked to "sort out the data" is to open a spreadsheet — and we've already discussed why that doesn't scale. They can define the rules, but they can't operationalise them into a repeatable, automated migration process.

The technical staff

On the other side, some organisations have technical staff who understand databases, SQL, scripting, and ETL tooling. They can build extraction routines, write transformation logic, and automate load sequences. They have the skills to build the migration engine.

But they lack the domain context to know what the transformations should be. They don't know that department "101" should map to "Finance" rather than "Administration." They weren't in the workshop where the business decided to consolidate five product categories into two. They don't understand why a particular customer record has three slightly different addresses, or which one is correct. They're often on the fringes of process design discussions — informed after the fact, if at all — and left to interpret mapping documents that are incomplete, ambiguous, or already out of date.

The gap in the middle

The result is a capability gap at the exact point where domain knowledge and technical execution need to converge. Domain experts define rules in spreadsheets that can't be operationalised. Technical staff build pipelines based on mapping documents that don't reflect the latest process decisions. Neither group has full visibility of the other's constraints, and the migration workstream falls into the space between them.

This gap is self-reinforcing. Because domain experts can't see how their rules translate into ETL logic, they don't appreciate the impact of a late change to a mapping specification. Because technical staff aren't embedded in process design, they can't anticipate which data requirements are about to shift. Each side assumes the other has it covered, and the migration workstream drifts quietly out of alignment with the programme.

This is why data migration benefits from a dedicated team that bridges both worlds — people who can sit in process design workshops, understand the business implications of a change to the chart of accounts, and then translate that into an updated ETL specification the same day. That combination of skills is rare internally, and it's one of the reasons organisations consistently underestimate the effort, complexity, and specialist capability that data migration demands.

What good looks like

The organisations that handle data migration well do a few things differently:

They start early

Data profiling and quality assessment begin during the design phase — not after it. The migration team is in the room when process decisions are being made.

They treat data as a first-class workstream

Data migration has its own governance, plan, dedicated resources, and steering representation. It's not a sub-task of the technical team.

They establish traceability

Every process change triggers a review of the data mapping specification. When a new field becomes mandatory, the migration team knows the same day.

They bridge the expertise gap

Domain knowledge and technical migration capability sit in the same team — not in separate silos. The people building the ETL engine understand the data.

They build a robust ETL engine early

Automated extraction, transformation, and loading — repeatable, auditable, and maintained alongside the process design. Not spreadsheets.

They run trial migrations relentlessly

Not one mock go-live at the end — multiple iterations throughout, each refining the data, the process, and the timing.

They plan for trust

Day-one data quality determines whether end users trust the new system. They invest in data stewardship, reconciliation, and rapid-response teams.

The bottom line

Data migration is not a technical exercise. It's a programme-level strategic risk that needs to be governed, funded, and managed with the same rigour as process design, change management, and system integration.

If your programme treats data as someone else's problem — as a task to be squeezed into the final weeks before go-live — you are building on foundations that will crack under pressure. The examples above prove that the cost of getting it wrong isn't a few weeks of rework. It's failed go-lives, lost revenue, regulatory exposure, and in extreme cases, business failure.

The silent killer earns its name precisely because nobody hears it coming until it's too late.