The generator that mostly works¶

On the political economy of infrastructure too functional to fix and too degraded to last

The Shades Community Generator dims for the third time this season. It dims, recovers, dims again, and then holds at about three-quarters of its usual output for the rest of the evening. The lights in the counting houses flicker. The aldermen of the Merchant Quarter, whose own district is connected to a different substation, note the incident with mild irritation and carry on. The generator is still running. It ran last month. It will probably run next month. The maintenance visit last quarter found some wear, noted it, recommended a part replacement, and went on the list.

The list is long. Everything on it is for infrastructure that is still running. At the top of the list is the Dolly Sisters pumping station, which has been cycling with a fault in its control mechanism for eleven months. Below it is the Seamstresses’ Quarter transformer, which runs hot and has been doing so since summer. Below that is the Shades generator. The Patrician looks at the list each month and looks at the budget beside it and defers everything on it, because everything on it is, in the most technically accurate sense available, still working.

The Shades generator fails completely in February. It fails during the coldest week of the year. The emergency repair contract, signed under duress, costs four times what the maintenance visit would have cost. The generator is still failing when the replacement parts arrive, because they had to be sourced from a supplier in Quirm, and the winter road is slow. The Shades is on candles for eleven days. By the time the generator is restored, the disruption has produced two secondary failures in the distribution network that had been running under additional strain to compensate, and the repair budget for the quarter is exhausted.

The part replacement was on the list for nine months.

The threshold problem¶

The neglect spiral is a well-known mechanism: deferred maintenance makes infrastructure more likely to fail, and failures make deferral more tempting by depleting the budget and political capacity that could have funded prevention. But there is a specific feature of the spiral that receives less attention than the spiral itself. It is a timing problem, and it operates in the relationship between infrastructure condition and political signal.

Infrastructure generates political will for investment when it fails visibly. Visible failure creates urgency, accountability pressure, public attention, and the emergency contract budgets that preventive investment never managed to unlock. But the failure that is politically visible is already the expensive failure. It is the failure that has crossed the threshold from degradation into collapse, from manageable into emergency. By the time the political signal arrives, the cheap prevention window has closed.

The loop runs precisely because the political feedback mechanism is calibrated to emergency, not to degradation. Infrastructure that is degrading but still functional sends a weak signal. Infrastructure that has failed sends an overwhelming one. The difference in signal strength between these two states is much larger than the difference in actual condition. A generator at seventy per cent capacity and declining produces almost no political signal. The same generator at zero capacity produces immediate emergency mobilisation. The transition between these states can happen in a single cold evening, but the political response behaves as though the previous state, the functional-but- degrading state, had been fine.

This means that every budget cycle in which functional-but-degrading infrastructure competes against visibly failing infrastructure will produce the same result. The visibly failing infrastructure gets the resource. The functional-but-degrading infrastructure gets the list. Each month on the list moves the infrastructure closer to the threshold. When it crosses the threshold, it generates the political signal that the list could never produce. It is then repaired, expensively, under duress, while the things it was connected to absorb the consequences.

Individual incidents look like anomalies¶

There is a psychological dimension to the mechanism that reinforces the timing problem. A system that is degrading does not announce its condition. It produces occasional anomalies: a fault that resolves itself, a lower output for a few hours, a maintenance visit that notes wear and recommends action. Each individual anomaly has a plausible explanation that does not require interpreting it as a signal of structural decline. The generator ran hot because it was an unusually warm August. The transformer cycled strangely because of a spike in the distribution network. The pump fault resolved because the maintenance team tightened a fitting.

These explanations are often accurate for the individual incident. What they cannot capture is the pattern: the fault rate is rising, the recovery time after each fault is increasing, the maintenance visits are finding progressively more wear each time. The signal that the system is approaching failure is a statistical pattern, visible in data that is distributed across multiple maintenance records, multiple incident logs, and multiple quarters of budget reports. It requires someone to hold all of this together and assess it as a trend, rather than treating each incident as a discrete event.

In most operational environments, nobody does this. Maintenance records are filed. Incident logs are reviewed in the immediate aftermath of each incident and then archived. Budget reports are produced quarterly and compared to the previous quarter, not to the fault trend over three years. The information that the generator is deteriorating is available in principle. In practice, it exists in a form that makes the deterioration invisible to anyone making resource allocation decisions.

The engineer who carries out the maintenance visit knows. Engineers usually know. The engineer writes it on the list and moves on to the next site.

France, Winter 2022¶

The French nuclear fleet had been the foundation of European power export for decades. France operated more than fifty reactors, generated roughly seventy per cent of its electricity from nuclear, and was reliably a net exporter of power to its neighbours. The infrastructure was ageing, certainly. The reactor fleet had an average age that had been rising for years. Maintenance had been deferred in some cases, scheduled in others, and the backlog of required work had been growing.

Each individual deferral was defensible. A reactor taken offline for maintenance costs money, both in the maintenance work itself and in replacement power purchased from the grid while it is unavailable. With power prices rising across Europe, the financial incentive to keep reactors generating rather than offline for maintenance was strong. EDF’s financial position made each additional deferral attractive relative to the immediate alternative.

In 2022, the aggregate arrived. A combination of scheduled maintenance already overdue, the discovery of stress corrosion cracking in cooling circuit pipes at multiple sites, and the deferred works that had accumulated across the fleet meant that by winter, France had approximately half of its nuclear capacity offline simultaneously. A country that had been exporting power to the rest of Europe became a net importer. Emergency measures were announced. Citizens and businesses were asked to reduce consumption. The government that had been operating on the assumption that the fleet was functional-but-ageing found itself managing a power security emergency in a country whose energy infrastructure was not expected to be a vulnerability.

EDF was eventually renationalised at a cost of approximately ten billion euros, a sum that had been politically unavailable for years of incremental deterioration and which materialised, within weeks, when the aggregate cost of the deferrals became impossible to absorb. The political will that could not be found for preventive investment appeared immediately for emergency rescue.

The Shades generator, at continental scale.

Italy, September 2003¶

The largest blackout in Italian history left fifty-six million people without power for up to eighteen hours. The proximate cause was a high-voltage line in Switzerland that tripped under a tree-contact load, triggering a cascade through the interconnected European grid that Italy’s domestic system could not absorb. But Italy’s inability to absorb the cascade was not an accident of geography. It was the predictable outcome of two decades of deferred domestic generation investment.

Italy had been chronically dependent on imported power from France and Switzerland since the early 1980s. The political and regulatory environment following the 1987 post-Chernobyl referendum, which resulted in the closure of Italy’s nuclear plants, had produced a generation system structurally reliant on cross-border transmission rather than domestic capacity. The imported power was reliable, it was competitively priced, and it mostly worked. Engineers in the transmission system had been modelling the dependency risk for years. The political case for domestic generation investment had been made in technical forums and had not, consistently, been made successfully in budget forums, because the imported power continued to arrive.

Until the night it didn’t, simultaneously, for every household and business in the country, because a single external event found a system with no margin to absorb it. The cascade ran through infrastructure that had been technically functional-but-dependent for twenty years. The failure took eighteen hours. The dependency had been accumulating since before many of the people affected were born.

Bucharest’s heating network¶

Under Nicolae Ceaușescu, Romania built one of the largest district heating systems in Europe. Bucharest’s RADET network served the overwhelming majority of the city’s flats with centrally generated heat, distributed through buried pipes across the urban area. The system was designed for a planned economy in which maintenance costs were political decisions and infrastructure was permanent.

The economy changed. The maintenance infrastructure did not keep pace. By the early 2000s, the RADET network was losing a substantial fraction of its generated heat to pipe leakage before it reached consumers. Residents in poorly served districts received insufficient heat through winters; the system operated with chronic pressure instability; localised failures were frequent enough to be a regular feature of winter life in affected neighbourhoods.

And yet the network mostly worked. Heat arrived most of the time. The political case for comprehensive replacement, which would require enormous capital investment and significant disruption to the residents dependent on the system during transition, competed against the observation that the system, despite its problems, was delivering something. Each winter that passed without a complete collapse was evidence that it remained functional. Each winter that passed without investment left it marginally worse than it had been the previous year.

The city of Bucharest has been managing the RADET network in its degraded state for three decades. The system has cycled through emergency interventions and partial upgrades. It has never received the comprehensive investment that would have been cheaper, in aggregate, than three decades of emergency maintenance, energy waste, and the costs imposed on residents by unreliable supply. It has never generated the political signal, a single catastrophic failure of the whole system at once, that would have unlocked that investment. It has been, throughout, mostly working.

The budget that cannot see the future¶

The timing problem in the neglect spiral has a structural counterpart in how infrastructure budgets are constructed and defended. Preventive maintenance expenditure is discretionary: the infrastructure is running, the maintenance is scheduled rather than urgent, and in any budget competition it will lose to expenditure on things that are visibly and urgently failing. Reactive maintenance expenditure, by contrast, is mandatory: the infrastructure has failed, services are disrupted, there is no alternative to spending.

The cost difference between these two categories, for most infrastructure types, is substantial. Emergency engineering contracts carry premiums that scheduled work does not. Parts sourced under urgency cost more than parts ordered in advance. Failures in one system generate secondary failures in connected systems that would not have occurred if the original failure had been prevented. The social and economic costs of disruption, which preventive maintenance avoids, are not borne by the maintenance budget at all: they fall on the businesses and households that experience the failure.

A budget that can see only the costs it directly controls will always find that preventive maintenance is expensive and deferrable. A budget that can see the total cost of each deferral, including the emergency contract premium, the secondary failure costs, and the economic disruption cost, would reach a different conclusion about which expenditure is discretionary. The information required to make that second calculation exists, in principle, in the engineering records, the incident logs, and the economic impact assessments that follow significant failures. It is rarely assembled into a form that speaks to budget decisions. The people who understand the condition of the infrastructure are engineers. The people making the budget decisions are not, in most cases, engineers. The translation between technical condition and financial risk is a function that most organisations perform badly if at all.

Breaking the loop¶

The neglect spiral breaks in one reliable way: catastrophic visible failure. The generator that mostly works fails completely, in winter, visibly, with consequences that cannot be attributed to anything other than the infrastructure it represents. The emergency response unlocks resources that the maintenance budget could never justify. The infrastructure is repaired or replaced, expensively, under duress.

After the repair, one of two things happens. Either the lesson is extracted and invested in structural change: better monitoring systems, different maintenance funding models, longer-cycle infrastructure assessment, a political framework for preventive investment that does not require a visible crisis to justify it. Or the emergency passes, the pressure on resources returns, the new infrastructure is entered on the maintenance list, and the spiral begins again. The second outcome is considerably more common than the first.

The reason is that the catastrophic failure did not change the underlying structure of the feedback. The political signal still arrives at the emergency threshold, not at the degradation threshold. The budget process still treats preventive maintenance as discretionary and reactive maintenance as mandatory. The information gap between engineering knowledge and budget decision still exists. The catastrophe produced a momentary alignment of urgency, resource, and political will. That alignment did not persist after the emergency.

In Ankh-Morpork, the Shades generator is repaired. The maintenance contract is renewed. The list is reset. Next winter, the Dolly Sisters pumping station is at the top of it.

The list is always long. Everything on it is still working.