The Safety Systems That Prevent Irrigation Disasters

What Can Go Wrong

Real irrigation disasters happen more often than people think. They rarely make the news, but facility managers, golf course superintendents, and property owners know them all too well:

A stuck solenoid valve runs a zone for 72 hours straight, saturating the soil so deeply that water seeps into a nearby basement. Thousands of gallons wasted, plus structural damage.
A controller firmware crash leaves valves open indefinitely. Nobody notices until the water bill arrives—or worse, until the landscape is flooded.
A broken field wire makes a valve appear off in the controller software when it's actually stuck open. The dashboard says everything is fine while a zone quietly runs around the clock.
A network outage disconnects the smart controller from the cloud, and with it, all scheduling intelligence. Some systems fail open (valves stay in their last state) rather than fail safe.

These aren't hypothetical edge cases. They're the reason insurance companies and facility managers are cautious about "smart" irrigation. The smarter the system, the more important the safety net.

That's why we designed five independent safety layers, each capable of preventing a disaster on its own, even if every other layer fails.

independent safety layers protect against every failure mode

Layer 1: Current Sensing — Detecting Stuck and Broken Valves

Irrigation solenoid valves draw a characteristic electrical current when energized—typically 200–400 mA at 24VAC. By monitoring the return current on the solenoid circuit with a precision ADC (analog-to-digital converter), the controller can detect two critical fault conditions in real time.

Overcurrent (Short Circuit / Stuck Valve)

If the measured current exceeds the expected threshold, it indicates a short circuit in the field wiring or a solenoid that has mechanically stuck in a way that draws excessive power. The system responds with an immediate emergency shutoff—all zones turn off within milliseconds. To prevent false triggers from electrical noise or transient spikes, three consecutive overcurrent readings are required before the shutoff engages.

Undercurrent (Broken Wire / Valve Not Opening)

The inverse problem is equally dangerous. If a zone is commanded ON but draws no current, the field wire is broken or the solenoid has failed. The valve isn't opening, which means the zone isn't getting water—but from the software's perspective, everything looks normal.

The system uses a retry cycle to confirm the fault: turn all zones off, wait 300ms for solenoid de-energization, re-enable the suspected zone, and re-check the current. A confirmed undercurrent reading after retry triggers an alert to the operator. This doesn't trigger a full system shutoff (since other zones may still be working correctly), but it flags the issue immediately so it can be addressed.

Inrush grace period: The current sensing system includes a configurable grace period (default 1 second) after a valve opens. Solenoid inrush current can spike 3–5× above steady-state as the plunger moves and the magnetic field establishes. Without this grace period, every single valve activation would trigger a false overcurrent alarm. The system waits for the current to stabilize before it starts monitoring.

Layer 2: Communication Watchdog — What Happens When the Network Fails

The local controller monitors its communication link with the edge AI computer that runs the optimization engine. These two devices communicate over a local connection—not the internet—but the link can still fail due to cable issues, software crashes on the edge computer, or power supply problems.

If no valid command is received within 5 seconds, the controller initiates a fail-safe sequence:

All local relay valves shut off immediately. No zone continues running without active supervision from the AI coordinator.
If two-wire decoders are connected (common on large golf course and commercial systems), a broadcast reset command is sent to all field decoders, ensuring even remote valves close.
The system saves its current state to flash memory so it can recover gracefully after the communication link is restored.
New zone activation requests are rejected. However, OFF commands are always accepted—safety first. You can always turn things off, even during a fault condition.

This means a network failure, an AI crash, or a power loss at the edge computer results in a clean shutdown, not a runaway system. The irrigation stops and waits for supervision to resume.

Layer 3: Escalating Software Watchdog

Even the local controller's firmware can hang. An unexpected interrupt, a memory corruption, or an unhandled edge case in the real-time loop could cause the main program to freeze. Inspired by the OpenSprinkler open-source controller project, this layer implements a graduated response to firmware hangs:

Normal operation: The main control loop "feeds" (resets) the software watchdog every few seconds, confirming the firmware is running correctly.
Hang detection: An independent timer checks every 8 seconds whether the watchdog has been fed. If 15 consecutive checks fail—meaning 120 seconds have passed without a feed—the system concludes the firmware has hung.
Graceful response: Before resetting, the system saves all current relay states to non-volatile storage. Then it performs a software reset.
Recovery: On reboot, the saved state allows the system to notify the AI coordinator which zones were running when the crash occurred. Brief hangs (under 2 minutes) are tolerated—the system recovers automatically without operator intervention.

The 120-second threshold is deliberately generous. Momentary slowdowns from garbage collection, flash writes, or burst network traffic shouldn't trigger a reset. But a genuine hang—a firmware crash that locks the control loop—is caught and recovered within two minutes.

Layer 4: Hardware Watchdog (Last Resort)

What if the software watchdog itself hangs? If the firmware crash is severe enough to lock up the entire async runtime—including the timer that monitors the software watchdog—then software-based recovery is impossible.

This is where the hardware watchdog timer (RWDT) takes over. It runs on dedicated silicon, completely independent of all software. The main firmware must reset this hardware timer every 30 seconds. If it doesn't—for any reason—the hardware watchdog resets the entire controller.

This is the ultimate backstop. It cannot be disabled by software bugs because it runs in dedicated hardware that the CPU cannot override once armed. Even a complete firmware crash that locks up the CPU results in a full hardware reset within 30 seconds. After reset, the controller boots into its safe default state (all valves off) and re-establishes communication with the AI coordinator.

Layer 5: Max-On Timers

Every zone has a configurable maximum on-time (default: 30 minutes). A dedicated relay control task checks every second whether any zone has exceeded its limit. Zones that have timed out are forced off, regardless of what the AI optimizer or schedule says.

This is the simplest safety mechanism, but possibly the most important. Even if every other layer fails simultaneously—current sensing is bypassed, both watchdogs are hung, the communication link is down—no zone can physically run longer than its max-on time.

For most residential and commercial zones, 30 minutes is far more water than any zone should ever receive in a single run. For large rotors on athletic fields, the limit can be extended. But there is always a hard ceiling, enforced independently of all other logic.

All Five Layers at a Glance

Layer	Monitors	Timeout	Response	Protects Against
Current Sensing	Solenoid current	3 readings (~300ms)	Emergency shutoff	Stuck/broken valves
Comm Watchdog	Edge computer link	5 seconds	All zones off	Network/AI failure
Software Watchdog	Main loop feed	120 seconds	Software reset	Firmware hangs
Hardware Watchdog	CPU execution	30 seconds	Hardware reset	Complete lockup
Max-On Timer	Zone duration	Configurable (30 min)	Force zone off	Any software bug

Notice that these layers are independent. They don't share code, don't share timers, and don't depend on each other. A failure in one layer doesn't compromise the others. This is the same defense-in-depth philosophy used in industrial safety systems and aviation.

Non-Volatile State Persistence

One more critical feature ties all of these layers together: the system maintains a dual-bank flash storage system that persists relay states across unexpected reboots.

The dual-bank design alternates writes between two flash memory regions, each protected with CRC32 checksums. If a power loss corrupts one bank mid-write, the other bank still contains the last valid state. After a watchdog reset or power cycle:

The system knows exactly which zones were running at the time of the failure, and for how long.
It can notify the AI coordinator of interrupted sessions so the optimizer can account for water that was partially delivered.
It tracks the reboot cause (watchdog timeout, brownout detection, normal power-on) for diagnostics. Repeated watchdog resets indicate a firmware bug that needs investigation.

This means even a total power failure is recoverable. The system doesn't lose track of what it was doing. When power returns, it boots into a safe state (all valves off), reads its last known state from flash, reports the interruption to the AI coordinator, and resumes normal operation.

Why This Matters for Facilities

For golf courses, commercial properties, and municipal irrigation systems, irrigation failures have real, tangible costs. A single stuck valve incident can mean:

Water waste: Thousands of gallons at commercial water rates
Landscape damage: Waterlogged turf, root rot, fungal disease from oversaturation
Property damage: Flooding, erosion, water intrusion into structures
Liability: Slip hazards, damage to neighboring properties, regulatory fines in drought-restricted areas

Traditional "smart" controllers often add complexity without adding safety. A WiFi-connected timer that loses its connection might keep running its last schedule indefinitely. A cloud-dependent system might go dark during an outage. These failure modes are actually worse than a simple mechanical timer, which at least fails predictably.

The bottom line: A single stuck valve incident can cause thousands of dollars in water waste, landscape damage, and property damage. These five safety layers ensure that the worst case is a brief interruption in irrigation—never a flood. The system is designed so that every failure mode results in valves closing, not staying open.

This is what separates industrial-grade irrigation control from consumer-grade "smart" products. It's not about adding more features—it's about ensuring that when something goes wrong (and eventually, something always does), the system fails safe.

Learn more about our approach

Physics-based irrigation optimization with industrial-grade safety systems.

Read the Technical Deep-Dive See Water Savings

The Safety Systems That Prevent Irrigation Disasters

What Can Go Wrong

Layer 1: Current Sensing — Detecting Stuck and Broken Valves

Overcurrent (Short Circuit / Stuck Valve)

Undercurrent (Broken Wire / Valve Not Opening)

Layer 2: Communication Watchdog — What Happens When the Network Fails

Layer 3: Escalating Software Watchdog

Layer 4: Hardware Watchdog (Last Resort)

Layer 5: Max-On Timers

All Five Layers at a Glance

Non-Volatile State Persistence

Why This Matters for Facilities

Learn more about our approach

Further Reading