In the traditional primary/backup DNS model, a data center outage is detected, the operations team receives an alert, the zone record is updated, the service is reloaded and clients wait for the new DNS answer to propagate. This chain looks straightforward in a runbook; in a real incident, the delays introduced by decision-making, access control, approval and execution stretch RTO considerably.
In many organizations, health checking and DNS operate as separate systems. A monitoring tool sees that a DC is unreachable, but the DNS server continues to answer with the same IP addresses. The bridge between them is typically a script, a manual runbook or a separate automation layer. That gap becomes the weakest link at the moment of failover.
Failback carries equal risk. If a DC bounces in and out quickly, the DNS answer can flip repeatedly — clients are scattered across different data centers and traffic may return before state synchronization is complete. A simple "remove when down, re-add when up" logic is not enough.
The correct model evaluates DC health through boolean scenario logic, reduces flap risk with consecutive success/failure thresholds and makes the DNS response the natural output of that decision. The same model must also cover manual cutover for planned maintenance, a fail-safe answer when all DCs are unhealthy and DR conditions.
TR7 DC Failover delivers this model: it automatically refreshes the DNS answer when a DC health scenario changes and ties the entire failover process to DNS TTL and operator-defined health parameters.
TR7 implements DC failover decisions through health scenarios, boolean condition logic, flap protection and a manual cutover mechanism.
When the health check state for a DC changes, the associated scenario is re-evaluated. If the scenario result changes, the relevant DNS records are regenerated and the unhealthy DC is removed from the answer.
Condition groups combine with AND logic; groups combine with OR logic. A negative condition can also be defined for each health check, enabling inverse scenarios such as "activate this record when this check is unhealthy."
While a DC is in transition, the previous evaluation result can be preserved. This behavior helps prevent short-lived up/down fluctuations from continuously changing the DNS answer.
During planned maintenance, an operator can take a DC offline with maintenance mode. Even if the DC appears healthy, it can be excluded from the DNS answer so traffic is directed to another DC.
DC Failover is the GTM failover layer that automatically manages DNS responses across multiple data centers based on health state.
TR7 can evaluate DC records as a priority chain ordered by array position. When the primary DC is unhealthy, the secondary takes over; when the secondary is also unhealthy, the tertiary steps in — and longer chains are equally supported. The code model is not theoretically limited to two endpoints. This structure simplifies multi-stage continuity design in financial, government and large-scale SaaS environments.
TR7 can evaluate health signals at DC level: wanAccess, lanAccess, access, internet and maintenanceMode. WAN reachability, LAN reachability, general access state, internet access and manual maintenance status are each modeled separately. A DC is therefore assessed across multiple access dimensions, not just a single ping result. The DNS answer reflects a more realistic picture of DC health.
requiredSuccess and requiredFailure determine how many consecutive results are needed before a DC is declared up or down. This model prevents unnecessary DNS changes caused by transient packet loss, brief network interruptions or momentary service slowdowns. Operators can use tighter thresholds for critical services and more tolerant ones for noisier links. RTO is planned together with these thresholds and the check interval.
noResponse mode keeps a passive DC silent under normal conditions. onlyNew mode can prevent a DC that has been down for a long time from answering with stale data when it comes back up. This behavior ensures that during failover, only DCs in the correct state produce DNS answers — not merely those that are reachable. It is an important protection layer in environments where stale-data risk is a concern.
Per-record DR mode allows specific records to become active only when a DR condition is met. The drCond scenario or drIfNoRecords flag triggers the DR record when primary and secondary targets are exhausted. This model keeps remote disaster-recovery IP addresses out of normal DNS answers while holding them on standby for critical situations. The DR strategy becomes controlled at the DNS level.
If no DC is healthy, a response can be generated from the fallbackRecords array. These records can point to a maintenance page, a static emergency endpoint or an alternative recovery service. FailSafe behavior ensures DNS produces a controlled last-resort answer instead of returning nothing. Operators define these records according to their organization's crisis plan.
TR7 can store local health check and scenario state data at the file level. After a restart or service reload, the previous state is restored so evaluation does not begin from scratch. This approach reduces unnecessary oscillation in failover decisions during a transient restart. It is especially useful for maintaining consistency during maintenance operations that restart the GTM service.
wanAccess and lanAccess target lists can be defined per DC. Multiple access targets give a more accurate picture of a DC's external and internal reachability. A transient issue with a single target does not necessarily mark the entire DC as down. This structure enables more comprehensive modeling of data center health.
When maintenanceMode is activated, the relevant DC is consciously taken offline. This is useful during patches, maintenance windows, migrations or controlled DR tests. The operator can remove the DC from the DNS answer — even when it is healthy — and redirect traffic to another DC. When maintenance is complete, the mode is disabled and normal evaluation resumes.
DC state can be expressed as ok, noInternet, noAccess, noWan or noLan. This classification shows which access dimension is problematic rather than just saying "down." Operations teams can distinguish internet egress, WAN reachability and LAN reachability issues more quickly. The reason behind a failover decision becomes more readable.
When the health check state changes, the associated scenario can be re-evaluated immediately. Records bound to the scenario enter the dynamic config regeneration pipeline and the DNS answer is updated. This behavior reduces the need for manual zone edits or external scripts. Changes are grouped by a short debounce to prevent unnecessary repeated regeneration.
In an HA cluster scenario, DNS config writes are controlled through the master role. If the master node fails, the standby node can take over the role after a defined safety period. This model helps prevent two nodes from producing different DNS configs simultaneously. GTM behavior stays aligned with cluster state.
A DC failover operation is planned together with the check interval, consecutive thresholds, HC ID structure, scenario conditions, regeneration pipeline and RTO parameters.
accessPeriod defines how frequently DC health checks run. It can be configured in seconds or minutes. A shorter period provides faster detection; a longer period gives a quieter, lower-noise evaluation.
requiredSuccess defines how many consecutive successes are needed before a DC is considered up. requiredFailure defines how many consecutive failures are needed before a DC is considered down. These two values set the balance between failover speed and flap protection.
wanAccess and lanAccess lists define the access targets for a DC. This allows evaluation of whether a DC is reachable not only from the outside but also from the internal network. The distinction is particularly important in inter-DC and hybrid routing scenarios.
Automatic HC records follow the format `auto|
Conditions within a group combine with AND logic; groups combine with OR logic. This structure supports a wide range of decision models, from simple primary-down checks to complex multi-dimension DC health scenarios. Operators are not limited to a single check result.
When HC state changes, the scenario is re-evaluated, bound records are identified and dynamic config regeneration is triggered. The pipeline runs with a short debounce so rapid successive changes are merged into a single regeneration pass. The DNS answer is re-rendered according to the current health state.
RTO depends on accessPeriod, requiredFailure count, regeneration debounce duration and client DNS TTL behavior. Rather than claiming a single fixed time, the failover window should be planned to match service requirements. Critical services benefit from shorter TTL and more frequent checks.
DC1 is defined as primary and DC2 as passive standby. When the internet or access scenario for DC1 fails, DC1 records are removed from the DNS answer and DC2 begins responding.
Financial institutions can build a DC1 → DC2 → DC3 sequential failover chain. Each tier is evaluated by its own health scenario and an unhealthy DC is automatically removed from the DNS answer.
At the maintenance window, DC1 is placed into maintenance mode and traffic is directed to DC2. When maintenance is complete, maintenance mode is disabled and normal health evaluation resumes.
When primary and secondary DCs are both unhealthy, DR mode records can be activated. In this scenario the remote disaster-recovery site remains passive under normal conditions and is added to the DNS answer only when the defined conditions are met.
When a DC that has been down for an extended period comes back online, it may not be desirable for it to respond with outdated data. The onlyNew behavior keeps an out-of-date DC passive, reducing the risk of publishing stale records.
The nearest DC is first selected by country or region, then if that DC becomes unhealthy, the standby DC is activated. This model combines performance-based steering with continuity decisions in a single GTM configuration.
Health scenario, DNS response and manual cutover unified in a single decision pipeline. Let's walk through a live setup with your own DC architecture.