Backend health in the classic load balancing world is always observed from the protocol layer. The ADC sends an HTTP request, receives 200 OK, adds the server to the pool. Or it runs a TCP probe and counts the server healthy if the port is open. Maybe it even validates content: 'does the response contain this word?'
This approach looks at the outer shell, not inside. It doesn't see the disk filling to 98%, RAM falling into swap and performance collapsing, the GC cycle stretching, or the database connection pool filling up. The server still returns HTTP 200 — but the user request waits seconds for a response.
Worse: the protocol probe is observed locally; the server doesn't report its own state. The ADC sees only what the probe sees. Resource pressure on the server or a critical-process restart loop doesn't factor into routing.
ETM Server Telemetry closes that gap. The agent on the server continuously reports its own internal state to the ADC. Routing decisions follow live data from the inside, not just the protocol probe from the outside.
TR7 ETM models server health as a live input to ADC routing. This approach unifies client security and server observability on the same platform.
The agent on the server measures CPU load, memory usage, swap pressure, disk IO saturation, and network throughput in real time. Data flows to ADC at second-level intervals; routing decisions feed from this fresh data.
Health of critical application processes, restart loops, garbage collection duration, open file handle count, thread pool state, and application-specific metrics (e.g., database connection pool) are observed. Unhealthy applications don't receive traffic.
The ADC load balancing algorithm can run against the ETM-derived health score. A server with high CPU receives less traffic; an IO-saturated server is taken out of new connections; a server falling into swap is automatically pulled from the pool.
The same ETM agent used for client security runs on servers too. Operations uses one agent, one management plane, one telemetry model. No separate backend observability tool needs deployment.
Server telemetry is not just observability — it is the live data source for ADC routing decisions.
Core count, sustained load average, instantaneous utilization percentage, and thermal state are measured in real time. Per-core anomalies or thermal throttling appear in routing decisions.
Total and available memory, swap usage, page fault rate, and OOM killer activity are observed. A server falling into swap is automatically moved to a lower-priority pool; one showing OOM risk is pulled from traffic.
Disk fill ratio, IO wait time, IOPS count, queue depth, and SMART error counts are observed. When disk fill crosses a threshold or IO saturation is high, the server is pulled from traffic.
Whether defined critical processes are running, last restart time, and restart loop count are observed. A continuously restarting application is pulled from traffic; the pool is flagged for operator intervention.
Application-specific metrics on the server — application runtime metrics (GC time, event loop lag), database connection pool saturation, queue depth — are pulled through the agent. ADC can include these metrics in routing decisions.
Network interface throughput, packet loss rate, retransmit count, and active TCP connection count are observed. A network-saturated server is automatically flagged with reduced weight.
Validity duration of server TLS certificates, hash integrity of critical configuration files, and changes in certificate stores are observed. Certificates nearing expiry trigger operator alerts and can also factor into routing policy.
Drift from server configuration baseline is caught instantly. Unauthorized configuration change, unexpected user account creation, or new service start arrives at ETM as an event. The signal feeds both security and operational decisions.
ETM telemetry can be converted to a 0–100 health score per server. ADC load balancing algorithms (round-robin, least-conn, weighted least-conn) use this score as a weight. Servers with declining scores receive less traffic; servers falling below threshold leave the pool.
Disk fill growth rate, memory consumption trends, restart frequency, and similar signals can be interpreted predictively. While the server has not yet failed, traffic is softly drained from one whose failure probability is rising.
Server telemetry is the live data source for ADC routing intelligence — integration, scalability, and audit included.
Telemetry flows to the ADC control plane periodically. The load balancing algorithm can run against the ETM score; custom routing decisions can trigger off ETM events. Operators can bind ETM metrics to policy language without writing custom scripts.
Protocol-based active health probes (HTTP, TCP, Oracle) continue to operate; ETM telemetry adds an 'inside view' layer alongside that probe. Routing decisions evaluate both sources: 'is it responding?' (protocol probe) + 'is it healthy?' (ETM).
Which metrics are collected at what period is configurable by server role. CPU/RAM/IO for web servers, connection pool and query latency for database servers, GC and thread pool for application servers can have different priority profiles.
The agent on the server is designed for minimal resource usage. It runs without measurable performance degradation even on high-throughput backends. Metric collection intensity is configurable.
Telemetry can be streamed to enterprise monitoring and observability platforms. When the organization wants to use its standard observability stack instead of ETM's own management UI, the data flow is supported.
Thousands of servers can be observed from a single TR7 cluster. In multi-region setups, Central Management presents fleets across regions through one console.
When the application backend's database connection pool approaches saturation, ETM signals ADC. ADC gradually reduces weight on that server; new connections route elsewhere. Users don't see timeouts; the backend exhales gradually.
When a server's disk fill passes 95% due to backup or log accumulation, ETM raises an event. The server is automatically pulled from the pool; the operator is flagged for intervention. Full disk and service collapse are prevented.
A server with continuously rising memory and high OOM risk can be moved to lower weight by ETM before it crashes. Traffic shifts smoothly to other servers; the problem is solved before becoming an incident.
When a TLS certificate on the server has fewer than 30 days remaining, ETM notifies. An operator alert is raised; until the certificate is renewed, the server may not receive critical traffic, or the alarm can be escalated. The risk of surprise certificate failure disappears.
Let's see ETM Server Telemetry in your own backend — a deployment session over a pilot server group.