Introduction
Load balancing is the backbone of modern application delivery. When thousands—or millions—of users access your application simultaneously, a single server cannot handle the load. Load balancers distribute incoming traffic across multiple backend servers, ensuring high availability, optimal performance, and fault tolerance.
But how does a load balancer decide which server should handle each request? The answer lies in load balancing algorithms. Choosing the right algorithm can mean the difference between a responsive application and one plagued by timeouts and uneven server loads.
This guide explores load balancing algorithms in two categories: distribution algorithms that determine how traffic is spread across servers, and persistence methods that ensure session continuity for stateful applications.
Why Algorithm Choice Matters
The right load balancing algorithm directly impacts application performance, user experience, and infrastructure efficiency:
Response time improvement with optimal algorithm selection vs basic round robin
NGINX Performance StudyEnterprise availability requirement - only 52 minutes downtime per year
Industry Standard SLACapacity increase with intelligent distribution vs single server
Load Balancing Best PracticesAlgorithm Categories
Load balancing algorithms fall into two main categories, each serving different architectural needs:
Distribution Algorithms
Determine how traffic is spread across servers—sequentially, randomly, or based on real-time server metrics like connections or response time.
Persistence Methods
Ensure session continuity by routing requests from the same client, URI, or user ID to the same server consistently.
Performance-Based
Advanced algorithms that consider server health metrics—response time, queue depth, connection errors—for optimal routing decisions.
Hybrid Approaches
Combine distribution with persistence, or use multi-criteria selection (like Fastest+) for sophisticated load balancing scenarios.
Distribution Algorithms
Distribution algorithms focus on spreading traffic across your server pool. The choice depends on your server infrastructure, request characteristics, and performance requirements.
Round Robin
Round Robin is the simplest and most widely deployed load balancing algorithm. It works exactly as its name suggests: requests are distributed to servers in a circular, sequential order. The first request goes to Server 1, the second to Server 2, and so on. After reaching the last server, the cycle starts over.
This algorithm assumes all servers have equal capacity and all requests require similar processing power. It requires no state tracking beyond knowing which server is next in the rotation, making it extremely efficient with minimal computational overhead.
Round Robin excels in homogeneous environments where servers have identical specifications and requests are relatively uniform—such as serving static content, simple API endpoints, or stateless microservices.
Round Robin: At a Glance
| Aspect | Details |
|---|---|
| How it works | Sequential distribution: Server 1 → Server 2 → Server 3 → repeat |
| Best for | Homogeneous servers, uniform request loads, stateless applications |
| Strengths | Simple, predictable, zero computational overhead, easy to debug |
| Weaknesses | Ignores server load and capacity differences |
| Use cases | Static content, CDN edge nodes, stateless APIs |
Weighted Round Robin
Weighted Round Robin extends the basic algorithm by assigning a weight to each server based on its capacity. Servers with higher weights receive proportionally more traffic. If Server A has weight 3 and Server B has weight 1, Server A handles three requests for every one that Server B handles.
This algorithm is essential for heterogeneous server environments. Organizations often run a mix of hardware—powerful new servers alongside older machines, or cloud instances with different vCPU counts. Weighted Round Robin ensures that a 16-core server handles more traffic than a 4-core server.
Weights are typically configured based on CPU cores, memory, or benchmark testing. While more flexible than simple Round Robin, this algorithm still does not account for real-time server load—a heavily weighted server under high load will continue receiving traffic based on its weight, not its current capacity.
Start with weights proportional to CPU cores or memory. For example, if Server A has 8 cores and Server B has 4 cores, assign weights 8 and 4 (or 2 and 1). Monitor and adjust based on actual performance metrics—throughput, response times, and error rates.
Least Connection
Least Connection takes a dynamic approach: each new request goes to the server with the fewest active connections at that moment. Unlike Round Robin, this algorithm adapts to real-time conditions—if one server is processing many slow requests, new requests are routed to less busy servers.
This algorithm is particularly recommended for servers handling long session times. Database connections (SQL), directory services (LDAP), and applications with persistent connections benefit significantly from Least Connection distribution.
The load balancer tracks active connections to each backend server. When a new request arrives, it queries this connection table and selects the server with the lowest count. This small overhead is negligible compared to the performance benefits for connection-heavy workloads.
Least Connection: At a Glance
| Aspect | Details |
|---|---|
| How it works | Routes to server with fewest active connections |
| Best for | Long session times, persistent connections, database workloads |
| Strengths | Adapts to real-time load, prevents server overload, handles slow requests gracefully |
| Weaknesses | Slight overhead for connection tracking |
| Use cases | SQL databases, LDAP directories, WebSocket applications, APIs with long-running requests |
First
The First algorithm takes a unique approach: it sends all traffic to the first server in the pool until that server reaches its maximum connection limit. Only then does traffic flow to the next server.
This algorithm is useful for active-passive configurations where you want one primary server to handle all load while others remain on standby. It's also valuable for licensing scenarios where you want to maximize utilization of a single licensed server before engaging additional capacity.
First provides predictable behavior and simplifies troubleshooting since you know exactly which server is handling traffic. However, it doesn't provide load distribution benefits and relies entirely on max connection limits for failover.
First: At a Glance
| Aspect | Details |
|---|---|
| How it works | First server receives all load until max connections reached |
| Best for | Active-passive setups, license optimization, predictable routing |
| Strengths | Simple, predictable, maximizes single server utilization |
| Weaknesses | No load distribution, depends on max connection configuration |
| Use cases | Primary-backup configurations, licensed software, capacity overflow scenarios |
Random
The Random algorithm selects servers randomly for each incoming request. However, unlike pure randomization, this implementation considers both server weights and response times in its selection probability.
This weighted random approach provides statistical load distribution while avoiding the predictable patterns of Round Robin. Over time, servers receive traffic proportional to their weights, but the random element prevents synchronized request patterns that can cause periodic load spikes.
Random selection is particularly effective in large server pools where the law of large numbers ensures even distribution. It's also useful when you want to avoid the "thundering herd" problem where multiple clients simultaneously target the same server.
Random: At a Glance
| Aspect | Details |
|---|---|
| How it works | Random server selection considering weight and response time |
| Best for | Large server pools, avoiding synchronized patterns |
| Strengths | Prevents thundering herd, statistically even distribution, considers performance |
| Weaknesses | Less predictable than Round Robin, may have short-term imbalances |
| Use cases | High-traffic applications, large clusters, cache servers |
Fastest
The Fastest algorithm routes requests to the server with the best response time. The load balancer continuously monitors server performance and directs traffic to whichever server is currently responding most quickly.
This approach optimizes for user experience by ensuring requests go to the most responsive server. It automatically adapts to changing conditions—if a server becomes slow due to high CPU, memory pressure, or external dependencies, traffic shifts to faster alternatives.
Fastest is ideal for latency-sensitive applications where response time directly impacts user experience or business metrics. E-commerce checkout flows, real-time APIs, and interactive applications all benefit from response-time-based routing.
Fastest+
Fastest+ is the most sophisticated algorithm, offering two-tier optimization with configurable criteria. You select a primary metric (Opt-1) for server selection, and a secondary metric (Opt-2) that breaks ties when multiple servers have equal primary values.
Available optimization criteria include: Least Response Time, Least Connection Time, Least Queue Time, Least Queues, Least Connection Error, Least Aborted Connections, and Least Used Connections. This flexibility allows fine-tuned optimization for your specific workload characteristics.
For example, you might configure Opt-1 as "Least Response Time" and Opt-2 as "Least Connection Error". The algorithm first selects servers with the best response times, then among those, chooses the one with fewest connection errors. This multi-criteria approach handles complex production scenarios where single metrics are insufficient.
Fastest+ Optimization Options
| Option | Description | Best For |
|---|---|---|
| Least Response Time | Server responding fastest to requests | Latency-sensitive applications |
| Least Connection Time | Server establishing connections fastest | High connection churn workloads |
| Least Queue Time | Server with shortest request queue wait | Bursty traffic patterns |
| Least Queues | Server with fewest queued requests | Avoiding request backlogs |
| Least Connection Error | Server with fewest failed connections | Reliability-critical applications |
| Least Aborted Connections | Server with fewest client disconnects | Long-running request workloads |
| Least Used Connections | Server with lowest connection utilization | Connection-pooled applications |
Fastest+ uses the secondary criterion (Opt-2) only when multiple servers tie on the primary criterion (Opt-1). This ensures optimal selection even in homogeneous environments where servers often have similar performance characteristics.
Persistence Methods (Self-Persistent)
Persistence methods ensure that related requests from the same client, session, or context always reach the same backend server. This is essential for stateful applications that store session data locally rather than in a shared store.
Source (IP Persistence)
Source persistence uses a hash of the client's source IP address for server selection. The hash value is combined with server weights to determine routing. The same client IP always produces the same hash, ensuring consistent routing to the same server.
This method provides session persistence without requiring cookies or application-level changes. All requests from a specific IP address go to the same server, maintaining any session state stored on that server.
Source persistence has limitations with NAT environments where multiple users share an IP address, and with mobile users who may change IP addresses. For these scenarios, application-layer persistence methods (URI, URL Param, HDR) provide better results.
URI (Path Persistence)
URI persistence hashes the request URI path to determine server routing. The URI text up to a specified length (or until the '?' character if query parameters exist) is hashed and combined with server weights. Same URIs always route to the same server.
Configuration options include URI character length and URI depth (number of path segments to consider). For example, with depth 2, both '/api/users/123' and '/api/users/456' would hash the same '/api/users' prefix.
This method is excellent for caching scenarios where you want all requests for the same resource to hit the same server, maximizing cache efficiency. It's also useful for sharded backends where different URI patterns map to different data partitions.
URL Param (Parameter Persistence)
URL Param persistence extracts a specified parameter from the URL (or POST body) and uses its value for server routing. This is typically used to track user IDs, session tokens, or other application-specific identifiers. Same parameter values always route to the same server.
You configure the URL parameter name to extract and optionally enable POST parameter checking for form submissions. This provides application-aware persistence that follows user sessions regardless of IP address changes.
This method is ideal for applications that embed session or user identifiers in URLs or form data. It provides more reliable persistence than IP-based methods for mobile users or those behind NAT.
HDR (Header Persistence)
HDR persistence examines a specified HTTP header in each request and routes based on its content. Requests with the same header value always go to the same server. You configure which header name to inspect.
Common use cases include routing based on custom session headers, API keys, tenant identifiers in multi-tenant applications, or JWT tokens. This provides maximum flexibility for applications that manage their own session identifiers.
HDR persistence is particularly valuable for API-first architectures and microservices where session state is managed through headers rather than cookies. It integrates smoothly with token-based authentication systems.
Hash (Advanced Custom Persistence)
Hash persistence is the most powerful and flexible method, allowing you to build custom persistence keys from virtually any element in the traffic flow. The load balancer maintains a hash table (up to 3 million entries by default) mapping custom key values to backend servers, with configurable expiration (default 7 days).
The hash key can be constructed from hundreds of available variables: client IP and port, timestamps, SSL certificate fields, frontend information, URL path and method, HTTP headers, request body content, WAF processing results, and many more. You can combine multiple variables and apply transformation functions to create precisely the persistence logic your application requires.
For example, you could create a hash key that: extracts the country from the client IP, checks if it's in a specific list, then combines this with the username from the SSL client certificate. All requests producing the same hash value from this combination—meaning users from the same region with the same certificate identity—will always be directed to the same backend server. This provides extremely granular persistence control while maintaining application session state. This level of customization enables persistence scenarios that other load balancers simply cannot achieve, making it one of TR7's differentiating capabilities.
Hash Key Building Blocks
The hash key can be constructed from any combination of these traffic elements:
| Category | Available Variables | Example Use Case |
|---|---|---|
| Network Layer | Client IP, Client Port, Server IP, Server Port | Geo-based routing, network segment affinity |
| SSL/TLS | Certificate CN, Certificate DN, SNI, Cipher Suite | Client certificate-based routing, mTLS scenarios |
| HTTP Request | Method, Path, URL, Query Parameters, Host Header | Content-based routing, API versioning |
| HTTP Headers | Any header value (Authorization, X-Tenant-ID, etc.) | Multi-tenant routing, API key affinity |
| Request Body | POST parameters, JSON fields, Form data | Transaction-based persistence |
| Context | Time, Date, Frontend name, WAF decision, GeoIP country | Time-based routing, compliance routing |
Hash keys support functions for transformation: string manipulation (substring, regex), encoding (base64, URL encode), lookups (GeoIP country, ASN), and conditional logic. Combine these to build complex persistence rules. For example: 'If client is from EU countries AND has valid client certificate, build hash key from certificate CN; otherwise build from Authorization header'—ensuring requests matching the same conditions always reach the same backend server.
Persistence Methods Comparison
| Method | Based On | Configuration | Best For |
|---|---|---|---|
| Source | Client IP address hash | None (automatic) | Simple web applications, legacy systems |
| URI | Request path hash | URI length, URI depth | Caching, content routing, sharded backends |
| URL Param | URL/POST parameter value | Parameter name, POST check option | Session tracking, user-specific routing |
| HDR | HTTP header value | Header name | API authentication, multi-tenant apps, JWT routing |
| New Cookie | LB-managed cookie | Cookie name, max-idle, max-life | No app changes needed, session timeout control |
| Current Cookie | Existing app cookie | Cookie name to track | Leverage existing app sessions |
| Hash | Custom key expression | Key variables, functions, 3M entries, 7-day TTL | Complex multi-factor persistence, ultimate flexibility |
Algorithm Selection Guide
Selecting the right algorithm depends on your specific requirements. This comparison highlights the key trade-offs:
| Algorithm | Load Awareness | Complexity | Persistence | Primary Use Case |
|---|---|---|---|---|
| Round Robin | None | Minimal | No | Homogeneous stateless workloads |
| Weighted Round Robin | Static (weights) | Low | No | Mixed server capacities |
| Least Connection | Dynamic (connections) | Medium | No | Long sessions, databases |
| First | None | Minimal | No | Active-passive, license optimization |
| Random | Dynamic (response time) | Low | No | Large clusters, cache servers |
| Fastest | Dynamic (response time) | Medium | No | Latency-sensitive applications |
| Fastest+ | Multi-criteria | High | No | Complex production environments |
| Source | Via weights | Low | Yes (IP) | Simple session persistence |
| URI | Via weights | Medium | Yes (path) | Caching, content routing |
| URL Param | Via weights | Medium | Yes (user ID) | User session tracking |
| HDR | Via weights | Medium | Yes (header) | API routing, multi-tenant |
Choosing Your Algorithm
Use Distribution Algorithms When
- Application is stateless or uses shared session store
- You need to spread load across server pool
- Servers have different capacities (use Weighted)
- Response time optimization is critical (use Fastest/Fastest+)
Use Persistence Methods When
- Application stores session data locally on servers
- Backend services are sharded by user or content
- Caching efficiency requires consistent routing
- Token or header-based authentication is used
Use Fastest+ When
- Single metric doesn't capture your optimization goal
- You need tie-breaking logic for homogeneous servers
- Both performance and reliability matter
- Complex production environment with varied workloads
Implementation Best Practices
Regardless of the algorithm chosen, these practices ensure optimal load balancing performance:
Implement Robust Health Checks
Configure active health checks that verify application functionality, not just TCP connectivity. A server responding to pings but returning 500 errors should be removed from rotation.
Monitor and Adjust Weights
For weighted algorithms, review weight assignments quarterly or after infrastructure changes. Benchmark servers under realistic load to determine accurate capacity ratios.
Choose Persistence Wisely
Design applications to be stateless when possible, storing sessions in external stores like Redis. If persistence is required, choose the method that best matches your session identifier (IP, URI, parameter, or header).
Test Failover Scenarios
Regularly test server failures to ensure graceful failover. Verify that algorithm behavior is correct when servers are removed and re-added to the pool.
Use Fastest+ for Complex Scenarios
When single-metric algorithms don't meet your needs, configure Fastest+ with primary and secondary criteria. Start with Least Response Time as Opt-1 and Least Connection Error as Opt-2.
Frequently Asked Questions
Yes, persistence methods (Source, URI, URL Param, HDR) already incorporate server weights in their routing decisions. This means you get both session affinity and capacity-aware distribution. The hash-based selection is weighted according to server configurations.
Use Fastest when response time is your only concern and servers have clearly different performance characteristics. Use Fastest+ when you need more nuanced selection—for example, optimizing for response time but preferring servers with fewer errors when response times are similar.
When a persistent server becomes unavailable, requests are re-routed to another server based on the persistence algorithm's hash redistribution. The session data on the failed server is lost unless your application uses external session storage. Health checks ensure failed servers are removed from the pool quickly.
Least Connection as a standalone algorithm routes to the server with the fewest active connections. 'Least Used Connections' in Fastest+ considers connection utilization relative to the server's capacity, making it more appropriate for heterogeneous server environments.
For mobile applications, avoid Source (IP) persistence since mobile devices frequently change IP addresses. Use URL Param persistence with a user ID or session token, or HDR persistence with an authentication header. These methods follow the user's session regardless of network changes.
Conclusion
Load balancing algorithms are foundational to application delivery, yet often overlooked during architecture decisions. The right algorithm ensures even server utilization, optimal response times, and resilient failover—while the wrong choice can create hotspots, degrade performance, and complicate troubleshooting.
For most production environments, start with Least Connection for dynamic workloads or Weighted Round Robin for static capacity distribution. As your monitoring capabilities mature, explore Fastest+ for multi-criteria optimization. Choose persistence methods based on your session management strategy—Source for simplicity, or URI/URL Param/HDR for application-aware routing.
Intelligent Traffic Distribution
TR7 Load Balancer supports all major algorithms including advanced Fastest+ with multi-criteria optimization, plus flexible persistence options for session-aware routing. Optimize your application delivery with enterprise-grade load balancing.
Explore Load Balancer