Introduction

Load balancing is the backbone of modern application delivery. When thousands—or millions—of users access your application simultaneously, a single server cannot handle the load. Load balancers distribute incoming traffic across multiple backend servers, ensuring high availability, optimal performance, and fault tolerance.

But how does a load balancer decide which server should handle each request? The answer lies in load balancing algorithms. Choosing the right algorithm can mean the difference between a responsive application and one plagued by timeouts and uneven server loads.

This guide explores load balancing algorithms in two categories: distribution algorithms that determine how traffic is spread across servers, and persistence methods that ensure session continuity for stateful applications.

Why Algorithm Choice Matters

The right load balancing algorithm directly impacts application performance, user experience, and infrastructure efficiency:

40%
Latency Reduction

Response time improvement with optimal algorithm selection vs basic round robin

NGINX Performance Study
high availability
Uptime Target

Enterprise availability requirement - only 52 minutes downtime per year

Industry Standard SLA
53%
User Abandonment

Users leave if page takes >3 seconds to load

Google Web Performance Study
3x
Throughput Gain

Capacity increase with intelligent distribution vs single server

Load Balancing Best Practices

Algorithm Categories

Load balancing algorithms fall into two main categories, each serving different architectural needs:

Distribution Algorithms

Determine how traffic is spread across servers—sequentially, randomly, or based on real-time server metrics like connections or response time.

Persistence Methods

Ensure session continuity by routing requests from the same client, URI, or user ID to the same server consistently.

Performance-Based

Advanced algorithms that consider server health metrics—response time, queue depth, connection errors—for optimal routing decisions.

Hybrid Approaches

Combine distribution with persistence, or use multi-criteria selection (like Fastest+) for sophisticated load balancing scenarios.

Distribution Algorithms

Distribution algorithms focus on spreading traffic across your server pool. The choice depends on your server infrastructure, request characteristics, and performance requirements.

Round Robin

Round Robin is the simplest and most widely deployed load balancing algorithm. It works exactly as its name suggests: requests are distributed to servers in a circular, sequential order. The first request goes to Server 1, the second to Server 2, and so on. After reaching the last server, the cycle starts over.

This algorithm assumes all servers have equal capacity and all requests require similar processing power. It requires no state tracking beyond knowing which server is next in the rotation, making it extremely efficient with minimal computational overhead.

Round Robin excels in homogeneous environments where servers have identical specifications and requests are relatively uniform—such as serving static content, simple API endpoints, or stateless microservices.

Round Robin: At a Glance

AspectDetails
How it worksSequential distribution: Server 1 → Server 2 → Server 3 → repeat
Best forHomogeneous servers, uniform request loads, stateless applications
StrengthsSimple, predictable, zero computational overhead, easy to debug
WeaknessesIgnores server load and capacity differences
Use casesStatic content, CDN edge nodes, stateless APIs

Weighted Round Robin

Weighted Round Robin extends the basic algorithm by assigning a weight to each server based on its capacity. Servers with higher weights receive proportionally more traffic. If Server A has weight 3 and Server B has weight 1, Server A handles three requests for every one that Server B handles.

This algorithm is essential for heterogeneous server environments. Organizations often run a mix of hardware—powerful new servers alongside older machines, or cloud instances with different vCPU counts. Weighted Round Robin ensures that a 16-core server handles more traffic than a 4-core server.

Weights are typically configured based on CPU cores, memory, or benchmark testing. While more flexible than simple Round Robin, this algorithm still does not account for real-time server load—a heavily weighted server under high load will continue receiving traffic based on its weight, not its current capacity.

Setting Weights

Start with weights proportional to CPU cores or memory. For example, if Server A has 8 cores and Server B has 4 cores, assign weights 8 and 4 (or 2 and 1). Monitor and adjust based on actual performance metrics—throughput, response times, and error rates.

Least Connection

Least Connection takes a dynamic approach: each new request goes to the server with the fewest active connections at that moment. Unlike Round Robin, this algorithm adapts to real-time conditions—if one server is processing many slow requests, new requests are routed to less busy servers.

This algorithm is particularly recommended for servers handling long session times. Database connections (SQL), directory services (LDAP), and applications with persistent connections benefit significantly from Least Connection distribution.

The load balancer tracks active connections to each backend server. When a new request arrives, it queries this connection table and selects the server with the lowest count. This small overhead is negligible compared to the performance benefits for connection-heavy workloads.

Least Connection: At a Glance

AspectDetails
How it worksRoutes to server with fewest active connections
Best forLong session times, persistent connections, database workloads
StrengthsAdapts to real-time load, prevents server overload, handles slow requests gracefully
WeaknessesSlight overhead for connection tracking
Use casesSQL databases, LDAP directories, WebSocket applications, APIs with long-running requests

First

The First algorithm takes a unique approach: it sends all traffic to the first server in the pool until that server reaches its maximum connection limit. Only then does traffic flow to the next server.

This algorithm is useful for active-passive configurations where you want one primary server to handle all load while others remain on standby. It's also valuable for licensing scenarios where you want to maximize utilization of a single licensed server before engaging additional capacity.

First provides predictable behavior and simplifies troubleshooting since you know exactly which server is handling traffic. However, it doesn't provide load distribution benefits and relies entirely on max connection limits for failover.

First: At a Glance

AspectDetails
How it worksFirst server receives all load until max connections reached
Best forActive-passive setups, license optimization, predictable routing
StrengthsSimple, predictable, maximizes single server utilization
WeaknessesNo load distribution, depends on max connection configuration
Use casesPrimary-backup configurations, licensed software, capacity overflow scenarios

Random

The Random algorithm selects servers randomly for each incoming request. However, unlike pure randomization, this implementation considers both server weights and response times in its selection probability.

This weighted random approach provides statistical load distribution while avoiding the predictable patterns of Round Robin. Over time, servers receive traffic proportional to their weights, but the random element prevents synchronized request patterns that can cause periodic load spikes.

Random selection is particularly effective in large server pools where the law of large numbers ensures even distribution. It's also useful when you want to avoid the "thundering herd" problem where multiple clients simultaneously target the same server.

Random: At a Glance

AspectDetails
How it worksRandom server selection considering weight and response time
Best forLarge server pools, avoiding synchronized patterns
StrengthsPrevents thundering herd, statistically even distribution, considers performance
WeaknessesLess predictable than Round Robin, may have short-term imbalances
Use casesHigh-traffic applications, large clusters, cache servers

Fastest

The Fastest algorithm routes requests to the server with the best response time. The load balancer continuously monitors server performance and directs traffic to whichever server is currently responding most quickly.

This approach optimizes for user experience by ensuring requests go to the most responsive server. It automatically adapts to changing conditions—if a server becomes slow due to high CPU, memory pressure, or external dependencies, traffic shifts to faster alternatives.

Fastest is ideal for latency-sensitive applications where response time directly impacts user experience or business metrics. E-commerce checkout flows, real-time APIs, and interactive applications all benefit from response-time-based routing.

Fastest+

Fastest+ is the most sophisticated algorithm, offering two-tier optimization with configurable criteria. You select a primary metric (Opt-1) for server selection, and a secondary metric (Opt-2) that breaks ties when multiple servers have equal primary values.

Available optimization criteria include: Least Response Time, Least Connection Time, Least Queue Time, Least Queues, Least Connection Error, Least Aborted Connections, and Least Used Connections. This flexibility allows fine-tuned optimization for your specific workload characteristics.

For example, you might configure Opt-1 as "Least Response Time" and Opt-2 as "Least Connection Error". The algorithm first selects servers with the best response times, then among those, chooses the one with fewest connection errors. This multi-criteria approach handles complex production scenarios where single metrics are insufficient.

Fastest+ Optimization Options

OptionDescriptionBest For
Least Response TimeServer responding fastest to requestsLatency-sensitive applications
Least Connection TimeServer establishing connections fastestHigh connection churn workloads
Least Queue TimeServer with shortest request queue waitBursty traffic patterns
Least QueuesServer with fewest queued requestsAvoiding request backlogs
Least Connection ErrorServer with fewest failed connectionsReliability-critical applications
Least Aborted ConnectionsServer with fewest client disconnectsLong-running request workloads
Least Used ConnectionsServer with lowest connection utilizationConnection-pooled applications
Two-Tier Selection

Fastest+ uses the secondary criterion (Opt-2) only when multiple servers tie on the primary criterion (Opt-1). This ensures optimal selection even in homogeneous environments where servers often have similar performance characteristics.

Persistence Methods (Self-Persistent)

Persistence methods ensure that related requests from the same client, session, or context always reach the same backend server. This is essential for stateful applications that store session data locally rather than in a shared store.

Source (IP Persistence)

Source persistence uses a hash of the client's source IP address for server selection. The hash value is combined with server weights to determine routing. The same client IP always produces the same hash, ensuring consistent routing to the same server.

This method provides session persistence without requiring cookies or application-level changes. All requests from a specific IP address go to the same server, maintaining any session state stored on that server.

Source persistence has limitations with NAT environments where multiple users share an IP address, and with mobile users who may change IP addresses. For these scenarios, application-layer persistence methods (URI, URL Param, HDR) provide better results.

URI (Path Persistence)

URI persistence hashes the request URI path to determine server routing. The URI text up to a specified length (or until the '?' character if query parameters exist) is hashed and combined with server weights. Same URIs always route to the same server.

Configuration options include URI character length and URI depth (number of path segments to consider). For example, with depth 2, both '/api/users/123' and '/api/users/456' would hash the same '/api/users' prefix.

This method is excellent for caching scenarios where you want all requests for the same resource to hit the same server, maximizing cache efficiency. It's also useful for sharded backends where different URI patterns map to different data partitions.

URL Param (Parameter Persistence)

URL Param persistence extracts a specified parameter from the URL (or POST body) and uses its value for server routing. This is typically used to track user IDs, session tokens, or other application-specific identifiers. Same parameter values always route to the same server.

You configure the URL parameter name to extract and optionally enable POST parameter checking for form submissions. This provides application-aware persistence that follows user sessions regardless of IP address changes.

This method is ideal for applications that embed session or user identifiers in URLs or form data. It provides more reliable persistence than IP-based methods for mobile users or those behind NAT.

HDR (Header Persistence)

HDR persistence examines a specified HTTP header in each request and routes based on its content. Requests with the same header value always go to the same server. You configure which header name to inspect.

Common use cases include routing based on custom session headers, API keys, tenant identifiers in multi-tenant applications, or JWT tokens. This provides maximum flexibility for applications that manage their own session identifiers.

HDR persistence is particularly valuable for API-first architectures and microservices where session state is managed through headers rather than cookies. It integrates smoothly with token-based authentication systems.

Hash (Advanced Custom Persistence)

Hash persistence is the most powerful and flexible method, allowing you to build custom persistence keys from virtually any element in the traffic flow. The load balancer maintains a hash table (up to 3 million entries by default) mapping custom key values to backend servers, with configurable expiration (default 7 days).

The hash key can be constructed from hundreds of available variables: client IP and port, timestamps, SSL certificate fields, frontend information, URL path and method, HTTP headers, request body content, WAF processing results, and many more. You can combine multiple variables and apply transformation functions to create precisely the persistence logic your application requires.

For example, you could create a hash key that: extracts the country from the client IP, checks if it's in a specific list, then combines this with the username from the SSL client certificate. All requests producing the same hash value from this combination—meaning users from the same region with the same certificate identity—will always be directed to the same backend server. This provides extremely granular persistence control while maintaining application session state. This level of customization enables persistence scenarios that other load balancers simply cannot achieve, making it one of TR7's differentiating capabilities.

Hash Key Building Blocks

The hash key can be constructed from any combination of these traffic elements:

CategoryAvailable VariablesExample Use Case
Network LayerClient IP, Client Port, Server IP, Server PortGeo-based routing, network segment affinity
SSL/TLSCertificate CN, Certificate DN, SNI, Cipher SuiteClient certificate-based routing, mTLS scenarios
HTTP RequestMethod, Path, URL, Query Parameters, Host HeaderContent-based routing, API versioning
HTTP HeadersAny header value (Authorization, X-Tenant-ID, etc.)Multi-tenant routing, API key affinity
Request BodyPOST parameters, JSON fields, Form dataTransaction-based persistence
ContextTime, Date, Frontend name, WAF decision, GeoIP countryTime-based routing, compliance routing
Hash Key Expressions

Hash keys support functions for transformation: string manipulation (substring, regex), encoding (base64, URL encode), lookups (GeoIP country, ASN), and conditional logic. Combine these to build complex persistence rules. For example: 'If client is from EU countries AND has valid client certificate, build hash key from certificate CN; otherwise build from Authorization header'—ensuring requests matching the same conditions always reach the same backend server.

Persistence Methods Comparison

MethodBased OnConfigurationBest For
SourceClient IP address hashNone (automatic)Simple web applications, legacy systems
URIRequest path hashURI length, URI depthCaching, content routing, sharded backends
URL ParamURL/POST parameter valueParameter name, POST check optionSession tracking, user-specific routing
HDRHTTP header valueHeader nameAPI authentication, multi-tenant apps, JWT routing
New CookieLB-managed cookieCookie name, max-idle, max-lifeNo app changes needed, session timeout control
Current CookieExisting app cookieCookie name to trackLeverage existing app sessions
HashCustom key expressionKey variables, functions, 3M entries, 7-day TTLComplex multi-factor persistence, ultimate flexibility

Algorithm Selection Guide

Selecting the right algorithm depends on your specific requirements. This comparison highlights the key trade-offs:

AlgorithmLoad AwarenessComplexityPersistencePrimary Use Case
Round RobinNoneMinimalNoHomogeneous stateless workloads
Weighted Round RobinStatic (weights)LowNoMixed server capacities
Least ConnectionDynamic (connections)MediumNoLong sessions, databases
FirstNoneMinimalNoActive-passive, license optimization
RandomDynamic (response time)LowNoLarge clusters, cache servers
FastestDynamic (response time)MediumNoLatency-sensitive applications
Fastest+Multi-criteriaHighNoComplex production environments
SourceVia weightsLowYes (IP)Simple session persistence
URIVia weightsMediumYes (path)Caching, content routing
URL ParamVia weightsMediumYes (user ID)User session tracking
HDRVia weightsMediumYes (header)API routing, multi-tenant

Choosing Your Algorithm

Use Distribution Algorithms When

  • Application is stateless or uses shared session store
  • You need to spread load across server pool
  • Servers have different capacities (use Weighted)
  • Response time optimization is critical (use Fastest/Fastest+)

Use Persistence Methods When

  • Application stores session data locally on servers
  • Backend services are sharded by user or content
  • Caching efficiency requires consistent routing
  • Token or header-based authentication is used

Use Fastest+ When

  • Single metric doesn't capture your optimization goal
  • You need tie-breaking logic for homogeneous servers
  • Both performance and reliability matter
  • Complex production environment with varied workloads

Implementation Best Practices

Regardless of the algorithm chosen, these practices ensure optimal load balancing performance:

01

Implement Robust Health Checks

Configure active health checks that verify application functionality, not just TCP connectivity. A server responding to pings but returning 500 errors should be removed from rotation.

02

Monitor and Adjust Weights

For weighted algorithms, review weight assignments quarterly or after infrastructure changes. Benchmark servers under realistic load to determine accurate capacity ratios.

03

Choose Persistence Wisely

Design applications to be stateless when possible, storing sessions in external stores like Redis. If persistence is required, choose the method that best matches your session identifier (IP, URI, parameter, or header).

04

Test Failover Scenarios

Regularly test server failures to ensure graceful failover. Verify that algorithm behavior is correct when servers are removed and re-added to the pool.

05

Use Fastest+ for Complex Scenarios

When single-metric algorithms don't meet your needs, configure Fastest+ with primary and secondary criteria. Start with Least Response Time as Opt-1 and Least Connection Error as Opt-2.

Frequently Asked Questions

Yes, persistence methods (Source, URI, URL Param, HDR) already incorporate server weights in their routing decisions. This means you get both session affinity and capacity-aware distribution. The hash-based selection is weighted according to server configurations.

Use Fastest when response time is your only concern and servers have clearly different performance characteristics. Use Fastest+ when you need more nuanced selection—for example, optimizing for response time but preferring servers with fewer errors when response times are similar.

When a persistent server becomes unavailable, requests are re-routed to another server based on the persistence algorithm's hash redistribution. The session data on the failed server is lost unless your application uses external session storage. Health checks ensure failed servers are removed from the pool quickly.

Least Connection as a standalone algorithm routes to the server with the fewest active connections. 'Least Used Connections' in Fastest+ considers connection utilization relative to the server's capacity, making it more appropriate for heterogeneous server environments.

For mobile applications, avoid Source (IP) persistence since mobile devices frequently change IP addresses. Use URL Param persistence with a user ID or session token, or HDR persistence with an authentication header. These methods follow the user's session regardless of network changes.

Conclusion

Load balancing algorithms are foundational to application delivery, yet often overlooked during architecture decisions. The right algorithm ensures even server utilization, optimal response times, and resilient failover—while the wrong choice can create hotspots, degrade performance, and complicate troubleshooting.

For most production environments, start with Least Connection for dynamic workloads or Weighted Round Robin for static capacity distribution. As your monitoring capabilities mature, explore Fastest+ for multi-criteria optimization. Choose persistence methods based on your session management strategy—Source for simplicity, or URI/URL Param/HDR for application-aware routing.

Intelligent Traffic Distribution

TR7 Load Balancer supports all major algorithms including advanced Fastest+ with multi-criteria optimization, plus flexible persistence options for session-aware routing. Optimize your application delivery with enterprise-grade load balancing.

Explore Load Balancer