Load Balancing¶
Updated: Mar 2026 Scope: Xray routing/balancer config generation, per-platform behavior, Smart Load Balancer scoring
TL;DR¶
| Platform | Uses backend routing/balancers/observatory? | How it picks a server |
|---|---|---|
| iOS | YES — full passthrough | xray-core (LibXray) executes backend config as-is |
| Android | NO — completely ignores | Builds own config with hardcoded leastPing |
| Desktop | NO — own routing | Single outbound per country, no load balancing |
Key rule: Changes to the balancer strategy in backend properties affect iOS only.
1. Backend: XrayConfigResponse¶
CountryServiceImpl.java:979-994 assembles:
XrayConfigResponse response = new XrayConfigResponse();
response.setOutbounds(outbounds);
// routing depends on split/base mode
XrayRoutingConfig routing = split
? xrayRoutingService.generateSplitTunnelingConfig(tags, XRAY_DOMAINS_PLACEHOLDER)
: xrayRoutingService.generateRoutingConfigForServers(tags);
response.setRouting(routing);
// observatory/burstObservatory depends on strategy
var observatory = xrayRoutingService.generateObservatoryForServers(tags);
if (observatory != null) response.setObservatory(observatory);
var burstObservatory = xrayRoutingService.generateBurstObservatoryForServers(tags);
if (burstObservatory != null) response.setBurstObservatory(burstObservatory);
DTO fields (XrayConfigResponse.java):
- List<OutboundBeanResponse> outbounds — VPN servers for the country
- XrayRoutingConfig routing — routing rules + balancer definition
- @JsonInclude(NON_NULL) XrayObservatory observatory — for leastPing
- @JsonInclude(NON_NULL) XrayBurstObservatory burstObservatory — for leastLoad
- @JsonInclude(NON_NULL) String stringImageData
2. Balancer Strategies (XrayRoutingServiceImpl.java)¶
Set in application-prod.properties:
| Strategy | Balancer tag | Observatory | Probe URL | Interval |
|---|---|---|---|---|
leastPing |
least-ping-balancer + leastPing strategy |
observatory (standard) |
https://1.1.1.1/cdn-cgi/trace |
120s |
leastLoad |
load-balancer + leastLoad strategy (expected=3, maxRTT=1000ms, tolerance=0.5) |
burstObservatory |
https://connectivitycheck.gstatic.com/generate_204 |
30s, sampling=5 |
random |
random-balancer + random strategy |
none (null) | none | — |
Current production strategy: leastPing (after the random incident, see section 6).
Routing rules — base mode¶
block-rule→geosite:category-ads-all→ outboundblockdirect-rule→geoip:private→ outbounddirectapi-direct-rule→domain:shiva-app.io→ outbounddirectmain-rule→tcp,udp→ balancerTag (everything else through balancer)
Routing rules — split mode¶
block-rule→ ads →blockdirect-rule→ private IP →directvpn-rule→ specified domains → balancerTag (only these domains through VPN)default-direct→tcp,udp→direct(everything else direct)
3. iOS: Full Passthrough¶
File: ios-app/Shiva/ShivaPacketTunnelProvider/ConfigBuilder.swift
iOS receives JSON from backend and passes it to LibXray almost unchanged:
static func buildConfig(fromFile sourceFileURL: URL, saveTo destFileURL: URL,
vpnMode: String, domains: [String], inbound: InboundProxy) throws {
let data = try Data(contentsOf: sourceFileURL)
guard var json = try JSONSerialization.jsonObject(with: data) as? [String: Any] else { ... }
// Adds SOCKS inbound (for tun2socks)
var inbounds = json["inbounds"] as? [[String: Any]] ?? []
inbounds.append(inboundObject)
json["inbounds"] = inbounds
// For split mode: replaces "domainPlaceholder" with real domains
if vpnMode == "split" { ... replaceDomains(in: json) ... }
let newData = try JSONSerialization.data(withJSONObject: json, options: [.prettyPrinted])
try newData.write(to: destFileURL)
}
What iOS does with backend config:
- outbounds → passed as-is
- routing → passed as-is (including balancers, strategy)
- observatory → passed as-is
- burstObservatory → passed as-is
- Adds: only SOCKS inbound
- Modifies: only domain: ["domainPlaceholder"] → real domains (split mode)
LibXray version: 0.0.1755156260 (xray-core ~Aug 2025, v25.7.25).
Consequence: Backend fully controls balancer behavior for iOS. Any strategy change affects iOS immediately. If backend sends an unsupported strategy — iOS breaks.
4. Android: Own Config, Hardcoded leastPing¶
Mapper file: MapperCountriesWithConfigsV2ToServerConfigList.kt
Android parses the backend response but extracts only outbounds:
from.countries.forEach { countryWithConfig ->
countryWithConfig.baseConfig?.let { config ->
val outbounds = config.outbounds.map { outbound ->
OutBoundConfigUIEntity.Outbound(
outboundBean = mapOutboundBean(outbound), // address, port, users, streamSettings
...
)
}
}
}
routing, balancers, observatory, burstObservatory — completely discarded at the mapping stage.
Android config construction (V2rayConfigUtil.kt)¶
// HARDCODED observatory — leastPing, 45s, Firefox URL
v2rayConfig.observatory = V2rayConfig.ObservatoryBean(
subjectSelector = outboundTags,
probeUrl = "http://detectportal.firefox.com/success.txt",
probeInterval = "45s",
enableConcurrency = true
)
// HARDCODED routing with balancer strategy = "leastPing"
setupRoutingWithBalancer(v2rayConfig, outboundTags, "leastPing")
Android fallback strategy (V2RayServiceManager.kt:163-215)¶
- Try leastPing
- After 2 seconds, probe
detectportal.firefox.com/success.txt - If probe fails → fallback to
randombalancer - Final fallback → try servers one by one
Differences from backend config:
| Parameter | Backend | Android |
|---|---|---|
| domainStrategy | AsIs |
IPIfNonMatch |
| Observatory probe URL | https://1.1.1.1/cdn-cgi/trace |
http://detectportal.firefox.com/success.txt |
| Probe interval | 120s | 45s |
| DNS | none | Google DNS + googleapis.cn → googleapis.com |
| Stats | — | enabled |
| Mux | from config | force-disabled |
Xray core version: libv2ray (xray-core v25.7.25, comment //NOTE: xray core version v25.7.25).
5. Desktop: Single Outbound, No Balancing¶
File: desktop-app/shiva-daemon/src/xray_config.rs
Desktop creates config for one outbound with a simple routing rule:
pub fn create_with_outbound(
log_config: &log_config::Config,
inbound_socks_proxy_addr: &SocketAddr,
outbound: outbound::Outbound,
) -> Config {
let inbound = new_inbound_socks(inbound_socks_proxy_addr);
let rule = routing::Rule {
r#type: routing::RuleType::Field,
inbound_tag: vec![inbound.tag.clone()],
outbound_tag: outbound.tag.clone(),
};
let routing = routing::Routing {
domain_strategy: routing::DomainStrategy::AsIs,
domain_matcher: routing::DomainMatcher::Hybrid,
rules: vec![rule, api_rule],
};
// Linux only: fwmark for binding to physical interface
Config {
outbounds, // single outbound
routing: Some(routing), // simple routing inbound→outbound
dns: None,
...
}
}
Outbound passthrough (shiva-common/src/xray/outbound.rs):
Desktop does not support: balancing, observatory, split-tunneling, DNS, block/direct outbounds.
6. Platform Comparison Table¶
| Aspect | Backend generates | iOS uses | Android uses | Desktop uses |
|---|---|---|---|---|
| outbounds | All servers for country | as-is | outboundBean only (settings, streamSettings) | Single outbound |
| routing.rules | block/direct/main | as-is | Own (inbound→balancer) | Own (inbound→outbound) |
| routing.balancers | Per strategy | as-is | Own (leastPing hardcoded) | None |
| routing.domainStrategy | AsIs |
as-is | IPIfNonMatch |
AsIs |
| observatory | For leastPing | as-is | Own (45s, Firefox URL) | None |
| burstObservatory | For leastLoad | as-is | None | None |
| DNS | None | None | Own (Google DNS) | None |
| Mux | From config | as-is | Force-disabled | None |
| Fallback | None | None | leastPing → random → single | None |
| Xray version | — | v25.7.25 (LibXray) | v25.7.25 (libv2ray) | From bundle |
7. Known Incidents¶
Incident 1: random strategy broke iOS (16 Feb 2026)¶
Timeline:
1. 96cd9100 (15 Feb): Changed leastPing → leastLoad + burstObservatory
2. 90421a0a (16 Feb 06:04): Changed leastLoad → random
3. iOS stopped connecting to all countries
4. Android continued working (builds its own config)
5. 882c600d (16 Feb): Rolled back to leastPing — iOS recovered
Cause: iOS passes random to LibXray as-is. xray-core Aug 2025 either doesn't support random strategy in balancer or handles it incorrectly.
Incident 2: Countries with 5 WL outbounds stopped working (16 Feb 2026)¶
Problem: After the rollback to leastPing — DE, GB, LV (5 WL outbounds each) failed to connect. Single-outbound countries worked.
Cause: leastPing observatory sends a probe every 120s to each outbound. 5 outbounds pointing to the same upstream server (through different WL proxies) → 5 parallel probe connections → instability.
Fix: Left 1 WL server per country. Everything worked.
8. Smart Load Balancer (VCS Side)¶
The VPN Config Service (sync_fast.py) computes server quality metrics every 60 seconds and syncs them to MySQL. This runs independently of the xray balancer strategies above — it governs which outbounds appear in the backend response.
Composite Score Formula¶
@staticmethod
def _compute_score(metrics, tcp_count, bw_ema, status, proxy_load, proxy_cores, proxy_bw_ema) -> float:
"""Composite score: 0 = best, 999 = DOWN"""
if status == 'DOWN':
return 999.0
if not metrics:
return 1.0 # no data — neutral
cpu_factor = metrics['load1'] / max(metrics.get('cpu_cores', 1), 1)
mem_factor = 1 - (metrics.get('mem_avail_bytes', 0) / max(metrics.get('mem_total_bytes', 1), 1))
bw_factor = (bw_ema or 0) / 1000 # normalized to 1 Gbps
session_factor = tcp_count / 500
score = (session_factor * 0.35 +
cpu_factor * 0.30 +
bw_factor * 0.25 +
mem_factor * 0.10)
# WL proxy: take max of server score and proxy score
if proxy_load is not None:
p_cores = max(proxy_cores or 1, 1)
proxy_score = proxy_cpu_factor * 0.5 + ((proxy_bw_ema or 0) / 1000) * 0.5
score = max(score, proxy_score)
return round(score, 4)
Weights: sessions 35% + CPU 30% + bandwidth 25% + memory 10%. Range: 0 = best, 999 = DOWN. Fallback: no metrics → 1.0 (neutral). DOWN → 999.0.
EMA Bandwidth¶
Bandwidth uses an Exponential Moving Average to smooth spikes:
EMA_ALPHA = 0.3
bw_ema = bw_mbps if old_ema is None else round(EMA_ALPHA * bw_mbps + (1 - EMA_ALPHA) * old_ema, 2)
EMA recalculated every sync_online_interval_seconds (default 60s).
is_turbo / is_fast Thresholds¶
is_turbo = (bw_ema or 0) >= 200 and status != 'DOWN' # ≥200 Mbps EMA
is_fast = is_turbo and score < 0.5 and status == 'HEALTHY' # turbo + low load
Note (as of Mar 2026): Dynamic is_turbo/is_fast from DB is disabled on the backend side — hardcoded lists are active until EMA stabilizes (1-2 weeks post-deploy).
MySQL Sync (every 5 minutes)¶
UPDATE server SET
health_status = %s, tcp_sessions = %s, load_avg_1m = %s,
score = %s, bandwidth_avg_mbps = %s,
is_turbo = %s, is_fast = %s
WHERE ip = %s
Columns added by V115 Flyway migration: score, bandwidth_avg_mbps, is_turbo, is_fast.
9. ServerRankingCache (Backend Side)¶
ServerRankingCache.java uses the DB score for weighted random selection:
- Weight formula:
1 / (1 + score)— lower score → higher weight → picked more often - Fallback (legacy): if no score in DB, falls back to original capacity-based weight:
freeSessions * cpuMultiplier * bwMultiplier - Refresh:
ServerMetricsSchedulerrebuilds rankings every 60s from VPN Config Service data - Feature flag:
server.balancing.enabled=false— server selection is currently off (metrics collected, scoring active, selection gating disabled)
Outbound Shuffle¶
Outbounds in /api/v2/countries response are deterministically shuffled per account, rotating every minute. This distributes load across servers even without active balancer selection.
As of 4 Mar 2026, outbounds are also sorted by score (best first) before shuffle — CountryServiceImpl.java.
How Score Affects What a Client Gets¶
VCS sync_fast.py (60s)
→ compute composite score per server
→ write to PostgreSQL (servers.score)
→ sync to MySQL (server.score) every 5 min
Backend ServerMetricsScheduler (60s)
→ read scores from VCS API
→ update ServerRankingCache (weight = 1/(1+score))
CountryServiceImpl
→ query servers for country
→ sort by score (best first)
→ apply deterministic per-account shuffle
→ return top N outbounds to client
10. Recommendations for Backend Refactoring¶
Problem 1: Backend doesn't know which platform it's generating a config for. - iOS needs full valid xray config (routing + balancer + observatory) - Android only needs outbounds (ignores the rest) - Desktop needs a single outbound
Proposed solution — platform-aware config:
GET /api/v2/countries?platform=ios → full config (current behavior)
GET /api/v2/countries?platform=android → outbounds only, no routing/observatory
GET /api/v2/countries?platform=desktop → single best outbound per country
Problem 2: One strategy for all platforms.
- leastPing — only verified-safe strategy for iOS
- random — breaks iOS (confirmed)
- leastLoad — not fully tested on iOS
Problem 3: Too many outbounds per upstream. - Many outbounds to the same upstream = observatory probe overload - Recommendation: ≤2 outbounds per country for iOS - Android handles it (own observatory with 45s interval)
Problem 4: Observatory probe URL.
- Backend uses https://1.1.1.1/cdn-cgi/trace — can be blocked
- Android uses http://detectportal.firefox.com/success.txt — more reliable
11. Source Files¶
| Platform | File | Role |
|---|---|---|
| Backend | service/impl/XrayRoutingServiceImpl.java |
Routing/balancer/observatory generation |
| Backend | service/impl/CountryServiceImpl.java:913-997 |
XrayConfigResponse assembly |
| Backend | dto/response/xray/XrayConfigResponse.java |
Response DTO |
| iOS | ShivaPacketTunnelProvider/ConfigBuilder.swift |
Config passthrough to LibXray |
| Android | v2rayng/util/V2rayConfigUtil.kt |
Own xray config construction |
| Android | v2rayng/service/V2RayServiceManager.kt:163-215 |
Fallback logic (leastPing→random) |
| Android | data/mapper/MapperCountriesWithConfigsV2ToServerConfigList.kt |
Backend response mapping |
| Desktop | shiva-daemon/src/xray_config.rs |
Single-outbound config creation |
| Desktop | shiva-common/src/xray/outbound.rs |
Unknown fields passthrough |
| VCS | vpn-config-service/app/services/sync_fast.py |
Score/EMA/turbo computation |