Перейти к содержанию

Load Balancing

Updated: Mar 2026 Scope: Xray routing/balancer config generation, per-platform behavior, Smart Load Balancer scoring


TL;DR

Platform Uses backend routing/balancers/observatory? How it picks a server
iOS YES — full passthrough xray-core (LibXray) executes backend config as-is
Android NO — completely ignores Builds own config with hardcoded leastPing
Desktop NO — own routing Single outbound per country, no load balancing

Key rule: Changes to the balancer strategy in backend properties affect iOS only.


1. Backend: XrayConfigResponse

CountryServiceImpl.java:979-994 assembles:

XrayConfigResponse response = new XrayConfigResponse();
response.setOutbounds(outbounds);

// routing depends on split/base mode
XrayRoutingConfig routing = split
    ? xrayRoutingService.generateSplitTunnelingConfig(tags, XRAY_DOMAINS_PLACEHOLDER)
    : xrayRoutingService.generateRoutingConfigForServers(tags);
response.setRouting(routing);

// observatory/burstObservatory depends on strategy
var observatory = xrayRoutingService.generateObservatoryForServers(tags);
if (observatory != null) response.setObservatory(observatory);

var burstObservatory = xrayRoutingService.generateBurstObservatoryForServers(tags);
if (burstObservatory != null) response.setBurstObservatory(burstObservatory);

DTO fields (XrayConfigResponse.java): - List<OutboundBeanResponse> outbounds — VPN servers for the country - XrayRoutingConfig routing — routing rules + balancer definition - @JsonInclude(NON_NULL) XrayObservatory observatory — for leastPing - @JsonInclude(NON_NULL) XrayBurstObservatory burstObservatory — for leastLoad - @JsonInclude(NON_NULL) String stringImageData


2. Balancer Strategies (XrayRoutingServiceImpl.java)

Set in application-prod.properties:

xray.balancer.strategy=leastPing   # leastLoad or random also possible

Strategy Balancer tag Observatory Probe URL Interval
leastPing least-ping-balancer + leastPing strategy observatory (standard) https://1.1.1.1/cdn-cgi/trace 120s
leastLoad load-balancer + leastLoad strategy (expected=3, maxRTT=1000ms, tolerance=0.5) burstObservatory https://connectivitycheck.gstatic.com/generate_204 30s, sampling=5
random random-balancer + random strategy none (null) none

Current production strategy: leastPing (after the random incident, see section 6).

Routing rules — base mode

  1. block-rulegeosite:category-ads-all → outbound block
  2. direct-rulegeoip:private → outbound direct
  3. api-direct-ruledomain:shiva-app.io → outbound direct
  4. main-ruletcp,udp → balancerTag (everything else through balancer)

Routing rules — split mode

  1. block-rule → ads → block
  2. direct-rule → private IP → direct
  3. vpn-rule → specified domains → balancerTag (only these domains through VPN)
  4. default-directtcp,udpdirect (everything else direct)

3. iOS: Full Passthrough

File: ios-app/Shiva/ShivaPacketTunnelProvider/ConfigBuilder.swift

iOS receives JSON from backend and passes it to LibXray almost unchanged:

static func buildConfig(fromFile sourceFileURL: URL, saveTo destFileURL: URL,
                         vpnMode: String, domains: [String], inbound: InboundProxy) throws {
    let data = try Data(contentsOf: sourceFileURL)
    guard var json = try JSONSerialization.jsonObject(with: data) as? [String: Any] else { ... }

    // Adds SOCKS inbound (for tun2socks)
    var inbounds = json["inbounds"] as? [[String: Any]] ?? []
    inbounds.append(inboundObject)
    json["inbounds"] = inbounds

    // For split mode: replaces "domainPlaceholder" with real domains
    if vpnMode == "split" { ... replaceDomains(in: json) ... }

    let newData = try JSONSerialization.data(withJSONObject: json, options: [.prettyPrinted])
    try newData.write(to: destFileURL)
}

What iOS does with backend config: - outbounds → passed as-is - routing → passed as-is (including balancers, strategy) - observatory → passed as-is - burstObservatory → passed as-is - Adds: only SOCKS inbound - Modifies: only domain: ["domainPlaceholder"] → real domains (split mode)

LibXray version: 0.0.1755156260 (xray-core ~Aug 2025, v25.7.25).

Consequence: Backend fully controls balancer behavior for iOS. Any strategy change affects iOS immediately. If backend sends an unsupported strategy — iOS breaks.


4. Android: Own Config, Hardcoded leastPing

Mapper file: MapperCountriesWithConfigsV2ToServerConfigList.kt

Android parses the backend response but extracts only outbounds:

from.countries.forEach { countryWithConfig ->
    countryWithConfig.baseConfig?.let { config ->
        val outbounds = config.outbounds.map { outbound ->
            OutBoundConfigUIEntity.Outbound(
                outboundBean = mapOutboundBean(outbound),  // address, port, users, streamSettings
                ...
            )
        }
    }
}

routing, balancers, observatory, burstObservatory — completely discarded at the mapping stage.

Android config construction (V2rayConfigUtil.kt)

// HARDCODED observatory — leastPing, 45s, Firefox URL
v2rayConfig.observatory = V2rayConfig.ObservatoryBean(
    subjectSelector = outboundTags,
    probeUrl = "http://detectportal.firefox.com/success.txt",
    probeInterval = "45s",
    enableConcurrency = true
)

// HARDCODED routing with balancer strategy = "leastPing"
setupRoutingWithBalancer(v2rayConfig, outboundTags, "leastPing")

Android fallback strategy (V2RayServiceManager.kt:163-215)

  1. Try leastPing
  2. After 2 seconds, probe detectportal.firefox.com/success.txt
  3. If probe fails → fallback to random balancer
  4. Final fallback → try servers one by one

Differences from backend config:

Parameter Backend Android
domainStrategy AsIs IPIfNonMatch
Observatory probe URL https://1.1.1.1/cdn-cgi/trace http://detectportal.firefox.com/success.txt
Probe interval 120s 45s
DNS none Google DNS + googleapis.cn → googleapis.com
Stats enabled
Mux from config force-disabled

Xray core version: libv2ray (xray-core v25.7.25, comment //NOTE: xray core version v25.7.25).


5. Desktop: Single Outbound, No Balancing

File: desktop-app/shiva-daemon/src/xray_config.rs

Desktop creates config for one outbound with a simple routing rule:

pub fn create_with_outbound(
    log_config: &log_config::Config,
    inbound_socks_proxy_addr: &SocketAddr,
    outbound: outbound::Outbound,
) -> Config {
    let inbound = new_inbound_socks(inbound_socks_proxy_addr);

    let rule = routing::Rule {
        r#type: routing::RuleType::Field,
        inbound_tag: vec![inbound.tag.clone()],
        outbound_tag: outbound.tag.clone(),
    };

    let routing = routing::Routing {
        domain_strategy: routing::DomainStrategy::AsIs,
        domain_matcher: routing::DomainMatcher::Hybrid,
        rules: vec![rule, api_rule],
    };
    // Linux only: fwmark for binding to physical interface
    Config {
        outbounds,            // single outbound
        routing: Some(routing), // simple routing inbound→outbound
        dns: None,
        ...
    }
}

Outbound passthrough (shiva-common/src/xray/outbound.rs):

#[serde(flatten)]
pub extra: HashMap<String, Value>,  // preserves unknown fields

Desktop does not support: balancing, observatory, split-tunneling, DNS, block/direct outbounds.


6. Platform Comparison Table

Aspect Backend generates iOS uses Android uses Desktop uses
outbounds All servers for country as-is outboundBean only (settings, streamSettings) Single outbound
routing.rules block/direct/main as-is Own (inbound→balancer) Own (inbound→outbound)
routing.balancers Per strategy as-is Own (leastPing hardcoded) None
routing.domainStrategy AsIs as-is IPIfNonMatch AsIs
observatory For leastPing as-is Own (45s, Firefox URL) None
burstObservatory For leastLoad as-is None None
DNS None None Own (Google DNS) None
Mux From config as-is Force-disabled None
Fallback None None leastPing → random → single None
Xray version v25.7.25 (LibXray) v25.7.25 (libv2ray) From bundle

7. Known Incidents

Incident 1: random strategy broke iOS (16 Feb 2026)

Timeline: 1. 96cd9100 (15 Feb): Changed leastPingleastLoad + burstObservatory 2. 90421a0a (16 Feb 06:04): Changed leastLoadrandom 3. iOS stopped connecting to all countries 4. Android continued working (builds its own config) 5. 882c600d (16 Feb): Rolled back to leastPing — iOS recovered

Cause: iOS passes random to LibXray as-is. xray-core Aug 2025 either doesn't support random strategy in balancer or handles it incorrectly.

Incident 2: Countries with 5 WL outbounds stopped working (16 Feb 2026)

Problem: After the rollback to leastPing — DE, GB, LV (5 WL outbounds each) failed to connect. Single-outbound countries worked.

Cause: leastPing observatory sends a probe every 120s to each outbound. 5 outbounds pointing to the same upstream server (through different WL proxies) → 5 parallel probe connections → instability.

Fix: Left 1 WL server per country. Everything worked.


8. Smart Load Balancer (VCS Side)

The VPN Config Service (sync_fast.py) computes server quality metrics every 60 seconds and syncs them to MySQL. This runs independently of the xray balancer strategies above — it governs which outbounds appear in the backend response.

Composite Score Formula

@staticmethod
def _compute_score(metrics, tcp_count, bw_ema, status, proxy_load, proxy_cores, proxy_bw_ema) -> float:
    """Composite score: 0 = best, 999 = DOWN"""
    if status == 'DOWN':
        return 999.0
    if not metrics:
        return 1.0  # no data — neutral

    cpu_factor = metrics['load1'] / max(metrics.get('cpu_cores', 1), 1)
    mem_factor = 1 - (metrics.get('mem_avail_bytes', 0) / max(metrics.get('mem_total_bytes', 1), 1))
    bw_factor = (bw_ema or 0) / 1000   # normalized to 1 Gbps
    session_factor = tcp_count / 500

    score = (session_factor * 0.35 +
             cpu_factor   * 0.30 +
             bw_factor    * 0.25 +
             mem_factor   * 0.10)

    # WL proxy: take max of server score and proxy score
    if proxy_load is not None:
        p_cores = max(proxy_cores or 1, 1)
        proxy_score = proxy_cpu_factor * 0.5 + ((proxy_bw_ema or 0) / 1000) * 0.5
        score = max(score, proxy_score)

    return round(score, 4)

Weights: sessions 35% + CPU 30% + bandwidth 25% + memory 10%. Range: 0 = best, 999 = DOWN. Fallback: no metrics → 1.0 (neutral). DOWN → 999.0.

EMA Bandwidth

Bandwidth uses an Exponential Moving Average to smooth spikes:

EMA_ALPHA = 0.3
bw_ema = bw_mbps if old_ema is None else round(EMA_ALPHA * bw_mbps + (1 - EMA_ALPHA) * old_ema, 2)

EMA recalculated every sync_online_interval_seconds (default 60s).

is_turbo / is_fast Thresholds

is_turbo = (bw_ema or 0) >= 200 and status != 'DOWN'   # ≥200 Mbps EMA
is_fast   = is_turbo and score < 0.5 and status == 'HEALTHY'  # turbo + low load

Note (as of Mar 2026): Dynamic is_turbo/is_fast from DB is disabled on the backend side — hardcoded lists are active until EMA stabilizes (1-2 weeks post-deploy).

MySQL Sync (every 5 minutes)

UPDATE server SET
    health_status = %s, tcp_sessions = %s, load_avg_1m = %s,
    score = %s, bandwidth_avg_mbps = %s,
    is_turbo = %s, is_fast = %s
WHERE ip = %s

Columns added by V115 Flyway migration: score, bandwidth_avg_mbps, is_turbo, is_fast.


9. ServerRankingCache (Backend Side)

ServerRankingCache.java uses the DB score for weighted random selection:

  • Weight formula: 1 / (1 + score) — lower score → higher weight → picked more often
  • Fallback (legacy): if no score in DB, falls back to original capacity-based weight: freeSessions * cpuMultiplier * bwMultiplier
  • Refresh: ServerMetricsScheduler rebuilds rankings every 60s from VPN Config Service data
  • Feature flag: server.balancing.enabled=false — server selection is currently off (metrics collected, scoring active, selection gating disabled)

Outbound Shuffle

Outbounds in /api/v2/countries response are deterministically shuffled per account, rotating every minute. This distributes load across servers even without active balancer selection.

As of 4 Mar 2026, outbounds are also sorted by score (best first) before shuffle — CountryServiceImpl.java.

How Score Affects What a Client Gets

VCS sync_fast.py (60s)
  → compute composite score per server
  → write to PostgreSQL (servers.score)
  → sync to MySQL (server.score) every 5 min

Backend ServerMetricsScheduler (60s)
  → read scores from VCS API
  → update ServerRankingCache (weight = 1/(1+score))

CountryServiceImpl
  → query servers for country
  → sort by score (best first)
  → apply deterministic per-account shuffle
  → return top N outbounds to client

10. Recommendations for Backend Refactoring

Problem 1: Backend doesn't know which platform it's generating a config for. - iOS needs full valid xray config (routing + balancer + observatory) - Android only needs outbounds (ignores the rest) - Desktop needs a single outbound

Proposed solution — platform-aware config:

GET /api/v2/countries?platform=ios     → full config (current behavior)
GET /api/v2/countries?platform=android → outbounds only, no routing/observatory
GET /api/v2/countries?platform=desktop → single best outbound per country

Problem 2: One strategy for all platforms. - leastPing — only verified-safe strategy for iOS - random — breaks iOS (confirmed) - leastLoad — not fully tested on iOS

Problem 3: Too many outbounds per upstream. - Many outbounds to the same upstream = observatory probe overload - Recommendation: ≤2 outbounds per country for iOS - Android handles it (own observatory with 45s interval)

Problem 4: Observatory probe URL. - Backend uses https://1.1.1.1/cdn-cgi/trace — can be blocked - Android uses http://detectportal.firefox.com/success.txt — more reliable


11. Source Files

Platform File Role
Backend service/impl/XrayRoutingServiceImpl.java Routing/balancer/observatory generation
Backend service/impl/CountryServiceImpl.java:913-997 XrayConfigResponse assembly
Backend dto/response/xray/XrayConfigResponse.java Response DTO
iOS ShivaPacketTunnelProvider/ConfigBuilder.swift Config passthrough to LibXray
Android v2rayng/util/V2rayConfigUtil.kt Own xray config construction
Android v2rayng/service/V2RayServiceManager.kt:163-215 Fallback logic (leastPing→random)
Android data/mapper/MapperCountriesWithConfigsV2ToServerConfigList.kt Backend response mapping
Desktop shiva-daemon/src/xray_config.rs Single-outbound config creation
Desktop shiva-common/src/xray/outbound.rs Unknown fields passthrough
VCS vpn-config-service/app/services/sync_fast.py Score/EMA/turbo computation