# System Architecture Overview This document is the single source of truth for the platform’s architecture, protocols, data flows, and operational details. It supersedes previous scattered docs. ## Contents - Components & Topology - Decentralized Layer (Membership, Replication, Metrics) - Upload/Conversion Pipeline - Content View & Purchase Flow - API Surface (selected endpoints) - Data Keys & Schemas - Configuration & Defaults - Observability & Metrics - Sequence Diagrams (Mermaid) --- ## Components & Topology - Backend API: Sanic-based service (Telegram bots embedded) with PostgreSQL (SQLAlchemy + Alembic). - Storage: Local FS for uploaded/derived data; IPFS used for discovery/pinning; tusd for resumable uploads. - Converter workers: Dockerized ffmpeg pipeline (convert_v3, convert_process) driven by background tasks. - Frontend: Vite + TypeScript client served via nginx container. - Decentralized overlay (in-process DHT): Membership, replication lease management, windowed content metrics. ```mermaid flowchart LR Client -- TWA/HTTP --> Frontend Frontend -- REST --> API[Backend API] API -- tus hooks --> tusd API -- SQL --> Postgres API -- IPC --> Workers[Converter Workers] API -- IPFS --> IPFS API -- DHT --> DHT[(In-Process DHT)] DHT -- CRDT Merge --> DHT ``` --- ## Decentralized Layer ### Identity & Versions - NodeID = blake3(Ed25519 public key), ContentID = blake3(encrypted_blob) - schema_version = v1 embedded into DHT keys/records. ### Membership - Signed `/api/v1/network.handshake` with Ed25519; includes: - Node info, capabilities, metrics, IPFS metadata. - reachability_receipts: (issuer, target, ASN, timestamp, signature). - State: LWW-Set for members + receipts, HyperLogLog for population estimate. - Island filtering: nodes with `reachability_ratio < q` are excluded (`k=5`, `q=0.6`, TTL=600s). - N_estimate: `max(valid N_local reports)` across sufficiently reachable peers. ### Replication & Leases - Compute prefix `p = max(0, round(log2(N_estimate / R_target)))` with `R_target ≥ 3`. - Responsible nodes: first `p` bits of NodeID equal first `p` bits of ContentID. - Leader = min NodeID among responsible. - Leader maintains `replica_leases` with TTL=600s and diversity: ≥3 IP first octets and ≥3 ASN if available. - Rendezvous ranking: blake3(ContentID || NodeID) for candidate selection. - Heartbeat interval 60s, miss threshold 3 → failover within ≤180s. ### Metrics (Windowed CRDT) - On view: PN-Counter for views; HyperLogLog for uniques (ViewID = blake3(ContentID || device_salt)); G-Counter for watch_time, bytes_out, completions. - Keys are windowed by hour; commutative merges ensure deterministic convergence. ```mermaid stateDiagram-v2 [*] --> Discover Discover: Handshake + receipts Discover --> Active: k ASN receipts & TTL ok Active --> Leader: Content prefix p elects min NodeID Leader --> Leased: Assign replica_leases (diversity) Leased --> Monitoring: Heartbeats every 60s Monitoring --> Reassign: Missed 3 intervals Reassign --> Leased ``` --- ## Upload & Conversion Pipeline 1) Client uploads via `tusd` (resumable). Backend receives hooks (`/api/v1/upload.tus-hook`). 2) Encrypted content is registered; converter workers derive preview/low/high (for media) or original (for binaries). 3) Derivative metadata stored in DB and surfaced via `/api/v1/content.view`. ```mermaid sequenceDiagram participant C as Client participant T as tusd participant B as Backend participant W as Workers participant DB as PostgreSQL C->>T: upload chunks T->>B: hooks (pre/post-finish) B->>DB: create content record B->>W: enqueue conversion W->>DB: store derivatives C->>B: GET /content.view B->>DB: resolve latest derivatives B-->>C: display_options + status ``` --- ## Content View & Purchase Flow - `/api/v1/content.view/` resolves content and derivatives: - For binary content without previews: present original only when licensed. - For audio/video: use preview/low for unauth; decrypted_low/high for licensed users. - Frontend shows processing state when derivatives are pending. - Purchase options (TON/Stars) remain in a single row (UI constraint). - Cover art layout: fixed square slot; image fits without stretching; background follows page color, not black. ```mermaid flowchart LR View[content.view] --> Resolve[Resolve encrypted/decrypted rows] Resolve --> Derivations{Derivatives ready?} Derivations -- No --> Status[processing/pending] Derivations -- Yes --> Options Options -- Binary + No License --> Original hidden Options -- Media + No License --> Preview/Low Options -- Licensed --> Decrypted Low/High or Original ``` --- ## Selected APIs - `GET /api/system.version` – liveness/protocol version. - `POST /api/v1/network.handshake` – signed membership exchange. - `GET /api/v1/content.view/` – resolves display options, status, and downloadability. - `GET /api/v1.5/storage/` – static file access. - `POST /api/v1/storage` – legacy upload endpoint. --- ## Data Keys & Schemas - MetaKey(content_id): tracks `replica_leases`, `leader`, `conflict_log`, `revision`. - MembershipKey(node_id): LWW-Set of members & receipts, HyperLogLog population, N_reports. - MetricKey(content_id, window_id): PN-/G-/HLL serialized state. All DHT records are signed and merged via deterministic CRDT strategies + LWW dominance (logical_counter, timestamp, node_id). --- ## Configuration & Defaults - Network: `NODE_PRIVACY`, `PUBLIC_HOST`, `HANDSHAKE_INTERVAL_SEC`, TLS verify, IPFS peering. - DHT: `DHT_MIN_RECEIPTS=5`, `DHT_MIN_REACHABILITY=0.6`, `DHT_MEMBERSHIP_TTL=600`, `DHT_REPLICATION_TARGET=3`, `DHT_LEASE_TTL=600`, `DHT_HEARTBEAT_INTERVAL=60`, `DHT_HEARTBEAT_MISS_THRESHOLD=3`, `DHT_MIN_ASN=3`, `DHT_MIN_IP_OCTETS=3`, `DHT_METRIC_WINDOW_SEC=3600`. - Conversion resources: `CONVERT_*` limits (CPU/mem), `MAX_CONTENT_SIZE_MB`. --- ## Observability & Metrics Prometheus (exported in-process): - dht_replication_under / dht_replication_over / dht_leader_changes_total - dht_merge_conflicts_total - dht_view_count_total / dht_unique_view_estimate / dht_watch_time_seconds Logs track replication conflict_log entries and HTTP structured errors (with session_id/error_id). --- ## Sequence Diagrams (Consolidated) ### Membership & N_estimate ```mermaid sequenceDiagram participant A as Node A participant B as Node B A->>B: POST /network.handshake {nonce, ts, signature} B->>B: verify ts, nonce, signature B->>B: upsert member; store receipts B-->>A: {node, known_public_nodes, n_estimate, signature} A->>A: merge; recompute N_estimate = max(N_local, peers) ``` ### Replication Leader Election ```mermaid sequenceDiagram participant L as Leader participant Peers as Responsible Nodes L->>L: compute p from N_estimate L->>Peers: rendezvous scores for ContentID L->>L: assign leases (diversity) Peers-->>L: heartbeat every 60s L->>L: reassign on 3 misses (≤180s) ``` ### Metrics Publication ```mermaid sequenceDiagram participant C as Client participant API as Backend participant M as MetricsAggregator participant D as DHT C->>API: GET content.view?watch_time&bytes_out API->>M: record_view(delta) M->>D: merge MetricKey(ContentID, Window) M->>API: update gauges ``` --- ## Run & Test ```bash # Spin services docker compose -f /home/configs/docker-compose.yml --env-file /home/configs/.env up -d --build # Backend unit tests (DHT integration) cd uploader-bot python3 -m unittest discover -s tests/dht ```