7.4 KiB
7.4 KiB
System Architecture Overview
This document is the single source of truth for the platform’s architecture, protocols, data flows, and operational details. It supersedes previous scattered docs.
Contents
- Components & Topology
- Decentralized Layer (Membership, Replication, Metrics)
- Upload/Conversion Pipeline
- Content View & Purchase Flow
- API Surface (selected endpoints)
- Data Keys & Schemas
- Configuration & Defaults
- Observability & Metrics
- Sequence Diagrams (Mermaid)
Components & Topology
- Backend API: Sanic-based service (Telegram bots embedded) with PostgreSQL (SQLAlchemy + Alembic).
- Storage: Local FS for uploaded/derived data; IPFS used for discovery/pinning; tusd for resumable uploads.
- Converter workers: Dockerized ffmpeg pipeline (convert_v3, convert_process) driven by background tasks.
- Frontend: Vite + TypeScript client served via nginx container.
- Decentralized overlay (in-process DHT): Membership, replication lease management, windowed content metrics.
flowchart LR
Client -- TWA/HTTP --> Frontend
Frontend -- REST --> API[Backend API]
API -- tus hooks --> tusd
API -- SQL --> Postgres
API -- IPC --> Workers[Converter Workers]
API -- IPFS --> IPFS
API -- DHT --> DHT[(In-Process DHT)]
DHT -- CRDT Merge --> DHT
Decentralized Layer
Identity & Versions
- NodeID = blake3(Ed25519 public key), ContentID = blake3(encrypted_blob)
- schema_version = v1 embedded into DHT keys/records.
Membership
- Signed
/api/v1/network.handshakewith Ed25519; includes:- Node info, capabilities, metrics, IPFS metadata.
- reachability_receipts: (issuer, target, ASN, timestamp, signature).
- State: LWW-Set for members + receipts, HyperLogLog for population estimate.
- Island filtering: nodes with
reachability_ratio < qare excluded (k=5,q=0.6, TTL=600s). - N_estimate:
max(valid N_local reports)across sufficiently reachable peers.
Replication & Leases
- Compute prefix
p = max(0, round(log2(N_estimate / R_target)))withR_target ≥ 3. - Responsible nodes: first
pbits of NodeID equal firstpbits of ContentID. - Leader = min NodeID among responsible.
- Leader maintains
replica_leaseswith TTL=600s and diversity: ≥3 IP first octets and ≥3 ASN if available. - Rendezvous ranking: blake3(ContentID || NodeID) for candidate selection.
- Heartbeat interval 60s, miss threshold 3 → failover within ≤180s.
Metrics (Windowed CRDT)
- On view: PN-Counter for views; HyperLogLog for uniques (ViewID = blake3(ContentID || device_salt)); G-Counter for watch_time, bytes_out, completions.
- Keys are windowed by hour; commutative merges ensure deterministic convergence.
stateDiagram-v2
[*] --> Discover
Discover: Handshake + receipts
Discover --> Active: k ASN receipts & TTL ok
Active --> Leader: Content prefix p elects min NodeID
Leader --> Leased: Assign replica_leases (diversity)
Leased --> Monitoring: Heartbeats every 60s
Monitoring --> Reassign: Missed 3 intervals
Reassign --> Leased
Upload & Conversion Pipeline
- Client uploads via
tusd(resumable). Backend receives hooks (/api/v1/upload.tus-hook). - Encrypted content is registered; converter workers derive preview/low/high (for media) or original (for binaries).
- Derivative metadata stored in DB and surfaced via
/api/v1/content.view.
sequenceDiagram
participant C as Client
participant T as tusd
participant B as Backend
participant W as Workers
participant DB as PostgreSQL
C->>T: upload chunks
T->>B: hooks (pre/post-finish)
B->>DB: create content record
B->>W: enqueue conversion
W->>DB: store derivatives
C->>B: GET /content.view
B->>DB: resolve latest derivatives
B-->>C: display_options + status
Content View & Purchase Flow
/api/v1/content.view/<content_address>resolves content and derivatives:- For binary content without previews: present original only when licensed.
- For audio/video: use preview/low for unauth; decrypted_low/high for licensed users.
- Frontend shows processing state when derivatives are pending.
- Purchase options (TON/Stars) remain in a single row (UI constraint).
- Cover art layout: fixed square slot; image fits without stretching; background follows page color, not black.
flowchart LR
View[content.view] --> Resolve[Resolve encrypted/decrypted rows]
Resolve --> Derivations{Derivatives ready?}
Derivations -- No --> Status[processing/pending]
Derivations -- Yes --> Options
Options -- Binary + No License --> Original hidden
Options -- Media + No License --> Preview/Low
Options -- Licensed --> Decrypted Low/High or Original
Selected APIs
GET /api/system.version– liveness/protocol version.POST /api/v1/network.handshake– signed membership exchange.GET /api/v1/content.view/<content_address>– resolves display options, status, and downloadability.GET /api/v1.5/storage/<file_hash>– static file access.POST /api/v1/storage– legacy upload endpoint.
Data Keys & Schemas
- MetaKey(content_id): tracks
replica_leases,leader,conflict_log,revision. - MembershipKey(node_id): LWW-Set of members & receipts, HyperLogLog population, N_reports.
- MetricKey(content_id, window_id): PN-/G-/HLL serialized state.
All DHT records are signed and merged via deterministic CRDT strategies + LWW dominance (logical_counter, timestamp, node_id).
Configuration & Defaults
- Network:
NODE_PRIVACY,PUBLIC_HOST,HANDSHAKE_INTERVAL_SEC, TLS verify, IPFS peering. - DHT:
DHT_MIN_RECEIPTS=5,DHT_MIN_REACHABILITY=0.6,DHT_MEMBERSHIP_TTL=600,DHT_REPLICATION_TARGET=3,DHT_LEASE_TTL=600,DHT_HEARTBEAT_INTERVAL=60,DHT_HEARTBEAT_MISS_THRESHOLD=3,DHT_MIN_ASN=3,DHT_MIN_IP_OCTETS=3,DHT_METRIC_WINDOW_SEC=3600. - Conversion resources:
CONVERT_*limits (CPU/mem),MAX_CONTENT_SIZE_MB.
Observability & Metrics
Prometheus (exported in-process):
- dht_replication_under / dht_replication_over / dht_leader_changes_total
- dht_merge_conflicts_total
- dht_view_count_total / dht_unique_view_estimate / dht_watch_time_seconds
Logs track replication conflict_log entries and HTTP structured errors (with session_id/error_id).
Sequence Diagrams (Consolidated)
Membership & N_estimate
sequenceDiagram
participant A as Node A
participant B as Node B
A->>B: POST /network.handshake {nonce, ts, signature}
B->>B: verify ts, nonce, signature
B->>B: upsert member; store receipts
B-->>A: {node, known_public_nodes, n_estimate, signature}
A->>A: merge; recompute N_estimate = max(N_local, peers)
Replication Leader Election
sequenceDiagram
participant L as Leader
participant Peers as Responsible Nodes
L->>L: compute p from N_estimate
L->>Peers: rendezvous scores for ContentID
L->>L: assign leases (diversity)
Peers-->>L: heartbeat every 60s
L->>L: reassign on 3 misses (≤180s)
Metrics Publication
sequenceDiagram
participant C as Client
participant API as Backend
participant M as MetricsAggregator
participant D as DHT
C->>API: GET content.view?watch_time&bytes_out
API->>M: record_view(delta)
M->>D: merge MetricKey(ContentID, Window)
M->>API: update gauges
Run & Test
# Spin services
docker compose -f /home/configs/docker-compose.yml --env-file /home/configs/.env up -d --build
# Backend unit tests (DHT integration)
cd uploader-bot
python3 -m unittest discover -s tests/dht