uploader-bot/docs/indexation.md

4.1 KiB
Raw Blame History

Indexation

Stored content types

  • local/content_bin binary content stored only locally (or indexer no found it on chain)
  • onchain/content - content stored onchain
  • onchain/content_unknown - content stored onchain, but we don't have a private key to decrypt it

Content item may have multiple types, for example, local/content_bin and onchain/content.

But content cover, content metadata and decrypted content always stored locally.

Content Ownership Proof NFT Values Cell Deserialization

values:^[
    content_hash:uint256
    metadata:^[
        offchain?:int1 = always 1
        https://my-public-node-1.projscale.dev/*:bytes
    ]
    content:^[
        content_cid:^Cell = b58encoded CID
        cover_cid:^Cell = b58encoded CID
        metadata_cid:^Cell = b58encoded CID
    ]
]

Available content statuses

  • UPLOAD_TO_BTFS content is stored locally, upload all content parts to BTFS. This status means that payment is received yet.

Upload content flow

  1. User uploads content to server (/api/v1/storage)
  2. User uploads content cover to server (/api/v1/storage)
  3. User send /api/v1/blockchain.sendNewContentMessage to server and accept the transaction in wallet
  4. Indexer receives the transaction and indexes the content. And send telegram notification to user.

Network Index & Sync (v3)

This document describes the simplified, productionready stack for content discovery and sync:

  • Upload via tus → stream encrypt (ENCF v1, AES256GCM, 1 MiB chunks) → ipfs add --cid-version=1 --raw-leaves --chunker=size-1048576 --pin.
  • Public index exposes only encrypted sources (CID) and safe metadata; no plaintext ids.
  • Nodes fullsync by pinning encrypted CIDs; keys are autogranted to trusted peers for preview/full access.

ENCF v1 (Encrypted Content Format)

Unencrypted header and framed body; same bytes on all nodes ⇒ stable CID.

Header (all big endian):

MAGIC(4):  'ENCF'
VER(1):    0x01
SCHEME(1): 0x03 = AES_GCM (0x01 AES_GCM_SIV legacy, 0x02 AES_SIV legacy)
CHUNK(4):  plaintext chunk bytes (1048576)
SALT_LEN(1)
SALT(N)
RESERVED(5): zeros

Body: repeated frames [p_len:4][cipher][tag(16)] where p_len <= CHUNK for last frame.

AESGCM (scheme 0x03) encrypts each frame with deterministic nonce = HMAC_SHA256(salt, u64(frame_idx))[:12]. Legacy scheme 0x01 keeps AESGCMSIV with the same nonce derivation.

For new uploads (v2025-09), the pipeline defaults to AES256GCM. Legacy AESGCMSIV/AESSIV content is still readable — the decoder auto-detects the scheme byte.

Local encryption/decryption helpers

python -m app.core.crypto.cli encrypt --input demo.wav --output demo.encf \
  --key AAAAEyHSVws5O8JGrg3kUSVtk5dQSc5x5e7jh0S2WGE= --salt-bytes 16

python -m app.core.crypto.cli decrypt --input demo.encf --output demo.wav \
  --wrapped-key <ContentKey.key_ciphertext_b64>

Because we use standard AESGCM, you can also re-hydrate frames manually with tools like openssl aes-256-gcm. The header exposes chunk_bytes and salt; derive the per-frame nonce via HMAC_SHA256(salt, idx) where idx is the frame number (0-based) and feed the 12-byte prefix as IV.

API

  • GET /api/v1/content.index{ items:[...], schema, ETag } with signed items.
  • GET /api/v1/content.delta?since=ISO8601{ items:[...], next_since, schema } with ETag.
  • POST /api/v1/sync.pin (NodeSig required) → queue/pin CID.
  • POST /api/v1/keys.request (NodeSig required) → sealed DEK for trusted peers.
  • GET /api/v1/content.derivatives?cid= → local ready derivatives (low/high/preview).

NodeSig

Canonical string:

METHOD\nPATH\nSHA256(body)\nTS\nNONCE\nNODE_ID

Headers: X-Node-Id, X-Node-Ts, X-Node-Nonce, X-Node-Sig. Window ±120s, nonce cache ~10min; replay → 401.

Sync daemon

  • Jitter 030s per peer; uses ETag/since.
  • Disk watermark (SYNC_DISK_LOW_WATERMARK_PCT) stops pin burst.
  • Pinned concurrently (SYNC_MAX_CONCURRENT_PINS) with prefindprovs swarm/connect.

Keys policy

KEY_AUTO_GRANT_TRUSTED_ONLY=1 — only KnownNode.meta.role=='trusted' gets DEK automatically. Preview lease TTL via KEY_GRANT_PREVIEW_TTL_SEC.