uploader-bot/docs/indexation.md

111 lines
4.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Indexation
### Stored content types
- `local/content_bin` binary content stored only locally (or indexer no found it on chain)
- `onchain/content` - content stored onchain
- `onchain/content_unknown` - content stored onchain, but we don't have a private key to decrypt it
Content item may have multiple types, for example, `local/content_bin` and `onchain/content`.
But `content cover`, `content metadata` and `decrypted content` always stored locally.
### Content Ownership Proof NFT Values Cell Deserialization
```text
values:^[
content_hash:uint256
metadata:^[
offchain?:int1 = always 1
https://my-public-node-1.projscale.dev/*:bytes
]
content:^[
content_cid:^Cell = b58encoded CID
cover_cid:^Cell = b58encoded CID
metadata_cid:^Cell = b58encoded CID
]
]
```
### Available content statuses
- `UPLOAD_TO_BTFS` content is stored locally, upload all content parts to BTFS. This status means that payment is received yet.
### Upload content flow
1. User uploads content to server (/api/v1/storage)
2. User uploads content cover to server (/api/v1/storage)
3. User send /api/v1/blockchain.sendNewContentMessage to server and accept the transaction in wallet
4. Indexer receives the transaction and indexes the content. And send telegram notification to user.
# Network Index & Sync (v3)
This document describes the simplified, productionready stack for content discovery and sync:
- Upload via tus → stream encrypt (ENCF v1, AES256GCM, 1 MiB chunks) → `ipfs add --cid-version=1 --raw-leaves --chunker=size-1048576 --pin`.
- Public index exposes only encrypted sources (CID) and safe metadata; no plaintext ids.
- Nodes fullsync by pinning encrypted CIDs; keys are autogranted to trusted peers for preview/full access.
## ENCF v1 (Encrypted Content Format)
Unencrypted header and framed body; same bytes on all nodes ⇒ stable CID.
Header (all big endian):
```
MAGIC(4): 'ENCF'
VER(1): 0x01
SCHEME(1): 0x03 = AES_GCM (0x01 AES_GCM_SIV legacy, 0x02 AES_SIV legacy)
CHUNK(4): plaintext chunk bytes (1048576)
SALT_LEN(1)
SALT(N)
RESERVED(5): zeros
```
Body: repeated frames `[p_len:4][cipher][tag(16)]` where `p_len <= CHUNK` for last frame.
AESGCM (scheme `0x03`) encrypts each frame with deterministic `nonce = HMAC_SHA256(salt, u64(frame_idx))[:12]`. Legacy scheme `0x01` keeps AESGCMSIV with the same nonce derivation.
For new uploads (v2025-09), the pipeline defaults to AES256GCM. Legacy AESGCMSIV/AESSIV content is still readable — the decoder auto-detects the scheme byte.
### Local encryption/decryption helpers
```
python -m app.core.crypto.cli encrypt --input demo.wav --output demo.encf \
--key AAAAEyHSVws5O8JGrg3kUSVtk5dQSc5x5e7jh0S2WGE= --salt-bytes 16
python -m app.core.crypto.cli decrypt --input demo.encf --output demo.wav \
--wrapped-key <ContentKey.key_ciphertext_b64>
```
Because we use standard AESGCM, you can also re-hydrate frames manually with tools like `openssl aes-256-gcm`. The header exposes `chunk_bytes` and salt; derive the per-frame nonce via `HMAC_SHA256(salt, idx)` where `idx` is the frame number (0-based) and feed the 12-byte prefix as IV.
## API
- `GET /api/v1/content.index``{ items:[...], schema, ETag }` with signed items.
- `GET /api/v1/content.delta?since=ISO8601``{ items:[...], next_since, schema }` with ETag.
- `POST /api/v1/sync.pin` (NodeSig required) → queue/pin CID.
- `POST /api/v1/keys.request` (NodeSig required) → sealed DEK for trusted peers.
- `GET /api/v1/content.derivatives?cid=` → local ready derivatives (low/high/preview).
## NodeSig
Canonical string:
```
METHOD\nPATH\nSHA256(body)\nTS\nNONCE\nNODE_ID
```
Headers: `X-Node-Id`, `X-Node-Ts`, `X-Node-Nonce`, `X-Node-Sig`.
Window ±120s, nonce cache ~10min; replay → 401.
## Sync daemon
- Jitter 030s per peer; uses ETag/`since`.
- Disk watermark (`SYNC_DISK_LOW_WATERMARK_PCT`) stops pin burst.
- Pinned concurrently (`SYNC_MAX_CONCURRENT_PINS`) with pre`findprovs` `swarm/connect`.
## Keys policy
`KEY_AUTO_GRANT_TRUSTED_ONLY=1` — only KnownNode.meta.role=='trusted' gets DEK automatically. Preview lease TTL via `KEY_GRANT_PREVIEW_TTL_SEC`.