uploader-bot/docs/indexation.md

98 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Indexation
### Stored content types
- `local/content_bin` binary content stored only locally (or indexer no found it on chain)
- `onchain/content` - content stored onchain
- `onchain/content_unknown` - content stored onchain, but we don't have a private key to decrypt it
Content item may have multiple types, for example, `local/content_bin` and `onchain/content`.
But `content cover`, `content metadata` and `decrypted content` always stored locally.
### Content Ownership Proof NFT Values Cell Deserialization
```text
values:^[
content_hash:uint256
metadata:^[
offchain?:int1 = always 1
https://my-public-node-1.projscale.dev/*:bytes
]
content:^[
content_cid:^Cell = b58encoded CID
cover_cid:^Cell = b58encoded CID
metadata_cid:^Cell = b58encoded CID
]
]
```
### Available content statuses
- `UPLOAD_TO_BTFS` content is stored locally, upload all content parts to BTFS. This status means that payment is received yet.
### Upload content flow
1. User uploads content to server (/api/v1/storage)
2. User uploads content cover to server (/api/v1/storage)
3. User send /api/v1/blockchain.sendNewContentMessage to server and accept the transaction in wallet
4. Indexer receives the transaction and indexes the content. And send telegram notification to user.
# Network Index & Sync (v3)
This document describes the simplified, productionready stack for content discovery and sync:
- Upload via tus → stream encrypt (ENCF v1, AESGCMSIV, 1 MiB chunks) → `ipfs add --cid-version=1 --raw-leaves --chunker=size-1048576 --pin`.
- Public index exposes only encrypted sources (CID) and safe metadata; no plaintext ids.
- Nodes fullsync by pinning encrypted CIDs; keys are autogranted to trusted peers for preview/full access.
## ENCF v1 (Encrypted Content Format)
Unencrypted header and framed body; same bytes on all nodes ⇒ stable CID.
Header (all big endian):
```
MAGIC(4): 'ENCF'
VER(1): 0x01
SCHEME(1): 0x01 = AES_GCM_SIV (0x02 AES_SIV legacy)
CHUNK(4): plaintext chunk bytes (1048576)
SALT_LEN(1)
SALT(N)
RESERVED(5): zeros
```
Body: repeated frames `[p_len:4][cipher][tag(16)]` where `p_len <= CHUNK` for last frame.
AESGCMSIV per frame, deterministic `nonce = HMAC_SHA256(salt, u64(frame_idx))[:12]`, AAD unused.
## API
- `GET /api/v1/content.index``{ items:[...], schema, ETag }` with signed items.
- `GET /api/v1/content.delta?since=ISO8601``{ items:[...], next_since, schema }` with ETag.
- `POST /api/v1/sync.pin` (NodeSig required) → queue/pin CID.
- `POST /api/v1/keys.request` (NodeSig required) → sealed DEK for trusted peers.
- `GET /api/v1/content.derivatives?cid=` → local ready derivatives (low/high/preview).
## NodeSig
Canonical string:
```
METHOD\nPATH\nSHA256(body)\nTS\nNONCE\nNODE_ID
```
Headers: `X-Node-Id`, `X-Node-Ts`, `X-Node-Nonce`, `X-Node-Sig`.
Window ±120s, nonce cache ~10min; replay → 401.
## Sync daemon
- Jitter 030s per peer; uses ETag/`since`.
- Disk watermark (`SYNC_DISK_LOW_WATERMARK_PCT`) stops pin burst.
- Pinned concurrently (`SYNC_MAX_CONCURRENT_PINS`) with pre`findprovs` `swarm/connect`.
## Keys policy
`KEY_AUTO_GRANT_TRUSTED_ONLY=1` — only KnownNode.meta.role=='trusted' gets DEK automatically. Preview lease TTL via `KEY_GRANT_PREVIEW_TTL_SEC`.