Preface
This is not a Web3 project. It doesn't use blockchain and it's not based on monetary transactions. There are no cryptocurrencies involved.
This document hasn't been fully translated as of now. You can check out the original docs here.
Why another network?
Centralization of basic services
The current WWW landscape is centralized. Most people use the same search engine (Google Search), the same video sharing platform (YouTube), the same social network (Facebook) etc.. This approach has its issues:
- its optimized for project commercialization, not for promoting small and independent creative endeavours
- freedom of speech is being progressively limited
- users are subjected to invasive profiling and tracking in order to maximize ad revenue at the cost of user privacy
What's the main cause?
This centralized state is caused by many factors. The most prominent ones stem primarily from core design decisions of the World Wide Web.
Designed by programmers, for programmers
In an age where even basic filesystem structure knowledge has been abstracted away by mobile operating systems one cannot assume that the end user is able to set up a dedicated HTTP server on their own, much less acquire an open port and a domain name.
Designed for mainframes
The WWW was designed in a time when a mainframe server was necessary to do any kind of non-trivial computations while the end users' terminals were delegated to only downloading and previewing content. It was not designed with edge computing in mind.
Designed without mobile internet in mind
Today everyone is always connected, and the primary way of accessing the internet are mobile phones.
Designed for static content delivery
Dynamic content is a CGI-based hack on top of existing old HTTP infrastructure, even though it forms the basis of current web interaction.
The solution
We propose a new protocol for distributed communication. It inverts most of the current WWW-based applications' assumptions. In particular:
- it does away with client-server architecture in favor of distributed P2P
- data is distributed on end-user devices, not on centralized servers
- all user traffic is end-to-end encrypted by default
- users are the content providers, not servers
- users store sensitive data on their own devices, not on external servers
- the network allows dynamic content delivery without the use of external CGI tools
- UI data processing is done on end user devices, not on centralized servers.
While the WWW created primitives for static content delivery the proposed network architecture creates primitives for dynamic content delivery. This is achived though a system of object and tags, which can be effectively added to the network thanks to a hybrid Kademlia-Gossipsub overlay network.
Design principles
The protocol described in this document is based on a set of well-defined design principles. If a problem had multiple solutions the one that was chosen followed those principles best.
- Whenever possible complicated behaviors aren't specified in the protocol directly, but are an emergent property of other behaviors
- Keep It Simple, Stupid - the protocol should provide the bare minimum set of primitives required to build other, more complicated behaviors by the application. Every implemented primitive's usage is maximized
- Occam's Razor - every primitive should work in a straightforward manner. Everything that's not necessary to understand or implement it needs to be cut.
- Local processing is preferred over sending remote requests (e.g. local garbage collecting vs sending periodic network cleanup requests)
- The load needs to be balanced evenly between end users (e.g. no superpeers)
- Increasing anonymity can't come at the cost of network performance. Perfect anonymity is useless if no one will use the service because it's too slow.
- Loss of anonymity is caused by the user's actions, not by the system.
- User authentication is based on public-key cryptography and there is no centralized authority for creating new identities. Every connection is based on a zero-trust model by default.
Layer overview
Session layer
This layer describes low-level primitives for transferring data over streams. Data transport is encrypted based on long-term public keys gathered both through Kademlia search and out-of-band exchange.
Peer discovery
This layer implements structured network topology that allows for efficient search and sending requests to other peers. It also decides who stores what.
Object transfer
This layer introduces the notion of an extensible object and describes the set of mandatory base objects. It's responsible for data fragmentation and the immediate transfer mode, along with other things. It also defines an extensible RPC interface.
Application modules
A set of objects necessary to build apps based on the previous layer. It allows defining its own subprotocols.
Peer Discovery
The Peer Discovery layer is responsible for finding peers storing requested content. This is done by creating a new virtual network topology (i.e. an overlay network). This layer is both performance and reliability critical as it provides a basis for the rest of the distributed system.
One also needs to define what exactly is the result of a peer discovery search. A naive solution based on a simple Kademlia search will return nodes whose ID is the most similar to the requested one. This is not enough for the purposes of this project, as described in the Motivation section.
Therefore, it is assumed that a search can return any peer if it knows the path to the destination.
Description
Kademlia
The peer discovery layer is defined by hierarchical groups with access control. Every group is defined by its own Kademlia network with the following additional abstractions:
- Route caching that allows for early termination of future Kademlia requests
- Generalized search target - users don't search for a single best-fit host that's tasked with managing specific content - they search for any host from the content's management group instead.
Groups
By default the system is divided into global, continent and country-level groups. Furthermore, the users can create their own private groups that belong to some other public group. This implies that users can't create another global-level group.
A public group is defined as a private group whose signing key is publicly known. Both types of groups are managed through the same system. Every group contains its own Kademlia network with extensions, as previously stated. Users are assigned to correct groups based on their IP address.
Route caching
Base Kademlia requests return the user closest to a given content identifier. This user serves as the primary gateway to the content management group. If the content management group is composed of multiple nodes the gateway returns the ID and address of a random node from this group.
Users are required to cache their requests into routing shortcut tables. If a future Kademlia request passes through them they not only return their closest peers, but also all their known cached members of the management group.
Management groups
Management groups serve content with a hash that's the same as the group's identifier. They also store objects related to that hash (e.g. Pins as defined by the upper layer). Users that belong to this group are responsible for distributed sharding. Every group member knows something about the hash, and they synchronize that knowledge through the Gossipsub protocol.
Everyone can join a management grup. A member can ask other nodes from the High Availability group for help if it detects an increased load on the content.
External nodes can help with content delivery by becoming seeders, i.e. by notifying other members about their readiness to provide content with the same hash. It is not required for them to also store related content (e.g. Pins). Seeders aren't part of the distributed sharding, so they also aren't stored in any routing shortcut tables.
High availability nodes
Inactive users automatically join the High Availability group. All users present there signal their readiness to help with other requests.
API - Kademlia groups
Base types
- A BLAKE3 hash that's used as both a content identifier and its respective content management group identifier.
#![allow(unused)] fn main() { type Hash = [u8; 32]; }
- User identifier
#![allow(unused)] fn main() { type UserId = Hash; }
- User/Kademlia group identifier
#![allow(unused)] fn main() { type GroupId = Hash }
- User downloaded content (a byte buffer)
#![allow(unused)] fn main() { type Content = Vec<u8> }
Group structures
- Basic node descriptor - its identifier and address (IP and port)
#![allow(unused)] fn main() { struct Node { id: NodeId, addr: SocketAddr } }
- Group owners can give out signed group access certificates to other users. If the signing key becomes public the whole group will also turn public.
#![allow(unused)] fn main() { struct GroupAccessToken { binding: GroupAccessBinding, group_owner_signature: Signature } }
- User access is limited to a given UTC timestamp.
#![allow(unused)] fn main() { type UnixTimestamp = u64; struct GroupAccessBinding { group: GroupId, recipient: UserId, revocation_date: UnixTimestamp } }
- User group validity certificate - the group definition signed by its owner.
#![allow(unused)] fn main() { struct UserGroup { definition: GroupDefinition, signature: Signature } }
- Hierarchical group definition - its description along with the owner's access certificate to the group higher in the hierarchy. The hierarchy finishes with the top-level global group. It's a root node for other sub-groups. Other global groups can't be created.
#![allow(unused)] fn main() { struct GroupDefinition { id: GroupId, owner: UserId, parent: GroupId, parent_membership_proof: GroupAccessToken } }