PliantDb User's Guide

PliantDb is an ACID-compliant, document-database written in Rust. It takes heavy inspiration from CouchDB, but makes no efforts to be compatible with CouchDB. Its goal is to be a general-purpose database that aims to simplify development and deployment by providing reliable building blocks that are lightweight enough for hobby projects running with minimal resources, but scalable for when your hobby project becomes a deployed product.

This user's guide aims to provide a guided walkthrough for users to understand how PliantDb works. This guide is meant to be supplemental to the documentation. If you learn best by exploring examples, many are available in pliantdb/examples. If, however, you learn best by taking a guided tour of how something works, this guide is specifically for you.

If you have any feedback on this guide, please file an issue, and we will try to address any issues or shortcomings.

Thank you for exploring PliantDb.

Concepts

This is a list of common concepts that will be used throughout this book as well as the documentation.

Document

A Document is a single piece of stored data. Each document is stored within a Collection, and has a unique ID within that Collection. A Document also contains a revision ID as well as a digest matching the current contents of the document.

When a Document is updated, PliantDb will check that the revision information passed matches the currently stored information. If not, a conflict error will be returned. This simple check ensures that if two writers try to update the document simultaneously, one will succeed and the other will receive an error.

PliantDb provides APIs for storing serde-compatible data structures using the CBOR format. CBOR provides larger data type compatibility than JSON, and is a more efficent format. It also provides a bit more resilliance for parsing structures that have changed than some other encoding formats, but care still needs to be taken when updating structures that represent already-stored data.

If you would prefer to manually manage the data stored inside of a Document, you can directly manage the contents field. PliantDb will not interact with the contents of a Document. Only code that you write will parse or update the stored data.

Collection

A Collection is a group of Documents and associated functionality. The goal of a Collection is to encapsulate the logic for a set of data in such a way that Collections could be designed to be shared and reused in multiple Schemas or applications.

Each Collection must have a unique CollectionId.

A Collection can contain one or more Views.

View

A View is a map/reduce-powered method of quickly accessing information inside of a Collection. A View can only belong to one Collection.

Views define two important associated types: a Key type and a Value type. You can think of these as the equivalent entries in a map/dictionary-like collection that supports more than one entry for each Key. The Key is used to filter the View's results, and the Value is used by your application or the reduce() function.

Views are a powerful, yet abstract concept. Let's look at a concrete example: blog posts with categories.

#[derive(Serialize, Deserialize, Debug)]
pub struct BlogPost {
    pub title: String,
    pub body: String,
    pub category: Option<String>,
}

While category should be an enum, let's first explore using String and upgrade to an enum at the end (it requires one additional step). Let's implement a View that will allow users to find blog posts by their category as well as count the number of posts in each category.

pub trait BlogPostsByCategory {
    type Collection = BlogPost;
    type Key = Option<String>;
    type Value = u32;

    fn map(&self, document: &Document<'_>) -> MapResult<Self::Key, Self::Value> {
        let post = document.contents::<BlogPost>()?;
        Ok(Some(document.emit_key_and_value(post.category.clone(), 1)))
    }

    fn reduce(
        &self,
        mappings: &[MappedValue<Self::Key, Self::Value>],
        _rereduce: bool,
    ) -> Result<Self::Value, Error> {
        Ok(mappings.iter().map(|mapping| mapping.value).sum())
    }
}

Map

The first line of the map function calls Document::contents() to deserialize the stored BlogPost. The second line returns an emitted Key and Value -- in our case a clone of the post's category and the value 1_u32. With the map function, we're able to use query() and query_with_docs():

    let rust_posts = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(String::from("Rust")))
        .query_with_docs().await?;

The above queries the Database for all documents in the BlogPost Collection that emitted a Key of Some("Rust").

Reduce

The second function to learn about is the reduce() function. It is responsible for turning an array of Key/Value pairs into a single Value. In some cases, PliantDb might need to call reduce() with values that have already been reduced one time. If this is the case, rereduce is set to true.

In this example, we're using the built-in Iterator::sum() function to turn our Value of 1_u32 into a single u32 representing the total number of documents.

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(String::from("Rust")))
        .reduce().await?;

Understanding Re-reduce

Let's examine this data set:

Document IDBlogPost Category
1Some("Rust")
2Some("Rust")
3Some("Cooking")
4None

When updating views, each view entry is reduced and the value is cached. These are the view entries:

View Entry IDReduced Value
Some("Rust")2
Some("Cooking")1
None1

When a reduce query is issued for a single key, the value can be returned without further processing. But, if the reduce query matches multiple keys, the View's reduce() function will be called with the already reduced values with rereduce set to true. For example, retrieving the total count of blog posts:

    let total_post_count = db
        .view::<BlogPostsByCategory>()
        .reduce().await?;

Once PliantDb has gathered each of the key's reduced values, it needs to further reduce that list into a single value. To accomplish this, the View's reduce() function to be invoked with rereduce set to true, and with mappings containing:

KeyValue
Some("Rust")2
Some("Cooking")1
None1

This produces a final value of 4.

How does PliantDb make this efficient?

When saving Documents, PliantDb does not immediately update related views. It instead notes what documents have been updated since the last time the View was indexed.

When a View is accessed, the queries include an AccessPolicy. If you aren't overriding it, UpdateBefore is used. This means that when the query is evaluated, PliantDb will first check if the index is out of date due to any updated data. If it is, it will update the View before evaluating the query.

If you're wanting to get results quickly and are willing to accept data that might not be updated, the access policies UpdateAfter and NoUpdate can be used depending on your needs.

If multiple simulataneous queries are being evaluted for the same View and the View is outdated, PliantDb ensures that only a single view indexer will execute while both queries wait for it to complete.

Using arbitrary types as a View Key

In our previous example, we used String for the Key type. The reason is important: Keys must be sortable by our underlying storage engine, which means special care must be taken. Most serialization types do not guarantee binary sort order. Instead, PliantDb exposes the Key trait. On that documentation page, you can see that PliantDb implements Key for many built-in types.

Using an enum as a View Key

The easiest way to expose an enum is to derive num_traits::FromPrimitive and num_traits::ToPrimitive using num-derive, and add an impl EnumKey line:

#[derive(Serialize, Deserialize, Debug, num_derive::FromPrimitive, num_derive::ToPrimitive)]
pub enum Category {
    Rust,
    Cooking,
}

impl EnumKey for Category {}

The View code remains unchanged, although the associated Key type can now be set to Option<Category>. The queries can now use the enum instead of a String:

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(Category::Rust))
        .reduce().await?;

PliantDb will convert the enum to a u64 and use that value as the Key. A u64 was chosen to ensure fairly wide compatibility even with some extreme usages of bitmasks. If you wish to customize this behavior, you can implement Key directly.

Implementing the Key trait

The Key trait declares two functions: as_big_endian_bytes() and from_big_endian_bytes. The intention is to convert the type to bytes using a network byte order for numerical types, and for non-numerical types, the bytes need to be stored in binary-sortable order.

Here is how PliantDb implements Key for EnumKey:

impl<T> Key for T
where
    T: EnumKey,
{
    fn as_big_endian_bytes(&self) -> anyhow::Result<Cow<'_, [u8]>> {
        self.to_u64()
            .ok_or_else(|| anyhow::anyhow!("Primitive::to_u64() returned None"))?
            .as_big_endian_bytes()
            .map(|bytes| Cow::Owned(bytes.to_vec()))
    }

    fn from_big_endian_bytes(bytes: &[u8]) -> anyhow::Result<Self> {
        let primitive = u64::from_big_endian_bytes(bytes)?;
        Self::from_u64(primitive)
            .ok_or_else(|| anyhow::anyhow!("Primitive::from_u64() returned None"))
    }
}

By implementing Key you can take full control of converting your view keys.

Schema

A Schema is a group of one or more Collections. A Schema can be instantiated as a Database. The Schema describes how a set of data behaves, and a Database is a set of data on-disk.

Database

A Database is a set of stored data. Each Database is described by a Schema. Unlike the other concepts, this concept corresponds to multiple types:

All of these types implement the Connection trait.

Server

A Server oversees one or more Schemas and named Databases. Over time, this concept will be extended to have support for other features including users and permissions.

There are two ways to initialize a PliantDb server:

  • Storage: A local, file-based server implementation with no networking capabilities.
  • Server: A networked server implementation, written using Storage. This server supports QUIC- and WebSocket-based protocols. The QUIC protocol is preferred, but it uses UDP which many load balancers don't support. If you're exposing PliantDb behind a load balancer, WebSockets may be the only option depending on your host's capabilities.

Client

A Client is used to access a Server over a network connection.

PubSub

The Publish/Subscribe pattern enables developers to design systems that produce and receive messages. It is implemented for PliantDb through the PubSub and Subscriber traits.

A common example of what PubSub enables is implementing a simple chat system. Each chat participant can subscribe to messages on the chat topic, and when any participant publishes a chat message, all subscribers will receive a copy of that message.

A working example of PubSub is available at pliantdb/examples/pubsub.rs.

Use cases of PliantDb

Single database model (No networking)

This use case is most similar to utilizing SQLite for your database. In this mode, PliantDb directly interacts with files on your disk to provide your database. Unlike other file-based databases, however, it's easy to migrate to any of these scenarios from this starting position:

graph LR
  code{{Rust Code}}
  local[(pliantdb-local::Database)]
  code <--> local

A working example of how to use a local database can be found at pliantdb/examples/basic-local.rs.

Multi-database model (No networking)

This model is most similar to using multiple SQLite databases. In this mode, you interact with a Storage that you spawn within your code.

graph LR
  code{{Rust Code}}
  local[(pliantdb-local::Storage)]
  code <--> server
  server <--> local

If you look at the source behind Database::open_local, you'll see that the single-database model is using Storage under the hood.

Server model (QUIC or WebSockets)

This model is most similar to using other document databases, like CouchDB or MongoDB. In this mode, you interact with a Client that connects via either QUIC or WebSockets with a server. From the server code's perspective, this model is the same as the multi-database model, except that the server is listening for and responding to network traffic.

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[pliantdb-client]]
  server[[pliantdb-server]]
  local[(pliantdb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server

A working example of this model can be found at pliantdb/examples/server.rs. When writing client/server applications that utilize PliantDb, you can have the PliantDb server running withing your server application. This means that your server still has the ability not use networking to interact with PliantDb. Regardless of if you run any other server code, your PliantDb server will be accessible through a Client over the network.

Coming Later: API Platform model (QUIC or WebSockets)

If you're finding yourself developing an API for your application, and all of the consumers of this API are already connected to PliantDb, you may want to take advantage of the platform feature. This is not implemented yet, but the vision is that by implementing a few callbacks to handle and respond to your own serde-compatible request type, you can implement a custom API that can be used directly from clients. And, by taking advantage of the permissions model that will be developed, you can even expose this API over the internet safely.

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[pliantdb-client]]
  server[[pliantdb-server]]
  platform[[pliantdb-platform]]
  local[(pliantdb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server
  server-code <--> platform
  platform <--> server

Coming Later: Cluster model

When you're at the stage of scaling beyond a single server, you will be able to upgrade your server to a cluster using the hypothetical pliantdb-cluster crate. The clustering model is still being designed, but the goal is something similar to:

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[pliantdb-client]]
  server1[[server 1]]
  server2[[server 2]]
  server3[[server 3]]
  cluster[[pliantdb-cluster]]
  client-code <--> client
  client <-. network .-> cluster
  server-code <--> cluster
  cluster <--> server1
  cluster <--> server2
  cluster <--> server3
  server1 <--> server2
  server2 <--> server3
  server1 <--> server3

In this model, the local storage element is hidden; Each server has its own storage. This model is very similar from the viewpoint of your server and client code -- the primary difference is that the server-side connection is being established using the cluster crate. From the client's perspective, the cluster behaves as a single entity -- sending a request to any server node will result in the same result within the cluster.

All features of PliantDb will be designed to work in cluster mode seamlessly. PubSub will ensure that subscribers will receive messages regardless of which server they're connected to.

Overview

PliantDb aims to offer the majority of its functionality in local operation. The networked server adds some functionality on top of the local version, but its main function is to add the ability to use networking to talk to the database.

Because of this model, it makes it easy to transition a local database to a networked database server. Start with whatever model fits your needs today, and when your neeeds change, PliantDb will adapt.

When to use the Local Integration

  • You're going to databases from one process at a time. PliantDb is designed for concurrency and can scale with the capabilities of the hardware. However, the underlying storage layer that PliantDb is built upon, sled, does not support multiple processes writing its data simultaneously. If you need to access the database from multiple processes, the server integration is what you should use. While it doesn't offer IPC communication today, a pull-request would be accepted to that added that functionality (along with the corresponding unit tests).
  • You have no public API/PubSub/access needs or have implemented those with another stack.

When to use the Server Integration

  • You need to access databases from more than one process or machine.
  • You are OK with downtime due to loss of service when the single server is offline. If you need to have a highly-available database, you should use the Cluster Integration (Coming Soon).
  • Your database load can be met with a single machine. If you have enough load that you need to share the processing power of multiple servers, you should use the Cluster Integration (Coming Soon)

Coming Soon: When to use the Cluster Integration

  • You need to access databases from more than one machine.
  • You need a highly-available setup.
  • You need/want to split load between multiple machines.

Integrating PliantDb Locally

PliantDb supports multiple databases and multiple schemas. However, for many applications, you only need a single database.

If you're only wanting a single database, the setup is straightforward: (from pliantdb/examples/basic-local.rs)

let db = Database::<Message>::open_local(
    "basic.pliantdb", 
    &Configuration::default()
).await?;

Under the hood, PliantDb is creating a multi-database Storage with a local Database named default for you. If you need to switch to a multi-database model, you can open the storage and access the default database: (adapted from pliantdb/examples/basic-local.rs)

let storage = Storage::open_local(
    "basic.pliantdb",
    &Configuration::default()
).await?;
storage.register_schema::<Message>().await?;
let db = storage.database::<Message>("default").await?;

You can register multiple schemas so that databases can be purpose-built.

Common Traits

To help your code transition between different modes of accessing PliantDb, you can use these common traits to make your methods accept any style of PliantDb access.

For example, pliantdb/examples/basic-local.rs uses this helper method to insert a record:

async fn insert_a_message<C: Connection>(
    connection: &C,
    value: &str,
) -> anyhow::Result<()> {
    connection
        .collection::<Message>()
        .push(&Message {
            contents: String::from(value),
            timestamp: SystemTime::now(),
        })
        .await?;
    Ok(())
}

Integrating the networked PliantDb Server

To access PliantDb over the network, you're going to be writing two pieces of code: the server code and the client code.

Your PliantDb Server

The first step is to create a Server, which uses local Storage under the hood. This means that if you're already using PliantDb in local mode, you can swap your usage of Storage with Server in your server code without running your database through any tools. Here's the setup code from pliantdb/examples/server.rs

    let server = Server::open(
        Path::new("server-data.pliantdb"),
        Configuration::default(),
    )
    .await?;
    if server.certificate().await.is_err() {
        server
            .install_self_signed_certificate("example-server", true)
            .await?;
    }
    let certificate = server.certificate().await?;
    server.register_schema::<Shape>().await?;
    match server.create_database::<Shape>("my-database").await {
        Ok(()) => {}
        Err(Error::DatabaseNameAlreadyTaken(_)) => {}
        Err(err) => panic!(
            "Unexpected error from server during create_database: {:?}",
            err
        ),
    }

Once you have a server initialized, calling listen_on will begin listening for connections on the port specified. This uses the preferred native protocol which uses UDP. If you find that UDP is not working for your setup or want to put PliantDb behind a load balancer that doesn't support UDP, you can enable WebSocket support and call listen_for_websockets_on.

You can call both, but since these functions don't return until the server is shut down, you should spawn them instead:

let task_server = server.clone();
tokio::spawn(async move {
    task_server.listen_on(5645).await
});
let server = server.clone();
tokio::spawn(async move {
    task_server.listen_for_websockets_on("localhost:8080").await
});

If you're not running any of your own code on the server, and you're only using one listening method, you can just await the listen method of your choice in your server's main.

From the Client

The Client can support both the native protocol and WebSockets. It determines which protocol to use based on the scheme in the URL:

  • pliantdb://host:port will connect using the native PliantDb protocol.
  • ws://host:port will connect using WebSockets.

Here's how to connect, from pliantdb/examples/server.rs:

Client::new(
    Url::parse("pliantdb://localhost:5645")?,
    Some(certificate),
)
.await?

This is using a pinned certificate to connect. Other methods are supported, but better certificate management is coming soon.

Common Traits

Integrating into a PliantDb Cluster

Coming Soon.

The goals of this feature are to make clustering simple. We hope to provide an experience that allows someone who is operating a networked server to desire two types of clusters:

One-leader mode

When setting up a cluster initially, you will begin with one-leader mode. In this mode, you can add as many nodes to the cluster as you wish, but only one node will be processing all of the data updates. All nodes can handle requests, but requests that can't be served locally will be forwarded to the leader. This allows for the use of read-replicas to alleviate load in some read-heavy situations.

Another benefit of this mode are that it supports a two-node configuration. If you're scaling your app and need a reliable backup for quicker disaster recovery, you can operate a read replica and manually failover when the situation arises.

If you decide to allow automatic failover in this mode, there is a chance for data loss, as the leader does not wait for read-replicas to synchronize data. Any transactions that committed and were not synchronized before the outage occurred would not be on the other servers. Thus, this mode is not intended for high-availability configurations, although some users may elect to use it in such a configuration knowing these limitations.

Quorum mode

Once you have a cluster with at least 3 nodes, you can switch the cluster into quorum mode. For any given N nodes, all requests must reach an agreed response by N / 2 + 1 members. For example, in a cluster of 3 nodes, there must be 2 successful responses before a client can receive a response to its request.

In quorum mode, your data is divided into shards and those shards replicated throughout the cluster onto at least 3 nodes (configurable). Initially, with just 3 nodes available, the only benefits are having a highly-available cluster with no data loss during when a single node goes down.

As you add more nodes to your cluster, however, you can re-balance your databases to move shards. The author of PliantDb did not enjoy this process in CouchDB when he had to do it and aims to make these tools easy and effortless to use. Ideally, there would be a low-maintenance mode that would allow the cluster to re-shard itself authomatically during allowed maintenance periods, ensuring data is distributed more evenly amongst the cluster.

Additional long-term dreams of quorum mode include the ability to customize node selection criteria on a per-database basis. The practical use of node selection is to ensure that at least 3 unique nodes are picked for each shard. However, allowing custom logic to evaluate which nodes should be selected for any database would allow ultimate flexibility. For example, if you have a globally deployed application, and you have some data that is geographically specific, you could locate each region's database on nodes within those locations' data centers.

When?

Clustering is an important part of the design of Cosmic Verge. As such, it is a priority for us to work on. But, the overall game is a very large project, so we hesitate to make any promises on timelines.

Connection

Coming soon.

This is an async trait, which unfortunately yields messy documentation.

ServerConnection

Coming soon.

This is an async trait, which unfortunately yields messy documentation.

PubSub Traits

Coming soon.

These are async traits, which unfortunately yield messy documentation: PubSub and Subscriber.

Key-Value Store

Coming soon.

This is an async trait, which unfortunately yields messy documentation.