Skip to content

Bitmap in Rust

Posted on:January 12, 2025 at 07:00 PM

Table of contents

Open Table of contents

Introduction

In the last blog post, we took a look at database indices. Bitmaps are one of the many data structures that can be used to power such an index.

Time to implement a bitmap ourselves. I use Rust to do this, though any other programming language could be used. The full code can be found at https://github.com/christianschleifer/bitmaps-in-rust.

Requirements

Be it in a hobby project like this, or a project at work, it’s often a good idea to state the requirements for the solution one is building first thing. We’ll focus on the functional requirements, which define the behavior our bitmap should support.

Our bitmap should allow

We can also use Rust’s type system to express these requirements. Let’s focus on the first two requirements first.

// src/lib.rs

pub trait Bitmap {
    fn set(&mut self, index: u32);

    fn get(&self, index: u32) -> bool;
}

SimpleBitmap

The Bitmap trait just specifies the functionality that a bitmap must offer. Now let’s implement the Bitmap trait. We’ll start with a trivial implementation. This will not be a production-read bitmap. There is a myriad of things that could be improved upon. To explain the concepts, it can be better to keep things simple, though.

Firstly, we need a way to store an array of bits. Rust does not provide a bit-sized data type. While we could use the 8-bit byte data type, I’ll opt for the 32-bit u32 data type (this will make comparison with the more optimized bitmap of the next blog post a bit easier). Instead of a fixed-size array, I’ll use the dynamically sized Vec to hold the elements of type u32. This way, we can grow our bitmap when needed.

// src/lib.rs

pub use simple_bitmap::SimpleBitmap;
mod simple_bitmap;

// src/simple_bitmap.rs

pub struct SimpleBitmap {
    bits: Vec<u32>,
}

Creating the Bitmap

We don’t want to expose the bits field to users of our bitmap. Still, they need a way to construct an instance of the bitmap. That’s why we need a publicly exposed constructor method, conventionally named new.

// src/simple_bitmap.rs

impl SimpleBitmap {
    pub fn new() -> Self {
        Self { bits: Vec::new() }
    }
}

Setting Bits

Time to implement setting a bit. Let’s start with writing a test making use of the to-be-implemented set method. We’ll extend this test later when we implement getting a bit.

// src/simple_bitmap.rs

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_sets_bits() {
        // given
        let mut bm = SimpleBitmap::new();

        // when
        bm.set(31);
        bm.set(32);

        // then
        // todo: check validity once get is implemented
    }
}

What do we have to do when our bitmap is called with bm.set(31) or bm.set(32)? We first have to check in which u32 we should set the bit. Each u32 holds 32 bits. So the first u32 in our bits vector can store values for indices 0 up to and including 31. The second u32 can store values for indices 32 up to and including 63 and so on. Bear in mind that we also have to check whether we allocated enough u32s to access the u32 at the given index.

For the provided index 31, we have to access the u32 at index 0 in our bits vector. For the provided index 32, it’s the u32 at index 1. We can calculate this by dividing the provided index by 32, i.e. the number of bits that fit into a u32, using integer division.

// src/simple_bitmap.rs

#[test]
fn get_bits_vector_index() {
    assert_eq!(31 / 32, 0);
    assert_eq!(32 / 32, 1);
}

Now that we figured out which u32 to access in our bits vector, we need to think about which bit of the u32 we need to modify. This, we can do by either using the modulo operation, or by performing a bitwise and operation with the number 31_u32 (see rust operators).

// src/simple_bitmap.rs

#[test]
#[allow(clippy::identity_op)]
fn get_bit_index_in_u32_using_modulo() {
    assert_eq!(31 % 32, 31);
    assert_eq!(32 % 32, 0);
}

#[test]
fn get_bit_index_in_u32_using_bitwise_and() {
    assert_eq!(31 & 31, 31);
    assert_eq!(32 & 31, 0);
}

Another mental model for what we are doing here is to think of splitting the provided index into bits [0..27], i.e. the most significant 27 bits, and [27..32], i.e. the least significant 5 bits (explanation of most and least significant bits). The 27 MSBs represent the index in the bits vector. The 5 LSBs represent the bit position in the u32.

Note that I’m using the least-significant-bit-last notation here, which means that leftmost bit is the most significant bit, while the rightmost bit is the least significant bit.

# 31_u32

# |                u32                |
# | 000000000000000000000000000 11111 |
# | --------- 27-msb ---------- 5-lsb |
#               |                 |
#               V                 |
#     index 0 in `bits` vector    |
#                                 V
#                             31st bit
#                             position

# 32_u32

# |                u32                |
# | 000000000000000000000000001 00000 |
# | --------- 27-msb ---------- 5-lsb |
#               |                 |
#               V                 |
#     index 1 in `bits` vector    |
#                                 V
#                             0th bit
#                             position

Accordingly, the bits vector of our SimpleBitmap should look like this after having performed both bm.set(31) and bm.set(32).

# |     index 0 in `bits` vector     |     index 1 in `bits` vector    |
# |               u32                |                u32              |
# | 10000000000000000000000000000000 | 0000000000000000000000000000001 |
#   |                                                                |
#   V                                                                V
#  31st bit                                                       0th bit
#  position                                                       position

Time to put our thinking into code:

// src/simple_bitmap.rs

impl Bitmap for SimpleBitmap {
    fn set(&mut self, index: u32) {
        let u32_index_in_bits_vec = (index / 32) as usize;
        let bit_index_in_u32 = index & 0b11111;

        let stored_u32 = self.bits[u32_index_in_bits_vec];

        let modified_u32 = stored_u32 | (0b1 << bit_index_in_u32);

        self.bits[u32_index_in_bits_vec] = modified_u32;
    }

    // ...
}

I hope it’s possible to follow the code after having walked through how we must choose the bit we want to set. Let’s take a closer look at let modified_u32 = stored_u32 | (0b1 << bit_index_in_u32); though. What’s happening here?

(0b1 << bit_index_in_u32) is creating a bitmask, where only the bit at position bit_index_in_u32 is set to 1. Doing the bitwise or operation with the stored_u32 preservers all bits that are set in stored_u32, and additionally sets the bit at position bit_index_in_u32 to 1.

Our test should pass now:

cargo test it_sets_bits
   Compiling bitmaps v0.1.0 (/Users/christian/codebase/bitmaps)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running unittests src/lib.rs (target/debug/deps/bitmaps-40c0e78a209450ab)

running 1 test
test simple_bitmap::tests::it_sets_bits ... FAILED

failures:

---- simple_bitmap::tests::it_sets_bits stdout ----
thread 'simple_bitmap::tests::it_sets_bits' panicked at src/simple_bitmap.rs:26:35:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    simple_bitmap::tests::it_sets_bits

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.00s

error: test failed, to rerun pass `--lib`

Oops! index out of bounds: the len is 0 but the index is 0.

We forgot to check whether there is already enough u32 integers in the bits vector when indexing into the vector using the calculated u32_index_in_bits_vec. We need to add a check, and extend the vector with zeros if there is not enough u32s yet.

// src/simple_bitmap.rs

impl Bitmap for SimpleBitmap {
    fn set(&mut self, index: u32) {
        let u32_index_in_bits_vec = (index / 32) as usize;
        let bit_index_in_u32 = index & 0b11111;

        // if there is too little u32s in the bits vec, it has to be extended
        if u32_index_in_bits_vec >= self.bits.len() {
            self.bits.resize(u32_index_in_bits_vec + 1, 0);
        }

        let stored_u32 = self.bits[u32_index_in_bits_vec];

        let modified_u32 = stored_u32 | (0b1 << bit_index_in_u32);

        self.bits[u32_index_in_bits_vec] = modified_u32;
    }

    // ...
}

Now, our test passes:

cargo test it_sets_bits
   Compiling bitmaps v0.1.0 (/Users/christian/codebase/bitmaps)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running unittests src/lib.rs (target/debug/deps/bitmaps-40c0e78a209450ab)

running 1 test
test simple_bitmap::tests::it_sets_bits ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.00s

Getting Bits

There is little sense in storing information without a way to retrieve it - we need a get method.

We’ll start by renaming and extending our existing it_sets_bits test.

// src/simple_bitmap.rs

#[test]
fn it_sets_and_gets_bits() {
    // given
    let mut bm = SimpleBitmap::new();

    // when
    bm.set(31);
    bm.set(32);

    // then
    assert!(!bm.get(0));
    assert!(bm.get(31));
    assert!(bm.get(32));
}

After having implemented set, get is a comparatively easy task.

// src/simple_bitmap.rs

impl Bitmap for SimpleBitmap {
    // ...

    fn get(&self, index: u32) -> bool {
        let u32_index_in_bits_vec = (index / 32) as usize;

        if let Some(bucket) = self.bits.get(u32_index_in_bits_vec) {
            let bit_index_in_u32 = index & 0b11111;

            ((bucket >> bit_index_in_u32) & 0b1) == 1
        } else {
            false
        }
    }

    // ...
}

And with this, our modified test also passes.

cargo test it_sets_and_gets_bits
   Compiling bitmaps v0.1.0 (/Users/christian/codebase/bitmaps)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.21s
     Running unittests src/lib.rs (target/debug/deps/bitmaps-40c0e78a209450ab)

running 1 test
test simple_bitmap::tests::it_sets_and_gets_bits ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

Bit Unions

Remember the Islay distillery example from the last blog post?

We built indices for the region column of our distilleries table:

// Islay       --> [1, 0, 0, 0, 1, 1, 0, 1, 1, 0]
// Speyside    --> [0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
// Highlands   --> [0, 0, 1, 1, 0, 0, 0, 0, 0, 1]

Building unions for these indices, we can implement queries like

SELECT * FROM distilleries
WHERE region = 'Speyside' OR region = 'Highlands';

Let’s use our SimpleBitmap to do this, thereby also meeting the third requirement that we stated in the beginning.

// src/simple_bitmap.rs

#[test]
fn it_builds_bit_unions() {
    // given
    // Speyside    --> [0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
    let mut speyside_bm = SimpleBitmap::new();
    speyside_bm.set(1);
    speyside_bm.set(6);

    // Highlands   --> [0, 0, 1, 1, 0, 0, 0, 0, 0, 1]
    let mut highlands_bm = SimpleBitmap::new();
    highlands_bm.set(2);
    highlands_bm.set(3);
    highlands_bm.set(9);

    // when
    let speyside_or_highlands = speyside_bm | highlands_bm;

    // then
    // Union       --> [0, 1, 1, 1, 0, 0, 1, 0, 0, 1]
    assert!(!speyside_or_highlands.get(0));
    assert!(speyside_or_highlands.get(1));
    assert!(speyside_or_highlands.get(2));
    assert!(speyside_or_highlands.get(3));
    assert!(!speyside_or_highlands.get(4));
    assert!(!speyside_or_highlands.get(5));
    assert!(speyside_or_highlands.get(6));
    assert!(!speyside_or_highlands.get(7));
    assert!(!speyside_or_highlands.get(8));
    assert!(speyside_or_highlands.get(9));
    assert!(!speyside_or_highlands.get(10));
}

When we try to compile this, we get the following compiler error:

error[E0369]: no implementation for `simple_bitmap::SimpleBitmap | simple_bitmap::SimpleBitmap`
   --> src/simple_bitmap.rs:133:49
    |
133 |         let speyside_or_highlands = speyside_bm | highlands_bm;
    |                                     ----------- ^ ------------ simple_bitmap::SimpleBitmap
    |                                     |
    |                                     simple_bitmap::SimpleBitmap
    |
note: an implementation of `BitOr` might be missing for `simple_bitmap::SimpleBitmap`

The error message is very helpful and points us to the exact problem. As a solution, we could either merely implement BitOr for SimpleBitmap, or additionally add a supertrait for our Bitmap trait. Let’s do the latter.

// src/lib.rs

pub trait Bitmap: Sized + BitOr {
    fn set(&mut self, index: u32);

    fn get(&self, index: u32) -> bool;
}

I’ll skip the reason why we have to add the Sized trait as a supertrait. If you’re interested, you can read the reason in the language reference.

Let’s implement the BitOr trait for SimpleBitmap.

// src/simple_bitmap.rs

impl BitOr for SimpleBitmap {
    type Output = SimpleBitmap;

    fn bitor(self, rhs: Self) -> Self::Output {
        // allocate enough capacity for the larger of both vecs
        let mut union = Vec::with_capacity(cmp::max(self.bits.len(), rhs.bits.len()));

        let mut left_iter = self.bits.iter();
        let mut right_iter = rhs.bits.iter();

        // iterate over both iterators and perform the bitwise or operation as long as both iters yield u32s and
        // add the result to the union vector
        for (left, right) in iter::zip(&mut left_iter, &mut right_iter) {
            union.push(left | right);
        }

        // if there is u32s remaining in left, add the u32s to the union vector
        for left in left_iter {
            union.push(*left);
        }

        // if there is u32s remaining in right, add the u32s to the union vector
        for right in right_iter {
            union.push(*right);
        }

        SimpleBitmap { bits: union }
    }
}

And with this, our test passes:

cargo test it_builds_bit_unions
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.08s
     Running unittests src/lib.rs (target/debug/deps/bitmaps-40c0e78a209450ab)

running 1 test
test simple_bitmap::tests::it_builds_bit_unions ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.00s

Summary

In this blog post, we have built a straightforward implementation of a bitmap, which allows us to set and get bits. It also provides the functionality to do unions between bitmaps. While our bitmap is functionally correct, it can be fairly inefficient in certain cases. Can you think of some already?

We’ll take a detailed look in the next blog post of this series, and implement a different version of a bitmap which mitigates some of the problems of our SimpleBitmap.