Collections in Rust - Vectors

The most popular data structures used in general purpose programming are efficiently implemented in the standard collection library of the Rust computer language. Two libraries should be able to communicate with each other using the standard implementations without undergoing a considerable amount of data conversion.

I'll get this out of the way: just use a hash table or a Vec. The majority of use cases for general data processing and storage are covered by these two categories. They are incredibly skilled at what they do. All of the other collections in the standard library have particular uses for which they are the best option, but these uses are comparably borderline specialised. Vec and HashMap are certainly a decent enough starting point even when they aren't exactly ideal.

Rust's collections can be grouped into four major categories:

  • Sequences: Vec, VecDeque, LinkedList
  • Maps: HashMap, BTreeMap
  • Sets: HashSet, BTreeSet
  • Misc: BinaryHeap

When Should You Use Which Collection?

These are fairly brief and high-level breakdowns of when each collection should be taken into account. On their own documentation sites, individual collections' strengths and drawbacks are covered in great detail.

Use a Vec when:

  • You don't care about any characteristics of the actual data being stored; you just want to gather things up to be processed or sent somewhere else later.
  • You only intend to add to (or very close to) the end of a sequence of components in a specific order.
  • Furthermore, you desire a stack.
  • A resizable array is what you need.
  • A heap-allocated array is what you need.

Use a VecDeque when:

  • A Vec that permits effective insertion at both ends of the sequence is what you need.
  • You desire a Queue.
  • You desire a double-ended queue(deque).

Use a LinkedList when:

  • You don't want to accept amortization and need a Vec or VecDeque of indeterminate size.
  • Lists should be efficiently split and appended.
  • You are positive that a doubly linked list is what you really, really want.

Use a HashMap when:

  • You wish to link any number of keys to any number of values.
  • You desire a cache.
  • You only need a map and nothing else.

Use a BTreeMap when:

  • A key-sorted map is what you need.
  • You need to be able to quickly access a variety of entries.
  • You want to know which key-value pair is the smallest or largest.
  • Finding the biggest or smallest key that is smaller or bigger than something is your goal.

Use the Set variant of any of these Maps when:

  • Just keep track of the keys you've seen.
  • Your keys don't have any significant worth.
  • You simply desire a set.

Use a BinaryHeap when:

  • Even though there are many elements you want to store, you only ever want to process the "biggest" or "most important" one at once.
  • You desire a priority queue.

Vectors

We'll start by examining the VecT> collection type, usually referred to as a vector. With vectors, we may store several values in a single data structure that arranges them all side by side in memory. Only the same type of value can be stored in a vector. They can in handy when you have a list of things, such the prices of the products in your shopping basket or the text lines in a file.

Creating a New Vector

As demonstrated below, we can use the Vec::new function to generate a fresh, empty vector.

let v: Vec<i32> = Vec::new();

There is a type annotation here, as you can see. Rust has no idea what kind of elements we want to store because we aren't adding any values to this vector. This is a crucial idea. Generics are used to implement vectors. Be aware that the standard library's Vec<T> type can hold any type, and that the type that a particular vector holds is specified within angle brackets.

Rust frequently infers the kind of value we wish to store once we insert values, therefore you hardly ever need to apply this type annotation in more realistic code. Since creating a Vec<T> with initial values is more frequent, Rust offers the vec! macro for convenience. The macro will build a new vector with the values we supply it with. Here, a new Vec<i32> is created and its values are 1, 2, and 3:

let v = vec![1, 2, 3];

Rust may infer that the type of v is Vec<i32> because we've provided initial values of type i32, therefore the type annotation isn't required. After that, we'll examine vector modification.

Updating a Vector

The push method can be used to create a vector and then add elements to it:

let mut v = Vec::new();
v.push(5);
v.push(6);
v.push(7);
v.push(8);

As with any variable, as discussed in Chapter 3, if we want to be able to change its value, we need to make it mutable using the mut keyword. The numbers we place inside are all of type i32, and Rust infers this from the data, so we don’t need the Vec<i32> annotation.

Dropping a Vector Drops Its Elements

A vector will be freed when it exits scope, just like any other struct:

{
    let v = vec![1, 2, 3, 4];
    // do stuff with v
} // <- v goes out of scope and is freed here

The integers it contains will be cleaned up when the vector is dropped because all of its contents will also be dropped. This may seem like a simple idea, but it can become a little more involved once references to the vector's elements are introduced. Next, let's deal with that!

Reading Elements of Vectors

Knowing how to read a vector's contents is a good next step once you have mastered the creation, updating, and destruction of vectors. A value kept in a vector can be referred to in two different ways. For more clarity, we've annotated the types of the values that these functions return in the examples.

Below are some examples of how to get a value in a vector using the get method or indexing syntax:

let v = vec![1, 2, 3, 4, 5];
let third: &i32 = &v[2];
let third: Option<&i32> = v.get(2);

Two points should be noted. First, as vectors are indexed by number, starting at zero, we utilise the index value of 2 to obtain the third element. Second, there are two distinct ways to obtain the third element: either by passing the index as an input to the get method, which returns an Option<&T>, or by using & and [], which yields a reference.

You can pick how the programme responds when you try to use an index value for which the vector lacks an element because Rust provides two ways to access an element. Let's look at what happens if a programme tries to access an element at index 100 in a vector that has five elements as an example:

let v = vec![1, 2, 3, 4, 5];
let does_not_exist = &v[100];
let does_not_exist = v.get(100);

The first [] method in this code will throw an error when you run it because it refers to an element that doesn't exist. When you want your programme to treat attempts to access an element past the end of the vector as a fatal error that causes the programme to crash, this way works well.

When an index that is outside the vector is supplied to the get method, it returns None without throwing a fit. If accessing an element beyond the vector's range infrequently occurs under typical conditions, you would use this method. Your code will then include logic to handle having either Some(&element) or None. The index, for instance, can originate from a person entering a number. You might inform the user of the amount of items in the current vector and give them another opportunity to enter a correct value if they unintentionally enter a value that is too large and the application receives a None value. This would be more user-friendly than having a typo cause the software to crash!

Invalid References

The borrow checker ensures that any references to the contents of the vector that the programme has that are still valid by enforcing the ownership and borrowing restrictions. Recall the prohibition on having references to both mutable and immutable objects in the same scope. This restriction when we attempt to add an element to the end of a vector while holding an immutable reference to the first element.

let mut v = vec![1, 2, 3, 4, 5];
let first = &v[0];
v.push(6);

Compiling this code will result in this error:

error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
 -->
  |
4 |     let first = &v[0];
  |                  - immutable borrow occurs here
5 |
6 |     v.push(6);
  |     ^ mutable borrow occurs here
7 |
8 | }
  | - immutable borrow ends here

Why should a reference to the first element be concerned with what alters at the vector's end? This issue occurs because of how vectors operate. If there isn't enough space to arrange all the elements next to one another where the vector was, adding a new element to the end of the vector may necessitate allocating new memory and moving the existing elements to the new space. The first element's reference would thus be referring to deallocated memory in that scenario. Programs are prevented from coming to that situation by the borrowing rules.

Iterating Over the Values in a Vector

Instead of using indexes to access each element individually, we can iterate through all of the elements in a vector if we wish to access each one in turn. In a vector of i32 values, how to obtain immutable references to each element and print them out using a for loop:

let v = vec![100, 32, 57];
for i in &v {
    println!("{}", i);
}

A mutable vector's elements can all be changed by iterating over mutable references to each one of them. 50 will be added to each element by the for loop:

let mut v = vec![100, 32, 57];
for i in &mut v {
    *i += 50;
}

Before using the += operator, we must use the dereference operator (*) to access the value in I in order to change the value that the mutable reference refers to.

Using an Enum to Store Multiple Types

Vectors can only store values of the same type, as we said at the start of this chapter. There are situations where it would be useful to store a list of many types of things, but this can be problematic. Since an enum's variations are defined under the same enum type, we can define and utilise an enum whenever we need to hold elements of a different type in a vector.

Let's take an example where we need to retrieve values from a row in a spreadsheet that has integers, floating-point numbers, and strings in some of the columns. The multiple value types can be represented by distinct enum variations, all of which will be regarded as belonging to the same type: the enum. After that, we can create a vector that contains that enum and ultimately contains various types.

enum SpreadsheetCell {
    Int(i32),
    Float(f64),
    Text(String),
}
let row = vec![
    SpreadsheetCell::Int(3),
    SpreadsheetCell::Text(String::from("blue")),
    SpreadsheetCell::Float(10.12),
];

Rust needs to know the types that will be in the vector at compile time so that it can calculate how much memory will be required on the heap to hold each element. The fact that we can explicitly state which kinds are permitted in this vector is a further benefit. There would be a probability that one or more of the types might result in errors when operations were carried out on the elements of the vector if Rust permitted a vector to hold any type. By combining an enum with a match expression, Rust will make sure that we always handle every scenario at compilation time.

In this blog, we learned about vectors in rust and how to use them, next we will learn about Hash Maps.

I sincerely hope that the majority of you find the approach covered here to be helpful. Thank you for reading, and please feel free to leave any comments or questions in the comments section below.

Post a Comment

0 Comments