Collections in Rust - Hash Maps

In the last blog, we learned about collections in rust and String collection and its usage and how it helps us in saving data of variable length unlike arrays which need to know the data length during compile time. Today, we will learn about Hash Maps and how they help us in saving data collections.

Hash Maps

The hash map is the final of our common collections. A mapping of keys of type K to values of type V is stored in a hash map of type HashMap<K, V>. It accomplishes this by storing these keys and values in memory using a hashing technique. This type of data structure is supported by numerous computer languages, but they frequently go by different names, such as hash, map, object, hash table, or associative array, to mention a few.

When you wish to look up data without using an index, as you can with vectors, but rather by using a key that can be of any kind, hash maps can be handy. For instance, you could use a hash map in a game to keep track of each team's score, with each key representing a team name and each value representing the team's score. You can find a team's score by providing its name.

In this section, we'll go over the fundamentals of hash maps' API, however the standard library's functions for HashMap<K, V> actually provide a tonne of extra goodies. As always, more details are available in the standard library documentation. The hash map is the final of our common collections. A mapping of keys of type K to values of type V is stored in the type HashMap<K, V>. It accomplishes this by storing these keys and values in memory using a hashing technique. This type of data structure is supported by numerous computer languages, but they frequently go by different names, such as hash, map, object, hash table, or associative array, to mention a few.

When you wish to look up data without using an index, as you can with vectors, but rather by using a key that can be of any kind, hash maps can be handy. For instance, you could use a hash map in a game to keep track of each team's score, with each key representing a team name and each value representing the team's score. You can find a team's score by providing its name.

In this section, we'll go over the fundamentals of hash maps' API, however the standard library's functions for HashMap<K, V> actually provide a tonne of extra goodies. As always, more details are available in the standard library documentation.

Creating a New Hash Map

With new, an empty hash map can be created, and insert can be used to add elements. We are tracking the results of two teams named Blue and Yellow. The Yellow squad will start with 50 points, while the Blue team will have 10 points:

#![allow(unused_variables)]
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
}

Keep in mind that we must first utilize the HashMap from the standard library's collections section. This is the least used of our three common collections, thus it is not one of the features that the prelude automatically brings into scope. Hash maps have less support from the standard library as well; for instance, there isn't a built-in macro to create them.

Hash maps store its data on the heap similarly to vectors. This hash map's values are of type i32 and its keys are of type String. Similar to vectors, hash maps are homogeneous in that all the keys and values must be of the same type.

Utilizing the collect method on a vector of tuples, where each tuple consists of a key and its value, is another approach of creating a hash map. Data can be collected using the collect method into a variety of collection types, including HashMap. The zip technique can be used to construct a vector of tuples where "Blue" is paired with 10, for instance, if the team names and initial scores were in two separate vectors. Then, we can convert that vector of tuples into a HashMap using the collect method.

#![allow(unused_variables)]
fn main() {
use std::collections::HashMap;
let teams  = vec![String::from("Blue"), String::from("Yellow")];
let initial_scores = vec![10, 50];
let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect();
}

Due to the fact that it might be collected into a variety of different data structures, Rust requires the type annotation HashMap<_, _> in this case. Rust can determine the kinds that the hash map includes based on the types of the data in the vectors, therefore we use underscores for the type arguments for the key and value types instead.

Hash Maps and Ownership

Values are copied into the hash map for types like i32 that implement the Copy trait. The values will be relocated, and the hash map will become the owner of any owned values, such as Strings.

use std::collections::HashMap;
let field_name = String::from("Favorite color");
let field_value = String::from("Blue");
let mut map = HashMap::new();
map.insert(field_name, field_value);
// field_name and field_value are invalid at this point, try using them and
// see what compiler error you get!

The variables field_name and field_value, which were added to the hash map by the call to insert, cannot be used after that point.

Values won't be added to the hash map if references to such values are inserted there instead. The references must point to values that are at least as current as the hash map.

Accessing Values in a Hash Map

By giving the get method the hash map's key, we may obtain a value from it.

use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name);

Here, the Blue team's value for the score will determine the outcome, which will be some (&10). The reason the result is wrapped in Some is that get returns an Option<&V>; if the hash map doesn't include a value for that key, get will return None.

Similar to how we iterate over vectors, we can use a for loop to traverse over each key/value combination in a hash map:

use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
    println!("{}: {}", key, value);
}

This code will print each pair in an arbitrary order:

Yellow: 50
Blue: 10

Updating a Hash Map

Although the total number of keys and values is expandable, only one value can ever be assigned to a given key at once. We must choose how to handle the situation when a key already has a value assigned when we want to alter the data in a hash map. The old value could be fully disregarded in favour of the new value. If the key already has a value, we may maintain it and disregard the new value and just add the new value in that case. Alternately, we might blend the previous and current values. Let's examine how to carry out each of these.

Overwriting a Value

A hash map will replace the value associated with a key if we enter a key and a value before replacing the key with a new value. The hash map will only contain one key/value combination even if code calls insert twice, since we are inserting the value for the Blue team's key both times:

use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Blue"), 25);
println!("{:?}", scores);

This code will print {"Blue": 25}. The original value of 10 has been overwritten.

Only Insert If the Key Has No Value

It's usual practise to determine whether a specific key has a value and, if not, to add one. The key we wish to check is a parameter that is taken by a particular API for hash maps called entry. The entry function returns an enum named Entry, which represents a value that may or may not exist. Say that we want to determine whether the key for the Yellow team is related with a value. If not, we wish to insert the value 50 for the Red team and the Blue team, respectively.

use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);
println!("{:?}", scores);

If the key for the relevant Entry exists, the or insert method on Entry is defined to return the value for that key; otherwise, it inserts the parameter as the new value for that key and returns the changed Entry. This method plays better with the borrow checker and is considerably cleaner than developing the logic from scratch.

Running the code will print "Yellow": 50, "Blue": 10 in the output. Because the Yellow team doesn't already have a value, the first call to entry will insert the key for the Yellow team with the value 50. The Blue team already knows the number 10, thus the second call for entries won't alter the hash map.

Updating a Value Based on the Old Value

Looking up a key's value and updating it depending on the old value is another frequent application for hash maps. Below example displays code that counts the number of times each word appears in a given passage of text. To keep track of how many times a word has appeared, we create a hash map with the words as keys and increment the value. When encountering a word for the first time, we first insert the value 0:

use std::collections::HashMap;
let text = "hello world wonderful world";
let mut map = HashMap::new();
for word in text.split_whitespace() {
    let count = map.entry(word).or_insert(0);
    *count += 1;
}
println!("{:?}", map);

The output of this code is {"world": 2, "hello": 1, "wonderful": 1}. The value for this key is really returned by the or insert method as a mutable reference (&mut V). Here, that changeable reference is stored in the count variable; hence, before assigning to that value, count must first be dereferenced using the asterisk (*). All of these modifications are secure and permitted by the borrowing rules because the mutable reference exits the for loop at that point.

Hashing Function

HashMap by default employs a cryptographically safe hashing mechanism that can defend against DoS attacks. Although this is not the quickest hashing algorithm on the market, the trade-off for increased security is worthwhile. You can change to a different function by specifying a different hasher if you profile your code and discover that the built-in hash function is too sluggish for your needs. A type that implements the BuildHasher trait is said to be a hasher. Crates.io has libraries contributed by other Rust users that provide hashers implementing a number of popular hashing algorithms, so you don't necessarily need to construct your own hasher from scratch.

Summary

In programmes where you need to store, access, and modify data, vectors, strings, and hash maps will offer a significant amount of the functionality you require. Here are some exercises that you should be able to complete now:

  • Return the mean (average), median (the value that appears most frequently when the list is sorted), and mode (the value that most frequently appears) of the list given an array of integers.
  • String to pig latin conversion First becomes "irst-fay" by moving the first consonant of each syllable to the end of the word and adding "ay." Words that begin with a vowel instead have the suffix "hay" appended (for example, "apple" becomes "apple-hay"). Remember the specifics of UTF-8 encoding!
  • Create a text interface that allows a user to add employee names to a department in a business using a hash map and vectors. "Add Sally to Engineering," or "Add Amir to Sales," as examples. The user can then access an alphabetically sorted list of all the employees in a department or the entire company.

For these exercises, the methods that vectors, strings, and hash maps have are described in the standard library API documentation.

I sincerely hope that the majority of you find the approach covered here to be helpful. Thank you for reading, and please feel free to leave any comments or questions in the comments section below.

Post a Comment

0 Comments