Rust String vs &str

Why are there several kinds of Strings in Rust? Why not just one String type?

These queries have been asked in many ways online, and it is true that newcomers to Rust, particularly those who are accustomed to dynamic programming languages, may find the String and &str types bewildering. So let's discuss the two string types and try to eliminate some of the ambiguity.

It will ultimately make sense, because Rust actually does require both kinds. They are very dissimilar and have various applications.

String

String owns the memory for the string that it contains in storage.

Use String to return strings produced inside of functions and (often) to store strings in structs and enums.

You can supply a reference to a String if you have one in order to convert it to &str.

&str

The memory for another string (slice) is not owned by &str; it is simply a reference to it.

To accept string slices and make it apparent that the function won't change the string, prefer using &str in function arguments.

You can use to owned() or to string() to copy an existing &str into a new String (they are effectively the same - use whichever makes your code clearer to read and consistent). These will create a new String and copy the memory.

Memory and ownership

We must first briefly discuss how Rust handles memory. Although I'll make an effort to keep this short, it will be crucial when we explore the distinction between String and &str later on.

You may already be aware that Rust lacks a trash collector. In order to safely deallocate memory when something is no longer in use, the compiler must explicitly allocate and deallocate memory at certain points in the code. To do this, the compiler also requires a technique for recognizing when something is no longer in use.

Rust refers to this technique as "ownership" (and "borrowing"). Take into account this code:

fn main() {
    let owned_string = get_string();
    print_string(&owned_string);
}
fn get_string() -> String {
    let s = String::from("Hello, World!");
    // In Rust we return s by omitting the semicolon
    s
}
fn print_string(my_string: &str) {
    println!("{}", my_string);
}

The memory for a String is allocated on the heap in this case of get string(), and it is then assigned to the variable s. Now, we can claim that S "owns" that memory. Now, the Rust compiler will keep track of the scope of s so that it can determine when to deallocate it. If s were to "move" rather than "go out of scope," this would happen (which you can think of like a transfer of ownership). In the example above, s is in fact relocated, into the top-level variable owned string.

The memory is not copied once get string() returns, and it is not moved. Instead, the caller (in this case, the variable owned string) now has ownership of that memory.

In order to determine when to deallocate, the Rust compiler would now keep track of this new owner owned string. It would be released at the conclusion of the main() function in this scenario.

A String will never have more than one owner. Although the memory can be temporarily "borrowed" by others (see the line print string(&owned string);), it will only be deallocated when the owner exits the scope (unless it is moved, in which case it will be deallocated when the new owner goes out of scope, and so on).

Borrowing and references

So now let's look at the other function:

fn print_string(my_string: &str) {
    println!("{}", my_string);
}

This is a reference to a str because of the & sign (a string slice). It's simpler to just think of them as &str, or "a reference to a string slice," because you hardly ever use str without &.

Think about the String from earlier, which had memory that was owned and allocated. What if we wanted to send someone a reference to that string (or a portion of it) but didn't want them to be able to change it? The & denotes what is known as an "immutable borrow" in Rust. In other languages, it is similar to a "reference," but Rust goes a step further by allowing you to indicate whether or not its contents can be modified (changed).

Since the complete string was sent in anyhow in the example above, we could have substituted &String for &str in the function signature, and it would have still worked correctly. But that would prevent other calls from passing in a substring (at least without copying its memory into another String). A string slice &str is significantly more helpful in this situation.

&str is essentially a fixed-length pointer that points directly to the memory of the string. You'll see the line print string(&owned string); in the original code. The function requested &str, but we only provided a reference to a String. Since this simply returns a pointer to the same memory with a specified length, it is possible for the String type to transform automatically into the &str type, which is why it works. Additionally, &str is an immutable reference, making the memory it points to unchangeable.

In reality, Rust will guarantee that nobody can modify the memory it points to (even the owner), for as long as the & reference variable is in scope, ensuring that the &str is safe to use (even across threads!). Additionally, Rust ensures that the owned String (to which &str points) cannot exit scope while the reference is still in scope (otherwise the reference would be a dangling pointer). The borrow checker in Rust has these extremely potent capabilities that stop all kinds of horrible issues that you can run into with other languages when you generate a string here and send a reference to it somewhere else, but the reference unintentionally changes the original.

Difference between String and &str

String holds a string in memory and owns the memory for it.

&str is just a reference to another string, but it doesn't own the memory for it.

When to use either one

In structs and enums, use String wherever you want the struct or enum to own the contents. When returning a string from a function where the string was allocated within the function, use String as well (Rust will not let you return a reference in that case because the memory would be deallocated at the end of the function and the reference returned would be a dangling pointer)

If you don't expressly want to move a String into the function and relinquish possession of it, use &str instead of the function's parameters (this tends to be rare).

Conclusion

Use String when you need to control the memory, such as when you need to return a string that you constructed in a function.

If you need an immutable reference to memory that belongs to another String variable, use the &str keyword (or a string literal in your code).

I sincerely hope that the majority of you find the approach covered here to be helpful. Thank you for reading, and please feel free to leave any comments or questions in the comments section below.

Post a Comment

0 Comments