1
1
% Strings
2
2
3
- Strings are an important concept for any programmer to master. Rust' s string
3
+ Strings are an important concept for any programmer to master. Rust’ s string
4
4
handling system is a bit different from other languages, due to its systems
5
5
focus. Any time you have a data structure of variable size, things can get
6
- tricky, and strings are a re-sizable data structure. That being said, Rust' s
6
+ tricky, and strings are a re-sizable data structure. That being said, Rust’ s
7
7
strings also work differently than in some other systems languages, such as C.
8
8
9
- Let' s dig into the details. A * string* is a sequence of Unicode scalar values
10
- encoded as a stream of UTF-8 bytes. All strings are guaranteed to be
11
- validly encoded UTF-8 sequences. Additionally, strings are not null-terminated
12
- and can contain null bytes.
9
+ Let’ s dig into the details. A ‘ string’ is a sequence of Unicode scalar values
10
+ encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
11
+ encoding of UTF-8 sequences. Additionally, unlike some systems languages,
12
+ strings are not null-terminated and can contain null bytes.
13
13
14
- Rust has two main types of strings: ` &str ` and ` String ` .
14
+ Rust has two main types of strings: ` &str ` and ` String ` . Let’s talk about
15
+ ` &str ` first. These are called ‘string slices’. String literals are of the type
16
+ ` &'static str ` :
15
17
16
- The first kind is a ` &str ` . These are called * string slices* . String literals
17
- are of the type ` &str ` :
18
-
19
- ``` {rust}
20
- let string = "Hello there."; // string: &str
18
+ ``` rust
19
+ let string = " Hello there." ; // string: &'static str
21
20
```
22
21
23
- This string is statically allocated, meaning that it' s saved inside our
22
+ This string is statically allocated, meaning that it’ s saved inside our
24
23
compiled program, and exists for the entire duration it runs. The ` string `
25
24
binding is a reference to this statically allocated string. String slices
26
25
have a fixed size, and cannot be mutated.
27
26
28
- A ` String ` , on the other hand, is a heap-allocated string. This string
29
- is growable, and is also guaranteed to be UTF-8. ` String ` s are
30
- commonly created by converting from a string slice using the
31
- ` to_string ` method.
27
+ A ` String ` , on the other hand, is a heap-allocated string. This string is
28
+ growable, and is also guaranteed to be UTF-8. ` String ` s are commonly created by
29
+ converting from a string slice using the ` to_string ` method.
32
30
33
- ``` { rust}
31
+ ``` rust
34
32
let mut s = " Hello" . to_string (); // mut s: String
35
33
println! (" {}" , s );
36
34
@@ -54,8 +52,78 @@ fn main() {
54
52
Viewing a ` String ` as a ` &str ` is cheap, but converting the ` &str ` to a
55
53
` String ` involves allocating memory. No reason to do that unless you have to!
56
54
57
- That's the basics of strings in Rust! They're probably a bit more complicated
58
- than you are used to, if you come from a scripting language, but when the
59
- low-level details matter, they really matter. Just remember that ` String ` s
60
- allocate memory and control their data, while ` &str ` s are a reference to
61
- another string, and you'll be all set.
55
+ ## Indexing
56
+
57
+ Because strings are valid UTF-8, strings do not support indexing:
58
+
59
+ ``` rust,ignore
60
+ let s = "hello";
61
+
62
+ println!("The first letter of s is {}", s[0]); // ERROR!!!
63
+ ```
64
+
65
+ Usually, access to a vector with ` [] ` is very fast. But, because each character
66
+ in a UTF-8 encoded string can be multiple bytes, you have to walk over the
67
+ string to find the nᵗʰ letter of a string. This is a significantly more
68
+ expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
69
+ isn’t something defined in Unicode, exactly. We can choose to look at a string as
70
+ individual bytes, or as codepoints:
71
+
72
+ ``` rust
73
+ let hachiko = " 忠犬ハチ公" ;
74
+
75
+ for b in hachiko . as_bytes () {
76
+ print! (" {}, " , b );
77
+ }
78
+
79
+ println! ("" );
80
+
81
+ for c in hachiko . chars () {
82
+ print! (" {}, " , c );
83
+ }
84
+
85
+ println! ("" );
86
+ ```
87
+
88
+ This prints:
89
+
90
+ ``` text
91
+ 229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
92
+ 忠, 犬, ハ, チ, 公,
93
+ ```
94
+
95
+ As you can see, there are more bytes than ` char ` s.
96
+
97
+ You can get something similar to an index like this:
98
+
99
+ ``` rust
100
+ # let hachiko = " 忠犬ハチ公" ;
101
+ let dog = hachiko . chars (). nth (1 ); // kinda like hachiko[1]
102
+ ```
103
+
104
+ This emphasizes that we have to go through the whole list of ` chars ` .
105
+
106
+ ## Concatenation
107
+
108
+ If you have a ` String ` , you can concatenate a ` &str ` to the end of it:
109
+
110
+ ``` rust
111
+ let hello = " Hello " . to_string ();
112
+ let world = " world!" ;
113
+
114
+ let hello_world = hello + world ;
115
+ ```
116
+
117
+ But if you have two ` String ` s, you need an ` & ` :
118
+
119
+ ``` rust
120
+ let hello = " Hello " . to_string ();
121
+ let world = " world!" . to_string ();
122
+
123
+ let hello_world = hello + & world ;
124
+ ```
125
+
126
+ This is because ` &String ` can automatically coerece to a ` &str ` . This is a
127
+ feature called ‘[ ` Deref ` coercions] [ dc ] ’.
128
+
129
+ [ dc ] : deref-coercions.html
0 commit comments