Skip to content

Commit 9a5d842

Browse files
committed
Add documentation on struct Data performance refinements
1 parent 7426830 commit 9a5d842

15 files changed

+249
-0
lines changed
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# Performance Refinement of Data
2+
3+
* Author(s): Philippe Hausler <phausler@apple.com>
4+
5+
## Introduction
6+
7+
In Swift 3 the Foundation team introduced a new structural type to represent `NSData` exposed into Swift.
8+
9+
`Data` allows developers to interact with binary data with value semantics, which is often more appropriate than using pointers like `UnsafeMutablePointer`. Having an encapsulating type to abstract the common operations is often quite advantageous to tasks like parsing, where types like `String` may not be appropriate for or have hidden performance "gotchas".
10+
11+
`Data` can easily be a critical point of performance for many tasks. `Data` is an appropriate common currency of transacting a safe managed buffer of bytes that interoperates well with existing Cocoa APIs. This means that it should be tuned for performance as much as possible.
12+
13+
## Motivation
14+
15+
There are several outstanding performance improvements which can be made to `Data`; issues regarding `Data`'s performance have been raised with the Foundation team both publicly and privately, and we would like to address those.
16+
17+
`Data` should be as fast as possible. Currently, most of the backing of `Data` is implemented in Foundation, while being quite fast for reallocations and other operations, this means that calls are made between Swift and Objective-C even for simple things like count for every access.
18+
19+
This Swift–Objective-C boundary means that no inlining can be performed across it; even when we have full control over the backing implementation and the caller, what would normally be just a single offset load instructions ends up becoming many just for the benefit of an objc_msgSend (not to mention ARC operations). Even though the calls are heavily optimized, they will never be as fast as a single instruction.
20+
21+
## Proposed solution
22+
23+
In order to make `Data` as fast as possible the implementation needs to be inlined; since that is one of the best performance optimizations that Swift can offer. To do this it requires a re-think of how `Data` and `NSData` interact. This means that the structure `Data` will need to adopt certain attributes that will allow it to be inlined into the call-sites. The function dispatch overhead will be reduced but also optimization passes like cold paths and branch elimination can be possibilities for the compiler to do the best possible thing for the call site.
24+
25+
`Data` will adopt the annotation `@inline(__always)` in key locations and use a non-Objective-C backing class to store the pointer and length (and other internal ivars). That backing object will allow for a reduction of capture points as well as avoid extra retain/releases that were added for mutation detection.
26+
27+
Instead of using `_MutablePairBoxing` (which uses closures to map invocations to references or apply mutations with copy on write semantics) the new backing implementation can easily be applied with copy on write semantics without any `Unmanaged` "dancing". That "dancing" complicates code and it can be made considerably simpler. Furthermore, avoiding this "dance" can reduce the calls to retain and release down to zero in the application of mutations in unique referenced cases as well as mapping non mutating invocations to backing storage.
28+
29+
Subclassing the reference type `NSData` is still something that Foundation should support for the wrapping of the reference type in a structure. This effectively means there are five types of backing for `Data`: Swift-implemented, immutable NSData, mutable NSMutableData, custom subclasses of NSData, and custom subclasses of NSMutableData. These specific cases are delineated to offer the most efficient inline cases possible.
30+
31+
Since Foundation can enforce a no dynamic dispatch needed contract with itself in the cases of the standard class cluster members of NSData and NSMutableData Foundation can assure these cases are acceptable to not need dynamic dispatch for every time `bytes` or `length` are accessed and the values can be cached until the data is mutated or disposed of. In the cases where a subclass is used of course all bets are off and every point requires dynamically calling out.
32+
33+
In short this will mean that fetching the `count` of a `Data` can be optimized to a single branch and load from an offset and this same optimization can be applied to many other methods on `Data`.
34+
35+
## Bridging to and from Objective-C
36+
37+
Many of the sources that Data is derived from are sourced from the SDKs written in Objective-C. For many other types like `Array`,`Set`, `Dictionary`, or `String` the objects returned are not very large. Arrays may have a handful of objects, strings may only be a few hundred characters and so on. In these cases it makes sense to "eagerly" bridge those reference types into a more inline-able version (there are exclusions to this but in general it is most often the case).
38+
39+
`Data` does not follow this rule. Often it is sourced from files on disk (which could be exceedingly large) or results from network transactions of downloads. These cases would definitely suffer from having an "eager" O(n) bridge; due to not only memory allocation duplications to hold both backing stores but also to the byte copy from the reference type to the value type. `Data` should be fast no matter where it came from unless it is truly unknown on it's dynamic dispatch requirements.
40+
41+
To build a `Data` that is fast for inline optimizations the bytes pointer and length need to be cached for the duration of the object. When `as` is used to bridge a custom reference to `Data` dynamic dispatch must occur on every call to count and every time bytes are touched but if the `Data` is known to be obtained from a source that we can control the dynamic dispatch expectations that dynamic dispatch can be elided and behavior can be preserved by mimicking the Objective-C implementation in Swift.
42+
43+
Bridging in the other direction also has some distinct performance optimizations that can be taken advantage of as well.
44+
45+
When the lifespan of the callout to Objective-C is well known the cases of Swift constructed `Data` can easily pass a `NSData` with a no-copy of the backing buffer. It is the responsibility of the Objective-C APIs to appropriately either not directly retain past the scope of the call or copy in the cases of long lasting references. Any Objective-C method or function that takes a `NSData` and just retains or unsafely stores it past the function callout is likely incorrect and has bugs no matter the language it was invoked in. This case where the `Data` is created in Swift to bridge it only needs to allocate the wrapper `NSData` but no O(n) copy needs to occur (unless it is holding a reference as previously stated).
46+
47+
The final case of bridging is when a `Data` is obtained from Objective-C and then passed directly back to Objective-C. The compiler has potentials of optimizations in direct callout cases such as `returnsAData() as NSData` with "peephole" optimizations but these are only really enforceable in limited scope (sometimes just the same line of code). Since the backing store can hold a reference to the reference type the bridge method (when not mutated) in those cases can pass that back over to Objective-C. For mutated versions a copy of that mutated version can be passed along as well (taking advantage of any optimizations the dynamic dispatch affords for calls to `copy`).
48+
49+
50+
## Detailed performance breakdown
51+
52+
Each graph below is a comparison between the Swift 3 `Data` and the new version of `Data` for each of the inline cases. The horizontal axis in each graph represent N and the vertical axis in each graph represents the sampled duration in nanoseconds. Each data set in the plots are an average over 100 (unless otherwise specified) per value of N. The attached graphs were generated from optimized builds on a Mac Pro (Late 2013) 3.5 GHz 6-Core Intel Xeon E5 with 16 GB 1866 MHz DDR3.
53+
54+
```swift
55+
func createSampleData(ofLength N: Int) -> Data {
56+
var buffer = [UInt8](repeating: 0, count: N)
57+
return buffer.withUnsafeMutableBytes { (buffer: UnsafeMutableRawBufferPointer) -> Data in
58+
arc4random_buf(buffer.baseAddress!, N)
59+
return Data(bytes: buffer.baseAddress!, count: N)
60+
}
61+
}
62+
63+
func createSampleDataReference(ofLength N: Int) -> NSData {
64+
var buffer = [UInt8](repeating: 0, count: N)
65+
return buffer.withUnsafeMutableBytes { (buffer: UnsafeMutableRawBufferPointer) -> NSData in
66+
arc4random_buf(buffer.baseAddress!, N)
67+
return NSData(bytes: buffer.baseAddress, length: N)
68+
}
69+
}
70+
71+
func createSampleArray(ofLength N: Int) -> [UInt8] {
72+
var buffer = [UInt8](repeating: 0, count: N)
73+
buffer.withUnsafeMutableBytes { (buffer: UnsafeMutableRawBufferPointer) -> Void in
74+
arc4random_buf(buffer.baseAddress!, N)
75+
}
76+
return buffer
77+
}
78+
79+
```
80+
81+
### Accessing count
82+
83+
This should be a O(1) operation. The y axis is measured in nanoseconds sampled over 100000 iterations.
84+
85+
```swift
86+
// setup
87+
let data = createSampleData(ofLength: N)
88+
// start measuring
89+
_ = data.count
90+
// end measuring
91+
```
92+
93+
![Comparison of Data.count](./images/access_count.png)
94+
95+
### Subscripting
96+
97+
This should be a O(1) operation. The y axis is measured in nanoseconds sampled over 100000 iterations.
98+
99+
```swift
100+
// setup
101+
let data = createSampleData(ofLength: N)
102+
// start measuring
103+
_ = data[index]
104+
// end measuring
105+
```
106+
107+
![Getting subscript](./images/getting_subscript.png)
108+
109+
---
110+
111+
```swift
112+
// setup
113+
var data = createSampleData(ofLength: N)
114+
// start measuring
115+
data[index] = 0x00
116+
// end measuring
117+
```
118+
119+
![Setting subscript](./images/setting_subscript.png)
120+
121+
### Appending
122+
123+
This should be a O(N) operation
124+
125+
```swift
126+
// setup
127+
let dataToAppend = createSampleData(ofLength: N)
128+
var data = Data()
129+
// start measuring
130+
data.append(dataToAppend)
131+
// end measuring
132+
```
133+
134+
![Appending N bytes](./images/append_n_bytes_with_data.png)
135+
136+
---
137+
138+
```swift
139+
// setup
140+
let arrayToAppend = createSampleArray(ofLength: N)
141+
var data = Data()
142+
// start measuring
143+
data.append(contentsOf: arrayToAppend)
144+
// end measuring
145+
```
146+
147+
![Appending N bytes from Array](./images/append_array_of_bytes.png)
148+
149+
The new version is still O(N) just a much smaller constant multiplier.
150+
151+
---
152+
153+
```swift
154+
var data = Data()
155+
// start measuring
156+
for _ in 0..<N {
157+
data.append(contentsOf: [0xFF, 0xFE, 0xFD, 0xFC, 0xFB, 0xFA])
158+
}
159+
//end measuring
160+
```
161+
162+
![Appending N arrays](./images/append_n_arrays.png)
163+
164+
### Replacing sub ranges
165+
166+
```swift
167+
// setup
168+
var data = createSampleData(ofLength: N)
169+
// start measuring
170+
data.replaceSubrange(0..<N, with: replacement)
171+
// end measuring
172+
```
173+
174+
![Replacing full subrange](./images/replace_entire_subrange.png)
175+
176+
---
177+
178+
```swift
179+
// setup
180+
var data = createSampleData(ofLength: N)
181+
// start measuring
182+
data.replaceSubrange(0..<min(N, 5), with: replacement)
183+
// end measuring
184+
```
185+
186+
![Replacing partial subrange](./images/replace_fixed_subrange.png)
187+
188+
### Growth of count
189+
190+
```swift
191+
// setup
192+
var data = Data()
193+
// start measuring
194+
data.count = N
195+
// end measuring
196+
```
197+
198+
![Growth from 0 to 1..<100000 bytes](./images/grow_small.png)
199+
200+
```swift
201+
// setup
202+
var data = Data()
203+
// start measuring
204+
data.count = N
205+
// end measuring
206+
```
207+
208+
![Growth from 0 to 1000000..<10000000 bytes](./images/grow_large.png)
209+
210+
```swift
211+
// setup
212+
var data = Data()
213+
data.count = starting
214+
// start measuring
215+
data.count = N
216+
// end measuring
217+
```
218+
219+
![Growth from 1000000 to 1000000..<10000000 bytes](./images/grow_from_mid_to_large.png)
220+
221+
222+
### Bridging to reference types
223+
224+
This should be a O(1) operation. In bridging to a reference case the previous implementation was a bit faster. The only real extra overhead here is an allocation of the NSData object since the Swift backed `Data` has no existing reference type to pass along. There are a few extra optimizations that can be done in this path to reduce it by the approximately 150 nanosecond difference. In practice the cases where `Data` is being bridged back out to Objective-C are usually cases like writing to a file or socket which dwarf that 150 nanosecond differential.
225+
226+
```swift
227+
// setup
228+
let data = createSampleData(ofLength: N)
229+
// start measuring
230+
_ = data as NSData
231+
// end measuring
232+
```
233+
234+
![Bridge to ObjC](./images/bridge_to_objectivec.png)
235+
236+
### Bridging from reference types
237+
238+
This should be a O(1) operation
239+
240+
```swift
241+
// setup
242+
let data = createSampleDataReference(ofLength: N)
243+
// start measuring
244+
_ = data as Data
245+
// end measuring
246+
```
247+
248+
![Bridge From ObjC](./images/bridge_from_objectivec.png)
249+

Docs/images/DataDesign.png

106 KB
Loading

Docs/images/access_count.png

44.4 KB
Loading

Docs/images/append_array_of_bytes.png

54.9 KB
Loading

Docs/images/append_n_arrays.png

63.7 KB
Loading
56.1 KB
Loading
53.8 KB
Loading

Docs/images/bridge_to_objectivec.png

51.8 KB
Loading

Docs/images/getting_subscript.png

45.6 KB
Loading
64.5 KB
Loading

Docs/images/grow_large.png

71 KB
Loading

Docs/images/grow_small.png

53.2 KB
Loading
62.7 KB
Loading
56 KB
Loading

Docs/images/setting_subscript.png

48.2 KB
Loading

0 commit comments

Comments
 (0)