Skip to content

Add "sortedPrefix(_:by)" to Collection #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Dec 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
5429d3b
Add partial sort algorithm
rakaramos Oct 8, 2020
4362197
Add in place partial sorting
rockbruno Oct 9, 2020
f299df1
Guide docs
rockbruno Oct 9, 2020
6cd2870
Use Indexes
rockbruno Oct 9, 2020
63b2dd0
Merge pull request #1 from rakaramos/guide
rakaramos Oct 9, 2020
88216e1
Add partial sort tests
rakaramos Oct 9, 2020
afe7111
Indent up to 80 columns
rakaramos Oct 9, 2020
4652ae7
Fix heapify stopping before it should
rockbruno Oct 9, 2020
37d494a
Update PartialSort.md
rockbruno Oct 9, 2020
83d5f1e
Update PartialSort.md
rockbruno Oct 9, 2020
bf31ba1
Update PartialSort.swift
rockbruno Oct 9, 2020
acb3583
Cleaning up iterators logic
rockbruno Oct 9, 2020
6227bd8
Update PartialSort.swift
rockbruno Oct 9, 2020
d4a2e6b
Cleaning docs
rockbruno Oct 9, 2020
62ee6f2
Change implementation and name
rakaramos Oct 21, 2020
f674851
DocDocs
rockbruno Oct 21, 2020
5bdea96
Merge remote-tracking branch 'origin/fix-algo' into docdocs
rockbruno Oct 21, 2020
dd15b5a
Docs
rockbruno Oct 21, 2020
7ac3915
Merge pull request #3 from rakaramos/docdocs
rockbruno Oct 21, 2020
c68537f
Docs
rockbruno Oct 21, 2020
e8504fd
Optimize
rockbruno Oct 21, 2020
36e9a39
Fix header and remove assert
rakaramos Oct 28, 2020
1d22ef9
Add more tests (#4)
rakaramos Oct 31, 2020
62096e1
Update PartialSortTests.swift
rockbruno Oct 31, 2020
d0c1ccd
Merge pull request #5 from rakaramos/rockbruno-patch-1
rockbruno Oct 31, 2020
23bf863
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
379609b
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
435a38c
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
70973a2
Documentation fixes
rockbruno Nov 1, 2020
70a263c
Add tests for massive inputs
rockbruno Dec 2, 2020
1d3dcaf
isLastElement
rockbruno Dec 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Guides/Resources/SortedPrefix/FewElements.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Guides/Resources/SortedPrefix/ManyElements.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 48 additions & 0 deletions Guides/SortedPrefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Sorted Prefix

[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/PartialSort.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/PartialSortTests.swift)]

Returns the first k elements of this collection when it's sorted.

If you need to sort a collection but only need access to a prefix of its elements, using this method can give you a performance boost over sorting the entire collection. The order of equal elements is guaranteed to be preserved.

```swift
let numbers = [7,1,6,2,8,3,9]
let smallestThree = numbers.sortedPrefix(3, by: <)
// [1, 2, 3]
```

## Detailed Design

This adds the `Collection` method shown below:

```swift
extension Collection {
public func sortedPrefix(_ count: Int, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> [Element]
}
```

Additionally, a version of this method for `Comparable` types is also provided:

```swift
extension Collection where Element: Comparable {
public func sortedPrefix(_ count: Int) -> [Element]
}
```

### Complexity

The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if k is a small amount. If k is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.

Here are some benchmarks we made that demonstrates how this implementation (SmallestM) behaves when k increases (before implementing the fallback):

![Benchmark](Resources/SortedPrefix/FewElements.png)
![Benchmark 2](Resources/SortedPrefix/ManyElements.png)

### Comparison with other languages

**C++:** The `<algorithm>` library defines a `partial_sort` function where the entire array is returned using a partial heap sort.

**Python:** Defines a `heapq` priority queue that can be used to manually achieve the same result.

4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ Read more about the package, and the intent behind it, in the [announcement on s
- [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order.
- [`uniqued()`, `uniqued(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Unique.md): The unique elements of a collection, preserving their order.

#### Partial sorting

- [`sortedPrefix(_:by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/SortedPrefix.md): Returns the first k elements of a sorted collection.

#### Other useful operations

- [`chunked(by:)`, `chunked(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes.
Expand Down
99 changes: 99 additions & 0 deletions Sources/Algorithms/SortedPrefix.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift Algorithms open source project
//
// Copyright (c) 2020 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
//===----------------------------------------------------------------------===//

extension Collection {
/// Returns the first k elements of this collection when it's sorted using
/// the given predicate as the comparison between elements.
///
/// This example partially sorts an array of integers to retrieve its three
/// smallest values:
///
/// let numbers = [7,1,6,2,8,3,9]
/// let smallestThree = numbers.sortedPrefix(3, by: <)
/// // [1, 2, 3]
///
/// If you need to sort a collection but only need access to a prefix of its
/// elements, using this method can give you a performance boost over sorting
/// the entire collection. The order of equal elements is guaranteed to be
/// preserved.
///
/// - Parameter count: The k number of elements to prefix.
/// - Parameter areInIncreasingOrder: A predicate that returns true if its
/// first argument should be ordered before its second argument;
/// otherwise, false.
///
/// - Complexity: O(k log k + nk)
public func sortedPrefix(
_ count: Int,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> [Self.Element] {
assert(count >= 0, """
Cannot prefix with a negative amount of elements!
"""
)

// Do nothing if we're prefixing nothing.
guard count > 0 else {
return []
}

// Make sure we are within bounds.
let prefixCount = Swift.min(count, self.count)

// If we're attempting to prefix more than 10% of the collection, it's
// faster to sort everything.
guard prefixCount < (self.count / 10) else {
return Array(try sorted(by: areInIncreasingOrder).prefix(prefixCount))
}

var result = try self.prefix(prefixCount).sorted(by: areInIncreasingOrder)
for e in self.dropFirst(prefixCount) {
if let last = result.last, try areInIncreasingOrder(last, e) {
continue
}
Comment on lines +59 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a logic issue here — if e is equal to result.last, execution will pass by this continue statement. That's a problem, because the call to partitioningIndex then returns endIndex, which becomes invalid after the call to result.removeLast(). What you want to ensure is that e is strictly less than result.last before proceeding.

Test case:

Array(repeating: 1, count: 100).sortedPrefix(5)
// Fatal error: Array index is out of range

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap... That's another case that we did have covered in the tests, but the prefix wasn't low enough to trigger the algorithm. I added more high input cases, hopefully it will work now.

let insertionIndex =
try result.partitioningIndex { try areInIncreasingOrder(e, $0) }
let isLastElement = insertionIndex == result.endIndex
result.removeLast()
if isLastElement {
result.append(e)
} else {
result.insert(e, at: insertionIndex)
}
}

return result
}
}

extension Collection where Element: Comparable {
/// Returns the first k elements of this collection when it's sorted in
/// ascending order.
///
/// This example partially sorts an array of integers to retrieve its three
/// smallest values:
///
/// let numbers = [7,1,6,2,8,3,9]
/// let smallestThree = numbers.sortedPrefix(3)
/// // [1, 2, 3]
///
/// If you need to sort a collection but only need access to a prefix of its
/// elements, using this method can give you a performance boost over sorting
/// the entire collection. The order of equal elements is guaranteed to be
/// preserved.
///
/// - Parameter count: The k number of elements to prefix.
///
/// - Complexity: O(k log k + nk)
public func sortedPrefix(_ count: Int) -> [Element] {
return sortedPrefix(count, by: <)
}
}
Loading