Description
I thought it might be best to create a new issue to discuss alternative implementations for the list of strings proposed in #268. As @arjenmarkus wrote there:
This proposal considers the features that would be useful for this derived type, a list of strings. It does not prescribe the methods for implementing the type. Given the ease by which arrays are handled in Fortran, it is possible to use allocatable arrays, but perhaps a linked list may prove to be more efficient, at least in certain scenarios of use. Therefore the proposal concentrates on the usage.
I thought it might be helpful to expand on this topic using some analogies with the C++ library. In C++, the standard template library provides the special container std::string
as a sequence of characters supporting fast random access, and fast insert/delete at back of the container.
In Fortran we can reach similar behavior/usage using the character(len=:), allocatable
type:
character(len=:), allocatable :: str
! Automatic allocation upon initialization
str = "Hello"
! "Insertion" at the back
str = str//", world!" ! "Hello, world!"
! Fast random access
str(8:8) = "W" ! "Hello, World!"
! "Deletion"
str = str(1:len(str)-1) ! "Hello, World"
There are of course also very large differences between the C++ std::string
and the Fortran allocatable character sequences in terms of memory management and other things. Some of the differences can be inferred from the numerous member functions of std::string
for access, modification, and string operations.
Moving one step further to a "lists of strings" in C++ there are several options. For "fixed-size" lists one has:
// Built-in (fixed) size array
std::string colour[4] = { "Blue", "Red",
"Orange", "Yellow" };
// Array of strings (safer, and easier to use than the built-in array)
std::array<std::string, 4> colour { "Blue", "Red", "Orange",
"Yellow" };
For dynamic lists of strings the most commonly used option in C++ is
// Vector of strings (the "go to" container)
std::vector<std::string> colour {"Blue", "Red", "Orange"};
// Strings can be added at any time with push_back
colour.push_back("Yellow")
However, the STL library also contains other specialized containers such as: std::deque
, std::list
, and std::forward_list
. Quoting from the textbook C++ Primer:
The idea of having different containers is they offer different performance trade-offs relative to
- The costs to add or delete elements to the container
- The costs to perform non-sequential access to elements of the container
The performance aspects of different sequential containers are summarized neatly in this table copied out of the same book:
Container | Description |
---|---|
vector |
Flexible-size array. Supports fast random access. Inserting or deleting elements other than at the back may be slow. |
deque |
Double-ended queue. Supports fast random access. Fast insert/delete at front or back. |
list |
Doubly linked list. Supports only bidirectional sequential access. Fast insert/delete at any point in the list. |
forward_list |
Singly linked list. Supports only sequential access in one direction. Fast insert/delete at any point in the list. |
array |
Fixed-size array. Supports fast random access. Cannot add or remove elements. |
string |
A specialized container, similar to vector , that contains characters.Fast random access. Fast insert/delete at back. |
The beauty of the C++ containers is they share many common operations. If you can avoid random access, and instead use only iterators to access the elements you can switch easily between some containers. A table of the supported operations can be found here.
I believe the current implementation in #311 is most similar to the std::vector
and uses contiguous storage for the underlying string elements. The comments from @esterjo (#268 (comment) and #241 (comment)) already address the fact that other implementations might be more suitable in some usage cases in order to avoid memory fragmentation or support other operations.
Given that there is not much precedence for such string containers in Fortran, I find it hard to judge which implementation is most suitable for typical Fortran usage cases. Getting some feedback from the community could help guide further (experimental) implementation efforts.