Skip to content

Latest commit

 

History

History
113 lines (76 loc) · 9.62 KB

README.org

File metadata and controls

113 lines (76 loc) · 9.62 KB

Introduction

The C++ Standardization Committee has tasked Study Group 9 (SG9) with researching the idea of using ranges instead or in addition to iterators.[fn:1] This is an attempt to (re)implement the STL with ranges instead of iterators. The purpose of this implementation is only to get the discussion going, either in the negative (rejecting ranges) or the positive (further discussion of ranges). The implementation is purposefully as close to the original[fn:2] as possible. This closeness allows one to directly compare the implementation of STL with ranges against an implementation with iterators. Sometimes, this direct approach may result in implementations that are suboptimal for ranges, but this is a topic for a further discussion.

The implementation is unfinished at the moment, but it covers a sizable chunk of STL. The status of the implementation is discussed in more detail later on in this document. This README file provides a very rudimentary discussion of the implementation that will hopefully be expanded by SG9.

Ranges

A big question that SG9 has to answer is why ranges? That is, what problem do ranges solve? This implementation will hopefully help to investigate that problem. Next, we discuss the differences and similarities between ranges and iterators.

Ranges “vs.” Iterators

Iterators are a generalization of pointers, allowing abstract algorithms without performance penalty oven algorithms actually implemented with pointers. However, virtually all algorithms in STL operate on and return ranges. Ranges can be kept as pairs of iterators or as a single iterator with some implicit knowledge on how many times can the iterator be advanced (bounded vs. counted ranges). Even when an algorithm (such as say find) returns a “position” in a range as a single iterator, this iterator is not independent, and it forms two bounded ranges with the original “first” and “last” iterators. Thus, there is no way of escaping ranges, even with the “position” iterators returned by some algorithms.

Iterators allow direct manipulation of the details of ranges. For example, two ranges can share iterators: [first, last1) and [first, last2) are two different ranges, but a smart programmer can “optimize” by sharing the first iterator. With ranges as primitives, such sharing is probably not possible without some support from the compiler or without greatly complicating the interface of ranges.

One of the basic objections against ranges is that they cannot represent “positions.” In this implementation, we contribute a possible remedy to this problem: subrange operations. It is important to note that “position” iterators are usually used to form new ranges. For example, given a call auto result = find(first, last, 2) creates two new ranges, [first, result) and [result,last). Without subrange operations, one may have to implement two different versions of find, each returning one of these ranges. Subrange operations, on the other hand, allow one natural implementation that returns the “remainder” range representing [result,last) and extracting the “skipped” range with r.before(result) where r is the range searched. Subrange operations make the concept of ranges a bit more “dangerous” as the programmer must know of the subrange relation between ranges. Subrange operations on ranges that are not related by a subrange relation are erroneous and may lead to unexpected results.

Range Concepts

Ranges are structured after the STL iterators as in current standard into output, input, forward, bidirectional, and random access ranges. Range concepts are “implemented” for testing in the test_ranges.h file. Currently, we do not give a detailed description of ranges (only due to time constraints). Here we discuss only a few interesting differences between ranges and iterators.

For most part, the basic interface of ranges consists of functions empty(), front(), pop_front() and for bidirectional ranges back() and pop_back(). For random access ranges, we implement slicing with + and - operators, which pop from the front and from the back of the range, respectively. So, to take of 3 elements from the front and 2 from the back of a range r one would write r+3-2. There is no reason why slicing could not be implemented through a different interface, and the current interface is only a first try.

The big change from the previous discussions of ranges is the idea of a subrange relation that we introduce. A range is useful on its own, but often we are interested in questions such as does one range come before another? Does it come after? Do they intersect? Furthermore, it is often useful to combine ranges that are a subrange of a common parent range. For example, given ranges r1 and r2, we may be interested in a range r3 that encompasses r1, r2, and everything in between them as it appears in some range x of which r1, r2, and r3 are all subranges. Iterators allow such queries and operations naturally because an iterator range is just a pair of iterators. Iterators can be freely paired into new ranges as long as the resulting range is valid. With range primitives, all the range slicing, dicing, and combining operation and queries have to be part of the interface. It’s a somewhat of a philosophical question whether ranges should allow such iterator-like operations at all. This implementation of a range based STL is an experiment to investigate how useful or not useful such capabilities are.

Subrange Queries and Operations

Let r1 and r2 be two ranges which represent two iterator ranges, [f, l) and [b, e). There are 8 different ways in which these ranges can be combined (assuming that they are both subranges of a common superrange):

  • [f, b)
  • [f, e)
  • [l, b)
  • [l, e)
  • [b, f)
  • [b, l)
  • [e, f)
  • [e, l)

Only some of these situations are valid, depending on preconditions. For example, we cannot construct a range [f, b) if [f, l) comes after [b, e) (i.e., f > e for random access iterators). We can notice that the first 4 combinations are just a mirror image of the last 4. We provide 4 operations to implement these combinations:

  • begin_begin
  • begin_end
  • end_begin
  • end_end

The names indicate how ranges are combined. The part of the name before an underscore signifies which end of the range of the left is used to construct a resulting range, and the part of the name after the underscore signifies which end of the range on the right is used to construct the resulting range. The names are a first choice, and may not be very pleasing to everyone. There may be a much better way to name these operations, or maybe one could even conceive some use of operators, but as for now we stick with these names to test the basic functionality of ranges. The following list shows how to use the operations to construct the range combinations enumerated before:

  • [f, b) is r1.begin_begin(r2)
  • [f, e) is r1.begin_end(r2)
  • [l, b) is r1.end_begin(r2)
  • [l, e) is r1.end_end(r2)
  • [b, f) is r2.begin_begin(r1)
  • [b, l) is r2.begin_end(r1)
  • [e, f) is r2.end_begin(r1)
  • [e, l) is r2.end_end(r1)

All of these operations are equivalent in complexity to their iterator counterparts, and they amount to basically pairing iterators under the hood (or some other data structures in terms of which iterators are themselves implemented).

We also need to be able to compare ranges, and the situation is similar to combining them. Ranges provide two functions ominously called begin() and end(). These functions can only be used for comparison using ==== for all ranges, and <, > for random access ranges. For example, one could write:

r1.begin() == r.2.end();

r1 == r2 becomes just a wrapper for:

r1.begin() == r2.begin() && r1.end() == r2.end()

Same for < and >.

This seems to cover all the possibilities, and it also is much more principled than what we have implemented before. begin_begin and end_end still have to return different types than the original range for infinite ranges.

Issues

The main issue with ranges that we observed until now is optimization. Ranges do not allow for sharing between different ranges as is possible with iterators. Furthermore, some operations may do more than is necessary. One example are the front_equal and back_equal operations we discussed before.

Status

Range algorithms are implemented in the include/algorithms-range file. Currently, the only ranges we implemented are iterator wrapper ranges implemented in the include/range file.

The implementation is highly experimental. Not all of the STL algorithms are translated, but a big chunk is. The implementations were directly translated from their iterator versions, and no thought was put into making them more palatable and perhaps more optimal with ranges.

All the translated algorithms are tested, and the tests can give an idea on how to use the algorithms. Studying of the algorithms themselves vs. the iterator counterparts gives a good idea on how to use ranges.

Further hints on what can be done with ranges can be taken from the D Phobos Library, but until now we only directly translated the iterator-based STL without attempting to extend it.

Footnotes

Note: Footnotes do not render properly on GitHub.

[fn:1] http://www.open-std.org/pipermail/ranges/2013-January/000009.html

[fn:2] “libc++” C++ Standard Library at http://libcxx.llvm.org/