A few weeks ago, I was reviewing some code, and found something similar to this:
 
std::string host = ...;

int port = 23;
size_t pos = host.find(":");
if (pos != host.npos)
{
port = atoi(host.substr(pos + 1).c_str());
host = host.substr(0, pos);
}

Looks reasonable enough. This code parses a parameter in “host:port” format. It's perfectly decent where it appears. It will run rarely, once per process invocation, so performance is irrelevant. It’s an acceptable way to achieve what’s intended.

But suppose this code was in a tight inner loop. Suppose it was part of a parser that needs to digest hundreds of megabytes of data, and performance is relevant.

In that case, this code does two suboptimal things:
  • A heap allocation to copy the “port” portion of the string into. Why not just read the number from the original string?
  • Another heap allocation to copy the “host” portion of the string. Again, why not just read from the original string?

Seq as an improvement over std::string const&

There are two purposes for which C++ programmers commonly use strings:
  1. To store and own character content. This is what string and wstring do.
  2. To pass character content without passing ownership. This is what string const& does.
I argue that passing string const& is almost always a mistake. It chains the string provider in ways that aren't necessary for the consumer to read the string. All you really need is to pass a pointer and a length. You need a lightweight Seq object.

A Seq object, essentially, is this:
 
struct Seq
{
byte const* p { nullptr };
size_t n { 0 };
};

In practice, a useful Seq implementation will also contain numerous methods that a user can use to read from the Seq. My implementation has these, among others:
 
struct Seq
{
...
uint ReadByte (...)
uint ReadHexEncodedByte (...)
uint ReadUtf8Char (...)
Seq ReadBytes (...)
Seq ReadUtf8_MaxBytes (...)
Seq ReadUtf8_MaxChars (...)
Seq ReadToByte (...)
Seq ReadToFirstOf (...)
Seq ReadToFirstOfType (...)
Seq ReadToFirstNotOf (...)
Seq ReadToFirstNotOfType (...)
Seq ReadToString (...)
Seq ReadLeadingNewLine (...)
uint64 ReadNrUInt64 (...)
int64 ReadNrSInt64 (...)
uint32 ReadNrUInt32 (...)
uint16 ReadNrUInt16 (...)
byte ReadNrByte (...)
uint64 ReadNrUInt64Dec (...)
int64 ReadNrSInt64Dec (...)
uint32 ReadNrUInt32Dec (...)
uint16 ReadNrUInt16Dec (...)
byte ReadNrByteDec (...)
double ReadDouble (...)
Time ReadIsoStyleTimeStr (...)
...
};

You get the idea. All the basic primitives you'd need to read character content belong in Seq.

The basic benefit of Seq is that it's lightweight, containing only a pointer and a length, and can point not just to a whole string, but also a substring. It does not require unnecessary functionality, like a whole string object, just to pass a sequence of characters without ownership.

A secondary, but even more central benefit is that it serves as a focal point for a powerful set of string reading methods that leverage each other, allowing for both elegant and efficient string reading.

Using Seq, the earlier "host:port" example can be rewritten like this:
 
Seq hostPort = ...;

Seq host = hostPort.ReadToByte(:);
uint32 port = 23;
if (hostPort.n)
port = hostPort.Drop(1).ReadNrUInt32Dec();

This is not more complex than the string version. Yet this version does its task without unnecessary heap allocations, and would be much more efficient if implemented where performance matters.

So, it's like std::string_view?

The proposed C++ extension std::string_view implements a similar concept. Main differences:
  • std::string_view is mainly the lightweight reference. It lacks a powerful library of string reading methods. Seq, as in the example above, shows emphasis on stream-like reading. Read methods consume part of Seq and return the part that was read as another Seq. A fully useful Seq implementation covers the basic primitives of string reading in an elegant way.
  • std::string_view is an std::long_inconvenient_name. However, this is understandable given a standard library designed by dark warlocks whose mystical powers derive from conjuring, and causing the world to use, long inconvenient names. :)
I emphasize the use of Seq as a default for string passing and reading, not special case. This is encouraged by giving it a practical name, and building a library of string reading methods around it.

std::string_view could do the same, but it needs more power than just remove_prefix and remove_suffix.