Looks reasonable enough. This code parses a parameter in “host:port” format. It's perfectly decent where it appears. It will run rarely, once per process invocation, so performance is irrelevant. It’s an acceptable way to achieve what’s intended.
std::string host = ...;
int port = 23;
size_t pos = host.find(":");
if (pos != host.npos)
{
port = atoi(host.substr(pos + 1).c_str());
host = host.substr(0, pos);
}
But suppose this code was in a tight inner loop. Suppose it was part of a parser that needs to digest hundreds of megabytes of data, and performance is relevant.
In that case, this code does two suboptimal things:
- A heap allocation to copy the “port” portion of the string into. Why not just read the number from the original string?
- Another heap allocation to copy the “host” portion of the string. Again, why not just read from the original string?
Seq
as an improvement over std::string const&
There are two purposes for which C++ programmers commonly use strings:- To store and own character content. This is what
string
andwstring
do. - To pass character content without passing ownership. This is what
string const&
does.
string const&
is almost always a mistake. It chains the string provider in ways that aren't necessary for the consumer to read the string. All you really need is to pass a pointer and a length. You need a lightweight Seq
object.A
Seq
object, essentially, is this:In practice, a useful
struct Seq
{
byte const* p { nullptr };
size_t n { 0 };
};
Seq
implementation will also contain numerous methods that a user can use to read from the Seq
. My implementation has these, among others:You get the idea. All the basic primitives you'd need to read character content belong in
struct Seq
{
...
uint ReadByte (...)
uint ReadHexEncodedByte (...)
uint ReadUtf8Char (...)
Seq ReadBytes (...)
Seq ReadUtf8_MaxBytes (...)
Seq ReadUtf8_MaxChars (...)
Seq ReadToByte (...)
Seq ReadToFirstOf (...)
Seq ReadToFirstOfType (...)
Seq ReadToFirstNotOf (...)
Seq ReadToFirstNotOfType (...)
Seq ReadToString (...)
Seq ReadLeadingNewLine (...)
uint64 ReadNrUInt64 (...)
int64 ReadNrSInt64 (...)
uint32 ReadNrUInt32 (...)
uint16 ReadNrUInt16 (...)
byte ReadNrByte (...)
uint64 ReadNrUInt64Dec (...)
int64 ReadNrSInt64Dec (...)
uint32 ReadNrUInt32Dec (...)
uint16 ReadNrUInt16Dec (...)
byte ReadNrByteDec (...)
double ReadDouble (...)
Time ReadIsoStyleTimeStr (...)
...
};
Seq
.The basic benefit of
Seq
is that it's lightweight, containing only a pointer and a length, and can point not just to a whole string, but also a substring. It does not require unnecessary functionality, like a whole string object, just to pass a sequence of characters without ownership.A secondary, but even more central benefit is that it serves as a focal point for a powerful set of string reading methods that leverage each other, allowing for both elegant and efficient string reading.
Using
Seq
, the earlier "host:port" example can be rewritten like this:This is not more complex than the string version. Yet this version does its task without unnecessary heap allocations, and would be much more efficient if implemented where performance matters.
Seq hostPort = ...;
Seq host = hostPort.ReadToByte(’:’);
uint32 port = 23;
if (hostPort.n)
port = hostPort.Drop(1).ReadNrUInt32Dec();
So, it's like std::string_view
?
The proposed C++ extension std::string_view
implements a similar concept. Main differences:std::string_view
is mainly the lightweight reference. It lacks a powerful library of string reading methods.Seq
, as in the example above, shows emphasis on stream-like reading. Read methods consume part ofSeq
and return the part that was read as anotherSeq
. A fully usefulSeq
implementation covers the basic primitives of string reading in an elegant way.std::string_view
is anstd::long_inconvenient_name
. However, this is understandable given a standard library designed by dark warlocks whose mystical powers derive from conjuring, and causing the world to use, long inconvenient names. :)
Seq
as a default for string passing and reading, not special case. This is encouraged by giving it a practical name, and building a library of string reading methods around it.std::string_view
could do the same, but it needs more power than just remove_prefix
and remove_suffix
.
Showing 8 out of 8 comments, oldest first:
Comment on Nov 15, 2015 at 15:54 by Unknown
Comment on Nov 15, 2015 at 16:55 by denisbider
If strong input validation is required, I use full-fledged parser machinery. This also uses Seq internally - but I don't use e.g. ReadNrUInt32Dec() to validate.
Comment on Nov 17, 2015 at 08:20 by Marco
Comment on Nov 17, 2015 at 08:39 by J
Another improvement on the initial code is
replace
host = host.substr(0, pos);
with
host.resize(pos);
Comment on Nov 21, 2015 at 16:08 by denisbider
The name string_span seems misleading. It suggests either:
- a span that requires an underlying string object (which it does not, it can be a span of characters from a string literal, or from a vector, or from manually managed storage;
- or maybe, even, a span of string objects.
In HTML, the tag "span" is used for inline character content. It seems to me it would be quite intuitive to use the word for this purpose in C++.
The array span is also a useful concept that I can easily imagine using. It seems that also needs a practical name. Maybe that could be a seq or a sequence.
Or maybe the array span can stay span, and string_span / wstring_span could be changed to cspan and wcspan for wchar_t.
Comment on Nov 23, 2015 at 14:03 by Arne Mertz
Comment on Nov 23, 2015 at 16:02 by denisbider
struct Seq
{
template <class F, typename... Args>
Seq& Fun(F f, Args&&... args)
{ f(*this, std::forward(args)...); return *this; }
};
Now you can do:
Seq(str).Fun(ReadThis, arg1).Fun(ReadThat, arg2);
but that is less readable than:
Seq(str).ReadThis(arg1).ReadThat(arg2);
There is a proposal by Bjarne Stroustrup to unify f(x,a) and x.f(a), so that the two are equivalent and interchangeable. If that were implemented, the free function chaining problem would be solved.
Unfortunately, at the last standards meeting (search for the keyword "Unified"), only part of the proposal was adopted: f(x,a) finds x.f(a), but not the other way around. This avoids solving the free function chaining problem... For now. :-|
Comment on Nov 23, 2015 at 16:05 by denisbider