Re: svn status tabbed output [was: Re: svn status should not show unmodified files in changelists]

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Thu, 7 Nov 2019 19:25:26 +0000

Julian Foad wrote on Thu, Nov 07, 2019 at 17:53:43 +0000:
> We have said the outputs (except XML) simply are not intended to be parsed.

We've said that? I think what we said is that the non-XML outputs _may_ be
parsed, but with care, since they may change in minor releases as functionality
gets added. (Example: when we added tree conflicts, we added a 7th column to
the output of 'svn st'.) That's why we say Â«%d linesÂ» at the end of every
Â«svn log | grep '^r[0-9]* 'Â» line.

> Tab-separated output seems a rather arbitrary addition to the flock, though
> it's an OK choice in isolation for this particular use.

The rationale was to offer a line-based format for use in shell scripts and
pipelines. Shell scripts are basically the reason we added --show-item in the
first place, too.

We could use NUL separators instead of tabs.

> Better consistency with other subcommands would be achieved by using
> space-separated output with field widths chosen per field, like we chose
> them for "status" and "list".

That's not possible in this case: just consider Â«--show-item=foo,url,relative-url,barÂ».

> Or item-per-line format like the original "info" output.

Firstly, I'm not this would be worth implementing; the default 'info' output
serves well enough for that.

Secondly, that would be an unusual/unnatural interface for scripts. Line-based
formats usually operate on the premise that all lines are parsed the same way.
(Example: the output of Â«netstatÂ».) Â«svn infoÂ», with or without the RFC822
headers at the start of each line, is the exception to the rule. The
corresponding parsing idioms (for example, Python's 'fileinput' module, and
Â«for line in open('foo')Â») operate one line at a time, not N lines at a time.

> Why should selecting items not be orthogonal to the output format?

There's a distinction between the default and XML outputs on the one hand,
and TSV output on the other hand: In the latter it's harder to select just
the particular fields one cares about.

Without the ability to select items to be output, a consumer of the
tab-separated format that cares just about a particular field would have to
either hardcode magic numbers (as in Â«awk 'print $1, $13'Â») or deal with
a header line (meaning the parsing would need to be stateful â€” exactly the
problem we started with â€” and things like Â«xargs -n1 svn info
--show-item=foo,bar --Â» would print multiple header lines). On the other hand,
in the default and XML formats selecting just the fields one cares about is easy.

So, selecting output fields is more important for TSV than for RFC822 or XML.

> The XML output should also support --show-item. It's arbitrarily
> inconsistent that it presently doesn't.

See above: consumers of the XML output can easily ignore the parts they don't
care about, even without --show-item support. On the other hand, I could see
a case for having Â«svn info --rfc822 --show-item=foo,barÂ», which would generate
the default output format but print only some of the lines, for interactive use.

More generally, I see your point that selecting fields and selecting output
format should be orthogonal. That does make sense from an abstract (pure
mathematics) point of view, but more pragmatically, I don't think it's as high
priority to support Â«--show-item=â€¦ --xmlÂ» as to to support multiple arguments
to Â«--show-item=â€¦Â» in TSV mode. In fact, with info-cmd.c as it stands, supporting
Â«--show-item=â€¦ --xmlÂ» would not _reduce_ complexity but _increase_ it, since
several different receiver functions would need to become aware of --show-item.

> These days, JSON would be a reasonable choice of output format as an
> alternative to XML. Consider if we offered it, how would we select it?
> Perhaps a --json flag? Then why not a --tsv flag for TSV format,

Yes, we could do this. We could add --json and --tsv, as well as --rfc822 that
would select the default output mode explicitly. We could generalize that to
--output-mode=(rfc822|json|xml|tsv). We could make --show-item work in
conjunction with any of json/xml/tsv/rfc822. For compatibility reasons, Â«svn
infoÂ» would default to --rfc822, but --show-item would imply --tsv unless one
of --json/--xml/--rfc822 was passed. We would then have Â«svn info --rfc822
--show-item=depthÂ» that prints "Depth: empty" (with the RFC822 header) and no
other lines, and we'd be able to do Â«svn info --json --show-item=fooÂ» to save a
couple of microseconds to the process on the other end of the pipe.

I don't see how any of these visions â€” leaving aside whether they're good ideas
or not â€” is a blocker to the patch I posted. You're saying we could add other
features, and I'm sure we could, but that's not the right question to ask. The
question to ask is whether the patch adds value, and whether it adds something
that we won't want to support untilÂ 2.x. I don't see any concern of the sort
in all your points. The patch as it stands is forward-compatible with all your
ideas.

> and expect that too to be made available to other subcommands?

Which other commands might use TSV?

> We might want to allow using multiple options, like "--show-item=revision
> --show-item=kind" in addition to comma-separated values.

I don't think this is high priority.

Daniel
Received on 2019-11-07 20:25:36 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]