On 8/12/14 4:39 PM, Branko Čibej wrote:
> This thing against flags and booleans that change API behaviour does have a
> very good argument going for it: it makes the code less obvious for the sake of
> saving a few function names.
>
> Consider the standard example, which (these days) would be an API to a DOM-like
> object that can be displayed or not; to query its state, you'd have a method called
>
> bool is_visible();
>
> To change the state, the "flaggy" design would be
>
> void set_visibility(bool);
>
> but the obviously more readable design uses two methods,
>
> void show();
> void hide();
>
> With the former, you always have to worry about what the boolean parameter
> actually means; with the latter, there can be no doubt.
>
> So in the case of displaying the contents of checksums, first of all something
> called
>
> svn_checksum_to_cstring_display
>
> is clearly a broken API, especially if it's used to format the checksum for the
> protocol, i.e., has nothing to do with displaying it at all. The docstring says
> nothing about displaying; and there's no difference between this function and
> svn_checksum_to_cstring, other than that the latter accepts (and possibly
> returns) a NULL, where the former does not.
Agree with you so far. Yes svn_checksum_to_cstring() accepts and returns NULL.
>
> Then we have svn_checksum_serialize and svn_checksum_deserialize, which I would
> expect to be always used for the protocol;
These are used by the wc indirectly by way of svn_sqlite__bind_checksum. The
key difference here is they include the type and the digest itself in the
output, allowing the svn_checksum_t value to be stored in the database and then
recreated from that, without assuming any particular type. This is probably
what we should have been doing for the protocol and the repository, but we
didn't and this is a relatively new (1.7) API. I'd suggest that FSX should
start using this but since it's probably going to be more binary there's
probably going to be more compact serialization used for it.
> and svn_checksum_parse_hex, where
> it's not clear if it's the inverse of svn_checksum_to_cstring, or the _display
> variant, or both, or WTF.
svn_checksum_parse_hex takes a hash type (which is usually inferred from where
the hex string is coming from) and a hex string and produces a svn_checksum_t,
it can be used to parse the output from either svn_checksum_to_cstring or
svn_checksum_to_cstring_display.
The difference between svn_checksum_to_cstring and
svn_checksum_to_cstring_display is how they handle a digest that is all zeros.
svn_checksum_to_cstring() returns NULL while svn_checksum_to_cstring_display
returns a string with the appropriate number of zeros in it. There are places
where one or the other variant is useful.
svn_checksum_parse_hex handles both NULL or a string with all zeros as the
digest the same.
> In other words, at least one of these APIs needs to be deprecated because it
> has a terrible name, and all of them need better documentation.
The documentation is sufficient to understand what's going on but it's not
entirely obvious until you read all of it. So yes we could improve this by
adding hints like that svn_checksum_parse_hex() can handle the output of either
svn_checksum_to_cstring() or svn_checksum_to_cstring_display().
> If we want a
> function that's intended specifically for printing fingerprints, then by all
> means call it
>
> svn_checksum_fingerprint
>
> although, really,
A fingerprint is just another name for a hash/checksum/digest. The problem
here is that there are several variations of how to display this data. It is
highly useful to try and display this data in the common format used for the
context. For instance, almost everything displays X.509 cert fingerprints as
SHA1 hashes with upper case hex digits with a colon between the characters for
each byte.
On the other hand you have software like GPG and PGP where the standard is to
use upper case hex digits with a space between every 2 bytes of the hash.
Then you have software like sha1sum, md5sum, etc... that just gives you a
stream of all lower case hex digits with no separators.
If you're wanting to manually verify that two match you can sit there and try
to read one and compare it to the other which is an error prone operation even
if you are using a format that has some sort of periodic separator. Or if you
happen to have both in the computer you can copy one or the other out and put
it in a editor and then search to see if the other one matches (or use the
search in your terminal). This of course only works if you have them formatted
the same.
So creating a function named svn_checksum_fingerprint() does absolutely nothing
to improve the situation. It's just as ambiguous as the current situation.
If we want to avoid flags and booleans then I guess we need to come up with
names for the styles like:
svn_checksum_fingerprint_x509style()
or
svn_checksum_fingerprint_uppercolons()
etc...
I'll try to come up with some final names, but can we at least agree that as
long as they're reasonably descriptive we don't need a week long thread to
agree to names?
> I don't know why such a thing would have to be in
> libsvn_subr, or perhaps in svn_checksum.h; I could imagine such a thing living
> in subr but exposed in svn_cmdline.h, called svn_cmdline_fingerprint or some such.
What does this have anything to do with the command line? The svn_cmdline.h
stuff is all things that either handle input or output to a terminal. The few
functions we have that handle character set encoding are there so they can
handle the detection of the output encoding (done in svn_cmdline_init()) and
use the correct encoding automagically.
I see no reason why this sort of formatting function shouldn't return UTF-8 and
then the command line tools use the svn_cmdline_* functions to output it.
Whatever formatting functions we provide for svn_checksum_t should live in
svn_checksum.h.
Received on 2014-08-13 04:34:59 CEST