On Aug 19, 2005, at 11:08 PM, Matthew L Daniel wrote:
>> I don't believe that there are any hook scripts available to filter a
>> file's contents as it is being checked out though.
>>
>
> I believe (with an Apache-hosted svn repo) a well-crafted mod_filter
> could accomplish this task. Whether the files would be marked as
> "dirty"
> in the WC would depend on the exact mechanism svn uses for determining
> such a detail. In my head, so long as the file svn uses for the
> "revert"
> command matches the "checked out" copy, it wouldn't be dirty.
I'm pretty sure this will work as long as the user never performs a
diff of the modified file against a specific revision on the
repository. At that point the watermark will be clearly
highlighted. The only way I can see to get around this would be to
craft the mod_filter to also apply the watermark to the data sent by
diff, but that would require your mod_filter to have intimate
knowledge of the entire history of the file that it's watermarking.
Another (probably easier) solution would be to just make sure the
watermarked file is one that is unlikely to ever get diffed. For
instance, you might watermark a comment in some extremely stable
interface code that no one is allowed to touch (well I guess a
comment wouldn't work if you want the binary to be watermarked, but
you get the idea). Since it won't change over time, no one is likely
to have any interest in performing a comparison of that file. Of
course, you'll also have to make sure it's appropriately segregated
from the active development code to ensure that it's unlikely to get
caught in a diff of the whole directory.
>
> As far as the actual watermarking (in a non-intrusive way, and there's
> the rub), sequentially altering local variables to construct a "bible
> code" (sorry, I don't have the actual cryptographic glossary in
> front of
> me, but when you take the 2nd letter of word 1, the 4th letter of word
> 3, etc, and it spells out a message) is one idea. Conceptually, one
> would only need to replace 33 tokens to be able to represent a GUID in
> the code.
Since there is presumably a single point of generation for the
watermark you probably don't need anywhere near 33 tokens. Assuming
you use upper and lower-case letters in your substitutions (to ensure
that the result is still compilable), that's 52 possibilities for
each modified character. That gives you a little more than 7 million
possibilities with just 4 tokens, which is a sufficiently high number
to create a unique watermark for every coder/revision pairing in all
but the largest projects.
>
> I responded because I like the challenge of this question, but I feel
> it's a terrible idea to try and solve this problem using an SCM tool.
This is a terrible problem to have to solve using any tool, because
source code is such an inherently difficult item to watermark.
Watermarking works in things like images or sound files because
there's a lot of noise where unique data can be easily hidden.
Source code on the other hand contains almost zero noise, so the only
way to hide something is to put it where no one will look (since if
they do look they'll probably see it). That said, if you have to
solve the problem I don't know of any way to do it without involving
the SCM tool. Since the whole purpose of the SCM tool is to not
allow anonymous random changes to get into the code, you pretty much
have to let the SCM know at least something about random changes that
you do want to allow in. For instance, the watermark would be pretty
useless if its introduction were listed in the log for a given file.
-Bill
>
> -- /v\atthew
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Aug 22 02:09:11 2005