On Tue, Feb 28, 2012 at 1:07 AM, Daniel Shahaf <danielsh_at_elego.de> wrote:
> Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800:
>> On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf <danielsh_at_elego.de> wrote:
>> > The output from these two tells me two things:
>> > 1. The minfo-cnt value is reasonable (within a typical ballpark).
>> > That's relevant since minfo-cnt abnormalities were seen in another
>> > instance of the bug.
>> > 2. Everything else looks correct: the 'id:'/'pred:' headers are accurate,
>> > and the 'count:' header was incremented correctly. The 'count:' header
>> > does, however, indicate that your repository has _in the past_ triggered
>> > an instance of the bug.
>> This is true. We have seen the bug happen before. The first occurence
>> of this that we had seen was Dec. 7th, 2011, a few days after we went
>> from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
>> At the time, we did not know about the cause and the developer who
>> had encountered the error didn't report it and was able to work
> Well, install fail2ban and have it mail you when that string appears
> in the logs? I'll do so too...
>> around it. From the Apache logs we have:
>> [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
>> count for the root node-revision is wrong: found 59444,
>> committing r59478 [409, #160004]
>> [Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor
>> count for the root node-revision is wrong: found 59482,
>> committing r59516 [409, #160004]
>> [Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor
>> count for the root node-revision is wrong: found 59488,
>> committing r59522 [409, #160004]
> As Stefan mentioned, these represent commit attempts that were rejected
> in order to prevent a new instance of the bug from entering the history.
>> [Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor
>> count for the root node-revision is wrong: found 59505,
>> committing r59539 [409, #160004]
>> Of the ips above, the last line is from the build machine. The others
>> were from developer workstations. I mentioned the most recent two
>> times first as we were actually aware of the issue at that time and
>> it was recent so we knew to start looking into it. Between Dec. 7 and
>> Jan. 31, the bug has occurred 12 times, 3 of those times from the
>> build server. The rest are from workstations. This month, it has only
>> occurred once and it was from the build server.
> What percentage of your commits are from the build server?
> Is there anything noteworthy about commits that were in progress around
> the time the bug occurred? (their svn:date's would be near the time
> stamp in the httpd log)
>> Each of these times, the error has occurred in different parts of
>> the repository.
>> Replies above. Sorry about the delay in replying. I have been really
>> busy of late. I will try and get the results this week, if not, it
>> will most likely be next week.
> No problem.
Not sure if I should be replying to this post or the latest post, but
here is where we are at currently.
I have had a developer here create a build of the latest SVN code
with your changes you mentioned in r1294470 for the svnadmin verify
command. We have run 'svnadmin verify' against every revision of our
hotcopy of our repository taken when we first brought this issue to
the forums and are now tracking down each of the revisions to see
what actions were being done at those times.
From the results, we see 25 error messages for predecessor count is
wrong and the first one appeared on January 26, 2011. Near that time
the following events occurred:
Jan. 14, 2011 - svn upgraded from 1.6.6 to 1.6.15
Jan. 14, 2011 - Apache HTTP server upgraded from 2.2.15 to 2.2.17
Jan. 21, 2011 - repository was pruned to delete some binary files.
Between January and our upgrade in Dec. to 1.7.1, we have had about
14,000 revisions and seen only 25 instances of this node revision
issue. During the times we had these errors, we were using svn
versions 1.6.15 and 1.6.16.
Fail2ban from what I could find does not look like it has a Windows
port which I currently have my production environment hosted on.
Received on 2012-03-01 19:02:16 CET