Revision graph cache implementation

Suppose the following repository

Action Path Copy src path Copy src revision
Revision 1
A /trunk    
A /branches    
A /tags    
Revision 2
A /trunk/foo.txt    
A /trunk/bar.txt    
Revision 3
D /trunk    
Revision 4
A /branches/1.0 /trunk 2

Algorithm to build the database

This database has three tables: files, change_paths and revisions. I'm going to explain step by step how this tables are built. The revisions table stores the information about authors, comments and dates. This table won't be shown in the examples because its information it stores is not useful to understand the cache.

In each revision I'm going to show the log information and then how appear the files and change_paths tables after the update. The rows inserted or the fields updated are shown in green.

In this algorithm all the log messages are stored and processed at the same time. All branches and change paths are calculated at the same time.

Revision 1

The user creates three directories

Action Path Copy src path Copy src revision
A /trunk    
A /branches    
A /tags    

Files table

File id Revision from Revision to Path
1 1 NULL /trunk
2 1 NULL /branches
3 1 NULL /tags

Change paths table

Path Revision id Copy src revision Copy src path Action File id
/trunk 1 NULL NULL A 1
/branches 1 NULL NULL A 2
/tags 1 NULL NULL A 3

Revision 2

The user creates two files under the /trunk directory

Action Path Copy src path Copy src revision
A /trunk/foo.txt    
A /trunk/bar.txt    

Files table

File id Revision from Revision to Path
1 1 NULL /trunk
2 1 NULL /branches
3 1 NULL /tags
4 2 NULL /trunk/foo.txt
5 2 NULL /trunk/bar.txt

Change paths table

Path Revision id Copy src revision Copy src path Action File id
/trunk 1 NULL NULL A 1
/branches 1 NULL NULL A 2
/tags 1 NULL NULL A 3
/trunk/foo.txt 2 NULL NULL A 4
/trunk/bar.txt 2 NULL NULL A 5

Revision 3

The user deletes the /trunk directory.

The problem here is that the behaviour is not lineal. There is one log message in this revision but three rows are updated in each table.

Action Path Copy src path Copy src revision
D /trunk    

Files table

Every row with a path starting with /trunk is marked as deleted at the current revision number.

File id Revision from Revision to Path
1 1 3 /trunk
2 1 NULL /branches
3 1 NULL /tags
4 2 3 /trunk/foo.txt
5 2 3 /trunk/bar.txt

Change paths table

Path Revision id Copy src revision Copy src path Action File id
/trunk 1 NULL NULL A 1
/branches 1 NULL NULL A 2
/tags 1 NULL NULL A 3
/trunk/foo.txt 2 NULL NULL A 4
/trunk/bar.txt 2 NULL NULL A 5
/trunk/foo.txt 3 NULL NULL D 4
/trunk/bar.txt 3 NULL NULL D 5
/trunk 3 NULL NULL D 1

Revision 4

A branch is created. The problem here is similar to the problem when a directory is deleted. One row is inserted for each subfolder and subfile of the directory copied.

Action Path Copy src path Copy src revision
A /branches/1.0 /trunk 2

Files table

File id Revision from Revision to Path
1 1 3 /trunk
2 1 NULL /branches
3 1 NULL /tags
4 2 3 /trunk/foo.txt
5 2 3 /trunk/bar.txt
1 4 NULL /branches/1.0
4 4 NULL /branches/1.0/foo.txt
5 4 NULL /branches/1.0/bar.txt

Change paths table

Path Revision id Copy src revision Copy src path Action File id
/trunk 1 NULL NULL A 1
/branches 1 NULL NULL A 2
/tags 1 NULL NULL A 3
/trunk/foo.txt 2 NULL NULL A 4
/trunk/bar.txt 2 NULL NULL A 5
/trunk/foo.txt 3 NULL NULL D 4
/trunk/bar.txt 3 NULL NULL D 5
/trunk 3 NULL NULL D 1
/branches/1.0 4 2 /trunk A 1
/branches/1.0/foo.txt 4 2 /trunk/foo.txt A 4
/branches/1.0/bar.txt 4 2 /trunk/bar.txt A 5

Queries to build the graph

Suppose the user wants to see the graph for the /branches/1.0/bar.txt file. The user is working at revision 4.

With this row in the files table we know that the file_id for that file is: 4.

File id Revision from Revision to Path
4 4 NULL /branches/1.0/foo.txt

Now we query for all change_paths for that file_id

Path Revision id Copy src revision Copy src path Action File id
/trunk/foo.txt 2 NULL NULL A 4
/trunk/foo.txt 3 NULL NULL D 4
/branches/1.0/foo.txt 4 2 /trunk/foo.txt A 4

This result joined with the revisions table, that stores the information about author, comments and dates, gives us all the information to build the graph.