Re: [Reminder] Subversion a mentor for Google Summer of Code

From: Sachin Garg <schngrg_at_gmail.com>
Date: 2006-05-09 08:19:25 CEST

I am available for working on this, if for some reason your proposal
doesnt gets selected. If it gets selected, I will be happy to help you
with it.

Best of luck.

Sachin Garg [India]
www.sachingarg.com | www.c10n.info

On 5/8/06, Qi Fred <fred.qi@gmail.com> wrote:
>
> I have submitted a proposal to Summer of Code 2006 on this task.
> The following is my proposal,
> -------------------------------------
>
>
>
> Name: Qi, Fei
> Email: fred.qi@gmail.com
> IM: fred.qi@gmail.com (gtalk)
> Language: Chinese, Native;
> English, fluently reading, writing and speaking.
>
>
> * PROJECT TITLE
> ----------------------------------------------------------------------
> Compressed or optional text base storage in Subversion
> ----------------------------------------------------------------------
>
>
> * SUMMARY
>
> In Subversion, difference comparison and deltas generation are
> performed off-line based on the locally cached text bases. Text
> bases of a certain working copy are the unmodified files in the base
> revision. But such a design doubles approximately the storage space
> needed on the client side. Two feasible solutions of reducing the
> storage are: (a) compress the text bases, and (b) disable caching
> text bases of some or all of the files in the working copy. My
> proposal is to add a mechanism combines the two solutions to manage
> text bases.
>
> The following features are planned to be implemented:
>
> - By setting options in the runtime configuration files, users can
> (a) switch between using original and compressed text bases, and
> (b) enable or disable caching large binary files.
>
> - By specifying a special property on a certain file, one of the
> three caching mechanisms can be chosen: original, compressed, and
> excluded (caching disabled). Note that the text bases can be
> excluded on client side only if the file is a binary one.
>
>
> * DETAILS of PROJECT
>
> Compressed or optional text base storage in Subversion have been
> discussed for a long time in Subversion's development community,
> - SoC description:
> http://subversion.tigris.org/project_tasks.html
> - issue 525:
> http://subversion.tigris.org/issues/show_bug.cgi?id=525
> - issue 908:
> http://subversion.tigris.org/issues/show_bug.cgi?id=908
> These discussions give the start base of implementing this proposal.
>
> ** Implementations of the Two Solutions
>
> In my opinion, the two solutions have similar consequence but are
> different in essence. Utilizing compressed text bases does NOT
> affect the working model of Subversion. It increases only the
> runtime complexity introduced by compressing and/or decompressing
> the text bases. Thus its implementation is somewhat straightforward.
> But disabling the caching of text bases changes the work model of
> Subversion because comparison (diff) and generation of deltas depend
> directly on text bases.
>
> If a file without cached text base has been modified and intend to
> be committed, there are three (or more) potential working cycles:
>
> 1) abort and warn the user
> - abort the commit process
> - prompt the user to enable caching of the corresponding file
> - enable caching by the user
> - restart the commit process
>
> 2) temporarily download the base revision
> - send a request of base revision to the server
> - temporarily download the base revision
> - generate the deltas and committed changes
> - remove the base file since caching is disabled
>
> 3) make Subversion work without cached text bases
> - split large binary files into small blocks, for example, 32KB
> - stores locally the very short message digests of all blocks
> - detect changes by comparing digests of corresponding blocks
> - send only the changed blocks to the server or request and
> download only the changed blocks to the client.
> - generate deltas and commit changes (on server or client side).
>
> All the above working cycles solve the problem introduced by disable
> caching text bases. The first one can be easily implemented, but
> introduces inconvenient manual operations. The latter two cycles
> require modifications in both the client and server sides. The
> problem of the second one is the heavy load of transmission during a
> commit. Since the contents of large files change seldom, the second
> cycle is feasible. The third one concerns the collision of message
> digest algorithms. There is a report that different contents give
> same MD5 digests (http://eprint.iacr.org/2004/199.pdf).
> But
> collisions have not been found in SHA-1 algorithm. Some
> investigations should be down to avoid collisions. I prefer to
> implement the third working model.
>
> According to these discussions, I suggest to add a section of
> runtime configuration options and a special property to manage text
> bases.
>
> ** Runtime Configurations for text-base Management
>
> I suggest to add a new section, 'text-base', to the set of options
> of runtime configuration. This section provides options of text
> bases management on the client side:
>
> - compressed: This is a binary option (yes/no). This instructs
> Subversion client to cache compressed or original text bases. Set
> this to 'yes' to enable caching text bases in compressed format.
>
> - exclude-large-bins: This is a binary switch (yes/no). Set this
> variable to 'yes' if the user want Subversion to disable caching
> large binary files automatically. Whether the file is large or not
> is determined by comparing its size with a threshold that
> specified by the variable 'exclusion-threshold'.
>
> - exclusion-threshold: This option should be a positive number. Its
> value describes whether a binary file is large enough to turn off
> the caching of its corresponding text-base. The suggested default
> value is 512KB.
>
> - digest-block-size: This variable specifies the size of blocks the
> binary files will be split into. This option should be a positive
> number and its default value is suggested to be 32KB.
>
> ** Special Property for text-base Management
>
> A special property, 'svn:text-base', is suggested to be added. This
> property indicates the way Subversion stores the text base of
> corresponding file. Its value of can be one of the follows:
>
> - original: This causes Subversion to store the corresponding text
> base in its original format.
>
> - compressed: This causes Subversion to store the text base in
> compressed format.
>
> - excluded: This cause Subversion to work without cached text base.
> This value is applicable only to binary files.
>
>
> * SCHEDULE
>
> In this summer, my main work is to finish my Ph.D dissertation.
> According to my plan, I can work for this project (3~4 hours) * (4~5
> days) per week. The following is my detailed schedule ('+' indicates
> a milestone):
>
> May 22:
> - commence with project.
> W01 (May 22 ~ May 28):
> - communicate with mentors to confirm the proposal and goals
> - read related codes and documents in Subversion
> W02 (May 29 ~ Jun. 4):
> - sketch the framework of text-base management
> - prepare test cases
> - implement the user interface
> W03 (Jun. 5 ~ Jun. 11):
> - implement the compressed IO based on svn_stream_compressed()
> - add logging support
> W04 (Jun. 12 ~ Jun. 18):
> - implement compressed text bases support in checkout/update
> commands
> W05 (Jun. 19 ~ Jun. 25):
> - implement compressed text bases support in commit/diff command
> +W06 (Jun. 26 ~ Jul. 2): (Mid-program evaluations, Jun. 30)
> - finish the compressed text bases management
> - commence the working model without cached text bases
> W07 (Jul. 3 ~ Jul. 9):
> - function(s) for splitting files into blocks
> - function(s) for generating message digests of blocks of files
> (apr-util provides the MD4 and MD5 algorithm)
> W08 (Jul. 10 ~ Jul. 16):
> - comparison based on message digests of blocks
> - support in checkout/update commands
> W09 (Jul. 17 ~ Jul. 23):
> - request blocks on client side
> - receive blocks on client side
> W10 (Jul. 24 ~ Jul. 30):
> - send blocks on server side
> W11 (Jul. 31 ~ Aug. 6):
> - generation of deltas from blocks
> - finish the commit command on client side
> +WW (Aug. 7 ~ Aug. 21):
> - finish the optional caching support
> - write a final report
> - pencil down
>
>
> * Experiences with Subversion and Programming
>
> ** Experiences with Subversion
>
> I have been a user of Subversion for more than one and a half years.
> Subversion is a great version control system which out performs all
> the ones I used before I enter the world of Subversion. I am very
> familiar with the commands and configuration of Subversion.
>
> I have subscribed the development mailing list and download the
> source code of Subversion when I heard of SoC 2006. I have read the
> 'Hacker's Guide to Subversion' and documentations in some header
> files.
>
> ** Experiences with Programming
>
> I have using C/C++ as my major development language for more than
> eight years. Though most of my development work are done under
> Windows, I have experiences of developing communication programs
> under Unix/Linux.
>
> I am a good team player. I have participated in several projects,
> and three main projects are listed below (More details is available
> in my resume web page):
>
> - SportsPartner project: This project aims to track the players and
> analyze their actions in sports (soccer) games. I am the team
> leader and key algorithm developer.
>
> - NightView project: This project aims to design and implement a
> vision-based pedestrians detector to improve the safety of nightly
> driving. I am a consultant of this research and develop project.
>
> - Microarray Image Analysis: This project aims to detect and
> quantify the intensities of spots on scanned microarray images. My
> task is to design and implement the algorithm of detect and
> recognize the regular structures of grids on such images.
>
>
> * BIBLIOGRAPHY
>
> I got a B. Eng. from Northwestern Polytechnical University, Xi'an,
> China, in July. 2000. I am now a Ph.D candidate majoring in control
> science and engineering at Department of Automation, Tsinghua
> University, Beijing, China. I am expected to get my Ph.D degree in
> Jan. 2007.
>
> My resume can be found at the following link addresses:
> - HTML format: http://fred.qi.googlepages.com/resume.html
> - PDF format: http://fred.qi.googlepages.com/cv-qf.pdf
>
>
> * OTHER PROJECTS in SoC 2006
>
> I plan to apply another one or two projects mentored by boost
> organization. But I prefer to work for this project.
> -----
> Best regards,
> Fei Qi
>
>
> On 5/8/06, Sachin Garg <schngrg@gmail.com> wrote:
> > I looked at bug ID 908, which wants that the local copy in text-base
> > should be stored compressed. I did a little digging around in code and
> > felt it shouldnt be very hard to implement this and it will atleast
> > make my life easier.
> >
> > I am not going through the Google summer of code thing (am no longer a
> > student either :-) but would like to implement this feature (assuming
> > someone hasnt already started working on this).
> >
> > I am a long time subversion user (on Windows, TortoiseSVN) but new to
> > subversion code, so will need some guidance if you guys want me to
> > work on this.
> >
> > Some quick quesitions:
> >
> > # Is libsvn_wc/ the only place where I will need to edit code, or do I
> > need to look in other directories too? Which ones?
> >
> > # Do we already have a compression library (zlib?) linked in subversion?
> >
> > # How much additional delay this is expected to result in during
> > checkouts and commits? Should I use something lightweight like zlib or
> > will it be fine to use bzip2 which can give better compression but
> > will be slower?
> >
> > # Do we want files in text-base to be always compressed, or do we want
> > text-base compression to be optional?
> >
> > Bug no 525 (optional text-base storage) is slightly related, maybe I
> > can have a design which will make it easier to implement 525 too. Like
> > implementing text-base access as a layer which can have multiple
> > implmentations:
> >
> > 1. Direct file read
> > 2. Read compressed file
> > 3. Fetch from server
> >
> >
> > Another possible todo item (which runs in opposite direction from the
> > above items :-)
> >
> > Just like SVN stores text-base for local diffs, how about generalizing
> > it to store N previous revisions and change log entires. Storing
> > additional revisions shouldn't result in too much bloat, as we can
> > probably store just the diffs and can make more operations local.
> >
> > Sachin Garg [India]
> > www.sachingarg.com | www.c10n.info
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue May 9 08:20:04 2006

This message: [ Message body ]
Next message: Madan U Sreenivasan: "Re: svn commit: r19558 - in trunk/subversion: include libsvn_client"
Previous message: Walter Mundt: "Re: SoC application submitted: Improving the Python Bindings"
In reply to: Qi Fred: "Re: [Reminder] Subversion a mentor for Google Summer of Code"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]