To perform CVS imports for fossil we need at least the ability to parse CVS files, i.e. RCS files, with slight differences. For the general architecture of the import facility we have two major paths to choose between. One is to use an external tool which processes a cvs repository and drives fossil through its CLI to insert the found changesets. The other is to integrate the whole facility into the fossil binary itself. I dislike the second choice. It may be faster, as the implementation can use all internal functionality of fossil to perform the import, however it will also bloat the binary with functionality not needed most of the time. Which becomes especially obvious if more importers are to be written, like for monotone, bazaar, mercurial, bitkeeper, git, SVN, Arc, etc. Keeping all this out of the core fossil binary is IMHO more beneficial in the long term, also from a maintenance point of view. The tools can evolve separately. Especially important for CVS as it will have to deal with lots of broken repositories, all different. However, nothing speaks against looking for common parts in all possible import tools, and having these in the fossil core, as a general backend all importer may use. Something like that has already been proposed: The deconstruct|reconstruct methods. For us, actually only reconstruct is important. Taking an unordered collection of files (data, and manifests) it generates a proper fossil repository. With that method implemented all import tools only have to generate the necessary collection and then leave the main work of filling the database to fossil itself. The disadvantage of this method is however that it will gobble up a lot of temporary space in the filesystem to hold all unique revisions of all files in their expanded form. It might be worthwhile to consider an extension of 'reconstruct' which is able to incrementally add a set of files to an existing fossil repository already containing revisions. In that case the import tool can be changed to incrementally generate the collection for a particular revision, import it, and iterate over all revisions in the origin repository. This is of course also dependent on the origin repository itself, how well it supports such incremental export. This also leads to a possible method for performing the import using only existing functionality ('reconstruct' has not been implemented yet). Instead generating an unordered collection for each revision generate a properly setup workspace, simply commit it. This will require use of rm, add and update methods as well, to remove old and enter new files, and point the fossil repository to the correct parent revision from the new revision is derived. The relative efficiency (in time) of these incremental methods versus importing a complete collection of files encoding the entire origin repository however is not clear. ---------------------------------- reconstruct The core logic for handling content is in the file "content.c", in particular the functions 'content_put' and 'content_deltify'. One of the main users of these functions is in the file "checkin.c", see the function 'commit_cmd'. The logic is clear. The new modified files are simply stored without delta-compression, using 'content_put'. And should fosssil have an id for the _previous_ revision of the committed file it uses 'content_deltify' to convert the already stored data for that revision into a delta with the just stored new revision as origin. In other words, fossil produces reverse deltas, with leaf revisions stored just zip-compressed (plain) and older revisions using both zip- and delta-compression. Of note is that the underlying logic in 'content_deltify' gives up on delta compression if the involved files are either not large enough, or if the achieved compression factor was not high enough. In that case the old revision of the file is left plain. The scheme can thus be called a 'truncated reverse delta'. The manifest is created and committed after the modified files. It uses the same logic as for the regular files. The new leaf is stored plain, and storage of the parent manifest is modified to be a delta with the current as origin. Further note that for a checkin of a merge result oonly the primary parent is modified in that way. The secondary parent, the one merged into the current revision is not touched. I.e. from the storage layer point of view this revision is still a leaf and the data is kept stored plain, not delta-compressed. Now the "reconstruct" can be done like so: - Scan the files in the indicated directory, and look for a manifest. - When the manifest has been found parse its contents and follow the chain of parent links to locate the root manifest (no parent). - Import the files referenced by the root manifest, then the manifest itself. This can be done using a modified form of the 'commit_cmd' which does not have to construct a manifest on its own from vfile, vmerge, etc. - After that recursively apply the import of the previous step to the children of the root, and so on. For an incremental "reconstruct" the collection of files would not be a single tree with a root, but a forest, and the roots to look for are not manifests without parent, but with a parent which is already present in the repository. After one such root has been found and processed the unprocessed files have to be searched further for more roots, and only if no such are found anymore will the remaining files be considered as superfluous. We can use the functions in "manifest.c" for the parsing and following the parental chain. Hm. But we have no direct child information. So the above algorithm has to be modified, we have to scan all manifests before we start importing, and we have to create a reverse index, from manifest to children so that we can perform the import from root to leaves.