

		     arch Compared to Subversion

Note: I don't intend to either slam or mis-represent Subversion.  I
think it's an interesting project.  If there are errors in my
characterization of Subversion in this FAQ or the feature comparison
chart, please point them out so I can correct them.  In that spirit:

* RECENT CORRECTIONS

  ** The Subversion team points out that BDB databases can be easilly
     replicated too, so arch's advantages regarding disaster recovery
     have been clarified.

  ** The Subversion team points out that their design is not tied
     specifically to BDB.  This has been noted.

  ** Dan Berlin points out that for local repositories, 
     Subversion does not require that you configure
     Apache with the dav module.


* RECENT ADDITIONS

  ** Because arch has distributed repositories, it is better suited
     for detached operation.  (This was pointed out by a soon-to-be
     delivered paper from the CPCMS group at Johns Hopkins SRL lab; the
     paper is about CPCMS, but the point applies equaly well to arch.)


* How does arch compare to Subversion?

  Superficially, arch and Subversion have some important points in
  common.  Both systems provide repository transactions with ACID
  properties.  Both provide tree-based revisions, rather than
  file-based revisions.  Both provide history-sensitive merging
  operations.  Both support adding, removing, and renaming files and
  directories.  Both provide an inexpensive tagging operation (which,
  in both cases, is also used to create new branches).

  Yet Subversion and arch have many differences, and those differences
  are the topic of this document.  The subtopics here are:


	** Smart Servers vs. Smart Clients
   
  	   In Subversion, a lot of revision control "smarts" are
	   built into the server.  In arch, the smarts reside entirely
	   within the clients.  Therefore....
   
	*** arch is very fast
	*** arch is scalable
	*** arch servers are easy to administer
	*** arch is resiliant when servers fail
	*** arch is better able to recover from server disasters


	** Trees in a Database vs. Trees in a File Systems

	   In Subversion, your projects are stored in a directory
   	   tree, but that tree is locked up in a Subversion-specific
   	   database: the only access to it is via Subversion's command
   	   set and web interface.  In arch, complete trees of all of
   	   your past revisions are stored (in a space efficient
   	   manner) on an ordinary file system.  Therefore:

	*** arch access to past revisions is faster
	*** with arch, you can use standard tools to access past revisions
	*** with arch, you can support any web protocols you care for
   

	** Centralized Control vs. Open Source Best Practices

	   Subversion is designed with the idea that projects should
	   reside entirely within a single repository:  You can't form
	   branches or easily take advantage of Subversions
   	   capabilities unless you are granted write access to that
   	   database.  Subversion provides low level operations for 
	   revision control, but little beyond that.

	   arch is designed with the idea that in the real world,
	   especially the world Open Source processes, projects
	   transcend organizational boundaries and are developed
	   asynchronously at sites that are only loosely coordinated.
	   Arch facilitates inter-organizational development in two
	   ways: with distributed repositories and with high-level
	   revision control operations that reflect the way
	   programmers work.

   	*** arch knows how to support development on branches
	*** arch has strong support for detached operation
	*** arch automates ChangeLog maintenance
	*** arch facilitates extensive change reporting
	*** arch supports process automation
   

	** arch is Smaller and Simpler

	   Small is beautiful, specially when it does more than Big.
  
	*** arch depends on fewer external projects


Here are the details:


   
** Smart Servers vs. Smart Clients

   In Subversion, a lot of the "smarts" of revision control are built
   into the server.  In arch, the server is a relentlessly dumb file
   system, possibly accessed via ordinary FTP.

   In Subversion, all branches must reside on a single server.  In
   arch, some branches can be one one server, others on other servers,
   and still others stored locally.

   These differences have many implications.  

*** arch is very fast

   Because so much work is done on the client side, where arch caches
   information about past revisions in an optimal format, many
   operations in arch are blindingly fast.  For example, checking out
   a past revision from your local cache takes about 2x the time it
   takes to recursively copy the files in that revision.  Retrieving
   an individual file from a past revision is essentially
   instantaneous.  Computing a patch set against a past revision is
   as fast as computing a patch set against an ordinary directory.


*** arch is scalable

   As the number of programmers working on a project increases, the
   demands on the revision control system increase somewhere between
   linearly and exponentially (consider, for example, the number of
   subsets of programmers who may be cooperating on particular sets of
   changes).  

   In Subversion, the entire revision control burdon of
   this increase is placed on a single server, and a single server
   administrator.  

   In arch, the repositories and branches are distributed, allowing
   the number of servers and server administrators to grow as needed
   to keep up with development.

   As a simple example, consider the problem of preparing difficult
   merges for the "trunk" line of a particular project.  If there are
   conflicts to resolve, and the merge takes time, a new problem
   arises: the "trunk" line will (or should) continue to change even
   while the merge is being prepared.  Now suppose that, in a large
   project, several different merges are being prepared
   simultaneously.  Each developer working on a merge needs the full
   facilities of the revision control system to do their work: they
   should be able to update against the changing "trunk".  They should
   be able to form revisions of their partially complete merges.  In
   some situations, two developers working on separate merges should
   be able to form new branches in which they combine their merge
   efforts before adding them to the trunk.  In Subversion, all of
   those developers will need write access to the shared, central
   repository.  Administering access control to the sub-trees on that
   server, therefore, becomes critical.  Up-time and availability for
   that server is critical: if it goes off-line -- all of the
   developers are stalled.  Performance of that server is critical: if
   it becomes bogged down computing diffs, all the developers suffer.

   But with arch, very few developers need write access to the shared
   server.  The others can use their own repositories.  Access control
   administration for the shared server is simplified and if it goes
   off-line for some reason, developers can still get useful work
   done.  The shared server does nothing more than exchange files over
   FTP: a task that is not particularly resource intensive.


*** arch servers are easy to administer

   An arch server is an ordinary FTP daemon.  

   arch does not require you to administer a Berkeley (or other)
   database just to set up a repository.  Rather than relying on a
   complex, general purpose transaction engine, arch implements (in a
   simple way) a very particular set of specialized transactions that
   are exactly right for updating a revision repository.  arch
   implements its transactions on top of ordinary unix system calls
   such as `mkdir' and `rename'.  You'll never have to rebuild indexes
   or kick-start wedged database servers just to keep arch running.



*** arch is resiliant when servers fail

   When a Subversion server crashes or is otherwise inaccessible,
   nobody has access to old revisions until the server is back up
   again and nobody can perform any commits.  When an arch server
   crashes, everybody actively working on the project still has
   extensive access to past revisions and, importantly, can perform
   commits to private branches -- only commits to shared branches have
   to wait for an arch server to return.  More precisely, only commits
   to the subset of shared branches residing on a particular server
   have to wait -- commits to shared branches residing elsewhere can
   continue.


*** arch is better able to recover from server disasters

   Because arch remote repositories are:

   	1. Easily replicated (as are Subversions)
	2. Trivially administered
	3. Not resource intensive

   creating repository mirrors is a lightweight operation that doesn't
   impose a significant cost on mirror sites.  You can set up a bunch
   of geographically scattered arch mirrors without having to
   configure or administer BDB at those mirror sites, and without
   having to reserve lots of disk space for log files at those sites.
   Thus, the cost of insuring yourself against server disasters is
   lower for arch.



** Trees in a Database vs. Trees in a File Systems

   In Subversion, all of your past revisions are "locked up" in a 
   custom database built over layers such as the Berkeley DB or a 
   relational database.  (The Subversion team is quick to point out
   that their back-end storage manager is designed to be an
   exchangable component -- thus, future versions of Subversion may
   relax this constraint.)

   In arch, the official copies of your revisions, created by commit
   transactions and similar operations, are stored as ordinary
   compressed tar files of revision source and revision patch sets.
   In a pinch, you could even reconstruct one of those past revisions
   "by hand", using nothing more than ordinary shell tools

   Additionally, in arch, many of your past revisions are cached as
   complete, ordinary file system trees -- containing literal copies
   of the revisions (but stored in a space efficient way).

   Therefore:

*** arch access to past revisions is faster

   You can instantly find the root of a file system tree containing a past
   revision from your cache.  This makes many of arch's built-in
   operations very fast, and:

*** with arch, you can use standard tools to access past revisions

   You can use ordinary tools, such as `find', `grep', and `diff' to
   examine past revisions.  You can browse past revisions with `ls' or
   your favorite file manager.

*** with arch, you can support any web protocols you care for

   If you want to provide fast and searchable network access to the
   individual files in past revisions, because they are available as
   ordinary file system trees, you can use whatever tools and
   protocols you choose.  arch is shipped with web tools that build an
   extensive HTML-based interface to past revisions allowing you to
   examine branch/merge histories, patch sets, individual files, log
   entries and changelogs in great detail.




** Centralized Control vs. Open Source Best Practices

*** arch knows how to support development on branches

   When a project has more than a few programmers, careful control of
   the "trunk" line of revisions becomes critical.  Often, for
   example, you will only want to permit changes to the trunk once
   they have passed testing.  In the world of Open Source processes,
   where development typically crosses organizational boundaries, you
   will often have many important contributors with no (write) access
   to the trunk line at all.

   Consequently, your revision control system should provide excellent
   support for developing on "branches" off the trunk, merging changes
   from branches to trunk only when they are ready.  And those
   branches must not be constrained to reside only in the repository
   that holds the trunk line.  That's a central problem for revision
   control to solve: how to support branching and merging when most
   development takes place on branches, and the trunk is used to
   syncronize those branches.

   arch has a high-level merging command that solves the branch
   merging problem perfectly (the "star-merge" command).  That command
   recognizes the pattern of development on branches from a trunk, and
   simplifies merging by computing merge paths that avoid spurious
   conflicts.

   In addition, arch encourages developers and contributing
   organizations to create private branches in private repositories,
   where they can, for example, work out the details of complex
   changes and complex merges before touching the shared server at
   all.  Private branches are also useful for maintaining
   site-specific customizations of the trunk line.


*** arch has strong support for detached operation

   Distributed repositories permit programmers to load up their 
   laptops with sources, detach from all networks, and still commit 
   revisions with the aim of later merging those commits into a
   shared repository.  Local revision libraries and archive mirrors
   mean that detached programmers can still access shared revisions
   for purposes such as merging or study.

*** arch automates ChangeLog maintenance

   arch knows how to automatically maintain accurate, complete, and
   hyperlinked ChangeLogs -- allowing you to read the history of any
   particular branch, and how it relates to the histories of other
   branches. 


*** arch facilitates extensive change reporting

   When multiple, loosely cooperating organizations contribute to a
   project, how can they avoid chaos?  Each organization must be able
   to remain informed of the changes being made by the others.  arch
   provides an extensible event triggering system that can be used,
   for example, to send email notices as new revisions are committed.
   arch includes a web-based browser for studying the detailed history
   of the branches and revisions in any set of archives.


*** arch supports process automation

   Through an event triggering mechanism, and a fully programmatic
   interface to arch's commands, arch provides excellent support for
   process automation.  You can easily, for example, impose arbitrary,
   automatically tested pre-conditions on commits to selected
   branches.  You can automate such tasks as configuration
   reconstruction or distribution production.  Using arch's detailed
   email notification service, you can integrate repository events
   into organizational processes, such as testing routines or progress
   tracking.



** arch is Smaller and Simpler

   Even though arch is more capable than Subversion (as described
   above), the implementation of arch is much smaller, and much
   simpler.

   arch is primarilly a small-multiple-of-10K lines of shell, sed, and
   awk scripts -- designed to be portable to any Posix environment.
   The code is easy to understand and extend.


*** arch depends on fewer external projects

   For the most part, arch depends only on very generic, widely ported
   shell tools.  Remote use requires an FTP server, but any reasonably
   sane server implementation should work.

   arch is not based on bleeding-edge W3C standards or the libraries
   being developed to implement them.  You can adopt arch without
   inheriting fears about where the latest and "greatest" technology
   will be next year.  (Dan Berlin points out that, for local
   repositories, Subversion does not require apache with the dav
   module.)




# tag: Tom Lord Wed Jan 30 20:16:35 2002 (=FAQS/subversion)
#
