DryDock SyncKit Guide

Deepak Giridharagopal
Principal Developer

Updated: 2003/10/23 00:55:32
Version: 1.6

Abstract

This guide shows you how to customize DryDock behaviour during synchronization. It will give a detailed description of both the SimpleSyncKit and the ARLSyncKit packages included in the distribution.


1. Reading list

First things first. Though it's not strictly necessary, it's probably a good idea that you at least read the DryDock overview documentation. It'll help you immensely if you actually have a decent handle on what goes on during DryDock's synchronization phase.

Second, you should take a look at the configuration section of the DryDock installation guide. That will familiarize you with all of the variables at your disposal when you start writing custom sync scripts.

2. Anatomy lessons

The synchronization daemon 

When the Webware application server is started, DryDock spawns off a thread that is dedicated to synchronization (the synchronization daemon). It is this thread that is responsible for pushing out approved changes to the production web tree. When a sync is requested (either through user activity or automatically), the sync daemon goes through the following sequence of events:

  1. Copies pre-approved files. The first thing the daemon does during a sync is handle any files marked as pre-approved. It queries the database to obtain a list of pre-approved files, and then copies the current state of each into the export archive.
  2. Calls to the SyncKit. The daemon then loads the configured SyncKit, and makes a call to its entry-point function (we'll talk about that in a sec).
  3. Blocks. The thread then block, waiting for completion of the SyncKit entry-point function.
  4. Resets the next-sync timestamp. Finally, the thread notes the time the sync was completed and sets the next mandatory sync to an hour from then.

As a developer of SyncKits, you should be primarily interested in steps 2 and 3.

The SyncKit 

A SyncKit is a Python package that is comprised, at minimum, of a file called SyncMain.py. This needs to be a Python module that defines a function main(), which is the aforementioned SyncKit entry-point. So a bare-bones SyncKit would look like this:

Code listing 2.1: A bare-bones SyncKit named 'BareBonesSyncKit'

// These are the files that appear in the Kit
# cd BareBonesSyncKit  
# ls
SyncMain.py  __init__.py

# cat SyncMain.py
def main():
  pass

Important: The __init__.py is REQUIRED. It defines the directory as a Python package. Creating the file is as simple as doing a touch __init__.py.

When this kit is invoked by the synchronization daemon, it will just immediately return control back to the sync daemon. Of course, this isn't interesting at all, but now you can make some sense of the example kits shipped with the DryDock distribution.

3. Dissecting SimpleSyncKit

The SimpleSyncKit is designed to simply copy DryDock's production web tree to another location in a manner that's somewhat atomic. The kit will manage 2 directories. A directory called live_web will hold a copy of DryDock's production web tree. Every time a sync is executed, the script will copy the current production web tree into live_web. The second directory is called old_web. old_web will contain the production web tree from the last synchronization. You know, as a backup.

So during a sync, SimpleSyncKit takes DryDock's image of the production web tree (the location of this image is defined by the EXPORT_DIRECTORY variable in your DryDock Config.py file) and copies it to a temporary location. It then takes the current live_web and renames it old_web (making it the backup copy). Finally, we move the files from the temporary directory into live_web.

You can set up Apache to serve files from live_web. This way, it always serves up the most up-to-date content.

Note: Why not just point Apache at your EXPORT_DIRECTORY? Partly because if DryDock's export directory gets hosed, DryDock won't work correctly. But mostly because it's just for the sake of the example. :)

So let's take a look at how this gets accomplished. First, there's the SyncMain.py file (remember how every SyncKit needs one?).

SyncMain.py 

Code listing 3.1: SyncMain.py

from DryDock.Config import Config
from DryDock.Sync import SyncUtils

def main():
  result = SyncUtils.spawn_sync_process( Config.APPDIR +
	   '/DryDock/Config/SyncKit/sync.sh' )

  if result != 0:
    raise SyncUtils.SyncError( "sync.sh returned an error code of %s" % result )

There's nothing terribly complicated going on here. The only interesting bit happens inside main(). When this kit is invoked by the sync daemon, it executes a file called sync.sh. If sync.sh returns a non-zero return code, then an error is raised. Notice that this function really doesn't do anything at all...all of the work is done in this mysterious sync.sh shell script, which we'll dissect in a second.

So what is this spawn_sync_process function that executes sync.sh? Well, you pass the function the path to a script, and it executes that script. Okay, well what's so special about that? The function takes all of the variables in your DryDock Config.py file and turns them into environment variables that you can use in the script that is executed. This means that you don't have to hard-code directory names or the like into the script. To get a feel for how this is used, lets look at the guts of sync.sh.

sync.sh 

Here it is in all it's annotated glory:

Note: When reading the script below, you'll see references to variables that start with $DD_. These variables are ones automatically generated from your Config.py file. For example, the variable $DD_WORKING_DIR contains the value of the WORKING_DIR variable you set in your Config.py file. The config variables start with DD_ so they won't overwrite any custom environment variables you've set up that might have the same name.

Code listing 3.2: sync.sh

#! /bin/sh
// live_web will hold a copy of the production web tree
WEB_LIVE_DIR=${DD_WORKING_DIR}/live_web
// old_web will hold a copy of the production web tree from
   the previous sync
WEB_OLD_DIR=${DD_WORKING_DIR}/old_web
// temp_web is, you guessed it, a temporary directory 
WEB_TEMP_DIR=${DD_WORKING_DIR}/temp_web

// Where we'll be logging the output of commands to
LOGFILE=${DD_LOG_DIRECTORY}/simple_sync_kit.log



// OK, here we go!

// Write the time to the log file specified above
echo  Start: `date` >>$LOGFILE 2>&1

// Change into the export directory
cd "$DD_EXPORT_DIRECTORY" >>$LOGFILE 2>&1
// If the 'cd' failed, then note it in the log and exit
if [ $? != 0 ]; then
  echo  "Error changing into export directory" >>$LOGFILE
  exit &1
fi

// If 'temp_web' or 'old_web' have lingered from the last 
   sync, delete them
rm -rf $WEB_OLD_DIR $WEB_TEMP_DIR >>$LOGFILE 2>&1

// Copy the export tree over to the temp directory
cp -Rv $DD_EXPORT_DIRECTORY $WEB_TEMP_DIR >>$LOGFILE 2>&1
// If the copy failed, then note it in the log and exit
if [ $? != 0 ]; then
  echo  "Error copying export directory" >>$LOGFILE
  exit 2
fi

// Rename the current live_web directory to old_web
if [ -w $WEB_LIVE_DIR -o -f $WEB_LIVE_DIR -o -L $WEB_LIVE_DIR ]; then
  mv $WEB_LIVE_DIR $WEB_OLD_DIR >>$LOGFILE 2>&1
  // If the move failed, then note it in the log and exit
  if [ $? != 0 ]; then
    echo  "Error renaming live_web" >>$LOGFILE
    exit 3
  fi
fi

// Make the temp directory (which holds the current production tree)
   the new live_web dir
mv $WEB_TEMP_DIR $WEB_LIVE_DIR >>$LOGFILE 2>&1
// If the move failed, then note it in the log and exit
if [ $? != 0 ]; then
  echo  "Error updating live_web" >>$LOGFILE
  exit 4
fi

// Make a note in the log that we've finished
echo  Done: `date` >>$LOGFILE

// Exit with a return code of 0. Remeber that SyncMain.py
   checks the return code and anything other than a '0' will tell
   the sync daemon that something went wrong and to react accordingly.
exit 0

It looks way more complicated than it actually is. Most of the code is just error checking, which is tedious but vital. In any case, that's all there is to SimpleSyncKit.

4. Customizing SimpleSyncKit

What's involved? 

For 99% of people out there, you can just take SimpleSyncKit and hack the sync.sh script to tailor it to your own environment. You don't evern have to make it a shell script: you can alter the SyncMain.py file to instead call a .pl file instead and write your sync routines in Perl. The only restriction is that whatever your script is called, it must have the executable bit (+x) set.

Example: rsync to a remote machine 

If you read the DryDock Best Practices Guide, you'll see that you can get the most network security out of DryDock if you serve up production content from a seperate machine than the one DryDock runs on. But in order to do so, we'll have to hack our SimpleSyncKit slightly.

In the following code, we'll use Rsync to copy the files over to a directory web on a remote machine called www-external.

Code listing 4.1: SyncKit that uses rsync

LOGFILE=${DD_LOG_DIRECTORY}/simple_sync_kit.log

echo  Start: `date` >>$LOGFILE 2>&1

// Mirror the export directory to the remote machine
rsync -az -e ssh --delete "$DD_EXPORT_DIRECTORY" www-external:"web" >>$LOGFILE 2>&1
if [ $? != 0 ]; then
  echo  "Error mirroring export directory" >>$LOGFILE
  exit 1
fi

Important: This won't work out-of-the-box. Since we're telling rsync to use SSH as its transport, you'll have to set up both the local machine and www-external to let your Webware user login via SSH without a password.

For a more elaborate example, you can take a look at ARLSyncKit, which is included in the DryDock distribution. It contains the scripts we use here at the laboratory to sync up our web servers.

Pitfalls 

By and large, customizing your sync behaviour is simple (no, really!). Pretty much anything you can do in a shell, you can wrap up into a sync script. But, of course, there are a few things to keep in mind:

  1. The sync daemon blocks. Remeber that the sync daemon waits patiently for your SyncKit to finish before it can continue. So if your sync script uploads files to your web server via a 300 baud modem or something, be aware that you're holding up the line. Remember that your shell script can always fork and exec a long-running task, if necessary.
  2. No user interaction. If you're using spawn_sync_process to launch a shell script (which you probably are), keep in mind that the script is executed behind-the-scenes, with no user interaction allowed. If you accidentally put in a command that prompts for a password, the sync thread will hang. If you're using SSH, use a pre-shared keypair. Or use expect or the like.
  3. The sync kit is executed as the Webware user. Keep this in mind when you're writing your scripts...they will be run as the user defined in WEBWARE_IDENTITY in your Config.py file.

Serving up production web content 

Once you've figured out where you'll be syncing your production web tree to (be it a different directory on the same machine, or (ideally) a second machine entirely), you'll need to configure your web server to point at this new location.

So, for example, if you sync to a different machine, then you'll need to have a web server running on that box that points to the directory you sync to.

Testing 

An easy way to test your SyncKit is to add the following to the bottom of your SyncMain.py file:

Code listing 4.2

if __name__ == "__main__":
  main()

Then, to run your SyncKit from the command line:

Code listing 4.3

// Replace "path_to_DryDock" with the path to your DryDock source directory.
# export PYTHONPATH=/path_to_DryDock/
# cd /path_to_DryDock/DryDock/Config/YourSyncKit
# python SyncMain.py

5. Resources

At this point, you should be all ready to do some mad customizin'. If you've got problems or didn't understand what the hell I've been talking about in this guide, don't hesitate to drop an email to the DryDock users mailing list. We're all nice people on there. :)



Outline

1. Reading list

2. Anatomy lessons
- The synchronization daemon
- The SyncKit

3. Dissecting SimpleSyncKit
- SyncMain.py
- sync.sh

4. Customizing SimpleSyncKit
- What's involved?
- Example: rsync to a remote machine
- Pitfalls
- Serving up production web content
- Testing

5. Resources

Links

1. DryDock overview
2. configuration section of the DryDock installation guide
3. pre-approved
4. export archive
5. configured SyncKit
6. DryDock Best Practices Guide
7. Rsync
8. login via SSH without a password
9. expect

Valid XHTML 1.0!

Valid CSS!