A DryDock history lesson
Deepak GiridharagopalPrincipal Developer
Updated: 2003/10/23 00:55:32
Version: 1.3
Abstract
This guide tells you about what failings we (ARL) experienced with our old web publishing system. We'll discuss the motivations behind DryDock's conception and some of the technology choices we made. For an overview of how DryDock works, you should instead look at the DryDock Overview Guide.
1. The problem, in a nutshell
Note: This information parrots the content found in the 2003 LISA Conference paper on DryDock. |
Information is valuable. While nearly all organizations go to great expense to protect their networks, a much smaller percentage have formal safeguards against the accidental dissemination of sensitive information. In a web server environment where many users can update different parts of a web site at once, auditing the server for inappropriate content becomes an increasingly difficult system administration task. Most system administrators can't easily determine why a particular file exists on a web server, or who in management authorized its publication. How can administrators be expected to safeguard information if they can't tell which documents are fit for the public and which ones aren't?
This was the situation in our organization, Applied Research Laboratories, The University of Texas at Austin (ARL:UT).
2. The situation at ARL
Since 1994, ARL:UT has had a publicly accessible web server. Initially, we served simple, static pages. There was relatively little web server traffic, and, like many organizations at the time, we didn't concentrate on the security of our network or our information. Our web server resided on our internal network with full access to our file server and other intranet resources, and employees were trusted to only serve documents that were appropriate for public viewing.
By 2001, our web presence had grown in both traffic and size by several orders of magnitude. Our site's much larger scale made it impossible for administrators to effectively police its content. Our formal publishing policy and guidelines were conceived for paper documents, not web pages. Their checks and balances were inadequate when applied to our web architecture, which allowed users to publish pages without even a cursory review. Many staff members were unaware that web pages even fell under these guidelines. We found ourselves unable to track who published specific files and who had deemed those files fit for the public. Since much of ARL:UT's research is sensitive and proprietary, we needed to strictly regulate the flow of information from inside our organization onto our public web server.
To ensure that only material suitable for public viewing appeared on our web site, we needed to force documents to undergo an approval process -- only files that successfully complete the process will move to the web server. Furthermore, for any publicly viewable file, we needed a way to determine who authorized its publication, when the file was published, and for what reason. Not only did we need this information available for currently published files, but for any previous versions as well. A web publishing system that provided us with these features would enforce our information security policies by ensuring publicly viewable content is acceptable and well accounted for.
3. Why we rolled our own solution
In late 2001, we searched for tools that would put our new web publishing plan into service. Of the countless managed web publishing solutions on the market at the time, we found none that, out-of-the-box:
- implemented a role-based approval process appropriate for our organizatoin's managerial structure
- gave us the thorough ervisioning and auditing capabilities we needed
- didn't mix approved and unapproved content on our public web server
- had a friendly, web-based user interface that managers could understand
- were minimally intrusive for both our system administrators and our web developers
- used technologies we were already familiar with
Sweeping, monolithic content management systems such as Vignette StoryServer\cite{storyserver} were inappropriate for our environment. ARL:UT is comprised of many autonomous groups that have their own web development methods and practices. While the need for a formal information security process was universally recognized, a sweeping content management system was both politically unfeasible and far too expensive.
Portal and weblog systems such as PostNuke, Tiki, or Plone focus on creating highly dynamic, interactive web sites. Thus, they frequently offer collaborative features such as article syndication, Wiki systems, forums, and user commentary. At ARL:UT, however, we needed a strict web presence that was decidedly static and non-collaborative -- this obviated many of these systems' features. All we wanted was software that would allow approved documents through to the web server, while blocking all other unauthorized updates. All of these packages required so much customization that it was easier for us to build our own solution, tailored specifically to our environment.
4. Technologies we liked
Having resigned ourselves to writing a custom tool, we began looking at platforms upon which we could base our application. We were particularly interested in the Zope application server. Written in Python, Zope has been used to build many complex and dynamic web sites. Its features include user management, web-based administration, searching, clustering, and syndication. Like the aforementioned packages, however, a great deal of Zope's dynamic componentry was of no use to us, and much of Zope's functionality fell far outside the scope of simple publication oversight. Though these issues weren't intractable, when combined with Zope's steep learning curve, they led us to look at other less complex and less ambitious platforms.
We settled on WebKit, the Webware for Python application server. WebKit uses a design pattern fashioned after Sun's Java Servletsarchitecture (a paradigm we were familiar with), and includes little extraneous, dynamic componentry we'd have to work around. We could implement all of the heavy-lifting functionality in plain Python and use a small number of servlets to expose a web interface -- we found such an architecture much more workable via WebKit than Zope.
Using WebKit, we devised an application that would give us the security and auditing features we required. Several months later, we put DryDock into production. DryDock has been managing our web publishing for over a year and a half now.
5. Resources
For information on how exactly DryDock solved any of these problems, you can flip through the DryDock overview.