Crash Recovery for Unattended Computers

We increasingly rely on computers set up as servers to provide us web pages, software downloads, and other files - all at any time of the day or night. We also rely on computers to perform our mundane tasks such as data processing or mathematical calculations. And we expect our computers set up for these specialized services to do their jobs unattended, with minimal intervention.

But an unattended computer can mean unresolved problems. Even the most uncomplicated and well-tested software is not completely immune to crashing. And a crash on an unattended computer, remote server, or public kiosk can be more than a minor inconvenience.

Sophisticated Circuits, Inc. is a leading provider of innovative hardware and software products that can monitor the unattended computer, and can respond to failures -- automatically and independently.

Each of these highly integrated crash detection and recovery devices acts like a full-time sentry, watching the computer to make sure it keeps running and taking over to restart the system when necessary.

Types of Crashes

When the computer is running normally, the system software and applications run in parallel. The system software takes care of the low-level bookkeeping and manages resources needed by the applications. The applications communicate with the system (as indicated by the yellow arrows in the diagram), asking for resources and sharing processing time. System Running Diagram

Thus, there are two types of crashes which can afflict a computer, system crashes and application crashes.

System Crashes

The system crash occurs at a very low level, causing the entire system to come to a halt. Since the applications rely on the system for needed resources, they too stop running. These crashes often cause the dreaded 'blue screen', "system error" bomb dialog box, or the system, including the mouse and keyboard, may simply "freeze". System Crash Diagram

When crashed, the system does not respond to software commands and the Restart command cannot be executed. Recovery requires external intervention, either by switching the computer off and on to reset it or, on a limited number of Macs, by pressing the command-control-power-on-key "reset" keystroke.

Application Crashes

The application crash occurs within a single process. This can cause an "application error" or "unexpectedly quit" dialog box, or the system may appear to stop responding. Often this crash will not affect the rest of the computer, and the system and other applications can still be running, especially applications which normally run in the background. Application Crash Diagram

When crashed, parts of the system may continue to respond to software commands. Recovery can sometimes occur by executing the Restart command, although in some cases, the system may be damaged to the extent that an external restart may be required.

Detecting System Crashes

Kick-off!, Rebound! and PowerKey Pro use a patented combination of hardware and software to detect and recover from system crashes. (For automatic crash detection, USB PowerKey Pro must have the Rebound! upgrade and ADB PowerKey Pro must have the Server Restart Option [SRO].)

To detect system crashes, the software periodically sets (or "tickles") an internal system timer in the hardware, as indicated by the purple arrow in the diagram. While the system is running normally, the software continuously maintains this communication.

System Running with PowerKey

The hardware's system timer runs independently, always counting down from the value set by the software. If the system crashes, the software will no longer be able to update the timer, and the timer will continue to count down. When it reaches zero, the hardware will decide the computer has crashed, which will trigger the hardware to restart the computer.

System Crash with PowerKey

Recovering From System Crashes

Software settings determine how long the hardware will wait before deciding the computer has crashed, and what steps to take when it does. The most common response is simply to restart the computer. Each product has different restarting capabilities:

Kick-off!, Rebound! and the Admin version of USB PowerKey Pro can try multiple times to restart the system. ADB models of PowerKey Pro can be programmed to perform additional actions, such as switching other outlets or launching applications or AppleScripts, in response to a crash.

Detecting Application Crashes

When an application crashes, other tasks can continue without interruption, so no system crash will be detected. To detect application problems, our software includes special Application Timers, or AppTimers, which can receive status information with many popular server applications, in a manner very similar to our system crash detection.

Applications with this support periodically update or "tickle" their own AppTimer within the Kick-off!, Rebound! or PowerKey software at regular intervals while running normally, as indicated with the pink arrow. If the application crashes, it fails to update its AppTimer, and when it expires, the software is then triggered to react.

Application Crash with PowerKey

Mac OS 9 only: Kick-off! and Rebound! can also monitor all running applications and can detect whenever any application unexpectedly quits.

Recovering From Application Crashes

To take advantage of the custom application crash recovery system, the application must be specifically written to include support for our software. Some applications use a plug-in, such as WebSTAR, while other support is completely automatic and transparent, as in AppleShare IP 6.x. See the list of third-party support to learn how these, and other, programs support Sophisticated Circuits' products.

The Kick-off!, Rebound! or PowerKey Pro must be set up to respond when a monitored application crashes. Kick-off! and Rebound! use simple check boxes in the Application Crashes panel of their Control Panels. With ADB PowerKey Pro, an event must be created which combines the Trigger "When Timer Expires" with the Action "Restart".

PowerKey Pro Screen Shot

Direct application support works together with the System Crash Detection to provide double defense against crashes. If the system Restart command fails, the hardware can take over and restart the computer.

Scripting the AppTimer on Macintosh Computers

Using AppleScript, custom Macintosh software, such as databases and interactive presentations, can communicate directly with the Kick-off!, Rebound! or PowerKey software. You can set the AppTimer to a specific number of seconds, as in the AppleScript example below:

        tell application "PowerKey Extension"
            set appTimer to 300
        end tell

This sample script sets the AppTimer in PowerKey to 300 seconds (5 minutes), after which it will immediately begin counting down. You can then program the application to send this script every 60 seconds; while the application is running normally, the AppTimer is reset every minute to the full 300 seconds. If the application fails to repeat the AppleScript within the allotted time, then the AppTimer will count down to zero, triggering the Kick-off!, Rebound! or PowerKey software to react.

The application should ping the Extension no more often than once every 30 to 60 seconds, to keep system overhead down. Also, the AppTimer should be set to a value significantly higher than the "pinging" frequency, so it won't reach zero if the system is busy with other tasks.

This feature may also be triggered with an AppleEvent. Event class and event ID are defined in the user manual for each product.

Adding Direct Support for AppTimers

Apple developers may add direct support to their products for Kick-off!, Rebound! and PowerKey's custom application monitoring by using our Software Developer's Kit. Windows developers will find the Kick-off! sdk information in the Kick-off! for Windows software.