Comparison between Checkpoint Schemes – Basic Computer Science

Table 1. Comparison between uncoordinated, coordinated and qusi-synchronoun checkpointing

Coordinated checkpointing algorithms are made up by using the following scheme:

• All process checkpointing: This requires all processes in the system to participate in every checkpointing session.

• Minimum process checkpointing: These algorithms only forces those process to take their checkpoints which communicated with the initiator directly or indirectly since the last checkpoint need to take new checkpoints.

• Blocking: Blocking algorithms force all relevant processes in the system to block their underlying computation during checkpointing latency.

• Non-blocking: In non-blocking algorithms applications processes are not blocked when checkpoints are being taken.

As mobile computing faces many new challenges such as low wireless bandwidth, frequent disconnections and lack of stable storage at mobile nodes. These issues make traditional checkpointing techniques unsuitable to checkpoint mobile distributed systems . A good checkpoint algorithm for mobile systems needs to have following characteristics [10]. It should impose low memory overheads on MHs and low overheads on wireless channels. The disconnection of MHs should not lead to infinite wait state. The checkpointing algorithm should avoid awakening of an MH in doze mode operation. The algorithm should be non-blocking and minimum-process. There is a tradeoff between coordinated and uncoordinated checkpointing approach for mobile systems. Some of the approaches advocate coordinated checkpointing[1-4,8,11,13,15,17,21,24- 26], as it free from domino-effect and others advocate un-coordinated checkpointing , due to lots of synchronization overhead caused by coordinated approach. But un-coordinated checkpointing in true sense is not suitable mobile computing and even for distributed systems due to number of reasons . If the frequency of local checkpointing is high, each process will have multiple checkpoints, which require a large amount of stable storage and introduces a lot of communication overhead in mobile computing systems. The stable storage and communication overheads can be reduced by taking local checkpoints less frequently. However, this will increase the recovery time as greater rollback and reply will be needed. Even though some algorithm were proposed to reduce the number of checkpoints to be saved on the stable storage, to ensure correctness, a process still needs to keep many more checkpoints in uncoordinated checkpointing algorithms. So if we reduce the synchronization overhead from in coordinated approach, then it can become quite effective for mobile systems [27]. In coordinated checkpointing, processes take checkpoints in such a manner that the resulting global state in consistent. Mostly it follows two-phase commit structure [1,2,4,8,13,19,26,27,] [31]. In the first phase, processes take tentative checkpoints and in the second phase, these are made permanent. The main advantage is that only one permanent checkpoint and at most one tentative checkpoint is required to be stored. In the case of a fault, processes rollback to last checkpointed state.