• Publication
  • 1988
  • Issue No. 6 - June
  • Abstract - Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation
 This Article 
 Bibliographic References 
 Add to: 
Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation
June 1988 (vol. 14 no. 6)
pp. 810-821

An approach to coordination of cooperating concurrent processes, each capable of error direction and recovery, is presented. Error detection, rollback, and retry in a process are specified by a well-structured language construct called recovery block. Recovery points of processes must be properly coordinated to prevent a disastrous avalanche of process rollbacks. The approach relies on an intelligent processor system (that runs processes) capable of establishing and discarding the recovery points of interacting processes in a well coordinated manner such that a process never makes two consecutive rollbacks without making a retry between the two, and every process rollback becomes a minimum-distance rollback. Following a discussion of the underlying philosophy of the author's approach, basic rules of reducing storage and time overhead in such a processor system are discussed. Examples are drawn from the systems in which processes communicate through monitors

[1] A. Abouelnaga, "Validation of recoverable concurrent software systems based on the programmer-transparent coordination scheme," Ph.D. dissertation, Dep. Comput. Sci. and Eng., Univ. South Florida, May 1986.
[2] T. Anderson and R. Kerr, "Recovery blocks in Action: A system supporting high reliability," inProc. 2nd Int. Conf. Software Engineering, Oct. 1979, pp. 447-457.
[3] P. Brinch Hansen,The Architecture of Concurrent Programs. Englewood Cliffs, NJ: Prentice-Hall, 1977.
[4] K. M. Chandy, "A survey of analytic models of rollback and recovery strategies,"Computer, vol. 8, pp, 40-47, May 1975.
[5] E. Dijkstra, "Co-Operating sequential problems," inProgramming Languages, F. Genuys, Ed. New York: Academic, 1972.
[6] E. W. Dijkstra, "Hierarchical ordering of sequential processes,"Acta Inform., vol. 1, no. 2, pp. 115-138, 1971.
[7] H. Hecht, "Fault-tolerant software for real-time applications,"ACM Comput. Surveys, vol. 8, no. 4, pp. 391-407, Dec. 1976.
[8] C. A. R. Hoare, "Monitors: an operating system structuring concept,"Commun. ACM, vol. 17, no. 10, pp. 549-557, Oct. 1974.
[9] J. Horning, H. C. Lauer, P. M. Melliar-Smith, and B. Randell, "A program structure for error detection and recovery,"Lecture Notes in Computer Science, vol. 16, New York: Springer-Verlag, 1974, pp. 171-187.
[10] K. H. Kim and C. V. Ramamoorthy , "Structure of an efficient duplex memory for processing fault-tolerant programs," inProc. ACM SIGARCH 5th Symp. Computer Architecture, Apr. 1978, pp. 131-138.
[11] K. H. Kim, "An implementation of a programmer-transparent scheme for coordinating concurrent processes in recovery," inProc. COMPSAC'80, IEEE Comput. Soc. 4th Int. Computer Software and Applications Conf., Oct. 1980, pp. 615-621.
[12] K. H. Kim, "Approach to mechanization of the conversation scheme based on monitor,"IEEE Trans. Software Eng., vol. SE-8, no. 3, pp. 189- 197, May 1982.
[13] K. H. Kim, "Software fault tolerance," inHandbook of Software Engineering, C. R. Vick and C. V. Ramamoorthy, Eds. New York: Van Nostrand Reinhold, 1984.
[14] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, pp. 220-232, June 1975.
[15] C. S. Repton, "Reliability assurance for System 250: A reliable, real-time control system, " inProc. Int. Computer Communication Conf., 1972, pp. 297-305.
[16] J. A. Rohr, "STAREX-self-repair routines: Software recovery in the JPL-STAR computer," inDig. 1973 Int. Symp. Fault-Tolerant Computing, pp. 11-16.
[17] D. L. Russell, "State restoration in systems of communicating processes,"IEEE Trans. Software Eng., vol. SE-6, pp. 183-194, Mar. 1980.
[18] S. K. Shrivastava, and J. P. Banatre, "Reliable resouce allocation between unreliable processes,"IEEE Trans. Software Eng., vol. SE- 4, pp. 230-241, 1978.
[19] S. K. Shrivastava, Ed.,Reliable Computer Systems. New York: Springer-Verlag, 1985.

Index Terms:
programmer transparent coordination; system recovery; storage reduction; data structures; recovering concurrent processes; error direction; language construct; recovery block; intelligent processor system; process rollback; minimum-distance rollback; time overhead; data structures; error detection; fault tolerant computing; multiprocessing programs; programming theory; supervisory programs; system recovery
K.H. Kim, "Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation," IEEE Transactions on Software Engineering, vol. 14, no. 6, pp. 810-821, June 1988, doi:10.1109/32.6160
Usage of this product signifies your acceptance of the Terms of Use.