Introduction: Fault Classification, Types of Redundancy, Fault tolerant metrics
Hardware Fault Tolerance: Fault rate, Reliability, MTTF, Canonical and Resilient structures, Reliability evaluation techniques, Processor level techniques, Byzantine failures
Information Redundancy: Coding techniques, Resilient Disk Systems, Data replication, Algorithm based fault tolerance
Fault tolerant Networks: Network topologies and their Resilience, Fault tolerant routing,
Software Fault tolerance: Single version fault tolerance, N-version programming, Recovery blocks, Conditions and assertions, Exception handling, Fault tolerant remote procedure calls
Checkpointing: Checkpointing in Analytical model, shared memory systems, real-time systems
Case studies: Non-stop systems, Itanium
Defect tolerance in VLSI circuits: Basic yield models, Yield enhancement through redundancy
Faults in Cryptographic Systems: Security attacks, Countermeasures
Prerequisite: Programming and Data Structures, Computer Organization and Architecture
Text Books:
I. Koren, C Mani Krishna, Fault tolerant systems, Morgan Kaufmann
Reference Books:
D. Pradhan, Fault tolerant Computer Design, Prentice Hall
E. Dubrova, Fault tolerant Design, Springer, 2013
K. Trivedi, Probability and statictics with reliability, queuing and computer science applications, John Wiley |