Hardware redundancy (Hardware Redundancy) is a technology that achieves fault tolerance by configuring additional hardware resources, and it is a core method in the reliability design of computer systems. Its basic principle is to use redundant physical devices or components to eliminate single-point failures and ensure that the system can still operate normally even when a local hardware fails [1] [3-4].
This technology is classified by implementation methods into static redundancy, dynamic redundancy, and hybrid redundancy. Static redundancy uses a voting mechanism to shield faults (such as the three-mode redundancy TMR system), dynamic redundancy achieves fault transfer through primary-standby switching, and hybrid redundancy combines the characteristics of both [1] [4] [8]. Common applications include dual power supplies, RAID disk arrays, and dual network card configurations. In the server, aerospace, and industrial control fields, fault detection and recovery are realized through master-slave architectures, dual-core lockstep architectures, and other schemes [3] [5-7].
The origin of redundancy technology can be traced back to Von Neumann’s research on fault-tolerant control. As the reliability requirements of computer systems increased, it gradually developed into a complete technical system. In the industrial field, a triple redundancy architecture is adopted, such as the 2oo3 heterogeneous processor architecture used in the flight control computer of Boeing 777, marking the deepening application of hardware redundancy in safety-critical systems.