Search: 

Efficient Techniques to Overcome Scaled-CMOS Reliability Challenges

Students

Overview

This research is motivated by an imminent paradigm shift in hardware design resulting from the growing problem of hardware failures in future technologies. The traditional design paradigm assumes that no gate or interconnect will ever operate incorrectly during the lifetime of a design (except for high-end mainframes and safety-critical applications). Such a paradigm will be infeasible in future technologies. One way to break this barrier is to accept the fact that transistors and interconnects will be imperfect, and design robust systems that are failure-aware. To adopt this philosophy for most future systems, not only for mainframes, associated costs must be extremely small compared to duplication or Triple Modular Redundancy (TMR).

Our central vision is to develop enabling technologies and tools spanning multiple abstraction levels to design globally optimized robust systems targeting a wide range of applications without incurring the high cost of classical redundancy.

diagram.gif

Specific projects include:

Built-In Error Resilience

Architecture-aware circuit design techniques for correcting radiation-induced soft errors and erratic bit errors in latches, flip-flops and combinational logic.

Circuit Failure Prediction and Self-Correction

Circuit failure prediction circuits predict the occurrence of a circuit failure before errors actually appear in system data and states. This is in contrast to traditional error detection where a failure is detected after errors appear in system data and states. Circuit failure prediction is ideally suited for major reliability challenges such as circuit aging and early-life failures (also called infant mortality) and enables early self-correction.

Online Self-test

Online self-test is a special kind of self-test where a system tests itself during normal operation without any downtime visible to the end-user. It is ideal for circuit failure prediction, error detection based on periodic self-test, and system diagnostics required for effective self-repair.

Application-aware Robust System Design

Application-aware design techniques utilize the fact that a large class of future killer applications, such as Recognition, Mining and Synthesis (RMS), are inherently error resilient (due to their probabilistic nature) to design globally optimized robust systems that efficiently combine combine a large pool of low cost, ultra-fast and ultra-low-power and, hence, unreliable hardware no longer constrained by worst-case design.

Selected Publications

This site is powered by the TWiki collaboration platformCopyright © 2008 Stanford University