Efficient Techniques to Overcome Scaled-CMOS Reliability Challenges
Students
Overview
This research is motivated by an imminent paradigm shift in hardware design resulting from the growing problem of hardware failures in future technologies. The traditional design paradigm assumes that no gate or interconnect will ever operate incorrectly during the lifetime of a design (except for high-end mainframes and safety-critical applications). Such a paradigm will be infeasible in future technologies. One way to break this barrier is to accept the fact that transistors and interconnects will be imperfect, and design robust systems that are failure-aware. To adopt this philosophy for most future systems, not only for mainframes, associated costs must be extremely small compared to duplication or Triple Modular Redundancy (TMR).
Our central vision is to develop enabling technologies and tools spanning multiple abstraction levels to design globally optimized robust systems targeting a wide range of applications without incurring the high cost of classical redundancy.
Specific projects include:
Built-In Error Resilience
Architecture-aware circuit design techniques for correcting radiation-induced soft errors and erratic bit errors in latches, flip-flops and combinational logic.
Circuit Failure Prediction and Self-Correction
Circuit failure prediction circuits predict the occurrence of a circuit failure before errors actually appear in system data and states. This is in contrast to traditional error detection where a failure is detected after errors appear in system data and states. Circuit failure prediction is ideally suited for major reliability challenges such as circuit aging and early-life failures (also called infant mortality) and enables early self-correction.
Online Self-test
Online self-test is a special kind of self-test where a system tests itself during normal operation without any downtime visible to the end-user. It is ideal for circuit failure prediction, error detection based on periodic self-test, and system diagnostics required for effective self-repair.
Application-aware Robust System Design
Application-aware design techniques utilize the fact that a large class of future killer applications, such as Recognition, Mining and Synthesis (RMS), are inherently error resilient (due to their probabilistic nature) to design globally optimized robust systems that efficiently combine combine a large pool of low cost, ultra-fast and ultra-low-power and, hence, unreliable hardware no longer constrained by worst-case design.
Selected Publications
- T.W. Chen, K. Kim, Y. Kim and S. Mitra, “Gate-Oxide Early Life Failure Prediction,” IEEE VLSI Test Symp., 2008.
- Y. Li, S. Makar and S. Mitra, "CASP: Concurrent Autonomous Chip Self-Test using Stored Test Patterns," Design Automation and Test in Europe, 2008
- S. Mitra, "Globally Optimized Robust Systems to Overcome Scaled CMOS Challenges," Design Automation and Test in Europe, 2008 (Invited)
- S. Mitra, "Circuit Failure Prediction for Robust System Design in Scaled CMOS," International Reliability Physics Symp., 2008 (Invited)
- M. Agarwal, B. Paul, M. Zhang and S. Mitra, “Circuit Failure Prediction and Its Application to Transistor Aging,” IEEE VLSI Test Symp., 2007
- S. Mitra and M. Agarwal, “Circuit Failure Prediction to Overcome Scaled CMOS Reliability Challenges,” Intl. Test Conf., 2007 (Invited).
- P. Relangi and S. Mitra, “Erratic Bit Errors in Latches,” Intl. Reliability Physics Symp. (IRPS), 2007
- S. Seshia, W. Li and S. Mitra, “Verification Guided Soft Error Resilience,” Design Automation and Test in Europe, 2007
- S. Mitra, M. Zhang, N. Seifert, B. Gill, S. Waqas and K.S. Kim, “Combinational Logic Soft Error Correction,” Intl. Test Conf., 2006
- M. Zhang, S. Mitra, TM Mak, N. Seifert, Q. Shi, K.S. Kim, N. Shanbhag, N. Wang and S.J. Patel, “Sequential Element Design with Built-In Soft Error Resilience,” IEEE Trans. VLSI, 2006
- S. Mitra, M. Zhang, N. Seifert, T.M. Mak and K.S. Kim, “Soft Error Resilient System Design through Error Correction,” IFIP SOC-VLSI, 2006.
- S. Mitra, M. Zhang, T.M. Mak, N. Seifert, V. Zia and K.S. Kim, “Logic Soft Errors: A Major Barrier to Robust Platform Design,” Intl. Test Conf., 2005
- S. Mitra, T. Karnik, N. Seifert and M. Zhang, "Logic Soft Errors in Sub-65nm Technologies: Design and CAD Challenges," Design Automation Conf., 2005
- S. Mitra, N. Seifert, M. Zhang, Q. Shi and K.S. Kim, “Robust System Design with Built-In Soft Error Resilience,” IEEE Computer, Vol. 38, Number 2, pp. 43-52, Feb. 2005
|
|
|
 Copyright © 2008 Stanford University
|
|