Abstract:
Generally, this disclosure provides error management across hardware and software layers to enable hardware and software to deliver reliable operation in the face of errors and hardware variation due to aging, manufacturing tolerances, etc. In one embodiment, an error management module is provided that gathers information from the hardware and software layers, and detects and diagnoses errors. A hardware or software recovery technique may be selected to provide efficient operation, and, in some embodiments, the hardware device may be reconfigured to prevent future errors and to permit the hardware device to operate despite a permanent error.