Debugging on Steroids

Alan Sguigna

April 3, 2011
9:18 am

ScanWorks Embedded Diagnostics is embedded firmware which uses a CPU’s debug port to access a system’s architecturally visible registers, memory and I/O. Acting as an “embedded JTAG-based debugger”, it can be operated remotely from anywhere and at anytime and troubleshoots the most difficult-to-debug hardware and software failures.

The Technology

In the past we’ve defined “embedded instrumentation” as logic inserted into an IC which performs various chip and circuit board validation, test and debug functions. Within commercial CPUs, the debug port acts as an embedded instrument – in fact, one can consider it to be the most powerful instrument on the board, as it runs the system’s main software and can access virtually every device within a system.

Historically, “JTAG debugger” tools are used to troubleshoot a system during initial board bring-up. These JTAG debuggers will typically access a CPU’s JTAG port (and, where applicable, some number of sideband signals, such as PREQ_N for Intel and HRESET for Freescale) with a hardware pod and cables, and they run application software on a workstation to perform the processor emulation function. These tools, although extremely powerful, are limited by the requirement of having a local hardware connection to a unit-under-test.

ScanWorks Embedded Diagnostics removes the local external hardware requirement. The emulation function is embedded within firmware on a circuit board’s service processor – be it a BMC, FPGA, or other device. I/O from the service processor is wired to the main CPU’s appropriate debug port signals. Once the technology is embedded, debug port control can be exercised anywhere, anytime, on an unlimited number of systems. It's like a regular debugger on steroids. The power this brings is incredible. Released from the hardware tether, design and technical support engineers can remotely troubleshoot device driver, BIOS, operating system kernel, and catastrophic or intermittent hardware or software failures that cause system crashes or hangs.

The Benefits

Imagine having, during board bring-up, the ability to set a breakpoint in say 50 prototype systems in the lab to increase the probability of catching some extremely intermittent bug.

Imagine having, after a new software release is deployed into the field, the ability to dump the kernel (registers, memory, stack traceback, I/O, etc.) at the precise moment when systems start locking up and customers start complaining.

Imagine being able to say to customers, “We have this new resident utility which retrieves forensic information during system failures and 'calls home' in the event of virtually any soft or hard failure.”

More explicitly, the value proposition for embedded diagnostics is fourfold:

1. Improving Profitability

A technology company’s profitability hinges on time-to-market. Market windows are continually shrinking, and if this design doesn’t come out on time, it will collide with other designs that are in the pipeline. In fact, many projects are canceled if their schedules slip too much. Accelerate the time-to-market and you accelerate profit.

During board bring-up and initial prototype production, having a powerful debugger installed on numerous systems increases the chances of catching and identifying the root causes of bugs. Problems encountered at-scale are difficult or impossible to reproduce on individual or small systems. Embedded Diagnostics gets the bugs out sooner, so you ship sooner.

2. Enhancing Productivity

This applies to both Customer Support and Failure Analysis. We’ve all been there: our technicians spend countless hours trying to reproduce customer problems and then identify their root cause. This happens in the field and in the repair depot. In the meantime, customer dissatisfaction continues to mount. Having better diagnostics which identifies the root cause at the time of the failure shortens the amount of time and effort technicians need to spend resolving problems.

3. Reducing Costs

Having enhanced diagnostics capability in field systems will reduce OEMs’ No Trouble Found (NTF) rates.

When a system encounters a problem in the field, one or more suspect field-replaceable units are often pulled out and sent back to the OEM for replacement. Once in the repair depot, they are tested to identify or confirm the nature of the problem. If the board tests OK in the lab, it’s a dilemma. Do we scrap the board? Do we refurbish it and send it back out? To the same customer? Or maybe a different customer? How many times does this particular board serial number come back into the factory before it is finally scrapped?

The cost of NTF is staggering. Accenture[1] says that return rates in the consumer electronics industry range from 11% to 20% and more than two-thirds of these can be characterized as NTF. Imagine being able to shrink the NTF rate by, say, even 10%. This usually translates into millions of dollars of savings for most large firms.

4. Gaining a Competitive Edge

Not all highly-available systems have extensive self-test and diagnostic capabilities built in. This is the hallmark of quality, and the ability to diagnose in-situ field problems separates “average” from “excellent” post-sales service. Ultimately it comes down to the OEM telling its customers and new buyers that they’ve got the right tools to keep their systems up and running. This is a competitive advantage over those OEMs who cannot rightfully claim this. So the next time a new Request for Proposal (RFP) comes in querying the uptime quotient (i.e. “carrier-grade”, 99.999%+ availability) of an OEM’s systems and what technology supports this, users who have deployed embedded debugging capabilities can respond with confidence.

[1] Accenture, Big Trouble with “No Trouble Found” Returns, 2008, http://www.accenture.com/us-en/Pages/insight-no-trouble-found-electronics-high-tech.aspx.