As part of the Open Compute Project, Microsoft Azure is leading the charge in providing the technical information needed to democratize the server industry. Recently, they put into the public domain the full schematics and board files for an Intel Xeon Scalable Processor (Skylake-EP) server design. Reviewing the schematics provides great insight into the hardware design implementation needed to support at-scale debug via embedded JTAG run-control.
This past month, Microsoft published technical collateral on server motherboards based upon Intel Xeon Scalable Processors, AMD Epyc, and Cavium ThunderX2. There is also some information available on the Qualcomm Centriq 2400, but it is lightweight and was published back in February of 2017. Presumably, the Qualcomm information will be put into the public domain, once it is mature.
This initiative is entitled “Project Olympus” and provides an open source hardware description for ODMs and VARs to build upon.
The Intel motherboard Electrical Collateral provides a wealth of valuable information, including the board schematics, CAD files, and Bill of Materials. The schematics themselves are 238 pages in length, and completely describe the electrical design. The JTAG Block Diagram as listed within the Index on page 19 is strangely missing, but it’s possible to glean the overall connections between the BMC and the CPU debug port by looking at various pages within the schematics. Overall, the implementation is similar to what I’ve described in previous articles on earlier architectures, most notably Intel x86 Design for Debug Guidelines, which is itself derived from the public Intel document Debug Port Design Guide for UP/DP Systems.
Looking at sheet 123 of the schematics, we can see GPIO of the BMC assigned to the following signals:
BMC_XDP_JTAG_SEL manages whether the JTAG_BMC signals can pass through to the XDP.
BMC_XDP_PRSNT_IN_N allows the BMC to enable debug mode.
BMC_XDP_OBSFN_A0 is the equivalent of PREQ, which initiates probe mode.
BMC_XDP_OBSFN_A1 is the equivalent of PRDY, which acknowledges probe mode entry.
Further, the JTAG slave pins of the ASPEED AST2400/AST1250 are also used to master JTAG back to the processor when needed, as seen in sheet 124:
Looking at the big picture, the overall topology and application of these connections can be seen pictorially as follows:
What is the purpose of these interfaces? The ASPEED BMC is expected to master JTAG and the other sideband signals to initiate hardware-assisted debug and test of the servers, at-scale, unconstrained by the need to physically connect external hardware and cables to the target. There are three primary use cases:
- Run-control, such as for crash forensics or for in-situ, cover-on debug (as contrasted against in-lab, cover-off debug). This is accomplished with run-control deeply embedded in the target, with no remote host.
- Run-control, such as for in-situ, cover-on debug (as contrasted against in-lab, cover-off debug), including Intel CScripts. This involves a remote host, for either automated (i.e. via an external rack manager) CScripts execution, or interactive CScripts / hands-on debug.
- Boundary-scan test (BST), such as for embedded manufacturing test (as contrasted against external manufacturing test).
In short, the Project Olympus servers will be capable of an unprecedented level of debug forensics and test capabilities, to troubleshoot the most intermittent, difficult-to-isolate and duplicate problems.
This capability is also available on the Cavium ThunderX2 and AMD EPYC Olympus designs, as can be seen from the following excerpts from the motherboard specifications respectively:
ASSET’s solution in this space is ScanWorks Embedded Diagnostics (SED), that provides a higher-performance, more flexible and powerful platform than other competing solutions. For more information, please register for our SED eBook.