JTAG-based debug of AMD servers: webinar notes

Earlier this year, I did a webinar on using our SourcePoint JTAG-based debugger on AMD EPYC platforms. Below is the transcript of the audio, for those who prefer to read some of the material.

What we’re going to do today is in terms of agenda is go through a short introduction to JTAG and its applicability for use cases in general, and for debugging in particular. On AMD, we’ll look at the access mechanisms that are supported for JTAG access and control. And then the emphasis is going to be on the SourcePoint application; a visual application used for firmware debug which is very close to the hardware.

And we have a demo lined up where we’ll see some of the things that you can do with JTAG and with SourcePoint, in terms of debugging firmware, UEFI, coreboot, building a Type 1 hypervisor and so on with the actual application right in front of us. And then after the demo, we’ll wrap up and hopefully, there’ll be a bunch of questions that we can go through at Q&A on the back end.

Agenda for SourcePoint AMD webinar

JTAG is actually a very mature technology. I’m sure everybody on the call knows all about it. But it actually turned 30 years old back on February 15, 2020. And JTAG is used as an access and control mechanism to access engines or embedded instrumentation within chips. And JTAG is ubiquitous; it’s inside any chip of any size or complexity. So it’s literally in billions of devices out there and globally in all electronic systems.

It’s actually a very simple architecture, a state machine, a finite state machine, as in the top diagram on the right. And behind every pin, most pins, there is a handful of gates, a couple handfuls of gates, what’s constitute a boundary scan cell, which allows serial access route around the perimeter of the device, and parallel data in, data out access through the pins. So they mediate between JTAG and the chip internals; the boundary scan cells and JTAG mediate between the outside world and the system logic inside of the chips.

As they relate to CPUs, there’s a bunch of applications that have grown around this simple and elegant technology over the last 30 years. Test, debug validation and programming. Today, we’re going to focus on the debug pillar. And talk about what we use what’s called run control, the Intel in-target probe, the AMD hardware debug tool and the Arm CoreSight specification. These are the specifications that revolve around using JTAG in and out to access the debug logic inside of these CPUs.

So why are we even talking about JTAG as a debugging utility? Why not use just OS-based, agent-based debuggers? Well, JTAG lives at the hardware, it actually lives at the interface between the hardware, the firmware and the software itself.

And so, if you’re working at a very, very low level and you need the most powerful debugging mechanism, out-of-band debugging mechanism, then JTAG is the way to go. Certainly, if the platform, if the system that you’re working on is wedged, then JTAG is the only way in. An agent-based debugger is of limited utility at that point. And typically, because of its inherent nature, most of the silicon providers, Intel, Arm, AMD all use JTAG as access and control in early silicon validation before the firmware is even ready. So, there’s obviously some reuse and utility of being able to use the same control as what is used by your silicon supplier.

People often express this application as “hardware-based debug”. JTAG-based debug or hardware-based debug, run control is typical what you hear when we describe run control: halting the system or halting the processor; afterwards, setting a breakpoint and single stepping through the code. And there’s a good debugger sitting on top providing a visual representation of these commands, doing the halt, go, setting breakpoints and all that. Typically, there’s also a scripting language on command line which is sitting on top of the GUI.

A typical example right over on the right-hand side, that you can issue commands interactively, directly on a command line with the platform, and even build them these command and control variables into macros and use for regression type testing, or stress testing.

The typical topology and what you’re going to see in the demo today is the SourcePoint debugger which connects with either a USB or Ethernet interface to a hardware controller. The hardware controller is the ECM-XDP3E. And that in turn is local and connected over JTAG and some sideband signals, the HDT interface with the EPYC server, and for the demonstration, we’re working with an AMD “Rome” platform. It’s a CRB customer reference board called the Ethanol-X, that some attendees might be familiar with.

There are two primary access mechanisms to the platform. What we’re going to be seeing today is the SourcePoint example, whereby we connect locally on the bench top. It is also possible to take the AMD run-control firmware and embed it within a service processor or BMC. And that’s our ScanWorks Embedded Diagnostics, what we call our SED product.

And SourcePoint, if you’ve ever used it in the past, you know that we cover both Intel and AMD on the x86 side, really a visual, source level symbolic debugger. Support for multi-core, multi-socket, multi-processor, multi-thread, you’ll see demonstrations of some of this functionality right during the demo section in a minute.

Okay, you should all be seeing the SourcePoint screen right now, and I’ll be hitting play. And what we have, I’m actually virtual-machining into a local copy of SourcePoint that’s directly in front of and hooked up to that Rome server I was talking about. This is an unlocked Rome server part that we have here. It’s on the Ethanol-X platform. And so we don’t use the debug unlock capability. This is just a unlocked part. But we could initiate Secure Debug Unlock if we needed to.

The windows within SourcePoint, what we’re looking at, what I’m highlighting here is the Viewpoint window. Here, we see all the processors, and cores in this instance, associated with this customer reference board, the Ethanol-X. It’s 64-core platform. As you can see, we’re in a Running state, but some of the cores have been fused off, they’re showing up as Not Active. And this was actually just characteristic of this particular platform.

We’re looking at Core zero, the processor zero, the bootstrap processor, right now. Because we’re actually sitting in the UEFI shell. I’ve got a serial console here, we’re seeing in the shell. I hit Enter a few times. And just to show you that it’s operational, you can type some of the shell commands to illustrate that we’re in fact running live. And over here, I’ve also have a VGA connection right into the platform as sort of a backup thing.

You can see we’re running. We had previously halted in the code window on the left, you can see we’re in the shell, there’s not a heck a of a lot… Not a lot going on right now. We’re in a sleep state waiting for something to happen. I hit that button at the top left, it’s the stop button. And that stops the entire platform.

Here’s a macro button that actually… that halts all the cores, as you can see from the Viewpoint window here, and changes them from a Running state to a Stopped state. General Registers window, you can see the general purpose registers here. And the ones in green are the ones that had changed from the last previous halt. So it’s very helpful in terms of seeing what’s changed, and the ones that are black are the ones that had not changed since the prior halt, or the prior break.

You can see where the Instruction Pointer is saying that’s what’s showing up in the Code window up there on the top left. And let’s see, what else do I want to tell you about: the Command window. Can’t forget that one. It’s the way you interact directly with run control and some of the built in SourcePoint command language, macro capabilities. I can invoke for example AX command shows hex zero.

And if you look at the general registers window again, over on the right hand side, you can see EAX was in fact 0. And AX was zero as well, of course. All right, the breakpoint window is here, I’ve actually placed a hardware breakpoint at the entry to DXEMain, you can tell it’s a hardware breakpoint because it’s in red here.

And we’re going to do some interesting things in terms of… Well, first off, showing you that in fact, the platform is halted. You can’t hear this or see this, but I’m tapping madly on my keyboard and nothing is happening because the platform is actually in a halted state. Which is as it should be. So let’s hit the Go button and put it back into a running state.

And let’s look back at the consoles. Okay, it’s back and running. And those commands that we type are in fact fully operational, again, as you would expect. So, we’re going to stop again, because things get interesting, of course, when you’re in a halted state, as far as JTAG is concerned. That’s a reset button that we’re going to hit right now. All right, so we’re going to debug this thing directly out of reset. I’ve initiated a hardware reset on the Ethanol-X platform. It’s going to take a few moments for the platform to reset itself. Okay, it has stopped as you would expect.

But actually I’ve stopped in La La Land as it were. I’ve actually stopped prior to the reset vector here. We’re at what AMD calls IntChk1. There’s no code. We’re actually not in the x86 world just yet, for some folks that might understand AMD architecture. But you can see the instruction pointer from the general registers window is set to 1710.

Single step button here takes us to IntChk2, which code FF at the address FFFF0093. Again, not much to see at this point, we’ll take one more step. And that will take us to the reset vector. This should be more familiar with folks that have done x86 debug on Intel platforms. In this case, 76FFFFF0, a couple of NOP instructions which we can single step through. And then we do that short jump over here and keep single stepping.

And you can see from the General Registers window again, some of the applicable registers have changed their value, as we step on one by one through the code. So easy enough to track exactly what is going on by looking at the general purpose registers. Here, we’ve got tool tips all over the place just by the way. I’m going to show you a run to DXEmain now. So I’m going to click on the Go button. So we’re going to jump from the reset vector to the entry to DXEmain and bang, you just saw it really fast. You might have missed it, but we had a lot of text scrolling off the serial console.

And here we are, you can see in the Code window that we have stopped at the entry point to the DXEmain procedure there. The yellow bar and arrow is the instruction pointer and the red blob on the left is where I placed the hardware breakpoint associated with DXEmain. So we’re right at the entry point. And the tool tip shows you some good information in terms of addressing and the source path.

Essentially we ran through a lot of code going from reset to DXEmain and you can see all the general registers have been highlighted in green in this instance. All right, I’m going to show a little bit now about what you can see on AMD in terms of what JTAG run-control brings to the table in terms of capabilities that you can use in terms of managing all of the registers, all of the architecturally visible registers on AMD platforms. Hover over the value of a register and you can see the tooltip, in this case the subfields associated with the registers.

And the EFLAGS register is the most interesting one in terms of the number of available subfields there. And you can also right click if you wish and expand on the contents of the registers and click any of the ones, change them in any way you see fit. Very helpful for example, in debugging process, escaping a deadloop or whatever.

We’ve got access to the floating point registers, the segment registers and so on. All of this is very similar to Intel, there are four hardware breakpoints available on the AMD platforms the same as Intel. Yeah, many more software breakpoints are available, of course, on the platforms.

And for the MSR we’ve actually imported the contents of the publicly available MSR documentation. You see here the name, value, address and then description. It’s easy enough to find what you’re looking for. And again, the ones that have changed since the prior break are highlighted in green and all the rest of them are in black. We’ve got access to memory typing, the machine check and so on and so forth. Secure virtual machine, if you’re doing some work, debugging the memory encryption technologies and so on.

All right. Back to the Code window for a second, just wanted to show we’re just looking at the code in source view as it were. We could see where this particular, the DXEMain procedure lives within DXEMain.c. We’ve got all the sources, symbols loaded up associated with the UEFI image on this platform. You can scroll through and look at stuff as you wanted to and look at the tooltips associated with the variables.

You can also look at the code from a Mixed perspective, which interleaves or mixes the C code itself and the associated assembly language. This is the way I typically like to work, I like to see both. Or if you’re really hardcore and don’t really need to look at C code, there’s the assembly language directly available to you as well. You can single step of course, in any of the modes, if you’re in assembly language mode, you can single step and go from one assembly instruction to the next.

If you’re in C code, or in Mixed mode, you can use the step over or step out of buttons there right beside the step by step ones if you’re debugging at a source level as opposed to the assembly language level. All right, let’s see breakpoints.

Let’s go back into Mixed mode. And let’s look at breakpoints for a minute. Because this is really the hallmark of the professional debugger. With SourcePoint, on AMD, as with Intel, you have a plethora of different breakpoints that you can initiate as displayed here. You’ve got available for hardware breakpoints, and however many software breakpoints available.

And when you hit a break that you’ve defined, you can initiate a command or a macro there at the bottom, which allows for regression testing or running an algorithm after you’ve actually hit the break. You can put those breakpoints in via the breakpoint window, or you can do it from within the code window itself. So, you have a choice.

The latter might be more convenient. And the easy way to do that is to apply the code that you’re interested in, set a break, which will appear at the next available instruction. And we have that green blob, we can talk about from software green blob, hardware red blob, essentially, by toggling the breakpoint type, via right click and select. Of course, there are keyboard shortcuts for any of those.

All right. Let’s hit the Go button. And then of course, we jumped from the entry point to DXEMain and just a little bit further down right to the call to AllocateRuntimecopyPool and also look over at the general registers window. You can see the contents that had changed.

The next thing I want to show is the functionality inherent in inspecting memory within SourcePoint. Click on the Memory button at the top and you can see inside the window that you’ve got a number of different ways that you can specify the specific memory that you’re looking for: linear, physical, segment, offset or even symbolic by hitting the magnifying glass and browsing through the symbol library.

Let’s have a look at and show what the Memory window looks like. Just for grins, I’m going to pull out the instruction pointer contents and go back open the memory address window, copy and paste that in. And you see the memory window here. All right. Yeah, lots of different ways you can customize the view, we’re looking at eight-bit right now, you can specify 16 bit, 32 bit, 64 bit, if you’re looking at the contents, or double words or whatnot, and specify how you want to see any associated text or character in ASCII or Unicode. Nothing much to see here right now, because I’m just looking at code.

All right, let’s have a look at some of the powers of source level debug associated with SourcePoint. I’m back in DXEMain again, at the entry point where I placed that hardware breakpoint right at the beginning. And I’ve also put a software breakpoint if you look carefully at the breakpoints window down below, you’ll see that I’ve got a software breakpoint at CoreInstallConfigurationTable plus hex18.

All right, so it’s a little further down that procedure, and the InstallConfigurationTable is invoked a handful of times from within DXEMain. And what it does is it updates the EFI system table configuration table with the contents. Clicking on Symbols shows me a little bit of what we can do with that power. Looking at Globals, Locals, the Stack or the Classes associated with the image. You can see that we’ve got a rich set of contents here.

Here’s what’s local to DXEMain. If you want it, you can also get some more information in addition to seeing the name and value: you can look at the address and the actual type of that symbol. Those symbols, that’s what’s on the stack. And there are the classes available with the structs that are now accessible therein.

Just let’s see, let’s display some of the power and the functionality of symbol search. You right click in any of these windows, any of these tabs actually. And using the find command, you can look inside the code right inside the image.

And you’ve got a lot of control over what you’re looking for, like starting with, containing, wildcard pattern or even putting in a regular expression. And you’ll see how fast it is, because as you can imagine, there are a lot of symbols inside this image.

I’ve got a search string, and we’re looking for “coreinstall” something. And you can see I found four instances of it here. Let’s move this over a little bit. You can see, one of which is my CoreInstallConfigurationTable function. And if I wanted to, I could open a Code window associated with that symbol, within that function, at a breakpoint.

Let’s get into some source level debug fun, then.

I’ve got a software breakpoint inside of my root CoreInstallConfigurationTable routine.

Its first invocation is installing the DXE services table. Right and then we do the HOBs and then we do the memory information table. Then I hit Go again. And we jumped to that point, right it’ll be the first invocation since the first invocation of InstallConfigurationTable. And there, it happened so fast, you might have missed it actually.

Now you can see I’ve actually stopped, the instruction pointer is now at hex 18 down from the start of the function routine start. I’ve got a couple of parameters coming in, pointers to GUID and Table here. And let’s see SourcePoint’s symbol capability. Let’s copy that variable, that pointer into a Watch window. In fact, I’ve already taken the liberty of copying both Table and GUID, the two incoming parameters, to the Watch window. And you can see the values associated with each on the initial entry point.

If I hit Go again, it’s going to run out of CoreInstallConfigurationTable, go back into DXEMain and get to the second invocation of the of CoreInstallConfigurationTable. Actually, I hope you saw that, but the second time in, that’s when we’re installing the HOBs list right into the EFI’s configuration table.

So hopefully, that has been helpful and easy demonstration, just what you would get with the, for example, Visual Studio type of debugging environment. But at a very low level and using JTAG. I’m going to wrap up the demonstration now with a look at what we have in terms of documentation and some of the power of the command language, the macro language that we have available.

Under the Help menu, click on index, and you can search a ton of information. Let’s look at the Go command for example. It’s not just Go and Stop. It’s actually a command with a bunch of possible parameters.

You can preset with a P0, P1, the actual processor viewpoint the processor that you want the Go command to operate on. And then there are a bunch of parameters as defined below, you can go forever, go till a software breakpoint, address, event, and so on. And all of the documentation, of course, has some good examples on the back end for each of these commands that allow you to understand exactly how to use the function.

To illustrate I’m going to show how the Go command can be used to actually run the target and then halt it. I hit Go in the Command window.

As you can see, we’re at the software breakpoint again, as we were. Let’s get a little bit more complicated. ‘go til 80IO’, that means it’s going to run the target until it hits the next Port 80 read or write. And if you glance back up at the Code window now you’ll see okay, the source actually changed. Put this in Mixed mode so we can see it a little bit better. But the instruction pointer, the yellow bar is just one instruction down from the OUT instruction of EAX; the AX register, contains C38. As you can see from the general registers window that was copied into the contents of DX which is hex 80, port 80. And that shows the OUT instruction.

We could also, even more powerfully, put conditional logic in here and put that to use, and do a while loop which runs forever. And we’re going to run until the next Port 80 read or write and then we’re going to wait and then we’re going to display the AX register contents at that time. And it just keeps going forever. As you can see here. It’s updating the Code Window in real time, Viewpoint, General Registers window, all update in real time and I’ve canceled it since otherwise it would go for a fairly long time.

But you can see that just with one simple command, command string as it were, initiated on the platform, you can do a lot of really cool things. For Documentation, if you really want to, you can use the help files, online files, or you can crawl through 645 pages of documentation associated with our Users Guide. This is actually how I prefer to consume, read content, sit on the couch. And yeah, go through a book from beginning to end.

Plenty of information, well written, easy to understand what’s all of the basic functionality associated with the GUI. Half of the 645 pages is associated with the GUI itself and the capabilities of SourcePoint as you’ve seen it so far, and the other half of the documentation is a description of the command language, the macro language.

I’ll show you an example of the macro buttons defined at the very top under the title bar for SourcePoint. Buttons can be associated with macros. You can write these yourself or use the built-in macros built into the application. This is the macro button at the top that dumps the HOBs.

And just hitting that button invokes this macro. You can see the command language, the macro language, is very ‘C’-like. If you’re a ‘C’ programmer and know a little about x86 debug, it should be fairly simple easy to understand and to code up your own applications.

Wrapping up, for any more information on this topic, by all means search for content on our website

Thanks, everyone. If there are any other unanswered questions or anything else that anyone wants to know about SourcePoint AMD, feel free to drop me an email.

Alan Sguigna