Switch Dual Shock adapter part 5: Analysis!

Posted on May 3, 2023

In this series of posts, I’m attempting to make a Dual Shock to Switch controller adapter. It will plug into the Switch Dock’s USB port.

Last time, I got the ATmega masquerading as a Pro Controller and controlling the Switch. This time debugging and performance improvements!

Although I ended the last post with control of the Switch working, there are a few problems:

  • Sometimes, when I plug it in to the Switch, it just doesn’t work. The blinking LED freezes, indicating that my halt(...) function is being called, indicating that something unexpected has happened during communication.
  • When plugged into the USB port on the Switch dock (rather than the USB-C port on the Switch itself) it hardly ever works. This is obviously problematic because the point of this (besides having some fun making it) is that it will allow me to sit on the couch and play games on the TV using the Dual Shock.
  • If it does work it usually stays working for 10-20 minutes (and sometimes for hours!) - but then it, again, mysteriously halt(...)s.

And there a couple of other questions that have been nagging at me:

  • I’ve already calculated that the answer to this is ‘yes’, but it would be nice to see hard visual evidence: Can this really actually transmit controller state updates fast enough to control games?
  • It feels like the answer to this is also ‘yes’, but it would also be nice to get hard evidence before continuing: With all the processing the ATmega is already doing, will there really also be enough time to read the Dual Shock’s state and translate it to Pro Controller format?

In this post, I’m going to investigate all of this.

Debugging

I have an oscilloscope with a logic analyzer add-on1 that I’ve hardly ever used burning a hole in my pocket, uh, workbench.

The wonderful Ben Eater made a few excellent videos where he hooks up oscilloscope probes to a USB bus here, here, and here. They’re well worth a watch if you want to learn more about how USB signaling works - or you’re just interested in tech workings in general (and, really, I’d question why you’re still reading these posts if you’re not…). But I wanted to start out simpler, so I added a few debug signals on some of the unused pins of the ATmega.

Here’s our old friend, the ATmega8A pin out, from the ATmega8A datasheet:

ATmega8A Pin Out Diagram

It looks like pins 15-19 are good candidates for ‘debug pins’. They’re marked PB1, PB2, PB3, PB4, and PB5, meaning that they correspond to bits 1 to 5 of the 8-bit PORTB variable - nice and consecutive, easy to keep track of in code - and they are physically next to each other on on the top right of the chip on the breadboard, so easy to keep track of in the real world too.

If you’re unfamiliar with PORT variables - or need a refresher, I wrote a little about them in a footnote to part 2 of this series.

I’m actually using one of these bits already: the one-second-flashing LED is connected to pin 19 - or PB5 (PORTB bit 5). This blinking LED has been around since I started this project, and besides looking nice on the breadboard it serves to show that the system is not in an error state: it stops flashing when the code calls halt(). Before these changes, I was still using the Arduino library’s digitialWrite(...) function and BUILTIN_LED pin definition to set its state. This felt less and less in line with the rest of the code, so I took the opportunity to change that use the PORTB variable directly too.

To set this up, I added this to the setup() function:

// We will use Port B bits 1-4 (pins 15-18) as 'debug' output, and
// bit 5 (pin 19) as our one-second-blinking 'status' LED.
// Configure them as output, and ensure they start as zero.
DDRB  |= 0b00111110;
PORTB &= 0b11000001;

Here I’m using a logical ‘or’ to set bits 1-52 of DDRB - the Data Direction Register for port B - to 1, without disturbing the other pins. This tells the ATmega that the corresponding pins (PB1-PB5) should be configured as outputs (0 would mean input - the default state).

The next line uses a logical ‘and’ to set bits 1-5 of PORTB to 0 without disturbing the other pins. This ensures they start out in a low state.

Then I bracketed interesting portions of my code with statements like this3:

// Set the pin corresponding to bit 3 of PORTB to output high.
PORTB |= (1 << 3);

/* Do something */

// Set the pin corresponding to bit 3 of PORTB to output low.
PORTB &= ~(1 << 3);

This will mean that when we look at oscilloscope traces, we’ll be able to tell when the code is executing by looking to see if the waveform is ‘high’.

I used:

  • Bit 1 to bracket the entire main loop.
  • Bit 2 to bracket the sendReportBlocking(...) function - the function that loops sending 8-byte chunks of a single report to the host until a whole report is done.
  • Bit 3 to bracket the usbFunctionWriteOut(...) function - the callback function that V-USB calls to provide data it’s received for an OUT endpoint.
  • Bit 4 to bracket the entire reception of a report. Because receiving a report takes more than one call to usbFunctionWriteOut(...), this isn’t a straightforward bracketing of code like bits 1-3. Instead, I set it to 1 when the first part of a report is received, and set it to 0 when a report has been fully received and processed.
  • Bit 5 to blink the LED. Bit 5 corresponds to pin 19 - the pin the LED is already connected to. I re-wrote the routine that does the blinking to use PORTB directly instead of Arduino’s digitalWrite(...).

Here’s the diff in the GitHub repo.

Then, I hooked up these pins to lines 1-4 of my logic analyzer and, for good measure, also hooked the the USB DATA- and DATA+ lines (pins 4 and 5 on the ATmega - PD2 and PD3) up to lines 15 and 16.

Here’s what that looks like on my desk. In the picture the breadboard is just connected to my Mac rather than the Switch - to check it’s working before I lug the Switch and its dock out to the shed.

My shed desk with oscilloscope, breadboard and Mac

Here’s a close-up of the breadboard - you can see the logic analyzer’s little grabbers connected to the pins.

What does the oscilloscope screen show? Thanks to the scope’s ability to save PNG files, here’s a screenshot:

Oscilloscope screenshot - description in main text

The breadboard is still attached to my Mac, not the Switch. When connected to my Mac, it still sends state reports to the IN (“in” to the host - the Switch) endpoint periodically - but it doesn’t receive anything on the OUT (“out” form the host) endpoint because it’s only the Switch’s custom protocol that uses it.

There are six tracks on screen. They show the state of the probes over time (time is the horizontal axis, state the vertical; each line is either high or low at any time).

There’s one track for D1 - connected to PB1 (pin 15), and so on up to D4 - connected to PB4 (pin 18). D14 and D15 show the state of the DATA- and DATA+ USB lines.

Looking at the top track, there are some solid looking bits, and some bits where the state stays high for a ‘long’ time. The solid looking bits are actually the state oscillating really fast between low and high. They look solid in the output because there’s not enough horizontal resolution to show the transitions. These rapid oscillations correspond to the main loop() running while usbInterruptIsReady() is returning false. It switches bit 1 high, runs its code - which does hardly anything because usbInterruptIsReady() returns false - then switches bit 1 low again, and loops.

The long highs visually correspond to a long high in the second track too. This is the sendReportBlocking(...) call inside the main loop. sendReportBlocking(...) sets its bit high when it’s called, both tracks stay high until an entire report is sent, at which point sendReportBlocking(...) sets its debug bit low again, and we return to the main loop - which starts looping really fast again because usbInterruptIsReady() is returning false.

So far, this makes a lot of sense!

The horizontal (time) resolution is set so that the width of each ‘box’ in the background represents 10 milliseconds. This is shown in the oscilloscope’s interface - note the ‘H 10.0ms’ in a box on the top-right. So each of those high periods look like they’re about 15ms? Luckily, I don’t need to judge this form myself. The scope can measure statistics about what it’s displaying, and I’ve set it up to show some at the bottom of the screen. There are three values. ‘Period2’ is the period of track 2 - 16.00ms. ‘Freq2’ is the frequency - 62.5Hz. ‘+Width2’ is the width of a high (+) section - 14.00ms.

The ATmega is managing to send 62.5 updates per second. Most Switch games run at about 30 frames per second. 62 updates per second is probably fine for controlling a 30Hz game like Breath of the Wild. It’s arguably okay for controlling a 60Hz game like Metroid Dread. It would be better if it were higher though.

Let’s zoom in!

Oscilloscope screenshot - description in main text

Now, we have ‘H 2.0ms’ - 2ms per horizontal division.

Take a look at D14, the USB DATA- line. There are short downward spikes every 1ms. These are what V-USB refers to as ‘SOF’s - ‘Start-Of-Frame’4, and what the USB spec calls ‘Low-Speed keep-alives’. They’re just short downward spikes, or pulses. There should be one every millisecond, and that’s what we see. This 1ms is being used to accurately set the ATmega’s internal clock, as discussed in part 2 of this series.

In addition to that, we can see data transfer happening every two milliseconds - those bursts on both the DATA- and DATA+ lines. That is the ‘interrupt’ to IN endpoint 1 - the one controller state reports are sent on. In the configuration descriptor I set up in the previous part of this series, we asked for an 2ms poll interval for this endpoint.

Perhaps we could ask for a 1ms period? Be able to send data after every SOF pulse instead of every second one? Get double the update rate? All I can say is that doesn’t seem to work. I think it’s because it doesn’t leave time for other traffic5.

Let’s attach the breadboard to a Switch!

…And take a look at the scope again:

Oscilloscope screenshot - description in main text

This trace was taken with the ‘controller’ in full flow, alternating left and right presses every second.

It’s quite similar to the Mac trace. Which I suppose is unsurprising - I’m not sure what I was expecting really.

The only difference I can see is that actual data packet transfer starts closer to the SOF ‘spike’ on DATA- than it does when connected to the Mac. Here’s the Mac trace zoomed in further, to a 20us-per-division:

Oscilloscope screenshot - description in main text

There’s over 6 20us boxes between the spike and the data packet transmission beginning.

But on the Switch…

Oscilloscope screenshot - description in main text

…they’re much closer together - in fact, you’d be forgiven for not noticing that the SOF spike was separate from the transmission, even when zoomed in this far. They look maybe 5-10µs apart.

This is interesting, but I don’t think terribly important.

Questions

Time to stop playing around and try to answer some questions. I’m not sure I’ll be able to get to the bottom of all the things I listed at the start of this post with the scope, but there are two things I’d like to investigate while I’ve got this set up:

  1. With all this processing and USB handling the ATmega is doing, will there really also be enough time to read the Dual Shock’s state and translate it to Pro Controller format?
  2. Can we see anything that might indicate why this fails so much more often when attached to the dock instead of directly to the Switch?

Processing time

There’s almost enough in the above traces to answer question 1.

Ideally, we’d read the Dual Shock state as soon as possible before transmitting it to the Switch. It looks like we have some time in the main loop, in those solid-looking blocks where we’re just looping waiting for usbInterruptIsReady() to return true. They look about 2ms long. Reverse-engineered Dual Shock protocol information suggests it usually communicates at a ‘bit rate’ of 250KHz - and can go up to at least 500KHz - which means that in 2ms we could transfer at least 500 bits. That’s about 62 bytes - and the same documents say that Dual Shock state is at most 21 bytes long - so it should fit fine in that 2ms.

In reality we have even more free time, if we need it. Things might be clearer if I change the debug output a bit.

I changed the sendReportBlocking(...) loop to look more like the main loop - and toggle its bit at the start and end of every loop. Here’s the diff - here’s what the scope output looks like now:

Oscilloscope screenshot - description in main text

The CPU is ‘available’ in all those solid looking blocks in track 2 too - that’s when the loop in sendReportBlocking(...) is spinning waiting for usbInterruptIsReady() to return true.

Zooming in a again:

Oscilloscope screenshot - description in main text

There are a also few places where it’s not solid. During these times, if there’s also traffic on the USB lines (D14 and D15), sendReportBlocking(...) is stopped because V-USB is servicing its interrupt - and so monopolizing the processor - to deal with the USB transmission. It could be sending or receiving - there’s not enough here to know which unless we really zoomed in and decoded the USB traffic, and I’m not going to get into that now. If there’s not traffic on the USB lines, either usbInterruptIsReady() has returned true, and it’s calling usbSetInterrupt(...) to supply data to be sent, or it’s calling usbPoll(), and V-USB is doing something inside that call. If it’s recently received data, that might include calling our code in usbFunctionWriteOut() - but it doesn’t in this shot. If it did we’d see activity on lines D3 and D4, thanks to our debug bits.

Anyway - we’re getting pretty in depth here. I think question 1 is answered - there should still be plenty of time to talk to the Dual Shock.

Controller down!

Now, to Question 2: why does the system sometimes not work - especially when connected to the dock?

At first, I didn’t think my oscilloscope based debugging was providing much useful information about this - but there actually is something interesting when the system’s in this stalled state. Take a look at this trace:

Oscilloscope screenshot - description in main text going on

USB traffic looks okay - 1ms gaps, transmissions every 2ms - but look at D1, D3 and D4 - they’re all permanently high!

We haven’t looked much at D3 and D4 so far. They correspond to calls to usbFunctionWriteOut(...) - implying that we’ve stalled inside a call to usbFunctionWriteOut(...). That’s kind of weird - it implies that something has happened when processing data sent to us from the Switch.

It’s time for debugging with something more verbose than an oscilloscope.

Verbose Debugging

Take 1

My first attempt at more verbose debugging - the one pictured at the top of this post - was an ill-conceived attempt to use a 8x2 LCD display, with an adapter for to fit on a breadboard, to display debug text.

This worked - and could keep up surprisingly well with the stream of data coming in the USB connection. But soon I started abbreviating output to the point of meaninglessness, filming it in slow-mo to try to read the output, and it quickly became clear how bad an idea it actually was. 16 Characters is not enough for everyone.

So I guess the picture of the breadboard like this was kind of clickbait. Sorry. But isn’t the tiny breadboard display cute?! I wonder if I can find a use for it in the final product…

Take 2

The alternative (and what really should’ve been the obvious first thing to try) is serial output.

The ATmega has a hardware serial port, and it’s easy to use. Pin 2 is the receive pin, pin 3 is the transmit pin. Just connect it up and run a serial terminal on a connected computer to communicate with it. I attached a cheap USB<->TTL Serial adapter to the breadboard:

Breadboard with a plethora of things plugged into is

The serial adapter is to the top left of the ATmega, with jumper wires leading down to ground, and over to pins 2 and 3 (the other three pins on the adapter are unused and not connected to anything).

You might also notice a bunch more additions - three long orange jumpers connecting pins 17, 18, and 19 of the ATmega to the right of the board, where they’re then connected to some long wires that are in turn connected to a ribbon cable that is connected to my trusty ISP shield6. This allows me to reprogram to the ATmega without removing it from the breadboard. Moving it to and from the breadboard with the logic analyzer clips attached to it was growing frustrating enough that taking the time to set this up seemed like a good idea - and I’ll be able to continue to use this setup throughout the project. Maybe I’ll rig up something better to connect the ribbon cable directly to the breadboard later…

The breadboard is looking really busy now - but remember that most of this isn’t actually part of the device I’m building - the serial port, the oscilloscope clips, the ISP connection are all ‘extra’. The circuit under this is still really simple7.

Here is the code diff where I added serial debug output, and also fixed a few smaller bugs I found while using it.

This is a fairly large diff thanks partially to the ‘few other bugs’. Going into all of it would be a series of posts in itself, and most of it’s not very interesting, so here are notes on the changes, and serial debugging with V-USB in general:

Serial debugging with V-USB:

  • V-USB has some verbose debug output if you’re prepared to get down to USB-protocol level debugging - but it is overwhelming. It will log everything received and sent from and to the bus! Add a DEBUG_LEVEL=2 to your build flags to switch it on. I added a change to do this in platformio.ini - and tried to use it, debugging the packets by hand, for a bit. But then I commented it out - it is really too verbose for regular use.
  • 250000 bits per second is about the fastest you can reliably go with the 12.8MHz ATmega, and also works on a 16MHz one (foreshadowing!).
  • Don’t print too much to the serial port! It will eat up too much CPU time and disrupt USB communication, ironically causing failures when you’re trying to debug them. Trust me, this gets very frustrating before you figure out what’s going on.

Serial debugging in my code:

  • I added some verbose output in halt(...), and in a few other places where I thought it would always be useful. To do this logging, I just used Serial.print(...) and its kin directly.
  • halt(...)’s verbose output includes a log of the last few commands received from the Switch. I insert these into a buffer at the end of usbFunctionWriteOutInternal(...)
  • For other debug output, I added and used a debugPrint(...) macro that I can switch off easily. This at least enables me to switch off most serial output when I get suspicious I’m using too much CPU printing…

Other code changes and fixes:

  • I renamed the functions with names like report...(...) and ...report(...) to have consistent naming.
  • There are a few other changes in usbFunctionWriteOutInternal(...) to fix how some commands are decoded and/or responded to. If the Switch receives anything it’s not expecting, it just halts communication, so fixing things in here was just trial and error, and peering at logs. One thing of note, if you’re interested in Pro Controller protocol, is that I needed to change the way the acknowledgements were sent for ‘UART’ commands. Even though most commands are expected to send an ack with its third byte derived from the report ID ored with 0x80 on success, some seemingly use different numbers. I switched to hard-coding them all instead of trying to derive them.

Unfortunately, things were still unstable - especially with the dock in use! And I couldn’t see rhyme nor reason to it - no patterns in the logs of commands sent or replied to.

Carefully combing through debug output, what seemed to be happening was that the ATmega would receive what looked like the first packet of a report to usbFunctionWriteOut(...) (so, a command from the Switch) - but when the next packet was received it would look (to my human eyes) like the start of another, different command. This would confuse my code greatly. It would treat the new packet as part of the unfinished command - which of course caused all sorts of havoc.

Was it supposed to be possible, somehow, to get parallel input streams to one endpoint? Was that something I needed to account for in my code? Was V-USB discarding some incoming packets erroneously? Or maybe something - my code or V-USB - is supposed to be able to deduce when a packet is intended to be the first packet in a transfer (maybe there’s a flag somewhere?), and ignore partially received ones?

I spent a lot of time debugging this - getting into the USB spec, and changing bits of V-USB to try to try to figure things out or try out theories.

Finally, I tried implementing the USB_RX_USER_HOOK macro in usbconfig.h. It’s documented as:

…a hook if you want to do unconventional things. If it is defined, it’s inserted at the beginning of received message processing. If you eat the received message and don’t want default processing to proceed, do a return after doing your things. One possible application (besides debugging) is to flash a status LED on each packet.

I implemented a function (cleverly named usbFunctionRxHook(...)), and in it just logged what it received, and then manually decoded the logs according to the USB spec. And I found the problem, at last!

It turns out that there’s a ‘HALT’ setup packet that can be sent to an endpoint - and when it is sent, any in-flight ‘I/O Request Packets’ are supposed to be abandoned8. But V-USB doesn’t tell its client code (i.e. our code) when this happens! It just ignores these packets.

I guess that, usually, this isn’t a problem. Most low-speed USB reports are less than eight bytes long, so they don’t consist of more than one packet, and it’s therefore impossible that they would need to be abandoned.

But, in our case, most transfers our OUT endpoint receives (all Switch ‘UART commands’, for example) are at least two packets long. If the Switch sends a ‘HALT’, we need to clear out any partially-received reports - and not reply to them

So, I renamed usbFunctionWriteOutInternal(...), and added a new argument:

usbFunctionWriteOutOrAbandon(uchar *data, 
                             uchar len, 
                             bool shouldAbandonAccumulatedReport)
{
    
    [...]
    
    if(shouldAbandonAccumulatedReport) {
        // The host has told us it's stalling the endpoint.
        // Abandon reception of any in-progress reports - we're not going to
        // get the rest of it :-(
        accumulatedReportBytes = 0;
        PORTB &= ~(1 << 4); // Debug signal that we've stopped processing a report.
        return;
    }
    
    [...]

I call this with shouldAbandonAccumulatedReport as true from usbFunctionRxHook(...), if I detect that a ‘HALT’ was sent to our endpoint:

void usbFunctionRxHook(const uchar *data, const uchar len)
{
    if(usbRxToken == USBPID_SETUP) {
        const usbRequest_t *request = (const usbRequest_t *)data;
        if((request->bmRequestType & USBRQ_RCPT_MASK) == USBRQ_RCPT_ENDPOINT &&
            request->bRequest == USBRQ_CLEAR_FEATURE &&
            request->wIndex.bytes[0] == 1) {
            // This is an clear of ENDPOINT_HALT for OUT endpoint 1
            // (i.e. the one to us from the host).
            // We need to abandon any old in-progress report reception - we
            // won't get the rest of the report from before the stall.
            //
            // We could also check the request->vWalue here for the specific
            // feature that's being cleared - but HALT is the only feature that
            // _can_ be cleared on an interrupt endpoint, so it's not actually
            // necessary to check.
            debugPrint("\n!Clear HALT ");
            debugPrint(request->wIndex.bytes[0], 16);
            debugPrint("!\n");
            usbFunctionWriteOutOrAbandon(NULL, 0, true);
        }
    }

    if(usbCrc16(data, len + 2) != 0x4FFE) {
        halt(0, "CRC error!");
    }
}

I also added a CRC check in there while debugging. It was one of my earlier debugging attempts. I’ve never seen it fail, but we seem to have enough time for it and it would be nice to know if it ever fails during later development9.

Here are the diffs for this change.

Time is relative. 1ms SOF timing doubly so.

Having fixed all this, things were still not stable when connected to the Switch dock. I noticed that, sometimes, very occasionally, I was getting garbage characters in the serial debug output. I thought my connection was just unreliable and maybe a bit noisy.

The other thing that can cause garbage in serial transmission, though, is timing errors.

Hmm. Timing errors. Is it possible that the syncing-the-8MHz-clock-to-the-USB-clock routines we added in part 2 are not working? Surely not! We did math and everything! There were charts and graphs!

I switched to using a 16MHz timing crystal10 and, boom, the problems went away. You might notice that crystal and its associated capacitors in the breadboard above.

Ick.

Well, I guess I could just keep using the crystal. It just takes up two IO pins I probably don’t need - and crystals are hardly expensive.

But I’d really like to understand this! I thought that the USB spec mandated a 1ms keep-alive (SOF, for “Start of Frame” to V-USB, “low speed keep-alive” in USB speak), and it had to be precise. We’re synchronizing our clock to that. Surely it should work! Charts and graphs!

I tried logging the number of SOFs counted between every interrupt. There should always be two, since the interrupt is being serviced every two milliseconds. But sometimes there were three! Or four!

Printing the time elapsed, according to the ATmega’s internal clock, between each interrupt seemed to indicate that, yes, the interrupts are being serviced at the correct rate. So, yes, we are sometimes receiving more than two SOFs per 2ms.

Hmm.

I broke out and pored over the USB spec again. It’s surprisingly quiet on Low Speed USB’s keep-alive behavior11.

The summary is that, although it does seem to say that a Low-speed keep-alive should happen at the start of every frame - in line with Full Speed SOF packets - there are a couple of ‘at least’s in there. Maybe there are extra ones? Maybe the dock is inserting them?

Hmmm. Syncing timing to these might not be as good an idea as the V-USB docs and online discussion (including mine, in part 2 of this series - ha!) suggests.

Investigating the extra Keep-Alive SOFs

Woo-hoo, this is an opportunity for the oscilloscope again! Let’s toggle a pin every time V-USB counts a SOF, and see what that looks like on a trace, and how it compares with the DATA- line, where we can see the keep-alive pulses with our own eyes.

The next logical bit to use for debug output is PORTB bit 6 - but I unfortunately can’t use it. PORTB bits 6 and 7 are only available if you’re not using a timing crystal (d’oh!) because they’re assigned to pins 9 and 10 - where the crystal is connected.

Instead, I used PORTC bit 0, which is at least physically near to the pins the oscilloscope is already connected to, on pin 23.

I set it up as output. V-USB already has facility for running some assembly code on SOF reception, so I can hook in there.

In main.cpp’s setup():

    DDRC  |= 0b00000001;
    PORTC &= 0b11111110;

and in the appropriate place in usbconfig.h (search for USB_SOF_HOOK).

#ifdef __ASSEMBLER__
macro sofHookAssemblerMacro
    push YH         // The docs say we're only allowed to use YL, but we need 
                    // two registers for this so we save the current value of 
                    // YH.
    ldi YH, 1       // Load '1' into YH.
    in YL, PORTC    // Load PORTC into YL.
    eor YL, YH      // Exclusive-or them together to toggle bit 0.
    out PORTC, YL   // Write the result out to PORTC.
    pop YH          // Restore YH.
endm
#define USB_SOF_HOOK                    sofHookAssemblerMacro
#endif

My very first AVR assembly!

Here’s the full diff of this change.

I connected a new logic probe to pin 23. Let’s take a look at the scope!

Oscilloscope trace - description in main text

So far, so good - this looks correct. You can see that there’s a new solid (and precise looking!) 1ms wave in track D5. (The feint lines are the previous time period - they fade out over time when the oscilloscope is free-running as it was when I took this shot.)

But then I set it to trigger a snapshot only when it sees pulses on D5 with a width of less than 500µs - and, well, look at this!

Oscilloscope trace - description in main text

This shows a weird tiny downward spike in D5, shortly after a USB traffic (the solid-looking parts of D14 and D15 - the USB DATA- and DATA+ lines). This spike implies that our USB_SOF_HOOK code is being run twice here - once to switch the output low, then again to switch it high again. This matches the occasional two extra SOFs in the logging.

This isn’t the only pattern, there are also sometimes blips like this:

Oscilloscope trace - description in main text

This looks like it might be one extra flip - again shortly after USB traffic. But the line looks thick - maybe there is actually an odd number of flips. Three? (Surely not five‽).

Let’s take these one at a time. Zooming in on one of the traces of the first variety:

Oscilloscope trace - description in main text

At 10us resolution, all of the preceding USB traffic and both the flips fit nicely on the screen at once.

Ugh. I’m going to have to decode these USB packets by hand to actually see what’s going on are, aren’t I?…

Oscilloscope trace - description in main text

I annotated the trace. I’ll explain it below, so don’t feel the need to read it from the screenshot. If you really want to see the annotations, you might need to zoom in. The key to understanding it is that USB uses an encoding scheme where a 0-1 or 1-0 transition counts as a 0, and no transition counts as a 1.

From left to right: first, there’s an IN packet - the Switch telling us to transmit our data; next a DATA packet - the ATmega sending controller state to the host; finally an ACKnowledge packet - the Switch acknowledging reception.

But D5, our toggles-on-SOF debug line - is showing that the V-USB thinks it received two SOF keep-alive pulses during the ACKnowledge packet.

That’s weird. V-USB counts any downward spike on the DATA- line as a SOF if it doesn’t detect a USB SYNC packet following it. There is a sync packet though. V-USB is somehow entirely missing it!12

The only reason this might happen I can think of is that V-USB’s packet reception routine - the one that fires when an interrupt detects traffic on DATA- - is failing to fire at the start of the ACK packet. If it instead fired during during the ACK packet, it would fail (‘correctly’) to detect the SYNC pattern it looks for at the start of every packet. Because it treats any downward pulse on the DATA- line that’s not followed by a SYNC as a SOF, it would treat every failure to find the SYNC as indicating a SOF.

Let’s test that. I think that there’s enough slop in the packet detection routines that we could afford to toggle another debug port debug bit at the start.

I added code to set PORTC bit 1 high while V-USB’s USB processing is running. This time, there were no appropriate hooks, so I needed to modify the V-USB code. Now:

  • Bit 0 is toggled when V-USB detects a SOF keep-alive pulse.
  • Bit 1 is set while V-USB is in its interrupt routine (i.e. processing and/or transmitting USB traffic). This processing may be called again by itself if more traffic is detected before it returns.
  • Bit 2 is toggled on and off at the end of every reception routine, so that I can see when control is really returning to other code.

Here’s what that looks like. I’ve cleaned up the display to show only bits 0, 1, 2 (defined above) as the tracks D5, D6 and D7, followed by DATA- and DATA+.

Oscilloscope trace - description in main text

The long high period on D6 represents the main reception and transmission of the IN packet. D6 goes low (and there’s a spike on D7) when that ends, and V-USB’s interrupt routine relinquishes control. Shortly after that, the SYNC pattern starts on D14 (and D15). D5 should go high, representing V-USB’s interrupt routine starting, shortly after that - but it doesn’t - it doesn’t actually go high until after the SYNC pattern is over! It makes sense that, after that, it would not recognize a SYNC - and so erroneously count a SOF as having happened.

Why is it taking so long for the interrupt routine to fire? I’m pretty sure there’s not a hardware bug in the ATmega’s interrupt system (these things have been around for over twenty years, after all!), so the only logical explanation must be that interrupts are disabled for a short period.

What might cause that?

There’s nowhere outside of setup() in my code that’s disabling interrupts. I couldn’t see anything in the V-USB source doing it at all.

That left the system frameworks. I know from previous experience that calls to the Arduino framework’s time-related routines - like micros() and millis() - disable interrupts for short periods, in order that they can read the entire 16-bit variables containing the current time without the chance of being interrupted13. I tried wrapping all my calls to millis() in debug bits and looking with the scope - but I didn’t see them toggling during the problem periods, and even when they were, they toggled so quickly that they wouldn’t explain the size of the delay.

So - time to dive into the Arduino framework itself! PlatformIO automatically uses the MiniCore implementation of it for the ATmega8a. PlatformIO installed it, on my Mac, in ~/.platformio/packages/framework-arduino-avr-minicore/cores/MiniCore

I looked through the MiniCore code for calls to sei() (‘Set Enable Interrupts’) and cei() (‘Clear Enable Interrupts’) - but didn’t see anything that could be being triggered by my code.

So, it looks like interrupts are not being disabled by either my code or the frameworks I’m using. What else could be causing them to be disabled?

Interrupts are automatically disabled while an interrupt handler is being run. The interrupt V-USB is using, INT0, or ‘External Interrupt Request 0’ is the highest priority interrupt in the ATmega8 though…

I scratched my head for a bit and got some coffee.

Of course! If another interrupt fired during or shortly after V-USB’s interrupt code returned interrupts would be disabled until it completed - and cause the firing of V-USB’s interrupt to be delated until then, no matter its priority.

What other interrupt might be firing? Well, the only one MiniCore seems to use is the ‘Timer/Counter0 Overflow’ interrupt. ‘Timer/Counter 0’ is a hardware counter on the ATMega that counts up independent of the processor - and can fire an interrupt when it reaches 255 and overflows back to 0. MiniCore uses this to update the variables it uses to back its timing and delay functions.

The routine that updates these variables is in MiniCore’s wiring.c file. Here’s a link to it in the MiniCore project on GitHub. It’s in the ISR(TIMER0_OVF_vect) function.

I altered it - locally, inside ~/.platformio/packages/framework-arduino-avr-minicore/cores/MiniCore - to add yet another debug bit: I switched PORTC bit 3 on and off at the start and end of the routine, and hooked it up to line D8:

Oscilloscope trace - description in main text

Bingo!

There is is on line D8 on the right: MiniCore’s timer overflow interrupt handler is running while the SYNC is happening on the USB data lines, and it’s delaying V-USB’s interrupt long enough that V-USB’s code doesn’t get to see the SYNC properly.

This happens more frequently than I first thought it would: the long period that interrupts are disabled during V-USB’s processing of USB traffic means that any timing overflow that happens during that period will be queued by the ATmega to be processed immediately after V-USB’s routine returns. So any Timer 0 overflow during USB traffic will cause V-USB to miss any traffic that immediately follows.

According to the scope’s measurements, the width of the D8 pulse corresponding to the timer interrupt handler is 3.4µs. The same code needs to be run regardless of processor speed, so presumably when running at 12.8MHz the delay would be even worse - a quarter as long again!

The V-USB docs say:

Interrupt latency: The application must ensure that the USB interrupt is not disabled for more than 25 cycles (this is for 12 MHz, faster clocks allow longer latency). This implies that all interrupt routines must either have the “ISR_NOBLOCK” attribute set (see “avr/interrupt.h”) or be written in assembler with “sei” as the first instruction.

at 12MHz, 25 cycles are 2.08µs. Obviously, this is shorter than 3.4µs!

Now, what to do?… One option is to ignore this and just keep using the 16MHz timing crystal. When using it, V-USB’s ‘quirk’ of not requiring ACK packets - just assuming successful data transmission - is ironically meaning that, even though we’re sometimes not properly receiving ACKs, overall communication is pretty reliable. It’s only when we’re basing processor timing on SOF counting - like when tuning the internal oscillator to run at 12.8MHz - that this becomes problematic in practice.

I guess a timing overflow could happen just before any transmission occasionally. But in that case we won’t send an ACK, and the Switch will send again (unlike V-USB, it probably does do proper error handling). But it’s icky to be relying on that - and it would cause a small (probably not noticeable14) delay in controller state transmission.

I considered getting rid of the Arduino framework entirely, but I am using delay() during setup(), and all the serial debugging is using the Arduino Serial class.

Can I, as the V-USB docs hint, just add ISR_NOBLOCK to MiniCore’s Timer0 interrupt handler? That would mean that other interrupts would be allowed to interrupt the Timer0 handler. So it would allow V-USB’s interrupt handler to execute during the Timer0 interrupt handler. At first this seems a little perilous, but I’ve convinced myself that it’s alright.

The potential problem is that, if MiniCore’s interrupt handler is interrupted while it’s in the middle of updating its timing variables, Arduino-defined timing routines like millis() would use these partially-updated variables. That’s okay, though, because V-USB doesn’t use any of the Arduino timing routines. I don’t need to worry about my main program code, because main program code still can’t run at the same time as the interrupt routine; only other interrupts can.

So, I decided to do it.

I didn’t want to alter the MiniCore code directly - it lives outside of my PlatformIO project and is globally used for all PlatformIO projects. So, instead, I used PlatformIO’s ‘middleware’ feature15 to create a new file derived from wiring.c, wiring-ISR_NOBLOCK.c, altered to add the ISR_NOBLOCK to its interrupt handler, and compile that instead of wiring.c.

I made a new Python file named timer0_interrupt_allower.py:

Import("env")
import os

def preprocess_wiring_c_to_add_ISR_NOBLOCK(node):
    fileFrom = node.srcnode().get_abspath()
    
    dirTo = node.dir.get_abspath();
    fileTo = os.path.splitext(node.get_abspath())[0] + "-ISR_NOBLOCK.c"

    os.makedirs(dirTo, exist_ok=True)

    with open(fileFrom, 'r') as infile, open(fileTo, 'w') as outfile:
        for line in infile:
            outfile.write(line.replace("TIMER0_OVF_vect", "TIMER0_OVF_vect, ISR_NOBLOCK"))

    return env.File(fileTo)

env.AddBuildMiddleware(preprocess_wiring_c_to_add_ISR_NOBLOCK, "*/wiring.c")

and added it to the existing ‘extra_scripts’ setting in platformio.ini:

extra_scripts =
    pre:v-usb_platformio_helper.py
    pre:timer0_interrupt_allower.py

Now, when compiling, I see that PlatformIO is says it’s Compiling .pio/build/ATmega8/FrameworkArduino/wiring-ISR_NOBLOCK.c.o instead of wiring.c.o.

Does it work? Yes! Take a look at this trace:

Oscilloscope trace - description in main text

Now, the V-USB interrupt routine (lines D5 and D6) is firing while the Timer0 interrupt routine (line D8) is live. And the toggle-on-SOF-detection line, line D5, is correctly not toggling!

And if I switch the oscilloscope back to triggering on pulses of less than 500us on the D5 line (signifying too-fast SOF detection) it now never fires!

The ultimate proof of whether all this works is whether I can remove the 16MHz timing crystal, and switch back to the internal oscillator calibrated to 12.8MHz using the 1ms SOF rate.

And trying that, it seems like I can - even when connected to the Switch dock, the system connects and seems stable for long periods.

Hurray!

I planned to end here - but before I finished writing this, I just had to try out one more thing….

Epilogue: 120Hz

I’m not entirely happy with the 62Hz update rate. I said above that it was ‘fine’, and it probably is - but it was still nagging at me. I’d really like to get at least 120 updates per second - twice the 60FPS frame rate of the fastest-updating Switch games.16

While I was reading up on USB concepts and the USB spec, especially while debugging the ‘lost packets’ above, I noticed that there are two ways in USB to signify the end of a multi-packet transmission: either the entire expected size has been transmitted, or a packet shorter than the maximum packet size is transmitted.

The Switch certainly isn’t transmitting a full 64 bits to us when it sends us commands, despite what its HID descriptor says - most are one or two packets long. But when my code sends data to the switch, it always sends 64 bytes - 8 full 8-byte packets.

But there’s not actually not that much useful data in them. The actual amount of useful data changes based on what command we’re replying to. Regular ‘HID-like’ updates are just 13 bytes long (less than two packets). Some of the more complex replies (like SPI reads) get up into the forty-something bytes - but they’re basically only seen just after the controller is just plugged in.

Could I just stop report transmission with a short packet after the ‘useful’ part was sent, rather than transmitting the entire 64 bits every time?

I altered the code to keep track of how ‘full’ each report was (in a new sReportLengths[2] static to go along with actual report in sReports[2]), and only transmit the size that had been filled in, signifying the end of transmission with a short or 0-sized packet. Here’s the satisfyingly short diff.

I’m not sure this is actually in compliance with the HID spec17 - but it works! Now we’re getting updates in excess of 120Hz! Here’s the Oscilloscope proof.

Oscilloscope trace - description in main text

Note D1’s 120Hz rate. Plenty good enough for Metroid.

The End

Well, this got a bit long. But the analysis and debugging has been very sucessful! More reliable, faster communication - and I’m pretty sure there’s more than enough time left for communication with the Dual Shock.

And while I do think it would have been possible to work out all everything I worked out above with timing code and serial logging, the oscilloscope was actually very useful despite what I thought at the start.

Next time: less oscilloscope, less USB, and more Dual Shock. And hopefully, by the end, I’ll be playing Zelda!

You can follow the code for this series on GitHub. The main branch there will grow as this series does. The blog_post_5 tag corresponds to the code as it is (was) at the end of this post.

All the posts in this series will be listed here as they appear.


  1. A logic analyzer is like an oscilloscope, in that it shows the voltages of things connected to its probes in an x-is-time, y-is-voltage way, but it’s tailored to digital signals only. It can only show square waves - every signal is treated as ‘high’ or ‘low’. This lets us see nice clean, square, waveforms, and also typically allows for way more probes to be connected at once. My Rigol MSO1074Z can show up to four channels in regular analog oscilloscope mode (which is amazing for a supposedly ‘budget’ oscilloscope) - but can handle up to sixteen in logic analyzer mode! ↩︎

  2. Bit numbering in binary literals starts at 0 on the right - the least significant bit - and counts up leftwards. So for 8-bit binary literals like the ones I use in this post, the leftmost bit is bit 0 and the rightmost one is bit 7. ↩︎

  3. If you’re confused about what these are doing, they’re setting the appropriate bits to 1 or 0 without disturbing the other bits. Check out the Wikipedia article on ‘bitwise’ operators for more information. ↩︎

  4. These DATA- downward strobes are referred to as ‘Start-Of-Frame’ or ‘SOF’ packets in V-USB parlance - both in the documentation, and source - for example the usbSofCount variable. They’re therefore also referred to as ‘SOFs’ in much of V-USB related online discourse. But SOF packets are actually Full-Speed USB concept - and they are data packets with content (like frame serial numbers). In Low-Speed USB (which is what V-USB implements) not just downward strobes. Low Speed USB’s short D- strobes are actually referred to, in the USB spec, as ‘keep-alives’. ↩︎

  5. The USB spec seems ambiguous about whether values of less than 10 are allowed for interrupts when operating at Low-Speed:

    5.7.4 Interrupt Transfer Bus Access Constraints

    …An endpoint for an interrupt pipe specifies its desired bus access period. A full-speed endpoint can specify a desired period from 1 ms to 255 ms. Low-speed endpoints are limited to specifying only 10 ms to 255 ms…

    …but:

    9.6.6 Endpoint

    bInterval: …For full-/low-speed interrupt endpoints, the value of this field may be from 1 to 255….

    ↩︎
  6. More on ISP programming in the first post in this series. ↩︎

  7. The very observant might notice another addition. More on that silver capsule later. ↩︎

  8. This is the relevant part of the USB spec:

    5.7.5 Interrupt Transfer Data Sequences

    […] If a halt condition is detected on an interrupt pipe due to transmission errors or a STALL handshake being returned from the endpoint, all pending IRPs are retired. Removal of the halt condition is achieved via software intervention through a separate control pipe. This recovery will reset the data toggle bit to DATA0 for the endpoint on both the host and the device. Interrupt transactions are retried due to errors detected on the bus that affect a given transfer.

    ↩︎
  9. a USB client is meant to verify the CRC of received packets before replying with an ACK packet - but V-USB unfortunately can’t because there’s not enough processing time to do it and send the ACK fast enough to meet USB’s timing spec. It just acknowledges every packet received. If you run your chip at 18MHz, though, there is just enough time and you can set USB_CFG_CHECK_CRC to 1 in usbconfig.h to enable it. ↩︎

  10. …by altering the fuse settings in platformio.ini and flashing the fuses again - and adding the crystal and associated capacitors to the breadboard. ↩︎

  11. How low speed keep-alive works is scattered throughout the document. Here’s All I could find in the spec regarding Low-Speed keep-alives:

    5.3.3 Frames and Microframes

    USB establishes a 1 millisecond time base called a frame on a full-/low-speed bus and a 125 μs time base called a microframe on a high-speed bus.

    […]

    7.1.7.1 Low-/Full-speed Signaling Levels

    […Big table that has a ‘4’ footnote on a mention of keep-alives…]

    Note 4: The keep-alive is a low-speed EOP

    7.1.7.6 Suspending

    […]

    In the absence of any other bus traffic, the SOF token (refer to Section 8.4.3) will occur once per (micro)frame to keep full-/high-speed devices from suspending. In the absence of any low- speed traffic, low-speed devices will see at least one keep-alive (defined in Table 7-2) in every frame in which an SOF occurs, which keeps them from suspending.

    8.4.3.1 USB Frames and Microframes

    USB defines a full-speed 1 ms frame time indicated by a Start Of Frame (SOF) packet each and every 1ms period with defined jitter tolerances. USB also defines a high-speed microframe with a 125 μs frame time with related jitter tolerances (See Chapter 7). SOF packets are generated (by the host controller or hub transaction translator) every 1ms for full-speed links. […]

    11.8.4.1 Low-speed Keep-alive

    All hub ports to which low-speed devices are connected must generate a low-speed keep-alive strobe, generated at the beginning of the frame, which consists of a valid low-speed EOP (described in Section 7.1.13.2). The strobe must be generated at least once in each frame in which an SOF is received. This strobe is used to prevent low-speed devices from suspending if there is no other low-speed traffic on the bus. The hub can generate the keep-alive on any valid full-speed token packet. The following rules for generation of a low-speed keep-alive must be adhered to:

    • A keep-alive must minimally be derived from each SOF. It is recommended that a keep-alive be generated on any valid full-speed token.
    • The keep-alive must start by the eighth bit after the PID of the full-speed token.
    ↩︎
  12. The reason this isn’t catastrophic when using the 16MHz crystal is that V-USB doesn’t actually track ACK packets at all. It just assumes successful transfer of everything it sends. So the fact that it’s doesn’t think it receives an ACK after this transfer is moot. ↩︎

  13. On ATmega (I think on all AVR systems?), a 16 bit value has to be read from memory by the processor in two 8 bit chunks, so there’s a chance that, if you don’t disable interrupts, it might read one half before an interrupt fires, and the other half from after it returns. ↩︎

  14. Although maybe an intermittent 2ms delay would be noticeable when playing Metroid?… ↩︎

  15. Previously used in part 1 to skip some files and add some compiler flags. ↩︎

  16. I know enough to say “Nyquist’s theory” here, but not enough to be sure it applies… ↩︎

  17. https://stackoverflow.com/questions/76075655/is-there-defined-behaviour-if-usb-hid-report-transmission-cleanly-concludes-early ↩︎