Well, yes, that's what I was trying to call out. The question is, should this functionally make a difference? A full blown NPU and your basic M0 will still add 2 and 2 to get 4, in 32 bits.First of all, there are big differences:Both chips are Cortex-M based, although the Arduino Nano uses an nRF which is Cortex M4 - this has CMSIS, and this is the only meaningful difference I can think of within TFLite Micro
RP2040 = Cortex-M0+
nRF52840 = Cortex-M4 with floating point unit, DSP extensions, and entire Thumb-2 instructions
I dug into TFLite micro and found some relevant ops like Depthwise Conv that are implemented in CMSIS, but these should still have numerical parity with reference ops. However, I say "should", and there's assumptions there. Perhaps TfLM was designed only with the M4 in mind. There could also be memory mishaps somewhere? (this one I'm more skeptical of)
Any other thoughts here? I'm trying to isolate the differences from printouts, but it's still not clear
you mentioned the RP2350, and one of my next ideas is to straight up try that, as the M33 shares a lot more with the M4 than M0+ from what I've read.
Statistics: Posted by mckinnonbuilding — Fri May 16, 2025 2:40 am