Blinky for the CH32V002 in 32 bytes
The CH32V002 is a low-end 32-bit RISC-V microcontroller by WCH. Its core supports the RV32EC instruction set, meaning it has 16 registers and supports a set of 16-bit compressed instructions (in addition to the base ISA’s 32-bit instructions). The compressed instructions are aliases of common 32-bit instructions, allowing them to be substituted in to reduce code size. As a result of the reduced bits, they have a tendency to only accept a limited range of registers as sources or destinations (s0-1, a0-5), or a more limited range of immediates, or both. They are an interesting set of instructions to try and fit code into!
I set myself the goal to toggle PA1 at approximately 1 Hz by any means necessary with any other side effects allowed. I found introducing side effects only started to be necessary below the 50 to 60 byte mark.
This program can be assembled using Bronzebeard.
Initial attempt, 36 bytes:
RCC_BASE = 0x40021000
RCC_PB2PCENR_OFFSET = 24
GPIOA_BASE = 0x40010800
GPIOA_BASE_SHIFTED = (0x40010800 << 1)
GPIOA_CFGLR_OFFSET = 0
GPIOA_BSHR_OFFSET = 16
GPIOA_BCR_OFFSET = 20
init:
# By default SYSCLK is 24 MHz
# Enable GPIOA clock (and GPIOC, and write to RO reserved bit as side effect)
# Side effects are due to abusing the constant in a0 and using it for 3 different things
lui a5, %hi(RCC_BASE) # equivalent to li a5, RCC_BASE
c.li a0, 0x16
c.sw a0, RCC_PB2PCENR_OFFSET(a5)
# Put PA1 into push-pull output (and modify PA0 mode as side effect)
# these two instructions save 2 bytes compared to naive "li a5, GPIOA_BASE"
lui a5, %hi(GPIOA_BASE_SHIFTED)
c.srli a5, 1
c.sw a0, GPIOA_CFGLR_OFFSET(a5)
# Toggle PA1 (and two other pins as side effect) at roughly 1 Hz
main_loop:
c.sw a0, GPIOA_BSHR_OFFSET(a5)
c.jal busy_wait
# it is odd to me that the chip has both set/reset and reset-only control registers,
# but it does allow us to do this
c.sw a0, GPIOA_BCR_OFFSET(a5)
c.jal busy_wait
c.j main_loop
busy_wait:
# use the address 0x40010800 as the cycle count
# 500 ms @ 24 MHz = 12 million cycles
# would have guessed 2 cycles per loop iteration, so loop needs about 6 million iterations
# 0x40010800 >> 7 gives us 8 million
# but in practice this takes like 8 times longer than expected, so bump to >> 10
# (flash fetch and branching being slow?)
c.mv a3, a5
c.srli a3, 10
busy_wait_1:
c.addi a3, -1
c.bnez a3, busy_wait_1
c.jr ra
I realised it’s possible to get a nice round 32 bytes if the loop toggles instead of doing a separate set and reset, since then you can inline the busy wait:
# Blinky for the CH32V002 (PA1, approx. 1 Hz) in 32 bytes
RCC_BASE = 0x40021000
RCC_PB2PCENR_OFFSET = 24
GPIOA_BASE = 0x40010800
GPIOA_BASE_SHIFTED = (0x40010800 << 1)
GPIOA_CFGLR_OFFSET = 0
GPIOA_OUTDR_OFFSET = 12
init:
lui a5, %hi(RCC_BASE)
c.li a0, 0x16
c.sw a0, RCC_PB2PCENR_OFFSET(a5)
lui a5, %hi(GPIOA_BASE_SHIFTED)
c.srli a5, 1
c.sw a0, GPIOA_CFGLR_OFFSET(a5)
c.li a1, 0x2
main_loop:
c.xor a0, a1
c.sw a0, GPIOA_OUTDR_OFFSET(a5)
c.mv a3, a5
c.srli a3, 10
busy_wait:
c.addi a3, -1
c.bnez a3, busy_wait
c.j main_loop
Artful Bytes gets their blinky down to 48 bytes, but they’re targeting a different chip and ISA, so it’s not quite fair.