Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

piolib: support transfer lengths that don't fit in 16 bits #107

Closed
jepler opened this issue Dec 16, 2024 · 14 comments
Closed

piolib: support transfer lengths that don't fit in 16 bits #107

jepler opened this issue Dec 16, 2024 · 14 comments

Comments

@jepler
Copy link

jepler commented Dec 16, 2024

struct rp1_pio_sm_xfer_data_args {
    uint16_t sm;
    uint16_t dir;
    uint16_t data_bytes;
    void *data;
};

The structure used for data transfers is limited to a 16-bit count of data_bytes.

Our application would like to do larger transfers. Right now we are working around this limitation by submitting several buffers of data sequentially, but it would be preferable if we could submit all the data in one go:

constexpr size_t MAX_XFER = 32768;

void pio_sm_xfer_data_large(PIO pio, int sm, int direction, size_t size, uint32_t *databuf) {
    while(size) {
        size_t xfersize = std::min(size_t{MAX_XFER}, size);
        int r = pio_sm_xfer_data(pio, sm, direction, xfersize, databuf);
        if (r) {
            perror("pio_sm_xfer_data (reboot may be required)");
            abort();
        }
        size -= xfersize;
        databuf += xfersize / sizeof(*databuf);
    }
}```
@pelwell
Copy link
Collaborator

pelwell commented Dec 16, 2024

That's a lot of data - what's your use case?

I'm not against in principle, it's just a matter of not breaking anything in the process. It might be that, due to the alignment requirements, switching to uint32_t would just work.

@jepler
Copy link
Author

jepler commented Dec 16, 2024

I'm driving HUB75 matrices with PIO. Ideally I'd submit a whole frame with one ioctl, but even a 64x32 matrix with 10 bitplanes exceeds 64kB in the encoding I'm using. And folks like to use several of these matrices in daisy chain mode.

Right now I'm just submitting 32kB at a time and trying to determine whether some artifacts I'm seeing are due to a flaw in my code or are due to a pause in the bitstream between ioctls. (a smaller 32x16 display that just needs one data submission per frame doesn't have these artifacts)

@jepler
Copy link
Author

jepler commented Dec 16, 2024

(@ladyada ping for interest)

pelwell added a commit to pelwell/utils that referenced this issue Dec 16, 2024
In many places, the number of bytes in a data transfer is stored as a
32-bit (or greater) value, but the ioctl API to the kernel stores it in
a 16-bit field, alongside which is a 16-bit gap due to the alignment
requirements of the pointer that follows.

Increase the size of data_bytes to a uint32_t, so that it can handle
larger transfers. Since using different sizes as either end can lead
to unpredictable results, a separate IOCTL is used for larger
transfers, allowing the kernel support to be determined and
backwards compatibility to be maintained.

See: raspberrypi#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/utils that referenced this issue Dec 16, 2024
In many places, the number of bytes in a data transfer is stored as a
32-bit (or greater) value, but the ioctl API to the kernel stores it in
a 16-bit field, alongside which is a 16-bit gap due to the alignment
requirements of the pointer that follows.

Increase the size of data_bytes to a uint32_t, so that it can handle
larger transfers. Since using different sizes as either end can lead
to unpredictable results, a separate IOCTL is used for larger
transfers, allowing the kernel support to be determined and
backwards compatibility to be maintained.

See: raspberrypi#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/utils that referenced this issue Dec 16, 2024
In many places, the number of bytes in a data transfer is stored as a
32-bit (or greater) value, but the ioctl API to the kernel stores it in
a 16-bit field, alongside which is a 16-bit gap due to the alignment
requirements of the pointer that follows.

Increase the size of data_bytes to a uint32_t, so that it can handle
larger transfers. Since using different sizes as either end can lead
to unpredictable results, a separate IOCTL is used for larger
transfers, allowing the kernel support to be determined and
backwards compatibility to be maintained.

See: raspberrypi#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/linux that referenced this issue Dec 16, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
@pelwell
Copy link
Collaborator

pelwell commented Dec 16, 2024

There are now two PRs for this: one for piolib (#108), and one for the kernel driver (raspberrypi/linux#6543). Both are needed for larger transfers to work. The library should simply fail to do larger transfers with a non-updated kernel.

pelwell added a commit to pelwell/linux that referenced this issue Dec 16, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
@jepler
Copy link
Author

jepler commented Dec 17, 2024

Thanks!

pelwell added a commit to raspberrypi/linux that referenced this issue Dec 17, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/utils that referenced this issue Dec 17, 2024
In many places, the number of bytes in a data transfer is stored as a
32-bit (or greater) value, but the ioctl API to the kernel stores it in
a 16-bit field, alongside which is a 16-bit gap due to the alignment
requirements of the pointer that follows.

Increase the size of data_bytes to a uint32_t, so that it can handle
larger transfers. Since using different sizes as either end can lead
to unpredictable results, a separate IOCTL is used for larger
transfers, allowing the kernel support to be determined and
backwards compatibility to be maintained.

See: raspberrypi#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit that referenced this issue Dec 17, 2024
In many places, the number of bytes in a data transfer is stored as a
32-bit (or greater) value, but the ioctl API to the kernel stores it in
a 16-bit field, alongside which is a 16-bit gap due to the alignment
requirements of the pointer that follows.

Increase the size of data_bytes to a uint32_t, so that it can handle
larger transfers. Since using different sizes as either end can lead
to unpredictable results, a separate IOCTL is used for larger
transfers, allowing the kernel support to be determined and
backwards compatibility to be maintained.

See: #107

Signed-off-by: Phil Elwell <[email protected]>
@pelwell
Copy link
Collaborator

pelwell commented Dec 17, 2024

That's all been merged now. The driver change missed yesterdays rpi-update release, but you can install a kernel containing the updated driver using either sudo rpi-update pulls/6543 or sudo rpi-updates rpi-6.6.y.

pelwell added a commit to raspberrypi/linux that referenced this issue Dec 17, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to raspberrypi/linux that referenced this issue Dec 17, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Dec 20, 2024
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 2, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 2, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 6, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 10, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 13, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 17, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
@jepler
Copy link
Author

jepler commented Jan 17, 2025

This appears to be working, thanks! I don't know if this issue should be closed now, or if it ought to wait for a released kernel to support it?

@pelwell
Copy link
Collaborator

pelwell commented Jan 17, 2025

The kernel will be released as sure as night follows day, so let's close this now.

@pelwell pelwell closed this as completed Jan 17, 2025
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 20, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Jan 24, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
@henryjliu
Copy link

Is there something else I need to do to make this work besides sudo rpi-updates rpi-6.6.y and compiling?

I made a sample:
#define BUF_SIZE 32768*2
uint8_t databuf[BUF_SIZE];
pio_sm_config_xfer(pio, sm, PIO_DIR_TO_SM, BUF_SIZE, 1);
pio_sm_xfer_data(pio, sm, PIO_DIR_TO_SM, BUF_SIZE, databuf);

This works fine for BUF_SIZE 32768 but BUF_SIZE 32768*2 seg faults first time and locks the os requiring power cycle the 2nd time.

@jepler
Copy link
Author

jepler commented Jan 25, 2025

I think pio_sm_config_xfer is still limited to values that fit in 16 bits. Try with your xfer configured for 32kB as before, but then when it comes to the actual xfer send your whole BUF_SIZE.

@henryjliu
Copy link

This works now if I set pio_sm_config_xfer(pio, sm, PIO_DIR_TO_SM, 32768, 2);//this is max size but
pio_sm_xfer_data(pio, sm, PIO_DIR_TO_SM, 32768*4, databuf);

though I'm confused what pio_sm_config_xfer(pio, sm, PIO_DIR_TO_SM, 32768, 2) does exactly. Does it just basically split into multiple 32768 transitions or is this number just ignored?

@jepler
Copy link
Author

jepler commented Jan 27, 2025

The kernel code handling this API call is https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/misc/rp1-pio.c#L606

(but be aware there's some translation of arguments between the userspace pio_sm_config_xfer call and the arguments here, where the truncation to 16 bits of the config_xfer length argument occurs)

From what I understand reading it, this allocates DMA buffers in the Linux kernel's address space. The numbers given are the buffer size & the count of buffers, and there can be up to 4 buffers.

When the transmission actually occurs, data is copied from userspace to the buffer, and then the dma is submitted (https://github.com/raspberrypi/linux/blob/a18d9ced4965462cb7b3b4252ada440395105308/drivers/misc/rp1-pio.c#L726). When the number of buffers is greater than 1, more an one dma can be submitted, which likely reduces the gaps between the dma transmissions.

Reasons for using fewer or smaller buffers? I'm not sure. Linux DMA-capable memory might be limited, and the returns for larger buffer sizes seemed to be modest in my own testing. However, I was throughput-oriented in my testing (though I would have expected 5ms gaps as you noted in #117 to be apparent in my application)

@henryjliu
Copy link

I've been thinking about how to do it to implement a ping pong buffer and I guess way one way could be use 2 sm. First reads in number of bytes to shift with mov x, num_bytes and then does decrement until end reached. Afterwards can wait on irq 0 and sets irq1. Second sm waits on irq 1 and does same thing. Irq is not available in userspace but seems assembly could should still work for interprocess communication shouldn't it?

@pelwell
Copy link
Collaborator

pelwell commented Jan 30, 2025

There are two PRs - one in the kernel (raspberrypi/linux#6640) and one here (#119) - that together support larger DMA buffers. You will still end up with small gaps between the buffers - I'm not aware of a way of adding DMA descriptors to a chain while the controller is running - but you get to decide how frequent they are. Another approach would be to use cyclic transfers, which is how ALSA manages transfers to I2S soundcards, but that's another level of complexity.

popcornmix pushed a commit to raspberrypi/linux that referenced this issue Feb 3, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Feb 3, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
6by9 pushed a commit to 6by9/linux that referenced this issue Feb 6, 2025
Provide remote access to the PIO hardware in RP1. There is a single
instance, with 4 state machines.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Support larger data transfers

Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: More logical probe sequence

Sort the probe function initialisation into a more logical order.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Minor cosmetic tweaks

No functional change.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Add in-kernel DMA support

Add kernel-facing implementations of pio_sm_config_xfer and
pio_xm_xfer_data.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Handle probe errors

Ensure that rp1_pio_open fails if the device failed to probe.

Link: raspberrypi#6593

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: SM_CONFIG_XFER32 = larger DMA bufs

Add an ioctl type - SM_CONFIG_XFER32 - that takes uints for the buf_size
and buf_count values.

Signed-off-by: Phil Elwell <[email protected]>

misc/rp1-pio: Fix copy/paste error in pio_rp1.h

As per the subject, there was a copy/paste error that caused
pio_sm_unclaim from a driver to result in a call to
pio_sm_claim. Fix it.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Fix parameter checks wihout client

Passing bad parameters to an API call without a pio pointer will cause
a NULL pointer exception when the persistent error is set. Guard
against that.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Convert floats to 24.8 fixed point

Floating point arithmetic is not supported in the kernel, so use fixed
point instead.

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Feb 10, 2025
Provide remote access to the PIO hardware in RP1. There is a single
instance, with 4 state machines.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Support larger data transfers

Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: More logical probe sequence

Sort the probe function initialisation into a more logical order.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Minor cosmetic tweaks

No functional change.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Add in-kernel DMA support

Add kernel-facing implementations of pio_sm_config_xfer and
pio_xm_xfer_data.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Handle probe errors

Ensure that rp1_pio_open fails if the device failed to probe.

Link: #6593

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: SM_CONFIG_XFER32 = larger DMA bufs

Add an ioctl type - SM_CONFIG_XFER32 - that takes uints for the buf_size
and buf_count values.

Signed-off-by: Phil Elwell <[email protected]>

misc/rp1-pio: Fix copy/paste error in pio_rp1.h

As per the subject, there was a copy/paste error that caused
pio_sm_unclaim from a driver to result in a call to
pio_sm_claim. Fix it.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Fix parameter checks wihout client

Passing bad parameters to an API call without a pio pointer will cause
a NULL pointer exception when the persistent error is set. Guard
against that.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Convert floats to 24.8 fixed point

Floating point arithmetic is not supported in the kernel, so use fixed
point instead.

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Feb 10, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit to raspberrypi/linux that referenced this issue Feb 10, 2025
Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants