-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement readv to deal with short reads #12674
base: master
Are you sure you want to change the base?
Conversation
The original implementation seems to have an assumption that the file is a regular file. This commit re-implements it so that it can deal with tty, sockets, etc.
@yamt nice! Is there some example showing the failure? Should it be added to apps/testing/ ? Another question: did you run ostest and LTS locally to confirm there is not known side effects? |
{ | ||
return ntotal; | ||
} | ||
buffer = malloc(total_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we avoid malloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't think of any trivial ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to avoid malloc is to read directly into each iov and avoid the copy. In other words: iterate iov array. For each iov, if iov_len > 0, read() iov_len bytes into that iov. If any read() returns less than iov_len bytes, break from loop. Keep running counter of total bytes read.
I don't know if this is permissible under POSIX but it (1) avoids malloc, and (2) avoids copying. Even if there is added overhead from multiple calls to read(), it might be better than overhead of malloc(), free(), and copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't work. please read the discussions in this PR.
{ | ||
/* NOTE: read() is a cancellation point */ | ||
|
||
nread = read(fildes, buffer, remaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we can't read the data piece by piece?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because, in general, you should not keep reading after a short read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can break out by returning the length we really got so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I don't get the problem from your commit message. could you explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider that you readv() on a tty, using a huge buffer, say, 1024 bytes.
if you got some data, say, 1 byte, readv() should return the 1 byte without waiting for the following 1023 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like struct file_operations
read and write methods should actually operate with vectors (the same as file_read
and file_write
), but such a change basically would require rewriting thousands lines of the code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yamt @xiaoxiang781216 how does Linux, QNX, MacOS, etc implement it? Although the specification doesn't require some assumptions, it is important to follow what other OSes are doing, to avoid breaking applications when porting it to NuttX
Linux improve the driver model by changing argument from void * to iovec*. The official approach is extending file_operation_s to support iovec like what Linux or NuttX sock_intf_s(#2959).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.
- these tricks are not thread-safe.
- we should preserve packet boundary for some kind of fds like datagram socket.
socket already improve this case by: #2959. But file_operation_s doesn't have the similar patch, so it's hard to modify readv to utilize this new capability from socket layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but adding
malloc
and trying to avoid memory leak with nested cancellation point is also conceptually wrong.
it's just inefficient.
it's far better than the original implementation, which is broken in a user-visible way.
I do not have a better solution right not and just noticed and
writev
seems also to suffer from the same problem.
right. writev/preadv/pwritev have the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone plans to do the ideal solution (ie. make ~everything iovec-based) anytime soon, it's great.
otherwise, i'd suggest to merge this PR for now.
micropython repl (with micropython/micropython#13676) on toywasm and wamr.
no. |
well, the original implementation was actually broken for regular files as well wrt read/write atomicity. |
if (iovcnt == 1) | ||
{ | ||
return read(fildes, iov->iov_base, iov->iov_len); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this optimization necessary? Would it be better to remove it and use common logic below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "common logic" below involves malloc. it's better to avoid it when easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yamt Agreed, but if we can eliminate malloc (see other feedback here) then this optimization can be eliminated also.
I see why iovec-based multiple reads could be a problem (e.g., read data available == size of iovec, then we get stuck in blocked read). Ok, I am fine to merge this PR with malloc, and maybe someone can solve how to remove malloc later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why iovec-based multiple reads could be a problem (e.g., read data available == size of iovec, then we get stuck in blocked read). Ok, I am fine to merge this PR with malloc, and maybe someone can solve how to remove malloc later.
i guess that tricks involving multiple read() calls for iovcnt>1 are all broken in one way or another.
implementing readv() on the top of read() is a design mistake. it should be the opposite.
* buffer reads. | ||
*/ | ||
nread = read(fildes, buffer, total_size); | ||
if (nread == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to do "if (nread < 0)" (defensive coding)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want that way, we should return -1 instead of nread i guess.
anyway, it isn't this PR is about.
{ | ||
return ntotal; | ||
} | ||
buffer = malloc(total_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to avoid malloc is to read directly into each iov and avoid the copy. In other words: iterate iov array. For each iov, if iov_len > 0, read() iov_len bytes into that iov. If any read() returns less than iov_len bytes, break from loop. Keep running counter of total bytes read.
I don't know if this is permissible under POSIX but it (1) avoids malloc, and (2) avoids copying. Even if there is added overhead from multiple calls to read(), it might be better than overhead of malloc(), free(), and copying.
can we make a decision? |
ping |
I would prefer to just read the first memory block, which conform the spec and don't need malloc. |
it doesn't conform the spec. |
Why? |
consider:
|
if spec enforce the implementation must read all available data, please point out the statement.
since udp already support sendmsg, the right fix is mapping writev to sendmsg by checking the fd type is socket.
it's caller responsibility to ensure the atomicity if readv/writev just can finish the partial job. |
It's better to do the different action base on the fd type before we add readv/writev to file_operation:
of course, it may need to move readv/writev from libc to fs/vfs to do this type of dispatch. |
i don't know where/if it's explicitly specified in the spec.
udp is just an example.
no. |
see #12674 (comment) |
i don't think it's a good idea to block an obvious fix like this for months just saying "there can be a better solution" w/o actually providing any better solutions. if this PR was merged two months ago, i might even have implemented the ideal iov-based solution at least partly in the two months. |
But the spec explicitly allows return the short length. Caller always need handle the short return correctly.
Since this solution isn't perfect, especially the allocation happens in the read/write path, I hesitate to merge this change, but other maintainers could merge it if they think it's OK. |
whatever the standard allows, i don't think it's a good idea to break the semantics which real applications have been relying on for decades. |
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. #12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
since the better approach is merged(#13498), let's close this pr. |
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
This would fix readv/writev issues mentioned in apache#12674. (only for this specific driver though. with this approach, we basically have to fix every single drivers and filesystems.) Lightly tested on the serial console, using micropython REPL on toywasm with esp32s3-devkit:toywasm, which used to be suffered by the readv issue.
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache/nuttx#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache/nuttx#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
currently, nuttx implements readv/writev on the top of read/write. while it might work for the simplest cases, it's broken by design. for example, it's impossible to make it work correctly for files which need to preserve data boundaries without allocating a single contiguous buffer. (udp socket, some character devices, etc) this change is a start of the migration to a better design. that is, implement read/write on the top of readv/writev. to avoid a single huge change, following things will NOT be done in this commit: * fix actual bugs caused by the original readv-based-on-read design. (cf. apache/nuttx#12674) * adapt filesystems/drivers to actually benefit from the new interface. (except a few trivial examples) * eventually retire the old interface. * retire read/write syscalls. implement them in libc instead. * pread/pwrite/preadv/pwritev (except the introduction of struct uio, which is a preparation to back these variations with the new interface.)
Summary
The original implementation seems to have an assumption that the file is a regular file.
This commit re-implements it so that it can deal with tty, sockets, etc.
Impact
Testing