Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add buffer size, replication, blocksize as file open options (#135) #136

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

Banyc
Copy link
Contributor

@Banyc Banyc commented Nov 13, 2024

Previously, OpenOptions did not provide a way for user to configure HDFS-specific options for its file open API. Instead, they are left with default values. According to the comment in code, it is intended to be implemented in the future.

After the change, users are allowed to specify those parameters in a with-prefixed builder pattern fashion.

@Banyc
Copy link
Contributor Author

Banyc commented Nov 13, 2024

as per issue #135

@Xuanwo
Copy link
Owner

Xuanwo commented Nov 13, 2024

Seems check failed on the latest rust. Would you like to start a new PR for that?

@Banyc
Copy link
Contributor Author

Banyc commented Nov 13, 2024

@Xuanwo yeah sure

Banyc added a commit to Banyc/hdrs that referenced this pull request Nov 13, 2024
@Banyc
Copy link
Contributor Author

Banyc commented Nov 13, 2024

@Xuanwo here you go

/// Pass `0` if you want to use the default configured values.
///
/// `0` by default.
pub fn with_buffer_size(&mut self, buffer_size: c_int) -> &mut Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, how about using rust's own types and convert into c_int instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a split about what primitive Rust type it really means in different platforms. "avr" and "msp430" say they are i16 but the rest of the platforms mean i32. I'm not confident if we should choose the strictest route to use i16 or follow i32 which is the popular one.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current idea is to use usize here and return an error during opening.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that hadoop itself may not be able to run on avr or msp430.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah you're right ill start working on it

/// Pass `0` if you want to use the default configured values.
///
/// `0` by default.
pub fn with_replication(&mut self, replication: c_short) -> &mut Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c_short is currently uniquely points to i16 for all platforms. But in case there would come with a new platform in the future and it would refer to another type otherwise, I think we should just stick to c_short for the maximum compatibility.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #136 (comment), using ffi in the public API of a high-level crate like hdrs doesn't sit well with me.

/// Pass `0` if you want to use the default configured values.
///
/// `0` by default.
pub fn with_blocksize(&mut self, blocksize: i32) -> &mut Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if blocksize < 0? What doesn 0 mean, the blocksize in hdfs config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the type should be u31 according to the valid values in the "Block Size" section. 0 is defined to be a default in C API. Using the web API as a cross-reference, 0 means "Specified in the configuration".

Since there isn't a u31 available, using u16 seems too restrictive while u32 straight up welcoming invalid values, I suggest we just do it in the C API way and have users informed by Error::last_os_error() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if users pass in any invalid values.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can verify those things while calling open.

tests/main.rs Show resolved Hide resolved
@Banyc
Copy link
Contributor Author

Banyc commented Nov 13, 2024

I have revised the integer types and changed the three all to usize. However, 0 is still a valid API value acting as a sentinel value for the value "Specified in the configuration".

Copy link
Owner

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thank you @Banyc for working on this.

@Xuanwo Xuanwo merged commit ce2ddc3 into Xuanwo:main Nov 13, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants