Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem with large (> 2GB) files on Linux. #2739

Open
1 of 6 tasks
alexiv1965 opened this issue Nov 6, 2024 · 4 comments
Open
1 of 6 tasks

Potential problem with large (> 2GB) files on Linux. #2739

alexiv1965 opened this issue Nov 6, 2024 · 4 comments
Assignees
Labels
waiting Waiting for the original poster (in most cases) or something else

Comments

@alexiv1965
Copy link

alexiv1965 commented Nov 6, 2024

Proposal:

Almost all read actions in searchd go through sphReadThrottled() (with write actions - all the same). Let's note that Linux read() function (called later) cannot read one piece that is larger 2GB.

Here is selection of "step size" for single read action. The rt_merge_maxiosize config option if set - will limit the size of every read/write actions by its value. But this option is not set by default and very few admins will take it into account. So, most often scenario is zero value for g_iMaxIOSize global variable, and hence - the whole size will be read in one chunk. Thereafter it'll produce very unclear read error.

I suppose to set strict upper limit for g_iMaxIOSize value (not zero) - to make sure that Linux read() can do its job without error - this value should be a little bit less than 2GB - see read() man for details.

By the way: the following call of sphRead() have very rough fix for Windows warning: cast to int of read chunk size. Without described above solution it leads to more unclear read errors: large positive value is casted to negative value and then in call to read() it is casted to very-very large positive value. Suppose this should be also fixed.

Once again: the same option and global variable rules both read and write actions.

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
  • OpenAPI YAML updated and issue created to rebuild clients
@sanikolaev
Copy link
Collaborator

The related message in TG - https://t.me/manticore_chat/7200/16974

@sanikolaev
Copy link
Collaborator

Thereafter it'll produce very unclear read error.

Please provide more details about it. What was the error in your case?

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Nov 7, 2024
@alexiv1965
Copy link
Author

alexiv1965 commented Nov 8, 2024

That is error in log: "global IDF unavailable - IGNORING". It concerns only my case: I've tried to set up very large global.idf file for the index, but this file cannot be read. In other situations there'll be other error messages.

@sanikolaev
Copy link
Collaborator

Can you share this idf file by uploading it to our write-only S3 storage? https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting Waiting for the original poster (in most cases) or something else
Projects
None yet
Development

No branches or pull requests

3 participants