Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing column separator in the file #5

Open
wants to merge 3 commits into
base: worms
Choose a base branch
from

Conversation

maxpatiiuk
Copy link
Member

No description provided.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have introduced an extra tab character:

image

Fixed the columns being off bug by refactoring the code (original implementation used string concatenation - bad idea - I belive I used it because I had to save on ram - needed to fit into a 2GB ram limit on that VM - funny thing is, ChatGPT says this is actually eating more ram - well, we didn't have ChatGPT back then)

> When you store the data as a nested array, each element in the outer array is a reference to a separate inner array, and each inner array contains the strings. This means that when you append a new string to an inner array, you are only modifying that specific inner array, and not creating a new string object or copying the entire array.
>
> On the other hand, if you store the data as a single string, every time you append a new string, you would need to create a new string object with the concatenated values of the existing string and the new string. This can consume a lot of memory, especially if the strings are large.

That's why programmers should avoid premature optimization, should test the impact of their changes (or stop worrying about changes if there is no noticeable impact) and read how the code is executed on the low-level to make correct tradeoffs
Fixed duplicate "Skipping node with wrong rank order" messages
Made "Skipping node with wrong rank order" include more information
@maxpatiiuk
Copy link
Member Author

Fixed duplicate "Skipping node with wrong rank order" messages
Made "Skipping node with wrong rank order" include more information
Fixed the columns being off bug by refactoring the code (original implementation used string concatenation - bad idea - I belive I used it because I had to save on ram - needed to fit into a 2GB ram limit on that VM - funny thing is, ChatGPT says this is actually eating more ram - well, we didn't have ChatGPT back then)

When you store the data as a nested array, each element in the outer array is a reference to a separate inner array, and each inner array contains the strings. This means that when you append a new string to an inner array, you are only modifying that specific inner array, and not creating a new string object or copying the entire array.

On the other hand, if you store the data as a single string, every time you append a new string, you would need to create a new string object with the concatenated values of the existing string and the new string. This can consume a lot of memory, especially if the strings are large.

That's why programmers should avoid premature optimization, should test the impact of their changes (or stop worrying about changes if there is no noticeable impact) and read how the code is executed on the low-level to make correct tradeoffs

--

To test:
Make sure this is not present in the file when selected Mollusca -> Bivalvia (notice the glued URL and incertae sedis):

https://www.marinespecies.org/aphia.php?p=taxdetails&id=105incertae sedis

Make sure columns are not off by one:
image

@maxpatiiuk
Copy link
Member Author

If this code works for worms, I can do the same modification for catalog of line

maxpatiiuk added a commit that referenced this pull request May 2, 2023
Same as #5, but for CoL instead
of WoRMS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants