Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify TENDL 2017 files for GROUPR processing and data extraction #68

Merged
merged 15 commits into from
Aug 6, 2024

Conversation

eitan-weinstein
Copy link
Contributor

@eitan-weinstein eitan-weinstein commented Aug 5, 2024

Closes #51 and #52.

Adds methods to search through a directory for pairs of ENDF and PENDF files corresponding to the same isotope to tendl_processing.py.

Modifies the example case in process_fendl3.2.py to be generalized to work with all of files found from the search methods employed in the new script.

@gonuke
Copy link
Member

gonuke commented Aug 5, 2024

This looks like it needs a merge/rebase to resolve a conflict

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another nice, well-contained addition.

I'm not sure that file-handling needs to be it's own file, but that's fine. It's all pretty specific to TENDL files, so it might make sense to put those methods in that file.

Comment on lines 18 to 33
isomer_id = ''

upper_case_letters = [char for char in stem if char.isupper()]
lower_case_letters = [char for char in stem if char.islower()]
numbers = [str(char) for char in stem if char.isdigit()]

if len(lower_case_letters) == 0:
lower_case_letters = ['']
elif len(lower_case_letters) > 1:
isomer_id = lower_case_letters[-1]
lower_case_letters = lower_case_letters[:-1]

element = upper_case_letters[0] + lower_case_letters[0]
A = ''.join(numbers) + isomer_id

return element, A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there must be a simpler way to do this? The files are always named n-Z[z]AAA[m], right, where

  • Z is an uppercase letter
  • [z] is an optional lower case letter
  • A is a digit
  • [m] is an optional ower case letter 'm'

so why not something like:

Z_start = 2
Z_end = Z_start + 1
if !stem[Z_end].isdigit():
    Z_end += 1
element = stem[Z_start:Z_end]

A_start = Z_end
A_end = A_start + 3
if !stem[-1].isdigit():
   A_end += 1
A = stem[A_start:A_end]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular expressions can make this even simpler in terms of number of lines, although maybe with a higher cognitive burden

directory = Path(directory)

file_info = {}
for file in directory.iterdir():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider using directory.glob("*.endf")?

Comment on lines 23 to 24
endf_path.rename(TAPE20)
pendf_path.rename(TAPE21)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this result in gradually overwriting all the files in the directory? Don't we want to copy the files to these names for processing?

Comment on lines 23 to 24
endf_path.rename(TAPE20)
pendf_path.rename(TAPE21)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did our prior testing work without a PENDF file? Do we really need a pendf file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an ENDF and a PENDF file to run GROUPR

Comment on lines 37 to 39
gendf_data = tp.iterate_MTs(MTs, endftk_file_obj, mt_dict, pKZA)
cumulative_data = concat([cumulative_data, gendf_data],
ignore_index=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how this is used across multiple nuclides, I'm not convinced that a pandas dataframe offers much more than a list of dictionaries. I think gendf_data could look like:

[ {'Parent KZA' : pkza, 'Daughger KZA' : dkza, etc.... } , 
  {'Parent KZA' : pkza, 'Daughger KZA' : dkza, etc.... } ,  ... ]

and cumulative_data could be a local variable that gets appended with the new data. It will require a change to iterate_MTs() that I think will make that a little simpler too.

At the end, this list could be used to create a new dataframe and export to CVS:

pd.DataFrame(cumulative_data).to_csv(filename)

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last things...

Comment on lines 52 to 59
for file in (p for p in dir.glob('*') if p.suffix in {'.tendl', '.endf'}):
if file.is_file() and file.with_suffix('.pendf').is_file():
element, A = get_isotope(file.stem)
file_info[f'{element}{A}'] = {
'Element' : element,
'Mass Number' : A,
'File Paths' : (file, file.with_suffix('.pendf'))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the glob could allow you to be more selective about which files you are even looking for:

Suggested change
for file in (p for p in dir.glob('*') if p.suffix in {'.tendl', '.endf'}):
if file.is_file() and file.with_suffix('.pendf').is_file():
element, A = get_isotope(file.stem)
file_info[f'{element}{A}'] = {
'Element' : element,
'Mass Number' : A,
'File Paths' : (file, file.with_suffix('.pendf'))
}
for suffix in ['tendl', 'endf']:
for file in dir.glob(f'*.{suffix}'):
if file.with_suffix('.pendf').is_file():
element, A = get_isotope(file.stem)
file_info[f'{element}{A}'] = {
'Element' : element,
'Mass Number' : A,
'File Paths' : (file, file.with_suffix('.pendf'))
}

Comment on lines 23 to 24
shutil.copy(endf_path, TAPE20)
shutil.copy(pendf_path, TAPE21)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work from Pathlib (trying to reduce number of dependencies, even if they're built in; I also think it's a more modern solution???):

Suggested change
shutil.copy(endf_path, TAPE20)
shutil.copy(pendf_path, TAPE21)
Path(TAPE20).write_bytes(endf_path.read_bytes())
Path(TAPE21).write_bytes(pendf_path.read_bytes())

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing indentiation

src/DataLib/fendl32B_retrofit/tendl_processing.py Outdated Show resolved Hide resolved
@gonuke gonuke merged commit f5c4f37 into svalinn:main Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define file locations for all possible files in workflow
2 participants