Identify TENDL 2017 files for GROUPR processing and data extraction #68

eitan-weinstein · 2024-08-05T21:18:17Z

Closes #51 and #52.

Adds methods to search through a directory for pairs of ENDF and PENDF files corresponding to the same isotope to tendl_processing.py.

Modifies the example case in process_fendl3.2.py to be generalized to work with all of files found from the search methods employed in the new script.

gonuke · 2024-08-05T21:58:00Z

This looks like it needs a merge/rebase to resolve a conflict

gonuke

This is another nice, well-contained addition.

I'm not sure that file-handling needs to be it's own file, but that's fine. It's all pretty specific to TENDL files, so it might make sense to put those methods in that file.

gonuke · 2024-08-05T22:06:53Z

src/DataLib/fendl32B_retrofit/file_handling.py

+    isomer_id = ''
+
+    upper_case_letters = [char for char in stem if char.isupper()]
+    lower_case_letters = [char for char in stem if char.islower()]
+    numbers = [str(char) for char in stem if char.isdigit()]
+
+    if len(lower_case_letters) == 0:
+        lower_case_letters = ['']
+    elif len(lower_case_letters) > 1:
+        isomer_id = lower_case_letters[-1]
+        lower_case_letters = lower_case_letters[:-1]
+
+    element = upper_case_letters[0] + lower_case_letters[0]
+    A = ''.join(numbers) + isomer_id
+
+    return element, A


It seems like there must be a simpler way to do this? The files are always named n-Z[z]AAA[m], right, where

Z is an uppercase letter

[z] is an optional lower case letter

A is a digit

[m] is an optional ower case letter 'm'

so why not something like:

Z_start = 2 Z_end = Z_start + 1 if !stem[Z_end].isdigit(): Z_end += 1 element = stem[Z_start:Z_end] A_start = Z_end A_end = A_start + 3 if !stem[-1].isdigit(): A_end += 1 A = stem[A_start:A_end]

regular expressions can make this even simpler in terms of number of lines, although maybe with a higher cognitive burden

gonuke · 2024-08-05T22:12:09Z

src/DataLib/fendl32B_retrofit/file_handling.py

+    directory = Path(directory)
+
+    file_info = {}
+    for file in directory.iterdir():


did you consider using directory.glob("*.endf")?

gonuke · 2024-08-05T22:16:37Z

src/DataLib/fendl32B_retrofit/process_fendl3.2.py

+        endf_path.rename(TAPE20)
+        pendf_path.rename(TAPE21)


won't this result in gradually overwriting all the files in the directory? Don't we want to copy the files to these names for processing?

gonuke · 2024-08-05T22:17:42Z

src/DataLib/fendl32B_retrofit/process_fendl3.2.py

+        endf_path.rename(TAPE20)
+        pendf_path.rename(TAPE21)


how did our prior testing work without a PENDF file? Do we really need a pendf file?

We need an ENDF and a PENDF file to run GROUPR

gonuke · 2024-08-05T22:24:31Z

src/DataLib/fendl32B_retrofit/process_fendl3.2.py

+        gendf_data = tp.iterate_MTs(MTs, endftk_file_obj, mt_dict, pKZA)
+        cumulative_data = concat([cumulative_data, gendf_data],
+                                 ignore_index=True)


Seeing how this is used across multiple nuclides, I'm not convinced that a pandas dataframe offers much more than a list of dictionaries. I think gendf_data could look like:

[ {'Parent KZA' : pkza, 'Daughger KZA' : dkza, etc.... } , {'Parent KZA' : pkza, 'Daughger KZA' : dkza, etc.... } , ... ]

and cumulative_data could be a local variable that gets appended with the new data. It will require a change to iterate_MTs() that I think will make that a little simpler too.

At the end, this list could be used to create a new dataframe and export to CVS:

pd.DataFrame(cumulative_data).to_csv(filename)

gonuke

Two last things...

gonuke · 2024-08-06T17:19:55Z

src/DataLib/fendl32B_retrofit/tendl_processing.py

+    for file in (p for p in dir.glob('*') if p.suffix in {'.tendl', '.endf'}):
+        if file.is_file() and file.with_suffix('.pendf').is_file():
+            element, A = get_isotope(file.stem)
+            file_info[f'{element}{A}'] = {
+                'Element'       :                              element,
+                'Mass Number'   :                                    A,
+                'File Paths'    :   (file, file.with_suffix('.pendf'))
+            }


I was thinking that the glob could allow you to be more selective about which files you are even looking for:

Suggested change

for file in (p for p in dir.glob('*') if p.suffix in {'.tendl', '.endf'}):

if file.is_file() and file.with_suffix('.pendf').is_file():

element, A = get_isotope(file.stem)

file_info[f'{element}{A}'] = {

'Element' : element,

'Mass Number' : A,

'File Paths' : (file, file.with_suffix('.pendf'))

}

for suffix in ['tendl', 'endf']:

for file in dir.glob(f'*.{suffix}'):

if file.with_suffix('.pendf').is_file():

element, A = get_isotope(file.stem)

file_info[f'{element}{A}'] = {

'Element' : element,

'Mass Number' : A,

'File Paths' : (file, file.with_suffix('.pendf'))

}

gonuke · 2024-08-06T17:29:07Z

src/DataLib/fendl32B_retrofit/process_fendl3.2.py

+        shutil.copy(endf_path, TAPE20)
+        shutil.copy(pendf_path, TAPE21)


I think this should work from Pathlib (trying to reduce number of dependencies, even if they're built in; I also think it's a more modern solution???):

Suggested change

shutil.copy(endf_path, TAPE20)

shutil.copy(pendf_path, TAPE21)

Path(TAPE20).write_bytes(endf_path.read_bytes())

Path(TAPE21).write_bytes(pendf_path.read_bytes())

gonuke

fixing indentiation

src/DataLib/fendl32B_retrofit/tendl_processing.py

Eitan Weinstein added 6 commits August 5, 2024 16:14

First commit for file handling

3f3a974

Keeping up with new changes in cross_sections branch.

89a84f0

Making get_isotope() method more consistent.

e1fb072

Fixing get_isotope() to allow for single letter chemical symbols.

8ccd904

Reducing redundancy in search_for_files().

faff3c9

Removing necessity of MTs being a list.

5378f83

Merging upstream.

185c428

gonuke requested changes Aug 5, 2024

View reviewed changes

Eitan Weinstein added 5 commits August 6, 2024 11:24

Responding to requested changes.

87b4165

Removing outdated pandas dependence in tendl_processing.py

c3fabd5

Simplifying search_for_files() logic.

d8126df

Fixing typo.

847ed87

Formatting fix.

27a109c

eitan-weinstein requested a review from gonuke August 6, 2024 17:01

gonuke requested changes Aug 6, 2024

View reviewed changes

Eitan Weinstein added 3 commits August 6, 2024 12:54

Responding to new round of requested changes.

1eee31c

Fixing indent.

7e0f80d

Fixing formatting for iterate_MTs().

053a10a

gonuke requested changes Aug 6, 2024

View reviewed changes

src/DataLib/fendl32B_retrofit/tendl_processing.py Outdated Show resolved Hide resolved

gonuke approved these changes Aug 6, 2024

View reviewed changes

gonuke merged commit f5c4f37 into svalinn:main Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify TENDL 2017 files for GROUPR processing and data extraction #68

Identify TENDL 2017 files for GROUPR processing and data extraction #68

eitan-weinstein commented Aug 5, 2024 •

edited

Loading

gonuke commented Aug 5, 2024

gonuke left a comment

gonuke Aug 5, 2024

gonuke Aug 5, 2024

gonuke Aug 5, 2024

gonuke Aug 5, 2024

gonuke Aug 5, 2024

eitan-weinstein Aug 6, 2024

gonuke Aug 5, 2024

gonuke left a comment

gonuke Aug 6, 2024

gonuke Aug 6, 2024

gonuke left a comment

		shutil.copy(endf_path, TAPE20)
		shutil.copy(pendf_path, TAPE21)

Identify TENDL 2017 files for GROUPR processing and data extraction #68

Identify TENDL 2017 files for GROUPR processing and data extraction #68

Conversation

eitan-weinstein commented Aug 5, 2024 • edited Loading

gonuke commented Aug 5, 2024

gonuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonuke left a comment

Choose a reason for hiding this comment

eitan-weinstein commented Aug 5, 2024 •

edited

Loading