fparser will potentially read many unrelated and unexpected files. #385

hiker · 2023-01-12T03:00:54Z

When a use statement is encountered, fparser will try to find & parse the sources for this module:

readfortran.py:
================
    def find_module_source_file(self, mod_name):
...
            fn = None
            for d in self.include_dirs:
                fn = get_module_file(mod_name, d)
                if fn is not None:
                    return fn
...
utils.py:
=========
module_file_extensions = [".f", ".f90", ".f95", ".f03", ".f08"]
...
def get_module_file(name, directory, _cache={}):
    fn = _cache.get(name, None)
    if fn is not None:
        return fn
    if name.endswith("_module"):
        ...  Not used in psyclone
    files = []
    for ext in module_file_extensions:
        files += glob.glob(os.path.join(directory, "*" + ext))
    for fn in files:
        if module_in_file(name, fn):
            _cache[name] = fn
            return fn
    return None
...
def module_in_file(name, filename):
    name = name.lower()
    pattern = re.compile(r"\s*module\s+(?P<name>[a-z]\w*)", re.I).match
    encoding = {"encoding": "UTF-8"}
    f = io.open(filename, "r", **encoding)
    for line in f:
        m = pattern(line)
        if m and m.group("name").lower() == name:
            f.close()
            return filename
    f.close()

The constructor of FortranReaderBase sets . as include directory, and the constructor of FortranFileReader appends the directory of the parsed file if no explicit include directories are set:

        if include_dirs is None:
            self.include_dirs.insert(0, os.path.dirname(self.id))

The result of this is that whenever a use statement is encountered, fparser will read any .f, .f90,... file in the current directory (and the directory in which the parsed file is) and run a regexp search to find the module. In my example, I had >300 automatically created .f90 files in my current directory, resulting in a significant slowdown.

A quick look indicates that this seems to be used to verify import statements, e.g. make sure if item x is imported from a module, x is indeed exported from the module. See class Use(Statement), around line 1295 for details. This is part of the analyze phase of fparser, which had the interesting comment in fparser parsefortran.py:

        Attempts to analyse the parsed Fortran. It is not clear what for.

I didn't check what else besides import statements will be checked by that analyze phase.

Summary: the way fparser is used in PSyclone results in potentially reading many unexpected (and unrelated) files (all .f/.f90/.. files in the current directory), which can result in a slowdown. LFRic is luckily not badly affected at the moment: when parsing the algorithm layer, an include path is provided by PSyclone (which is the kernel search directory):
parse/utils.py:

        reader = FortranFileReader(filename, include_dirs=config.include_paths)

This will overwrite the default include search path in fparser (in LFRic this include path is the .../kernel/ directory).So parsing the .x90 files will only read .f90 files in the .../kernel directory, and all files that directory are .F90, so de-facto this will not read any additional files (as long as there are no .f90 kernel files) - the only overhead is some glob searches for non-existent files.

BUT, when parsing a kernel file, which PSyclone does, no include directory is set in parse/kernel.py:

def get_kernel_parse_tree
...
        parse_tree = fpapi.parse(filepath)

Therefore the defaults will be used, which is the directory in which the kernel is, and .. So for any kernel all .f/.f90/... files in .../kernel and . will be searched. Again, since all kernels in LFRic are .F90, no files are found, nothing else is read.

So we are mostly fine for now (till LFRic adds kernels with .f90 extension 😁, and/or the build system should have .f90 files in the current directory). Actually, why are the files called F90? As far as I can tell they do not contain preprocessor directives 😁

There is still parse/lfric_builtins.f90, which will be read (again) while searching for kernel.mod while parsing the same file. When building gungho, I had 1762 reads of that file (i.e. reading this file to find modules while this file is being parsed by PSyclone).

I don't think not finding the module files does any harm ... .since atm already the modules files are not found anyway.
Does the analyze phase actually contribute anything? Maybe we can make fparser a little bit faster by disabling this??
We should plan for the fact that at some stage we might have source files in the current directory (i.e. consider non LFRic apps that might build in the source tree??), or lower case f90 files in the directory of the parsed files. Maybe add an option to disable searching for module files?? Or at least document this issue and make it more explicit to avoid surprises.

The text was updated successfully, but these errors were encountered:

arporter · 2023-01-12T09:43:58Z

Wow. Thanks very much for digging into this @hiker. Just to clarify, does fparser actually proceeds to parse any file returned by find_module_source_file?

hiker · 2023-01-12T12:41:37Z

Yes, see one/statement.py, around line 1295:

            fn = self.reader.find_module_source_file(self.name)
            if fn is not None:
...
                reader = FortranFileReader(
                    fn,
                    include_dirs=self.reader.include_dirs,
                    source_only=self.reader.source_only,
                )
                parser = FortranParser(reader)
                parser.parse()
                parser.block.a.module.update(modules)
                parser.analyze()

And then a few lines further down use this information to check the imported (and renamed) symbols with the symbols provided in the module that was just read.

This is all done in analyze, so disabling the analyze phase when calling fparser should help (though ... I just noticed that ithe code above is calling analyze for the just read module as well .... gee, that might read in half of LFRic ... if it would find the files 😁 )

arporter · 2023-01-12T13:09:25Z

Ah, if it's in one/xxx then once we move Rupert's new metadata handling into 'production' and drop fparser1 we should be OK?? (He says hopefully.)

hiker · 2023-01-12T13:31:36Z

I just had a look, and it appears that you are right - there seems to be no analysis step in two at all, and can't see it loading any modules (I didn't realise the caller was actually in one in the debugger, I saw only Statement.py). Dang, I could have used that time a lot better :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fparser will potentially read many unrelated and unexpected files. #385

fparser will potentially read many unrelated and unexpected files. #385

hiker commented Jan 12, 2023

arporter commented Jan 12, 2023

hiker commented Jan 12, 2023

arporter commented Jan 12, 2023 •

edited

Loading

hiker commented Jan 12, 2023

fparser will potentially read many unrelated and unexpected files. #385

fparser will potentially read many unrelated and unexpected files. #385

Comments

hiker commented Jan 12, 2023

arporter commented Jan 12, 2023

hiker commented Jan 12, 2023

arporter commented Jan 12, 2023 • edited Loading

hiker commented Jan 12, 2023

arporter commented Jan 12, 2023 •

edited

Loading