Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fparser will potentially read many unrelated and unexpected files. #385

Open
hiker opened this issue Jan 12, 2023 · 4 comments
Open

fparser will potentially read many unrelated and unexpected files. #385

hiker opened this issue Jan 12, 2023 · 4 comments

Comments

@hiker
Copy link
Collaborator

hiker commented Jan 12, 2023

When a use statement is encountered, fparser will try to find & parse the sources for this module:

readfortran.py:
================
    def find_module_source_file(self, mod_name):
...
            fn = None
            for d in self.include_dirs:
                fn = get_module_file(mod_name, d)
                if fn is not None:
                    return fn
...
utils.py:
=========
module_file_extensions = [".f", ".f90", ".f95", ".f03", ".f08"]
...
def get_module_file(name, directory, _cache={}):
    fn = _cache.get(name, None)
    if fn is not None:
        return fn
    if name.endswith("_module"):
        ...  Not used in psyclone
    files = []
    for ext in module_file_extensions:
        files += glob.glob(os.path.join(directory, "*" + ext))
    for fn in files:
        if module_in_file(name, fn):
            _cache[name] = fn
            return fn
    return None
...
def module_in_file(name, filename):
    name = name.lower()
    pattern = re.compile(r"\s*module\s+(?P<name>[a-z]\w*)", re.I).match
    encoding = {"encoding": "UTF-8"}
    f = io.open(filename, "r", **encoding)
    for line in f:
        m = pattern(line)
        if m and m.group("name").lower() == name:
            f.close()
            return filename
    f.close()

The constructor of FortranReaderBase sets . as include directory, and the constructor of FortranFileReader appends the directory of the parsed file if no explicit include directories are set:

        if include_dirs is None:
            self.include_dirs.insert(0, os.path.dirname(self.id))

The result of this is that whenever a use statement is encountered, fparser will read any .f, .f90,... file in the current directory (and the directory in which the parsed file is) and run a regexp search to find the module. In my example, I had >300 automatically created .f90 files in my current directory, resulting in a significant slowdown.

A quick look indicates that this seems to be used to verify import statements, e.g. make sure if item x is imported from a module, x is indeed exported from the module. See class Use(Statement), around line 1295 for details. This is part of the analyze phase of fparser, which had the interesting comment in fparser parsefortran.py:

        Attempts to analyse the parsed Fortran. It is not clear what for.

I didn't check what else besides import statements will be checked by that analyze phase.


Summary: the way fparser is used in PSyclone results in potentially reading many unexpected (and unrelated) files (all .f/.f90/.. files in the current directory), which can result in a slowdown. LFRic is luckily not badly affected at the moment: when parsing the algorithm layer, an include path is provided by PSyclone (which is the kernel search directory):
parse/utils.py:

        reader = FortranFileReader(filename, include_dirs=config.include_paths)

This will overwrite the default include search path in fparser (in LFRic this include path is the .../kernel/ directory).So parsing the .x90 files will only read .f90 files in the .../kernel directory, and all files that directory are .F90, so de-facto this will not read any additional files (as long as there are no .f90 kernel files) - the only overhead is some glob searches for non-existent files.

BUT, when parsing a kernel file, which PSyclone does, no include directory is set in parse/kernel.py:

def get_kernel_parse_tree
...
        parse_tree = fpapi.parse(filepath)

Therefore the defaults will be used, which is the directory in which the kernel is, and .. So for any kernel all .f/.f90/... files in .../kernel and . will be searched. Again, since all kernels in LFRic are .F90, no files are found, nothing else is read.

So we are mostly fine for now (till LFRic adds kernels with .f90 extension 😁, and/or the build system should have .f90 files in the current directory). Actually, why are the files called F90? As far as I can tell they do not contain preprocessor directives 😁

There is still parse/lfric_builtins.f90, which will be read (again) while searching for kernel.mod while parsing the same file. When building gungho, I had 1762 reads of that file (i.e. reading this file to find modules while this file is being parsed by PSyclone).

  1. I don't think not finding the module files does any harm ... .since atm already the modules files are not found anyway.
  2. Does the analyze phase actually contribute anything? Maybe we can make fparser a little bit faster by disabling this??
  3. We should plan for the fact that at some stage we might have source files in the current directory (i.e. consider non LFRic apps that might build in the source tree??), or lower case f90 files in the directory of the parsed files. Maybe add an option to disable searching for module files?? Or at least document this issue and make it more explicit to avoid surprises.
@arporter
Copy link
Member

Wow. Thanks very much for digging into this @hiker. Just to clarify, does fparser actually proceeds to parse any file returned by find_module_source_file?

@hiker
Copy link
Collaborator Author

hiker commented Jan 12, 2023

Yes, see one/statement.py, around line 1295:

            fn = self.reader.find_module_source_file(self.name)
            if fn is not None:
...
                reader = FortranFileReader(
                    fn,
                    include_dirs=self.reader.include_dirs,
                    source_only=self.reader.source_only,
                )
                parser = FortranParser(reader)
                parser.parse()
                parser.block.a.module.update(modules)
                parser.analyze()

And then a few lines further down use this information to check the imported (and renamed) symbols with the symbols provided in the module that was just read.

This is all done in analyze, so disabling the analyze phase when calling fparser should help (though ... I just noticed that ithe code above is calling analyze for the just read module as well .... gee, that might read in half of LFRic ... if it would find the files 😁 )

@arporter
Copy link
Member

arporter commented Jan 12, 2023

Ah, if it's in one/xxx then once we move Rupert's new metadata handling into 'production' and drop fparser1 we should be OK?? (He says hopefully.)

@hiker
Copy link
Collaborator Author

hiker commented Jan 12, 2023

I just had a look, and it appears that you are right - there seems to be no analysis step in two at all, and can't see it loading any modules (I didn't realise the caller was actually in one in the debugger, I saw only Statement.py). Dang, I could have used that time a lot better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants