Extract signature files from implementation file #16450

nojaf · 2023-12-18T13:14:01Z

nojaf
Dec 18, 2023
Collaborator

Continuing the conversation of #16436 (comment).

Short recap: in this experiment, I've extracted in-memory signature files from the implementation files.
Having those allows could potentially be very beneficial in graph-based type-checking.

One small example I did:

sequential:

|Typecheck                           |  2.1427|  1.5314|    492|      1|      0|      0|    488|     59|

graph-based (with in-memory signatures):

|Typecheck                           |  1.1238|  0.5907|    502|      1|      0|      0|    491|     59|

Consider this cautiously, as the data is insufficient for definitive conclusions. However, it hints at the potential benefits when every file is signed. Interestingly, this is achieved without any actual signature files, thus avoiding redundancy.

</recap>

Replying to Vlad's fair critique:

Subjectively, though, it feels very unnatural in F# to me, since HM explicitly doesn't require any types to be specified (only in some corner cases, when compiler has trouble inferring it).

Indeed, there's a mix of emotions regarding F#'s magic. It seems suitable for extensive enterprise-level code. Typing everything at the top level somewhat resembles using signature files, which are employed by some, like in the compiler.

I'm not advocating for an immediate shift towards this approach. Even if it became standard overnight, there would be a significant effort required to update existing codebases with complete typings. Without a dedicated migration tool, I doubt it would gain widespread acceptance.

Conversely, this method would allow for parallel type-checking of every implementation file, subtly addressing the initial challenge of relocating certain checks to a post-inference stage, depending on how you look at it."

charlesroddie · 2023-12-19T12:05:33Z

charlesroddie
Dec 19, 2023

This is what I have been looking for for a long time without taking the time to articulate it, so thanks!

Signature files address the performance side but not readability, since they separate documentation (type annotations and xml docs) and implementations into different files, making it hard to read code since you have to read in two places simultaneously.

To me the approach of fully annotating objects in code would give the perfect combination of compiler performance and code readability.

0 replies

nojaf · 2023-12-21T09:56:27Z

nojaf
Dec 21, 2023
Collaborator Author

We've talked about this a bit more inside G-Research, and there does seem to be an interest in exploring this.

It would be great if it can be applied incrementally on a file-per-file level. Turning it on for an entire project might be overkill. The same rules more or less apply when considering a signature file, so not absolutely every file would be a good candidate to be fully top-level typed.
You probably want a signature file when:

the file is larger than 1000 lines.
the public surface area of a file is smaller than the private area.
the file takes a long time to type-check.
the file has many dependencies.
the file is part of the longest chain in the graph.

In G-Research/fsharp-analyzers#47, I started playing around with finding untyped top-level functions. To get an initial idea of what the changes to a file would be to be able to extract a signature file from it.

Of course, knowing what is still missing doesn't put the file in the desired end state. Some automated process to convert the file would be ideal. As this involves multiple modifications, I don't think this is easy to pull off in a command-line tool. This might be more interesting to have as IDE actions. I might experiment with something in FSAC.

Afterwards, if your files were processed and are now fully typed, you still want to have some sort of awareness in your IDE. Warnings should be raised when a non-private function is missing a type annotation and the IDE should make it very visible whether a function is exposed or not.

I'll chronicle some findings in this discussion. Feel free to reach out to me if there is any interest in collaborating on this experiment.

2 replies

charlesroddie Dec 21, 2023

Great, so there are 3 aspects to this:

Analyzer to detect untyped top-level objects
Compiler improvements to avoid processing expressions beyond the explicit top-level information unless required
(Not essential but important for ease of use): tool to add the type information automatically.

@nojaf @vzarytovskii in the current state of the compiler, if 1. is done and objects are explicitly typed, would this already lead to faster compilation times? Intuitively the compiler would have less inference to do. And the benefits if any would come from annotation of inputs (generic parameters and inputs to functions and methods) rather than outputs?

nojaf Dec 21, 2023
Collaborator Author

No, the experiment would need to be merged in for faster compilation. And this would need to be hooked into the Transparent compiler for type-checking in the IDE to improve.

So, right now, just typing all your files won't make any difference.

The main benefit of this approach comes from more parallelization in the project and faster checking of dependent files.

Imagine a project with files A.fs, B.fs, C.fs, D.fs, E.fs. E depends on D, D on C and so on.
If everything is top-level typed, all files could be type-checked in parallel during compilation.

When you open file E.fs in the IDE, you would not need to wait until all files are type-checked, it would have the same effect as if all files had signatures. In that case, the editor tooling uses that signature information to type-check the E.fs itself. Think of those as the tl;dr version of the file.

nojaf · 2023-12-21T10:06:21Z

nojaf
Dec 21, 2023
Collaborator Author

One edge case I can already see is that constraints would need to be listed explicitly.
Consider:

let areEqual a b = a.Equals(b)

Add type annotations:

let areEqual (a:'a) (b:'b):bool = a.Equals(b)

would lead to the in-memory signature of

val areEqual: 'a -> 'b -> bool

But this doesn't work because 'a needs equality constraint:

  B.fsi(4, 15): [FS0340] The signature and implementation are not compatible because the declaration of the type parameter 'a' requires a constraint of the form 'a: equality

It needs to produce:

val areEqual<'a, 'b when 'a : equality>: 'a -> 'b -> bool

so the type annotation needs to be:

let areEqual<'a, 'b when 'a : equality> (a:'a) (b:'b):bool = a.Equals(b)

I think all the information is present to deduce this in the transformation step, however, it is a non-trivial situation to implement.

0 replies

nojaf · 2023-12-26T09:51:34Z

nojaf
Dec 26, 2023
Collaborator Author

Alright, I was able to annotate everything in Fantomas.Core.

I first detected every missing type information via an analyzer.
Then I was able to add in most of the types via a code fix in FsAutocomplete.

The nice thing here, is that I was able to re-use the detecting algorithm in both the analyzer as the code fix.

After some missing tweaks in the compiler experiment, I was able to compile the project:

Before (sequential):

--------------------------------------------------------------------------------------------------------
|Phase name                          |Elapsed |Duration| WS(MB)|  GC0  |  GC1  |  GC2  |Handles|Threads|
|------------------------------------|--------|--------|-------|-------|-------|-------|-------|-------|
|Import mscorlib+FSharp.Core         |  0.3465|  0.3309|    181|      0|      0|      0|    435|     43|
|Parse inputs                        |  0.6338|  0.2784|    302|      0|      0|      0|    495|     59|
|Import non-system references        |  0.6700|  0.0333|    334|      0|      0|      0|    499|     59|
|Typecheck                           |  4.3637|  3.6906|   1168|      2|      1|      1|    590|     79|
|Typechecked                         |  4.3685|  0.0019|   1168|      0|      0|      0|    590|     79|
|Write Interface File                |  4.3716|  0.0001|   1168|      0|      0|      0|    590|     79|
|Write XML doc signatures            |  4.3813|  0.0072|   1169|      0|      0|      0|    590|     79|
|Write XML docs                      |  4.3901|  0.0063|   1170|      0|      0|      0|    590|     79|
|Encode Interface Data               |  4.5232|  0.1303|   1221|      0|      0|      0|    591|     79|
|Optimizations                       |  5.0245|  0.4983|   1458|      0|      0|      0|    734|     79|
|Ending Optimizations                |  5.0276|  0.0001|   1458|      0|      0|      0|    734|     79|
|Encoding OptData                    |  5.0396|  0.0092|   1460|      0|      0|      0|    734|     79|
|TailCall Checks                     |  5.0611|  0.0186|   1464|      0|      0|      0|    734|     79|
|TAST -> IL                          |  6.1621|  1.0981|   1859|      0|      0|      0|    734|     79|
|>Write Started                      |  6.1830|  0.0097|   1861|      0|      0|      0|    736|     79|
|>Module Generation Preparation      |  6.1894|  0.0035|   1863|      0|      0|      0|    736|     79|
|>Module Generation Pass 1           |  6.2208|  0.0286|   1872|      0|      0|      0|    736|     79|
|>Module Generation Pass 2           |  6.5756|  0.3518|   2047|      0|      0|      0|    738|     79|
|>Module Generation Pass 3           |  6.5850|  0.0065|   2048|      0|      0|      0|    738|     79|
|>Module Generation Pass 4           |  6.5911|  0.0033|   2051|      0|      0|      0|    738|     79|
|>Finalize Module Generation Results |  6.5940|  0.0002|   2051|      0|      0|      0|    738|     79|
|>Generated Tables and Code          |  6.5999|  0.0033|   2051|      0|      0|      0|    738|     79|
|>Layout Header of Tables            |  6.6031|  0.0006|   2051|      0|      0|      0|    738|     79|
|>Build String/Blob Address Tables   |  6.6170|  0.0114|   2056|      0|      0|      0|    738|     79|
|>Sort Tables                        |  6.6201|  0.0003|   2056|      0|      0|      0|    738|     79|
|>Write Header of tablebuf           |  6.6317|  0.0089|   2058|      0|      0|      0|    738|     79|
|>Write Tables to tablebuf           |  6.6344|  0.0000|   2058|      0|      0|      0|    738|     79|
|>Layout Metadata                    |  6.6370|  0.0000|   2058|      0|      0|      0|    738|     79|
|>Write Metadata Header              |  6.6394|  0.0001|   2058|      0|      0|      0|    738|     79|
|>Write Metadata Tables              |  6.6420|  0.0001|   2058|      0|      0|      0|    738|     79|
|>Write Metadata Strings             |  6.6449|  0.0003|   2058|      0|      0|      0|    738|     79|
|>Write Metadata User Strings        |  6.6483|  0.0010|   2059|      0|      0|      0|    738|     79|
|>Write Blob Stream                  |  6.6521|  0.0013|   2060|      0|      0|      0|    738|     79|
|>Fixup Metadata                     |  6.6546|  0.0000|   2060|      0|      0|      0|    738|     79|
|>Generated IL and metadata          |  6.6716|  0.0146|   2062|      0|      0|      0|    738|     79|
|>PDB: Defined 22 documents          |  6.6752|  0.0009|   2062|      0|      0|      0|    738|     79|
|>PDB: Sorted 10449 methods          |  6.7140|  0.0362|   2074|      0|      0|      0|    738|     79|
|>PDB: Created                       |  6.7289|  0.0118|   2075|      0|      0|      0|    738|     79|
|>Layout image                       |  6.7349|  0.0031|   2075|      0|      0|      0|    738|     79|
|>Writing Image                      |  6.7399|  0.0023|   2075|      0|      0|      0|    738|     79|
|>Finalize PDB                       |  6.7430|  0.0004|   2075|      0|      0|      0|    738|     79|
|>Signing Image                      |  6.7456|  0.0001|   2075|      0|      0|      0|    737|     79|
|>Generate PDB Info                  |  6.7488|  0.0000|   2075|      0|      0|      0|    737|     79|
|Write .NET Binary                   |  6.7517|  0.5866|   2075|      0|      0|      0|    737|     79|
--------------------------------------------------------------------------------------------------------

After (extracted signatures for each file + graph based type-checking):

--------------------------------------------------------------------------------------------------------
|Phase name                          |Elapsed |Duration| WS(MB)|  GC0  |  GC1  |  GC2  |Handles|Threads|
|------------------------------------|--------|--------|-------|-------|-------|-------|-------|-------|
|Import mscorlib+FSharp.Core         |  0.3374|  0.3224|    180|      0|      0|      0|    434|     45|
|Parse inputs                        |  0.6275|  0.2814|    302|      0|      0|      0|    491|     61|
|Import non-system references        |  0.6644|  0.0338|    334|      0|      0|      0|    495|     61|
|Typecheck                           |  2.1096|  1.4423|   1339|      2|      1|      1|    747|     81|
|Typechecked                         |  2.1148|  0.0020|   1339|      0|      0|      0|    747|     81|
|Write Interface File                |  2.1177|  0.0001|   1339|      0|      0|      0|    747|     81|
|Write XML doc signatures            |  2.1271|  0.0067|   1340|      0|      0|      0|    747|     81|
|Write XML docs                      |  2.1350|  0.0052|   1340|      0|      0|      0|    747|     81|
|Encode Interface Data               |  2.2668|  0.1291|   1387|      0|      0|      0|    748|     81|
|Optimizations                       |  2.7794|  0.5098|   1618|      0|      0|      0|    891|     81|
|Ending Optimizations                |  2.7823|  0.0000|   1618|      0|      0|      0|    891|     81|
|Encoding OptData                    |  2.7947|  0.0094|   1620|      0|      0|      0|    891|     81|
|TailCall Checks                     |  2.8171|  0.0194|   1624|      0|      0|      0|    891|     81|
|TAST -> IL                          |  3.9544|  1.1345|   2022|      0|      0|      0|    894|     82|
|>Write Started                      |  3.9747|  0.0096|   2024|      0|      0|      0|    896|     82|
|>Module Generation Preparation      |  3.9810|  0.0034|   2025|      0|      0|      0|    896|     82|
|>Module Generation Pass 1           |  4.0132|  0.0293|   2036|      0|      0|      0|    896|     82|
|>Module Generation Pass 2           |  4.4350|  0.4188|   2068|      1|      1|      0|    594|     82|
|>Module Generation Pass 3           |  4.4437|  0.0058|   2068|      0|      0|      0|    594|     82|
|>Module Generation Pass 4           |  4.4489|  0.0025|   2069|      0|      0|      0|    594|     82|
|>Finalize Module Generation Results |  4.4517|  0.0002|   2069|      0|      0|      0|    594|     82|
|>Generated Tables and Code          |  4.4573|  0.0031|   2069|      0|      0|      0|    594|     82|
|>Layout Header of Tables            |  4.4605|  0.0006|   2069|      0|      0|      0|    594|     82|
|>Build String/Blob Address Tables   |  4.4734|  0.0104|   2070|      0|      0|      0|    594|     82|
|>Sort Tables                        |  4.4766|  0.0004|   2070|      0|      0|      0|    594|     82|
|>Write Header of tablebuf           |  4.4884|  0.0092|   2072|      0|      0|      0|    594|     82|
|>Write Tables to tablebuf           |  4.4912|  0.0000|   2072|      0|      0|      0|    594|     82|
|>Layout Metadata                    |  4.4937|  0.0000|   2072|      0|      0|      0|    594|     82|
|>Write Metadata Header              |  4.4963|  0.0001|   2072|      0|      0|      0|    594|     82|
|>Write Metadata Tables              |  4.4988|  0.0002|   2072|      0|      0|      0|    594|     82|
|>Write Metadata Strings             |  4.5014|  0.0004|   2072|      0|      0|      0|    594|     82|
|>Write Metadata User Strings        |  4.5047|  0.0010|   2074|      0|      0|      0|    594|     82|
|>Write Blob Stream                  |  4.5084|  0.0014|   2075|      0|      0|      0|    594|     82|
|>Fixup Metadata                     |  4.5108|  0.0000|   2058|      0|      0|      0|    594|     82|
|>Generated IL and metadata          |  4.5282|  0.0148|   2059|      0|      0|      0|    594|     82|
|>PDB: Defined 22 documents          |  4.5320|  0.0009|   2059|      0|      0|      0|    594|     82|
|>PDB: Sorted 10452 methods          |  4.5676|  0.0330|   2062|      0|      0|      0|    594|     82|
|>PDB: Created                       |  4.5828|  0.0123|   2063|      0|      0|      0|    594|     82|
|>Layout image                       |  4.5887|  0.0032|   2063|      0|      0|      0|    594|     82|
|>Writing Image                      |  4.5938|  0.0023|   2063|      0|      0|      0|    594|     82|
|>Finalize PDB                       |  4.5969|  0.0005|   2063|      0|      0|      0|    594|     82|
|>Signing Image                      |  4.5997|  0.0001|   2063|      0|      0|      0|    593|     82|
|>Generate PDB Info                  |  4.6022|  0.0000|   2063|      0|      0|      0|    593|     82|
|Write .NET Binary                   |  4.6047|  0.6474|   2063|      1|      1|      0|    593|     82|
--------------------------------------------------------------------------------------------------------

Typecheck went from 3.6906 to 1.4423 (60% faster).
Total compilation went from 7.3383 to 5.2521 (28% faster).

5 replies

vzarytovskii Dec 26, 2023
Maintainer

That's nice numbers. @nojaf what do you have in mind for having it as a feature of compiler?
My only fear is that we will start encouraging people to type everything explicitly (as I mentioned before, see quote in the OP).

We have been experimenting with caching whole inferred typecheck results in the intermediate build results (like we do with generated parser and lexer in compiler), which can (with effective hashing) be reused between builds and typechecks). It feels like a natural evolution for this feature, without requiring people to type everything explicitly.

Smaug123 Dec 26, 2023

(FWIW I think it's strongly preferable to type functions explicitly anyway, and I'm certainly not alone - just a couple of days ago the creator of Austral said the same thing in https://blog.lambdaclass.com/austral/ , search term "type inference". As evidence in favour, I submit the fact that Ionide by default inserts type annotations, which is a workaround for the fact that the code author did not bother :P personally as a reader I do not enjoy being forced to run a type unification algorithm in my head while I'm reading code, and I'd enable an analyser which forces top-level functions to be typed. I view this issue simply as "here is empirical evidence that the compiler can be adjusted so that this practice helps the compiler; I claim that it also helps the reader, and for the same reason".)

vzarytovskii Dec 27, 2023
Maintainer

(FWIW I think it's strongly preferable to type functions explicitly anyway, and I'm certainly not alone - just a couple of days ago the creator of Austral said the same thing in https://blog.lambdaclass.com/austral/ , search term "type inference". As evidence in favour, I submit the fact that Ionide by default inserts type annotations, which is a workaround for the fact that the code author did not bother :P personally as a reader I do not enjoy being forced to run a type unification algorithm in my head while I'm reading code, and I'd enable an analyser which forces top-level functions to be typed. I view this issue simply as "here is empirical evidence that the compiler can be adjusted so that this practice helps the compiler; I claim that it also helps the reader, and for the same reason".)

I understand this pov, I just don't want users to think that they have to type everything explicitly to have better perf/understanding of the code. I do believe that both should be solvable by tooling, one way or another. After all, if I wanted to write types everywhere, I could've been just writing C# 😆

nojaf Dec 28, 2023
Collaborator Author

what do you have in mind for having it as a feature of compiler?

I can't say I have it really figured out yet, but I see all of this as a natural evolution of --test:GraphBasedChecking. Type-checking can be done a lot more in parallel when signatures are present. Signature files really grew on me this past year, but alas, I haven't made any friends there.
I understand the frustrations people have with them.

There's a group of developers who wouldn't mind more types in top-level functions, much like what we see with signature files. This experiment aims to integrate them with the implementation files. Beyond the potential performance benefits, it could enhance readability, especially outside of an IDE. Consider web-based tools for PR reviews, where you only get the text without any IDE features - it can be tough to grasp what's happening at first glance.

The key takeaway is that some folks might find this appealing even without any additional tooling benefits. For projects that reach a certain level of complexity, adopting this approach could be a worthwhile decision.

We have been experimenting with caching whole inferred typecheck results in the intermediate build results

Thanks for sharing, it's really interesting! Could you provide links to these experiments for more context? It would help strengthen the discussion. I'm still a bit sceptical about the incremental approach, especially if the first compilation remains slow. That being said, perhaps with some form of serialization, things could be jump-started.

vzarytovskii Dec 28, 2023
Maintainer

There's a group of developers who wouldn't mind more types in top-level functions, much like what we see with signature files. This experiment aims to integrate them with the implementation files. Beyond the potential performance benefits, it could enhance readability, especially outside of an IDE. Consider web-based tools for PR reviews, where you only get the text without any IDE features - it can be tough to grasp what's happening at first glance.

Yeah, that's fair, I wish we had better online tooling (such as utilising LSIF and treesitter more, so we had real incentive to double down on them).

We have been experimenting with caching whole inferred typecheck results in the intermediate build results

Thanks for sharing, it's really interesting! Could you provide links to these experiments for more context?

It's not fully ready yet, I will need to tinker with them more. Short summary is - I cache typecheck, parse results in the obj folder of the project, with incremental hashes, which allow to (folly or partially) reuse them between builds/checks.

It would help strengthen the discussion. I'm still a bit sceptical about the incremental approach, especially if the first compilation remains slow. That being said, perhaps with some form of serialization, things could be jump-started.

Personally, I found that it's not really a problem in most of the cases (slower first check).

ScottArbeit · 2023-12-26T20:39:34Z

ScottArbeit
Dec 26, 2023

I'm regularly compiling around 24,000+ lines of F#, across 7 projects, in about 15s (for Grace). It takes that long if I make a change to the "Shared" project the others all rely on; it's much faster if it's one of the "leaf node" projects on the compilation tree.

I always love it when things go faster, like compilation in my favorite language. With that said, if going to the trouble of creating external signature files that I have to keep synced up with source code is only going to save me 2s per compilation (or whatever), I'll never do it.

It would take an awful lot of 2s to add up to the hours I'd spend on keeping those files synced, and the hours I'd spend debugging something when it'll come down to some weird issue involving the files being out of sync. I'm thinking of the pain it can be to generate and sync up OpenAPI specifications and imagining that this is analogous.

I love that this is an area for exploration, and I'm not trying to discourage you, it might lead somewhere awesome. I imagine a future where we have LLM's involved in our compilation steps, and I wouldn't object to using an LLM to generate the signature files automatically - only when it's detected that regenerating would be required - that then improved compilation time. But I'll never create those files by hand, and if there's a way to infer this data and cache it (as suggested above), yay! If not, meh, I won't miss it.

1 reply

charlesroddie Dec 26, 2023

@ScottArbeit the reference to "signature files" in the title is an implementation detail. There is no requirement to keep files synced here and possibly the "files" do not exist as files. The title should be something like "extract signatures from fully-typed implementation files". The example from the linked issue is let f (a: int) : int =. The compiler can then get the signatures without processing the implementation code.

nojaf · 2024-01-08T16:28:54Z

nojaf
Jan 8, 2024
Collaborator Author

I'm still exploring this experiment and the results are a bit mixed.

I've started by changing the compiler in my fork so that only marked files are being processed to extract signatures.
I don't believe it will be practical to have everything typed for every file in the project.
When a file takes > 200ms to type-check, I consider that to be fast enough. Files that take more than 500ms are worth adding types too.

When selecting which files to start typing, I'm looking at the longest path in the graph. (Detected via this script). I found this to be an efficient way to speed up the graph while keeping things pragmatic. It is possible to win 500 ms to 2000 ms in |Typecheck but lose some of that gain in the later phases. I should do some more digging for good numbers.

Addressing type-checking still makes the most sense as it is timewise still the largest factor.
However, I'm not sure this is making a big enough dent in the total compilation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract signature files from implementation file #16450

{{title}}

Replies: 6 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Extract signature files from implementation file #16450

nojaf Dec 18, 2023 Collaborator

Replies: 6 comments · 8 replies

charlesroddie Dec 19, 2023

nojaf Dec 21, 2023 Collaborator Author

charlesroddie Dec 21, 2023

nojaf Dec 21, 2023 Collaborator Author

nojaf Dec 21, 2023 Collaborator Author

nojaf Dec 26, 2023 Collaborator Author

vzarytovskii Dec 26, 2023 Maintainer

Smaug123 Dec 26, 2023

vzarytovskii Dec 27, 2023 Maintainer

nojaf Dec 28, 2023 Collaborator Author

vzarytovskii Dec 28, 2023 Maintainer

ScottArbeit Dec 26, 2023

charlesroddie Dec 26, 2023

nojaf Jan 8, 2024 Collaborator Author

nojaf
Dec 18, 2023
Collaborator

Replies: 6 comments 8 replies

charlesroddie
Dec 19, 2023

nojaf
Dec 21, 2023
Collaborator Author

nojaf Dec 21, 2023
Collaborator Author

nojaf
Dec 21, 2023
Collaborator Author

nojaf
Dec 26, 2023
Collaborator Author

vzarytovskii Dec 26, 2023
Maintainer

vzarytovskii Dec 27, 2023
Maintainer

nojaf Dec 28, 2023
Collaborator Author

vzarytovskii Dec 28, 2023
Maintainer

ScottArbeit
Dec 26, 2023

nojaf
Jan 8, 2024
Collaborator Author