Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming of CSR/CSC-related things (LayoutCS and the CS record) #25819

Closed
bradcray opened this issue Aug 26, 2024 · 15 comments · Fixed by #26137
Closed

Naming of CSR/CSC-related things (LayoutCS and the CS record) #25819

bradcray opened this issue Aug 26, 2024 · 15 comments · Fixed by #26137

Comments

@bradcray
Copy link
Member

bradcray commented Aug 26, 2024

In past PRs, we've converted most of our standard user-facing distributions from classes to records, such that:

  • class Block defined in the BlockDist module became record blockDist
  • class Cyclic defined in the CyclicDist module became record cyclicDist

and so on… Notably missing in that effort were the layouts, particularly the CSR/CSC layout used for sparse computation as an alternative to the default COO layout. We simply didn't get to it due to lack of time. In #25812, I'm working on making up for that.

This issue asks what we should name the module and layout record going forward. Sparse is still an unstable feature in Chapel, so technically we don't need to stabilize these names now, but if we get them right now, it'll be fewer changes later as sparse becomes stable.

Currently, the module is named LayoutCS and the class is simply named CS (with arguments, such as compressRows=true or compressRows=false to select between CSR or CSC).

Q1: What should we name the module?

We could retain the current name, but I was noticing that it swaps the order of the Distribution/Layout classification compared to the distributions, so might be more consistent to take this opportunity to change it to CSLayout or CsLayout or CompressedSparseLayout or … This also _may_help with transitioning code between the old and the new, though that shouldn't be a primary consideration.

Q2: What should we name the record?

If we simply changed the current name to a style guide appropriate name, we could just change it to cs. If we followed the approach of the distributions, we could name it csLayout or compressedSparseLayout or layoutCS or whatever the de-capitalized version of the module name is.

Q2a: Or, should we not expose the record name directly, only through aliases?

Users in the field will be most familiar with CSR/CSC as concepts, not some sort of cs(rows=true) initializer like the one we have now. So we could also imagine not worrying about the name of the record and just coming up with names for the aliases to the record, like:

  • 'csr', 'csrLayout', 'CSR' (technically a style guide violation)
  • 'csc', 'cscLayout', 'CSC' (" " ")

[I was originally thinking of this as a nicety to put on top of the record, so not necessary to discuss from the start, but Engin pointed out below that we don't have to tell the users what the record name is, which is a great point].

@bradcray bradcray changed the title Naming of CSR/CSC-related things Naming of CSR/CSC-related things (LayoutCS and the CS record) Aug 26, 2024
@jeremiah-corrado
Copy link
Contributor

My inclination is to make things as symmetric as possible with the naming scheme used for the distributions. So, I'd be in favor of changing to CsLayout for the module name and csLayout for the type name.

Note that I like CSLayout better than CsLayout for the module name and would be okay with making an exception to the standard module style guide for that case (like we did for locale.numPUs, as noted here)

@e-kayrakli
Copy link
Contributor

e-kayrakli commented Aug 26, 2024

I came here to push my own agenda :p In general, I am not a big fan of "CS" or "Compressed Sparse" as standalone terms in the interface because I don't see them used like that elsewhere. Not that there are many languages with layouts, but still.

Q1: What should we name the module?

My ideal choice is to have SparseLayouts. We have distribution-specific modules, and my suggestion diverges from that. That being said, layouts are not technically limited to sparse ones (a col-major layout would be a non-sparse alternative). Moreover, there could be a lot of similarities between sparse layouts which could make implementation easier if they are on the same module. Finally, I can imagine different sparse layouts being used in conjunction with one another, whereas the same story is not that common for distributions.

This avoids using "CS" etc in the name as I mentioned above.

But if this is more than intended at the moment, then, I am for CsLayout > CompressedSparseLayout (EDIT I like CompressedSparseLayout better now. #25819 (comment)) . CSLayout is wrong per our style guide. We renamed GPUDiagnostics to GpuDiagnostics, for example. (Typed this part before seeing @jeremiah-corrado's comment)

Q2: What should we name the record?

I want to hide the record behind type aliases if possible. So, I would choose any name for the record. Then would have

type csrLayout = something(compressedRow=true, ?);  // I can't recall if there are other fields
type cscLayout = something(compressedRow=false, ?); // ? may be unnecessary

as the user-facing interface for types.

As implied by my name choices above; if this is too big of a change at this moment, I would go for csLayout.

@bradcray
Copy link
Member Author

Note that I like CSLayout better than CsLayout for the module name and would be okay with making an exception to the standard module style guide for that case

I tend to agree. I've been wondering whether the standard module style guide should simply be updated to permit an identifier that starts with an acronym to use all-caps for that acronym and the next word so that this wouldn't be an exception. Basically extend the "is the entire symbol name" exception to "is the first part, or entire, symbol name" (similarly, it seems like the numPUs exception could be made non-exceptional about "if the natural spelling of the 'word' contains a mix of upper- and lower-case letters, that mix can be preserved").

@bradcray
Copy link
Member Author

My ideal choice is to have SparseLayouts.

Does that suggest that all future (standard) sparse layouts would need to be added to this module? (e.g., say I wanted to add a tridiagonal/k-diagonal layout of sparse layout that stored dense 2D blocks). I worry that would be pretty non-scalable in terms of code maintenance and compile times.

there could be a lot of similarities between sparse layouts which could make implementation easier if they are on the same module.

The similar code could also be refactored into a helper module without the user being any the wiser though (and as you probably know, this already happens with our COO and CSR/CSC sparse layouts… though in some cases I worry it goes a little too far).

I want to hide the record behind type aliases if possible.

Yeah, I want to do that as well (though I was thinking about going full to csr/csc), though that felt like it was orthogonal to the critical questions here, so I didn't bring it up. But I see that your point is that it's not critical if we choose not to tell teh user about the record type. Let me edit the OP to indicate that as a possibility now that you've made it one for me.

@e-kayrakli
Copy link
Contributor

I've been wondering whether the standard module style guide should simply be updated to permit an identifier that starts with an acronym to use all-caps for that acronym and the next word so that this wouldn't be an exception. Basically extend the "is the entire symbol name" exception to "is the first part, or entire, symbol name" (similarly, it seems like the numPUs exception could be made non-exceptional about "if the natural spelling of the 'word' contains a mix of upper- and lower-case letters, that mix can be preserved").

I was involved in the exact same design discussion and voted alongside what you're thinking. However, majority was very clearly in the other direction. In my recollection, CSR/CSC didn't came up in that discussion heavily, but HTML was used as a similar example. Argument for not adopting the style you refer to was when you have something like CSRLayout, it is not very clear how many words/abbreviations there are and where they begin as CSRL could be seen as jumble of characters. I tend to agree that that's a good argument, and come to like the decision we made in that discussion. I believe there was relatively equal amount of weight on both sides of the design spectrum when it comes to other languages' style guides. So, where we are is not really bizarre at all. With that background, I am for sticking to the current style guide. IOW, I prefer CsLayout over CSLayout.

Does that suggest that all future (standard) sparse layouts would need to be added to this module? (e.g., say I wanted to add a tridiagonal/k-diagonal layout of sparse layout that stored dense 2D blocks). I worry that would be pretty non-scalable in terms of code maintenance and compile times.

I can see that it could be problematic maintenance-wise, but I don't think we'll add that many standard sparse layouts ever. Those could come in package/mason modules in my view. I was mostly thinking about BCSR/BCSC versions of these layouts.

I am not too sure about the impact on compilation times, which would be more worrisome. It feels like most of the unnecessary stuff will be dropped by scope resolution. I am not very confident in this assessment.

The similar code could also be refactored into a helper module without the user being any the wiser though (and as you probably know, this already happens with our COO and CSR/CSC sparse layouts… though in some cases I worry it goes a little too far).

That is true. Without delving too much into specifics, I agree that this is mostly an implementation detail rather than an interface design.

@e-kayrakli
Copy link
Contributor

Thinking about it some more -- I am liking CompressedSparseLayout more and more, if do not go with SparseLayout. It is obvious now, but they are almost the same thing while the former can cover for BCSR etc cases as I'd like while leaving Brad's tridiagonal layouts in their own modules. Implying much less worrisome code scalability challenges.

I like the name spelled out as CsLayout is not at all clear as per my statement about "cs" not being used as standalone terms anywhere.

@ShreyasKhandekar
Copy link
Contributor

My initial thoughts here align with what Engin suggested. I liked SparseLayout the most since it is smaller than CompressedSparseLayout but I also think the latter is more descriptive of what kind of layouts we have in the module.

Now, I also think that SparseLayout might be too broad of a name because it might suggest that BSR/DIAgonal/DictionaryOfKeys and all the other sparse layouts also belong in this module. I would restrict the module to just CSC/CSR and add other implementations elsewhere if we want to. But that's implementation details, not relevant.

Moreover, cs(rows=true) seems a roundabout way making a CSR layout (especially because CSR is a common term used to describe this) and that's even more true for cs(rows=false) for CSC.

I would like to go with the suggestion in 2a in the OP to not expose the name of the CS record and only use type aliases which mention the words csr/csc.

@jeremiah-corrado
Copy link
Contributor

I would like to go with the suggestion in 2a in the OP

I'd also be in favor of this option.

@bradcray
Copy link
Member Author

bradcray commented Sep 4, 2024

@jeremiah-corrado , @e-kayrakli , @ShreyasKhandekar : Belatedly, thanks for the quick responses here, and what feels like a potential consensus path forward. Here's what I'm currently implementing, where I'm looking for feedback along the lines of:

  • Could we make the following change / adjustment: …
  • This looks good for 2.2!
  • I don't think we should live with this question for more time before we proceed (i.e., don't make any changes here for 2.2)
  1. I've added a new CompressedSparseLayout module to replace the LayoutCS module

  2. It contains a no-doc'd record csLayout that is the closest equivalent to the old CS class and used to implement the user-facing layouts. I originally named this compressedSparseLayout, but that name got old fast and since it isn't user-facing, I took the shorter name

  3. It also contains type aliases, csrLayout and cscLayout which are implemented in terms of csLayout.

  4. The LayoutCS module is retained, but generates a deprecation message when used:
    'LayoutCS' and the 'CS' layout are deprecated; please use 'CompressedSparseLayout' and 'csrLayout' or 'cscLayout' instead

  5. The CS class is retained, but generates a deprecation message when used:
    'CS' is deprecated, please use 'CompressedSparseLayout.[csrLayout|cscLayout]' instead

I've got enough testing passing with this new approach that I feel pretty confident that I could plumb it through all remaining tests; but if we were to rename any of the new things, that'd be good to settle before doing that full conversion.

Thanks!

@jeremiah-corrado
Copy link
Contributor

IMO, this looks good for 2.2!

@e-kayrakli
Copy link
Contributor

This looks good for 2.2!

@ShreyasKhandekar
Copy link
Contributor

Just so I understand the user facing side of this change properly and to also make sure we see it to make sure it looks right. Making this change would have users writing code that looks like the following:

use CompressedSparseLayout;
var parentDom = {1..100,1..100};
var cscSparseDom: sparse subdomain(parentDom) dmapped new cscLayout(somethingGoesHere?); 
var csrSparseDom: sparse subdomain(parentDom) dmapped new csrLayout(somethingGoesHere?); 

OR, do we still have to use the dmap type with these layouts?

If it's the first, I'm for making these changes for 2.2!

@bradcray
Copy link
Member Author

bradcray commented Sep 4, 2024

@ShreyasKhandekar : You're correct. Ultimately, dmapped will be replaced with… whatever we decided to replace it with (I can never recall), but this will be the syntactic pattern. No arguments are required where you have somethingGoesHere?, but there is an optional argument that lets you specify whether you want to keep the indices in a given row/col sorted or not (some CSR/CSC implementations do, others don't).

@bradcray
Copy link
Member Author

Though I didn't land the CSR/CSC renaming and recordization effort in time for 2.2 (so close…), I did manage to wrap it up now that release notes are out and so am now looking for a reviewer for #26137 where I think someone from the crew on this issue (@jeremiah-corrado , @e-kayrakli , @ShreyasKhandekar ) would be best-suited to review it.

Note that the size of the patch is large, but that much of the code in CompressedSparseLayout.chpl is a properly indented clone of LayoutCS.chpl and doesn't need to be (re-)reviewed at this point. I suspect being a little clever with diff commands would permit you to focus on the relevant parts (which should mostly be the record types).

Size: 4/7 if we take the code clone into account, more like 6/7 otherwise
Complexity: 3/7
Pride: 5/7

@ShreyasKhandekar
Copy link
Contributor

I can take a look soon, unless somebody beats me to it :)

bradcray added a commit that referenced this issue Oct 31, 2024
…nd rename `LayoutCS` module (#26137)

[reviewed by @ShreyasKhandekar ]

As part of our ongoing conversion of domain maps from classes to
records, this PR changes the `CS` class that has been used to represent
sparse CSR/CSC domains and arrays into a pair of records, `csrLayout`
and `cscLayout`. Simultaneously, it renames the `LayoutCS.chpl` module
to `CompressedSparseLayout.chpl` to (a) move `Layout` from a prefix to a
suffix, similar to the other domain maps and (b) make the name a bit
more self-descriptive (since 'CS' isn't generally recognized as being
the prefix of CSR/CSC). This approach also made it easy to generate
reasonable deprecation warnings for the old approach by essentially
starting from a code clone of the LayoutCS module and attaching
`@deprecated` attributes to its module and class declarations.

Despite these improvements, I've left the new module and record type as
`@unstable` since sparse as a whole is unstable and the routines and
identifiers used here haven't been reviewed in much detail (though the
record names were OK'd by an ad hoc design subteam who wrestled with the
question).

With this change, I believe `DefaultDist()` is the only class-based
domain map remaining and the only thing standing in the way of removing
the `dmap` type (?).

Most of the diff here is updating module and test code from `dmapped new
dmap(new CS())` to one of `dmapped new csrLayout()` or `dmapped new
cscLayout()`, preserving the `sortedIndices` argument when used.

For the implementation of the record itself, I took a similar approach
to the records defining blockDist, cyclicDist, etc. by preserving the
old classes and simply wrapping them in the record, combined with using
a few helper records—one designed as a means of providing common
capabilities between the csrLayout/cscLayout record types and,
hopefully, `DefaultDist()` once it's converted to a record; the other
(`csLayout`) providing common code between csr/cscLayout since they're
basically just transposes of one another.

As a result, CompressedSparseLayout.chpl is essentially:
* a code clone of LayoutCS.chpl
* promoted the module from a `prototype` to a true module by adding
`throws` decorators to the `dsiSerialWrite()` calls
* added better documentation for the module relative to what we had
before
* fixed the module's indentation
* renamed `LayoutCSDefaultToSorted` to `csLayoutSortByDefault` to better
math the module name
* updated the `isCSType()` query to work with the new records
* changed the `CS` class into a `CSImpl` class that the records wrap
* removed its default for `compressRows` in the process because it's
unnecessary and could just mask mistakes
* added a `chpl_layoutHelper` record that is similar to the
`chpl_distHelp` record used for the Block, Cyclic, etc. conversions,
whose goal is to factor things that used to be inherited from `BaseDist`
into a common record that could be used by multiple types; currently
it's only this one, but maybe `DefaultDist` will be able to make use of
it as well? Or maybe it's overkill / putting the cart before the horse.
* added the `csLayout` record, which simply defines initializers and
`==` / `!=` operators
* added the `csrLayout` / `cscLayout` records, which simply define
initializers
* added a `dsiGetDist()` call to convert a class back into a
distribution record value

Taking stock of other changes required:

* modules/layouts/LayoutCS.chpl: marked the module and `CS` class as
deprecated
* modules/Makefile: added new entry to ensure that the new module is
documented
* dists/BlockDist.chpl, dists/SparseBlockDist.chpl,
packages/LinearAlgebra.chpl: updated to use the new module and type
names
* test/*.chpl:  For all tests, I essentially converted:
  * uses/imports of `LayoutCS` into `CompressedSparseLayout`
* `dmap(new CS())` and `dmap(new CS(compressRows=true))` into
`csrLayout()`
  * `dmap(new CS(compressRows=false))` into `cscLayout()`
* in cases like the above that passed `sortIndices`, I preserved those
arguments
  * references to `unmanaged CS` into `csrLayout`
* cases that chose between the two using a param variable into a
conditional that chooses between the types
* cases that have to support both `cs*Layout` and `DefaultDist` to wrap
the latter in a dmap+unmanaged and the former not
  * cases that extended the `CS` class to extend `CSImpl` instead
* test/*.compopts: For cases that passed in `CS` as a `config type`, I
changed to `csrLayout`

In terms of new tests, I:
* added a test of the deprecation messages in
`test/deprecated/testLayoutCS.chpl`
* added a test to make sure the new types worked as expected w.r.t.
using parens or not in `test/sparse/CS/cscTypes.chpl` and
`test/sparse/CS/csrTypes.chpl` (where I thought that this wasn't working
at some point in the past during this effort)

And I added some futures for things that surprised me or seemed wrong as
I was developing all of the above (some of which were resolved before
this was even merged):
* #26142: `test/types/records/generic/dispatchWithinParamLoop.chpl`
demonstrates that resolution within a param loop over a bool range is
not working as expected (and
`test/types/records/generic/dispatchWithinIntParamLoop.chpl` as a
workaround that works as expected)
* #26140
`test/types/records/generic/typeAliasFullyDefaultedGeneric.chpl`: shows
that declaring a type alias of two complete types is treated as a
generic type, incorrectly I would argue
* #26141:
`test/types/records/generic/typeAliasFullyDefaultedGeneric2.chpl`: shows
that printing out a generic type doesn't seem to give an intuitive
result

Resolves #25819
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
@lydia-duncan @e-kayrakli @bradcray @ShreyasKhandekar @jeremiah-corrado and others