Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use row major when building attributes #307

Merged
merged 8 commits into from
Dec 1, 2024
Merged

Conversation

Pangoraw
Copy link
Collaborator

No description provided.

@mofeing
Copy link
Collaborator

mofeing commented Nov 27, 2024

so discussing with @wsmoses about this in today's meeting, we have reached the conclusion that the column- vs row-major layout is not forced by MLIR but it's dialect specific (e.g. a Julia MLIR dialect could still use column-major layout).

furthermore, I just found that column-major layouts can be represented by affine maps https://mlir.llvm.org/docs/Dialects/Builtin/#affine-map-layout do we know if affine maps can be used in tensors? the only examples I found are with memrefs. also, is StableHLO compatible with them?

I still believe that this PR is useful, but I would suggest to make the conversion from column- to row-major optional. maybe a kwarg that defaults to false?

test/basic.jl Outdated Show resolved Hide resolved
@Pangoraw
Copy link
Collaborator Author

we have reached the conclusion that the column- vs row-major layout is not forced by MLIR but it's dialect specific

I agree but here this is for building shaped builtin attributes through the C API. Row major is needed for the data and shape to be interpreted accordingly. A dialect is then free to reinterpret the attribute as suited within its ops.

but I would suggest to make the conversion from column- to row-major optional. maybe a kwarg that defaults to false?

I don't really see why one would not want to convert to row major here. We can add it later when/if a potential julia dialect needs transposed attributes.

@mofeing
Copy link
Collaborator

mofeing commented Nov 29, 2024

I agree but here this is for building shaped builtin attributes through the C API. Row major is needed for the data and shape to be interpreted accordingly. A dialect is then free to reinterpret the attribute as suited within its ops.

i see your point but I'm not sure of all the consequence. how about we discuss it the next meeting?

@wsmoses
Copy link
Member

wsmoses commented Nov 29, 2024

Yeah okay I can agree with that logic (since then the bultin attribute conversion is nice — which is distinct from the builtin op semantics).

im okay with this PR

@@ -492,6 +492,9 @@ function Base.fill(::Core.Type{Attribute}, value, shape)
return Base.fill(value, shaped_type)
end

to_row_major(x) = permutedims(x, ndims(x):-1:1)
to_row_major(x::AbstractVector) = x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the error logs it looks like this also needs a 0-dim specialization

@mofeing
Copy link
Collaborator

mofeing commented Nov 29, 2024

Yeah okay I can agree with that logic (since then the bultin attribute conversion is nice — which is distinct from the builtin op semantics).

im okay with this PR

do you know if affine maps can be associated to dense array / dense elements attributes?

@wsmoses
Copy link
Member

wsmoses commented Nov 29, 2024

Not attributes directly, but as part of a type

@mofeing
Copy link
Collaborator

mofeing commented Nov 29, 2024

okay, and can be associated to a tensor? the only examples I've seen are with memrefs

@mofeing
Copy link
Collaborator

mofeing commented Nov 29, 2024

(since then the bultin attribute conversion is nice — which is distinct from the builtin op semantics)

yeah, but we pass the values to stablehlo.constant as an attribute. the only change I would like is to make this conversion optional (I don't mind if the default is true or false) so that we can choose if to make the permutedims on the Julia side or on MLIR side.

@wsmoses
Copy link
Member

wsmoses commented Nov 29, 2024

I don’t think tensors do, and also that would still be an issue here since it having a different type means that you couldn’t use a memref where you needed a regular affine map.

still cool to use, but I think this is necessary regardless

@Pangoraw
Copy link
Collaborator Author

Note that the fix here not only affects stablehlo.constant but also other ops which have dense attributes such as here:

padding = Reactant.MLIR.IR.DenseElementsAttribute(
reshape(collect(padding), (num_spatial_dims, 2))
)

If we want to try not having the transpose in julia, it should be possible by doing it at the promotion level:

function promote_to(x)
    cst = stablehlo.constant(reshape(x, :))
    cst = stablehlo.reshape(cst, reverse(size(x)))
    cst = stablehlo.transpose(cst, ndims(x):-1:1)
    return cst
end

@wsmoses
Copy link
Member

wsmoses commented Nov 29, 2024

Note that the fix here not only affects stablehlo.constant but also other ops which have dense attributes such as here:

padding = Reactant.MLIR.IR.DenseElementsAttribute(
reshape(collect(padding), (num_spatial_dims, 2))
)

If we want to try not having the transpose in julia, it should be possible by doing it at the promotion level:

function promote_to(x)
    cst = stablehlo.constant(reshape(x, :))
    cst = stablehlo.reshape(cst, reverse(size(x)))
    cst = stablehlo.transpose(cst, ndims(x):-1:1)
    return cst
end

Yeah but frankly that’s the more intuitive thing anyways

@Pangoraw

This comment was marked as resolved.

Copy link
Collaborator

@mofeing mofeing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then let's merge it like this and revisit in the future if we need it

@@ -109,7 +109,7 @@ function NNlib.conv!(
#! format: on

padding = Reactant.MLIR.IR.DenseElementsAttribute(
reshape(collect(padding), (num_spatial_dims, 2))
reshape(collect(padding), (2, num_spatial_dims))'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't call '/adjoint because it will conjugate complex matrices

Suggested change
reshape(collect(padding), (2, num_spatial_dims))'
transpose(reshape(collect(padding), (2, num_spatial_dims)))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know about that. As you said, it does not apply here but I will be cautious in the future 👍

@@ -163,7 +163,7 @@ function reduce_window(f, x::AnyTracedRArray{T,N}, pdims; init) where {T,N}
end

padding = Reactant.MLIR.IR.DenseElementsAttribute(
reshape([padding..., 0, 0, 0, 0], (N, 2))
reshape([padding..., 0, 0, 0, 0], (2, N))'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reshape([padding..., 0, 0, 0, 0], (2, N))'
transpose(reshape([padding..., 0, 0, 0, 0], (2, N)))

Copy link
Collaborator

@mofeing mofeing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, NNlib.padding just returns a tuple of ints... so it's alright if you call adjoint

@Pangoraw Pangoraw merged commit 5731c0b into EnzymeAD:main Dec 1, 2024
24 of 38 checks passed
@Pangoraw Pangoraw deleted the row-major branch December 1, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants