Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has anyone successfully built on Windows 10? #44

Open
wrench1997 opened this issue Jul 30, 2024 · 19 comments
Open

Has anyone successfully built on Windows 10? #44

wrench1997 opened this issue Jul 30, 2024 · 19 comments

Comments

@wrench1997
Copy link

wrench1997 commented Jul 30, 2024

I have been trying for a few days, replacing cuda12.1, cudnn, and building ninja from scratch. Win10 still reports an error. The compatibility with win is too poor

@wrench1997
Copy link
Author

wrench1997 commented Jul 30, 2024

I seem to have succeeded, but there are many problems. The core lies in the build.ninja parameters.
`
ninja_required_version = 1.3
cxx = cl
nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc

cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__
post_cflags =
cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\TH -IC:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\ProgramData\Anaconda3\envs\py310torch\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__
cuda_post_cflags =
cuda_dlink_post_cflags =
ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@yahxz torch.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\ProgramData\Anaconda3\envs\py310torch\libs "/LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib

rule compile
command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags
deps = msvc

rule cuda_compile
depfile = $out.d
deps = msvc
command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags

rule link
command = "link.exe" $in /nologo $ldflags /out:$out

build slstm.o: compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc
build slstm_forward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu
build slstm_backward.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu
build slstm_backward_cut.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu
build slstm_pointwise.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu
build blas.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu
build cuda_error.cuda.o: cuda_compile C$:\ProgramData\Anaconda3\envs\py310torch\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu

build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o

default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd

`

@Adapter525
Copy link

Have you successfully built on Windows 10你成功了吗

@wrench1997
Copy link
Author

wrench1997 commented Aug 2, 2024

Have you successfully built on Windows 10你成功了吗

of course
6DFB7DD3A813A0675FB13C845295ECAE
80C1A757BEB335E948F411F3FF93FD06
AF07C191C0C8CBB038ECAAB24A341B62

4C1699BFA24B9D256992C5932A386BBD

@vanclouds7
Copy link

I still got a problem even though I make every change like you posted. Do you mind taking a look?

'build.ninja'
ninja_required_version = 1.3
cxx = cl
nvcc = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc

cflags = -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATOR

S -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__
post_cflags =
cuda_cflags = -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Anaconda\envs\xlstm\Lib\site-packages\torch\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -ID:\Anaconda\envs\xlstm\include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=128 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=4 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__
cuda_post_cflags =
cuda_dlink_post_cflags =
ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@yahxz torch.lib /LIBPATH:D:\Anaconda\envs\xlstm\Lib\site-packages\torch\lib /LIBPATH:D:\Anaconda\envs\xlstm\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib" cublas.lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64" cudart.lib

rule compile
command = cl /showIncludes $cflags -c $in /Fo$out $post_cflags
deps = msvc

rule cuda_compile
depfile = $out.d
deps = msvc
command = $nvcc --generate-dependencies-with-compile --dependency-output $out.d $cuda_cflags -c $in -o $out $cuda_post_cflags

rule link
command = "link.exe" $in /nologo $ldflags /out:$out

build slstm.o: compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm.cc
build slstm_forward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\site-packages\xlstm\blocks\slstm\src\cuda\slstm_forward.cu
build slstm_backward.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward.cu
build slstm_backward_cut.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu
build slstm_pointwise.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu
build blas.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\blas.cu
build cuda_error.cuda.o: cuda_compile D$:\Anaconda3\envs\xlstm\lib\site-packages\xlstm\blocks\slstm\src\util\cuda_error.cu

build slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd: link slstm.o slstm_forward.cuda.o slstm_backward.cuda.o slstm_backward_cut.cuda.o slstm_pointwise.cuda.o blas.cuda.o cuda_error.cuda.o

default slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0.pyd

微信截图_20240804020012
微信截图_20240804020137
微信截图_20240804020332

@wrench1997
Copy link
Author

wrench1997 commented Aug 3, 2024

@vanclouds7 Please ensure that ninja, cuda and cudnn are installed, And add include files and dynamic libraries:
INCLUDE

D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include
C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\shared
C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt
C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um
C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\winrt
LIB

D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\lib\x64
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x64
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\ucrt\x64
C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib

You can use 'ninja', '-v' to get more verbose output

@wrench1997
Copy link
Author

wrench1997 commented Aug 3, 2024

@vanclouds7 I forgot one
QQ20240804-022851
delete "extra_ldflags": [f"-L{os.environ['CUDA_LIB']}", "-lcublas"],
This is linux Syntax format.

@vanclouds7
Copy link

@wrench1997
I believe I've done everything you asked, but there's still an error.
微信截图_20240804223839
微信截图_20240804225925
微信截图_20240804225936

@wrench1997
Copy link
Author

@vanclouds7 Go to the sltsm ninja.build directory to view the error output. In addition, I saw that you did not specify your cuda version in the python variable.

@gutaihai
Copy link

gutaihai commented Aug 6, 2024

@vanclouds7
Edit the torch.utils.cpp_extension.py, in function _write_ninja_file(), add line in the end:
_maybe_write('build.ninja', content), like:

image

Now th code will generate build.ninja to your workspace root. You can run command ninja -v to see what's wrong with the code.
In my case, i solved the problem by editing the xlstm\blocks\slstm\src\cuda_init.py, load():
(torch dir)/include was missing while including Aten/aTen.h
(torch dir)/include/torch/csrc/api/include was missing while including torch/all.h

        #TORCH_HOME = os.path.abspath(torch.__file__).replace('\__init__.py','')
        # edit: add to `extra_cflags`
        f"-I{TORCH_HOME}/include",
        f"-I{TORCH_HOME}/include/torch/csrc/api/include",

(torch dir)/lib was missing while searching for c10.lib
-Xptxas -3 will raise error, removed from extra_cuda_cflags in myargs
As @wrench1997 said, f"-L{os.environ['CUDA_LIB']}", "-lcublas" don't work on windows, replace it with
f"/LIBPATH:{CUDA_HOME}/lib/x64", "cublas.lib"

        #CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
        # edit: add to `extra_ldflags`
        f"/LIBPATH:{TORCH_HOME}/lib",
        f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib",

Lastly, run pip uninstall xlstm to remove the xlstm package in your env, make sure the edited file wound work.

@Adapter525
Copy link

could you share your code and environment wityh me ? thank you a lot!

@Adapter525
Copy link

could you share your code and environment wityh me ? thank you a lot!

@Adapter525
Copy link

你好 同学 我能不能加一下你的微信 Liz18326042653 .非常感谢

@gutaihai
Copy link

gutaihai commented Aug 6, 2024

@Adapter525 我的方案上传了,你可以试试
EN:I just uploaded my solution, hope it works for you

@kristinaste
Copy link

And add include files and dynamic libraries:

Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails

@wrench1997
Copy link
Author

And add include files and dynamic libraries:

Could you please indicate where to include this files and libraries? OI feel like I am following all the instructions, but the build still fails

Hello, do you have any questions? Can you send me an error message?

@kristinaste
Copy link

error_log.txt
imagen
imagen

Hello, do you have any questions? Can you send me an error message?

Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.

@gutaihai
Copy link

gutaihai commented Sep 4, 2024

@kristinaste
I just tried xlstm1.0.5 on windows11, the compatibility was improved. Just two steps make it work:
Step 1: disable the line "-Xptxas -O3" in "extra_cuda_cflags";
Step 2: replace the content of "extra_ldflags" with f"/LIBPATH:{CUDA_HOME}/lib/x64","cublas.lib", CUDA_HOME got by code CUDA_HOME=os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH').
pic
Good luck!

1 similar comment
@gutaihai

This comment was marked as outdated.

@wrench1997
Copy link
Author

error_log.txt imagen imagen

Hello, do you have any questions? Can you send me an error message?

Hi, yes, I attached the error log to the message and also build.ninja file and cuda_init.py file. The .ninja_log is not very informative.

Here are my relevant changes, Make sure both link.exe and cl.exe can run directly.
build_copy_ninjia.txt
cpp_extension.txt line 1876

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants