Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to trace SDXL VAE decoder with a different dimension #34

Open
neo opened this issue Sep 8, 2023 · 6 comments
Open

Unable to trace SDXL VAE decoder with a different dimension #34

neo opened this issue Sep 8, 2023 · 6 comments

Comments

@neo
Copy link

neo commented Sep 8, 2023

In torch-neuronx/inference/hf_pretrained_sdxl_1024_inference.ipynb, I tried to change [1, 4, 128, 128] to [1, 4, 104, 152] and it didn't work; more specifically I was able to trace the unet and post_quant_conv with such shape but not with the decoder.

Here's the error I got:

2023-09-08T21:17:33Z Too many instructions after unroll for function sg0000 !
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File <timed exec>:10

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:323, in trace(func, example_inputs, states, input_output_aliases, compiler_workdir, compiler_args, options)
    320     compiler_workdir = context.name
    322 with context:
--> 323     neff_filename, metaneff, flattener, packer = _trace(
    324         func,
    325         example_inputs,
    326         states,
    327         input_output_aliases,
    328         compiler_workdir,
    329         compiler_args,
    330         options,
    331     )
    332     return create_neuron_model(
    333         neff_filename,
    334         metaneff,
   (...)
    338         input_output_aliases,
    339     )

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:416, in _trace(func, example_inputs, states, input_output_aliases, compiler_workdir, compiler_args, options)
    413     handle.write(hlo.SerializeToString())
    415 # Compile HLO to NEFF
--> 416 neff_filename = hlo_compile(model_dir, compiler_workdir, compiler_args)
    418 metaneff = hlo_metaneff(hlo, input_parameter_names, updated_input_output_aliases)
    420 return neff_filename, metaneff.SerializeToString(), flattener, packer

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:281, in hlo_compile(filename, compiler_workdir, compiler_args)
    274     elif status == -11:
    275         logger.warning(
    276             "The neuronx-cc (neuron compiler) crashed (SEGFAULT). "
    277             "This is likely due to a bug in the compiler.  "
    278             "Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'"
    279         )
--> 281     raise RuntimeError(f"neuronx-cc failed with {status}")
    283 return neff_filename

RuntimeError: neuronx-cc failed with 70
And the text print out before the error:
2023-09-08T21:17:23Z Running DoNothing
2023-09-08T21:17:23Z DoNothing finished after 0.000 seconds
2023-09-08T21:17:23Z Running CanonicalizeIR
2023-09-08T21:17:23Z CanonicalizeIR finished after 0.018 seconds
2023-09-08T21:17:23Z Running ExpandBatchNorm
2023-09-08T21:17:23Z ExpandBatchNorm finished after 0.057 seconds
2023-09-08T21:17:23Z Running ResolveComplicatePredicates
2023-09-08T21:17:23Z ResolveComplicatePredicates finished after 0.017 seconds
2023-09-08T21:17:23Z Running AffinePredicateResolution
2023-09-08T21:17:23Z AffinePredicateResolution finished after 0.019 seconds
2023-09-08T21:17:23Z Running EliminateDivs
2023-09-08T21:17:23Z EliminateDivs finished after 0.018 seconds
2023-09-08T21:17:23Z Running PerfectLoopNest
2023-09-08T21:17:23Z PerfectLoopNest finished after 0.016 seconds
2023-09-08T21:17:23Z Running Simplifier
2023-09-08T21:17:24Z Simplifier finished after 0.223 seconds
2023-09-08T21:17:24Z Running GenericAccessSimplifier
2023-09-08T21:17:24Z GenericAccessSimplifier finished after 0.015 seconds
2023-09-08T21:17:24Z Running TCTransform
2023-09-08T21:17:24Z TCTransform finished after 0.027 seconds
2023-09-08T21:17:24Z Running CommuteConcat
2023-09-08T21:17:24Z CommuteConcat finished after 0.016 seconds
2023-09-08T21:17:24Z Running TensorOpFusion
2023-09-08T21:17:24Z TensorOpFusion finished after 0.018 seconds
2023-09-08T21:17:24Z Running TensorOpTransform
2023-09-08T21:17:24Z TensorOpTransform finished after 0.060 seconds
2023-09-08T21:17:24Z Running LowerTensorOp
2023-09-08T21:17:24Z LowerTensorOp finished after 0.017 seconds
2023-09-08T21:17:24Z Running MemcpyElimination
2023-09-08T21:17:25Z MemcpyElimination finished after 1.058 seconds
2023-09-08T21:17:25Z Running LoopFusion
2023-09-08T21:17:26Z LoopFusion finished after 1.182 seconds
2023-09-08T21:17:26Z Running Simplifier
2023-09-08T21:17:26Z Simplifier finished after 0.112 seconds
2023-09-08T21:17:26Z Running Delinearization
2023-09-08T21:17:26Z Delinearization finished after 0.052 seconds
2023-09-08T21:17:26Z Running DeadStoreElimination
2023-09-08T21:17:28Z DeadStoreElimination finished after 1.288 seconds
2023-09-08T21:17:28Z Running Simplifier
2023-09-08T21:17:28Z Simplifier finished after 0.116 seconds
2023-09-08T21:17:28Z Running LICM
2023-09-08T21:17:28Z LICM finished after 0.064 seconds
2023-09-08T21:17:28Z Running Delinearization
2023-09-08T21:17:28Z Delinearization finished after 0.019 seconds
2023-09-08T21:17:28Z Running LoopFusion
2023-09-08T21:17:28Z LoopFusion finished after 0.224 seconds
2023-09-08T21:17:28Z Running SimplifySlice
2023-09-08T21:17:28Z SimplifySlice finished after 0.007 seconds
2023-09-08T21:17:28Z Running LICM
2023-09-08T21:17:28Z LICM finished after 0.019 seconds
2023-09-08T21:17:28Z Running Simplifier
2023-09-08T21:17:28Z Simplifier finished after 0.114 seconds
2023-09-08T21:17:28Z Running ValueNumbering
2023-09-08T21:17:28Z ValueNumbering finished after 0.036 seconds
2023-09-08T21:17:28Z Running LICM
2023-09-08T21:17:28Z LICM finished after 0.018 seconds
2023-09-08T21:17:28Z Running PadElimination
2023-09-08T21:17:28Z PadElimination finished after 0.001 seconds
2023-09-08T21:17:28Z Running Delinearization
2023-09-08T21:17:28Z Delinearization finished after 0.058 seconds
2023-09-08T21:17:28Z Running LoopFusion
2023-09-08T21:17:29Z LoopFusion finished after 0.218 seconds
2023-09-08T21:17:29Z Running GenericAccessSimplifier
2023-09-08T21:17:29Z GenericAccessSimplifier finished after 0.007 seconds
2023-09-08T21:17:29Z Running Simplifier
2023-09-08T21:17:29Z Simplifier finished after 0.111 seconds
2023-09-08T21:17:29Z Running LICM
2023-09-08T21:17:29Z LICM finished after 0.018 seconds
2023-09-08T21:17:29Z Running ValueNumbering
2023-09-08T21:17:29Z ValueNumbering finished after 0.024 seconds
2023-09-08T21:17:29Z Running TCTransform
2023-09-08T21:17:29Z TCTransform finished after 0.010 seconds
2023-09-08T21:17:29Z Running CommuteConcat
2023-09-08T21:17:29Z CommuteConcat finished after 0.008 seconds
2023-09-08T21:17:29Z Running RecognizeOpIdiom
2023-09-08T21:17:29Z RecognizeOpIdiom finished after 0.047 seconds
2023-09-08T21:17:29Z Running MaskPropagation
2023-09-08T21:17:29Z MaskPropagation finished after 0.023 seconds
2023-09-08T21:17:29Z Running Recompute
2023-09-08T21:17:29Z Recompute finished after 0.001 seconds
2023-09-08T21:17:29Z Running DeadCodeElimination
2023-09-08T21:17:29Z DeadCodeElimination finished after 0.008 seconds
2023-09-08T21:17:29Z Running DoNothing
2023-09-08T21:17:29Z DoNothing finished after 0.000 seconds
2023-09-08T21:17:29Z Running MutateDataType
2023-09-08T21:17:29Z MutateDataType finished after 0.006 seconds
2023-09-08T21:17:29Z Running AutoCastTCInputs
2023-09-08T21:17:29Z AutoCastTCInputs finished after 0.015 seconds
2023-09-08T21:17:29Z Running GenericAccessSimplifier
2023-09-08T21:17:29Z GenericAccessSimplifier finished after 0.009 seconds
2023-09-08T21:17:29Z Running Simplifier
2023-09-08T21:17:29Z Simplifier finished after 0.114 seconds
2023-09-08T21:17:29Z Running LegalizeCCOpLayout
2023-09-08T21:17:29Z LegalizeCCOpLayout finished after 0.008 seconds
2023-09-08T21:17:29Z Running DelinearIndices
2023-09-08T21:17:29Z DelinearIndices finished after 0.018 seconds
2023-09-08T21:17:29Z Running Delinearization
2023-09-08T21:17:29Z Delinearization finished after 0.017 seconds
2023-09-08T21:17:29Z Running DelinearIndices
2023-09-08T21:17:29Z DelinearIndices finished after 0.018 seconds
2023-09-08T21:17:29Z Running DeadCodeElimination
2023-09-08T21:17:29Z DeadCodeElimination finished after 0.008 seconds
2023-09-08T21:17:29Z Running InferIntrinsicOnCC
2023-09-08T21:17:29Z InferIntrinsicOnCC finished after 0.099 seconds
2023-09-08T21:17:29Z Running ResolveAccessConflict
2023-09-08T21:17:29Z ResolveAccessConflict finished after 0.065 seconds
2023-09-08T21:17:29Z Running LICM
2023-09-08T21:17:29Z LICM finished after 0.056 seconds
2023-09-08T21:17:29Z Running LocalLayoutOpt
2023-09-08T21:17:29Z LocalLayoutOpt finished after 0.053 seconds
2023-09-08T21:17:29Z Running DelinearIndices
2023-09-08T21:17:29Z DelinearIndices finished after 0.019 seconds
2023-09-08T21:17:29Z Running OrigLayoutTilingPipeline
2023-09-08T21:17:29Z Running GlobalLayoutOpt
2023-09-08T21:17:31Z GlobalLayoutOpt finished after 1.704 seconds
2023-09-08T21:17:31Z Running CanonicalizeDAG
2023-09-08T21:17:31Z CanonicalizeDAG finished after 0.082 seconds
2023-09-08T21:17:31Z Running FlattenAxesForTiling
2023-09-08T21:17:31Z FlattenAxesForTiling finished after 0.075 seconds
2023-09-08T21:17:31Z Running SundaSizeTiling
2023-09-08T21:17:33Z SundaSizeTiling finished after 1.930 seconds
2023-09-08T21:17:33Z OrigLayoutTilingPipeline finished after 3.809 seconds
2023-09-08T21:17:33Z Running TilingProfiler
2023-09-08T21:17:33Z TilingProfiler finished after 0.094 seconds
2023-09-08T21:17:33Z 
2023-09-08T21:17:33Z Diagnostic information:
2023-09-08T21:17:33Z   NeuronX Compiler version 2.9.0.40+07376825f
2023-09-08T21:17:33Z   
2023-09-08T21:17:33Z   Python version 3.8.10
2023-09-08T21:17:33Z   HWM version 2.9.0.2-f79d59e7b
2023-09-08T21:17:33Z   NumPy version 1.21.6
2023-09-08T21:17:33Z   
2023-09-08T21:17:33Z   Running on AMI ami-0d08bfe808787640a
2023-09-08T21:17:33Z   Running in region use1-az5
2023-09-08T21:17:33Z 
2023-09-08T21:17:33Z Diagnostic logs stored in /home/ubuntu/log-neuron-cc.txt
Lastly the log-neuron-cc.txt:
2023-09-08T21:17:22Z INFO 238269 [root]: /opt/aws_neuron_venv_pytorch/bin/neuronx-cc compile sdxl_compile_dir_832x1216/vae_decoder/model --framework XLA --target trn1 --output sdxl_compile_dir_832x1216/vae_decoder/graph.neff
2023-09-08T21:17:22Z INFO 238334 [root]: TVM/Relay detected
2023-09-08T21:17:22Z INFO 238334 [root]: Pipeline: Frontend HHChecker WalrusDriver BIRLinker Kelper
2023-09-08T21:17:22Z INFO 238334 [root]: Intermediate files stored in /home/ubuntu/neuronxcc-5l2tcm31, output in /home/ubuntu
2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1
2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Processing input #0
2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Running pipeline Pipeline.0
2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Starting job job.Frontend.0
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Job Frontend len(in_states) 1
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Processing input #0
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Start model loading
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: IR signature: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 for model
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Executing: /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/sdxl_compile_dir_832x1216/vae_decoder/model --out-dir ./ --output penguin.py --layers-per-module=1 --coalesce-all-gathers=false --coalesce-reduce-scatters=false --coalesce-all-reduces=false --emit-tensor-level-dropout-ops --emit-tensor-level-rng-ops
2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: 
Histogram before graph level optimizations:
total HLO instructions: 1614
           broadcast       452  28.00% ################################################################
             reshape       364  22.55% ###################################################
            constant       294  18.22% #########################################
            multiply       167  10.35% #######################
                 add       113   7.00% ################
           transpose        57   3.53% ########
         convolution        35   2.17% ####
 batch-norm-training        30   1.86% ####
   get-tuple-element        30   1.86% ####
                tanh        29   1.80% ####
              divide        16   0.99% ##
                call        15   0.93% ##
                 dot         6   0.37% 
              reduce         2   0.12% 
         exponential         1   0.06% 
           parameter         1   0.06% 
            subtract         1   0.06% 
               tuple         1   0.06% 


Histogram before graph level optimizations:
total HLO instructions: 1614
           broadcast       452  28.00% ################################################################
             reshape       364  22.55% ###################################################
            constant       294  18.22% #########################################
            multiply       167  10.35% #######################
                 add       113   7.00% ################
           transpose        57   3.53% ########
         convolution        35   2.17% ####
 batch-norm-training        30   1.86% ####
   get-tuple-element        30   1.86% ####
                tanh        29   1.80% ####
              divide        16   0.99% ##
                call        15   0.93% ##
                 dot         6   0.37% 
              reduce         2   0.12% 
         exponential         1   0.06% 
           parameter         1   0.06% 
            subtract         1   0.06% 
               tuple         1   0.06% 

INFO: IoStatistics: total inputs: 1
INFO: IoStatistics: total outputs: 1
INFO: IoStatistics: total passthrough tensors: 0
INFO: IoStatistics: total outputs read from: 0
INFO: IoStatistics: total redundant outputs: 0
Replaced 0 dropout sequences with OffloadedDropout
INFO: HloMacCount has found 5025528358400
INFO: Traffic has found 12393472
INFO: AIF 810996.04

Histogram after graph level optimizations:
total HLO instructions: 758
            constant       143  18.87% ################################################################
            multiply       118  15.57% ####################################################
                 add       113  14.91% ##################################################
           broadcast       110  14.51% #################################################
             reshape        73   9.63% ################################
           transpose        49   6.46% #####################
         convolution        35   4.62% ###############
 batch-norm-training        30   3.96% #############
   get-tuple-element        30   3.96% #############
                tanh        29   3.83% ############
         custom-call        15   1.98% ######
                 dot         6   0.79% ##
              reduce         2   0.26% 
         exponential         1   0.13% 
           parameter         1   0.13% 
              divide         1   0.13% 
            subtract         1   0.13% 
               tuple         1   0.13% 

HLO Ops used in computation: add batch-norm-training broadcast constant convolution custom-call divide dot exponential get-tuple-element multiply parameter reduce reshape subtract tanh transpose tuple 
Invoking RemoveOptimizationBarriers pass
Invoking NeuronInstCombine pass.
Total SqrtMul sequences deleted = 0

2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Start tensorization
2023-09-08T21:17:22Z WARNING 238334 [job.Frontend.0]: TVM not detected.
2023-09-08T21:17:23Z INFO 238334 [job.Frontend.0]: Num parallel jobs: 1
2023-09-08T21:17:23Z INFO 238334 [root/Tensorizer/All]: Enter time region
2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Frontend found a single CU. Switching to flat flow.
2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Building model from Penguin script "penguin.py"...
2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Tensorizer options: --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=matmult-bf16 --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --sunda-batchnorm --enable-tritium-loopfusion --keep-remat-dma-transpose --enable-softmax-kernel
2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Building model from Penguin script "penguin.py"...
2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Successfully built model.
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/DoNothing]: Running DoNothing
2023-09-08T21:17:23Z INFO 238334 [DoNothing]: Finished (changed=True)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR
2023-09-08T21:17:23Z INFO 238334 [CanonicalizeIR]: Finished (changed=True)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.018 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm
2023-09-08T21:17:23Z INFO 238334 [ExpandBatchNorm]: Finished (changed=True)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.057 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates
2023-09-08T21:17:23Z INFO 238334 [ResolveComplicatePredicates]: Finished (changed=False)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.017 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution
2023-09-08T21:17:23Z INFO 238334 [AffinePredicateResolution]: Finished (changed=False)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.019 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs
2023-09-08T21:17:23Z INFO 238334 [EliminateDivs]: Finished (changed=False)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.018 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest
2023-09-08T21:17:23Z INFO 238334 [PerfectLoopNest]: Finished (changed=False)
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.016 seconds
2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:24Z INFO 238334 [Simplifier]: Finished (changed=True)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.223 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2023-09-08T21:17:24Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.015 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TCTransform]: Running TCTransform
2023-09-08T21:17:24Z INFO 238334 [TCTransform]: Finished (changed=True)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.027 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat
2023-09-08T21:17:24Z INFO 238334 [CommuteConcat]: Finished (changed=False)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.016 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpFusion]: Running TensorOpFusion
2023-09-08T21:17:24Z INFO 238334 [TensorOpFusion]: Finished (changed=True)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpFusion]: TensorOpFusion finished after 0.018 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform
2023-09-08T21:17:24Z INFO 238334 [TensorOpTransform]: Finished (changed=True)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.060 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp
2023-09-08T21:17:24Z INFO 238334 [LowerTensorOp]: Finished (changed=True)
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.017 seconds
2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination
2023-09-08T21:17:25Z INFO 238334 [MemcpyElimination]: Finished (changed=True)
2023-09-08T21:17:25Z USER 238334 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 1.058 seconds
2023-09-08T21:17:25Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2023-09-08T21:17:26Z INFO 238334 [LoopFusion]: Finished (changed=True)
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 1.182 seconds
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:26Z INFO 238334 [Simplifier]: Finished (changed=False)
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.112 seconds
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2023-09-08T21:17:26Z INFO 238334 [Delinearization]: Finished (changed=True)
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.052 seconds
2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination
2023-09-08T21:17:28Z INFO 238334 [DeadStoreElimination]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 1.288 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:28Z INFO 238334 [Simplifier]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.116 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM
2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=True)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.064 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2023-09-08T21:17:28Z INFO 238334 [Delinearization]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.019 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2023-09-08T21:17:28Z INFO 238334 [LoopFusion]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.224 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice
2023-09-08T21:17:28Z INFO 238334 [SimplifySlice]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.007 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM
2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.019 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:28Z INFO 238334 [Simplifier]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.114 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering
2023-09-08T21:17:28Z INFO 238334 [ValueNumbering]: Finished (changed=True)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.036 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM
2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.018 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/PadElimination]: Running PadElimination
2023-09-08T21:17:28Z INFO 238334 [PadElimination]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2023-09-08T21:17:28Z INFO 238334 [Delinearization]: Finished (changed=False)
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.058 seconds
2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2023-09-08T21:17:29Z INFO 238334 [LoopFusion]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.218 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2023-09-08T21:17:29Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.007 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:29Z INFO 238334 [Simplifier]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.111 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM
2023-09-08T21:17:29Z INFO 238334 [LICM]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.018 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering
2023-09-08T21:17:29Z INFO 238334 [ValueNumbering]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.024 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/TCTransform]: Running TCTransform
2023-09-08T21:17:29Z INFO 238334 [TCTransform]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.010 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat
2023-09-08T21:17:29Z INFO 238334 [CommuteConcat]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.008 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom
2023-09-08T21:17:29Z INFO 238334 [RecognizeOpIdiom]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.047 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation
2023-09-08T21:17:29Z INFO 238334 [MaskPropagation]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.023 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Recompute]: Running Recompute
2023-09-08T21:17:29Z INFO 238334 [Recompute]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.001 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination
2023-09-08T21:17:29Z INFO 238334 [DeadCodeElimination]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.008 seconds
2023-09-08T21:17:29Z INFO 238334 [Tensorizer]: After optimization: 138 statements
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DoNothing]: Running DoNothing
2023-09-08T21:17:29Z INFO 238334 [DoNothing]: Finished (changed=True)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType
2023-09-08T21:17:29Z INFO 238334 [MutateDataType]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.006 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/AutoCastTCInputs]: Running AutoCastTCInputs
2023-09-08T21:17:29Z INFO 238334 [AutoCastTCInputs]: Finished (changed=True)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/AutoCastTCInputs]: AutoCastTCInputs finished after 0.015 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2023-09-08T21:17:29Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.009 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2023-09-08T21:17:29Z INFO 238334 [Simplifier]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.114 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout
2023-09-08T21:17:29Z INFO 238334 [LegalizeCCOpLayout]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.008 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.018 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2023-09-08T21:17:29Z INFO 238334 [Delinearization]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.017 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.018 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination
2023-09-08T21:17:29Z INFO 238334 [DeadCodeElimination]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.008 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC
2023-09-08T21:17:29Z INFO 238334 [InferIntrinsicOnCC]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.099 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict
2023-09-08T21:17:29Z INFO 238334 [ResolveAccessConflict]: Finished (changed=True)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.065 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM
2023-09-08T21:17:29Z INFO 238334 [LICM]: Finished (changed=True)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.056 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt
2023-09-08T21:17:29Z INFO 238334 [LocalLayoutOpt]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.053 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False)
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.019 seconds
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/OrigLayoutTilingPipeline]: Running OrigLayoutTilingPipeline
2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GlobalLayoutOpt]: Running GlobalLayoutOpt
2023-09-08T21:17:31Z INFO 238334 [GlobalLayoutOpt]: Finished (changed=True)
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/GlobalLayoutOpt]: GlobalLayoutOpt finished after 1.704 seconds
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/CanonicalizeDAG]: Running CanonicalizeDAG
2023-09-08T21:17:31Z INFO 238334 [CanonicalizeDAG]: Finished (changed=True)
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/CanonicalizeDAG]: CanonicalizeDAG finished after 0.082 seconds
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/FlattenAxesForTiling]: Running FlattenAxesForTiling
2023-09-08T21:17:31Z INFO 238334 [FlattenAxesForTiling]: Finished (changed=True)
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/FlattenAxesForTiling]: FlattenAxesForTiling finished after 0.075 seconds
2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/SundaSizeTiling]: Running SundaSizeTiling
@jeffhataws
Copy link
Contributor

Hi @neo , thank you for raising the issue. We are aware of compilation issues with different input shapes and are working to fix them in an upcoming release.

@neo
Copy link
Author

neo commented Sep 11, 2023

Not sure how related it is, but I also tried to do it with stabilityai/stable-diffusion-xl-refiner-1.0 and it consistently breaks at tracing the unet (almost right after starting, before the log txt file is even created), and there was no error message, just that the kernel has died...

@neo
Copy link
Author

neo commented Sep 11, 2023

and while I have you, one semi-related question I had was that, I saw previous SD samples all have the step compiling also the text encoder, but not with this SDXL example – is it considered no longer needed anymore? or is it just not included because we haven't get there yet?

@aws-mvaria
Copy link

Apologies for the late reply on your latest question - it was not included because we hadn't gotten there yet. That said, in an upcoming release, we'll be tracing it in our samples for improved performance.

@neo
Copy link
Author

neo commented Nov 20, 2023

Thank you for the response!

I chatted with one of the neuron team members earlier and got the suggestion to use neuron-optimum from HF which does tracing on every component and has been working quite well for us 😊

However it would still be great to see the samples doing it on a lower level so ppl can learn what's going on under the hood.

@neo
Copy link
Author

neo commented Nov 28, 2023

Can I add that when making the sample for the SDXL text encoder, can we provide examples around doing attention_mask, output_hidden_states and return_dict? Because they're all required and set explicitly by compel: https://github.com/damian0815/compel/blob/v2.0.2/src/compel/embeddings_provider.py#L390-L393

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants