Question about merge_input function - Does it really include different resolutions? #231

pldlgb · 2025-01-08T03:40:52Z

Pyramid-Flow/pyramid_dit/flux_modules/modeling_pyramid_flux.py

Line 242 in a012faa

Sample: From low resolution to high resolution

Hi, I have a question regarding the merge_input function in your code. Specifically, the docstring mentions:

def merge_input(self, sample, encoder_hidden_length, encoder_attention_mask):
    """
        Merge the input video with different resolutions into one sequence
        Sample: From low resolution to high resolution
    """

However, when looking at the implementation, it seems to me that this function might not actually handle different resolutions, but rather incorporates historical frame information. Could you please clarify if this function indeed processes inputs of varying resolutions, or if it only deals with historical conditions from past frames?

Thank you for your time and for providing this project!

feifeiobama · 2025-01-08T05:26:06Z

great observation! it only deals with history conditions of different resolutions. the "input" here refers to transformer input instead of user input.

pldlgb · 2025-01-08T06:03:20Z

My understanding is that the historical conditions here should all have the same resolution. For example, if there are two frames of historical conditions, they should both be 16x24, and there won't be cases where 16x20 is mixed with 32x40.

feifeiobama · 2025-01-08T06:23:17Z

our model compresses earlier history condition to lower resolution to save memory & compute. for further details, please refer to the temporal pyramid part in our paper.

pldlgb · 2025-01-08T06:50:54Z

I understand what you mean, but in each merge_input function, there will only be one resolution of latent as input. It can either be a low-resolution latent or a high-resolution latent, but within the merge_input function, there will only be a single unified resolution, correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about merge_input function - Does it really include different resolutions? #231

Question about merge_input function - Does it really include different resolutions? #231

pldlgb commented Jan 8, 2025 •

edited

Loading

feifeiobama commented Jan 8, 2025

pldlgb commented Jan 8, 2025

feifeiobama commented Jan 8, 2025

pldlgb commented Jan 8, 2025

Question about merge_input function - Does it really include different resolutions? #231

Question about merge_input function - Does it really include different resolutions? #231

Comments

pldlgb commented Jan 8, 2025 • edited Loading

feifeiobama commented Jan 8, 2025

pldlgb commented Jan 8, 2025

feifeiobama commented Jan 8, 2025

pldlgb commented Jan 8, 2025

pldlgb commented Jan 8, 2025 •

edited

Loading