Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
nvidiapaper

what's the difference between MM-DiT and cross attention conditioning?  for example, sd3.5 compared to sana?

0

default value is cute cat play with duck