Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

IVRY

duckyo

Created

nvidiapaper

what's the difference between MM-DiT and cross attention conditioning? for example, sd3.5 compared to sana?

prompt

default value is cute cat play with duck