Comment on page
Draw PEPE takes the beneficial elements of Dall-E 2 and Latent Diffusion, adding new thoughts as well. It uses the CLIP model to act as the text and image encoder, and also forges a diffusion image prior (mapping) between the latent spaces of the CLIPs. From here, a generative model based on multi-modal VAE is created, which jointly optimizes the log-likelihood of the image and text, whilst taking into consideration the cross-modal correlations
We use a transformer with 20 layers, 32 heads, and a hidden size of 2048 to diffuse the latent spaces in order to enhance the visual performance of the model, providing new opportunities for blending images and manipulating images via text.
By using a deep learning encoder-decoder architecture, Draw PEPE is able to transform a simple text description into a visually stunning and accurate rendition of what the user is describing. Additionally, this system enables the user to adjust the parameters of the output image, thus further customizing their masterpiece. To show our commitment and dedication to PEPE, Draw PEPE will only accept prompts that include the name "PEPE". AI Draw PEPE at https://pepe.style/pepe