Create realistic images from text, combining concepts and styles.

DALL-E is a 12-billion-parameter version of GPT-3 that has been trained to generate images from textual descriptions, utilizing a set of text-image pairs. It has been found to possess a diverse range of abilities, including creating anthropomorphized versions of animals and objects, plausibly combining unrelated concepts, rendering text, and applying transformations to existing images.

Overview of a prompt in DALL-E

Similar to GPT-3, DALL-E is a transformer language model. It processes both text and images in the form of a single data stream containing up to 1280 tokens and is trained to use maximum likelihood to generate all tokens sequentially. This learning procedure enables DALL-E not only to create an image from scratch but also to regenerate any rectangular region of an existing image that extends to the bottom-right corner, in a manner consistent with the textual prompt.

Image generation by DALL-E

DALL-E has the capability to produce plausible images for a wide variety of phrases that explore the compositional structure of language. The samples displayed for each caption in the visuals are obtained by selecting the top 32 out of 512 after re-ranking with CLIP.

DALL-E Details

5.0 Overall Rating
Recently Added
Design and launch autonomous GPT robots and let your Intelligent Alter Ego take care of the rest
Alternative to AutoGPT, the standalone version of ChatGPT
Web user interface for AutoGPT
Open source experimental attempt to make GPT4 fully autonomous
Exploring the power of AutoGPT generative agents
An extensive library of AI tools for content creation and authoring
The first platform to combine GPT3, Stable Diffusion and unique facial animation technology
Turn your text into video on over 100 AI avatars covering different ethnicities, styles and accents