The news cycles have been lit up with AI image generators, and Stable Diffusion is in solid company. But, much like any of this AI technology, there is a big question of what exactly it is. Stable Diffusion bears similarities to famous software like Midjourney and DALL-E, but there are some core differences as a whole when looking a bit deeper.
Stable Diffusion is the odd one out of most conversations when looking at big-name AI art applications. It doesn’t quite have the same cultural appeal as the popular Midjourney, but can get quite close. DALL-E caught an early windfall thanks to a culture of memes online through the former Twitter and Instagram and has enjoyed continued popularity.
Today’s guide will be casting a closer look at Stable Diffusion as a whole and how it stacks against two of the most popular AI image generators online.
What Is Stable Diffusion?
Stable Diffusion is a deep learning, AI image generator that saw a public release in 2022. Users can use the software for a variety of image creation and manipulation tasks, like AI upscaling, and whole new images made from scratch.
The software originally comes from the Ludwig Maximillian University of Munich and has received a round of donations in both monetary and computer resources. Stable Diffusion is decidedly different from other AI image generators, at least to an extent. You’ll find with many AI image generators, you’re relegated to using cloud resources.
However, Stable Diffusion can be locally installed and run on any PC with the compatible CUDA cores needed. You’ll want at least 4 GB of VRAM, but this places Stable Diffusion in the hands of many users provided they have the hard drive space and compatible hardware.
How It Works
If you have used Midjourney or DALL-E, the software functions in a similar manner. You’ll be faced with the txt2img prompt upon entering a cloud instance or starting the software on your PC. From there, you can enter a prompt and get a generated image that tries to match the parameters of the prompt itself.
Looking further, Stable Diffusion uses a diffusion model developed explicitly for the software. The model itself was trained with over five billion images from the LAION-5B data set. This is a publicly available data set anyone can use to train their AI models, and relies on billions of image-text pairs scraped from the web.
Stable Diffusion vs. DALL-E
The first real comparison to make is against DALL-E. DALL-E serves as the image-driven counterpart to the ever-popular ChatGPT. They are both administered and developed by OpenAI, serving as sister products utilizing the same general framework.
DALL-E Mini made the rounds just a few years ago as part of a social media craze. This led to a widespread interest in AI image generators and their applications. Now, DALL-E Mini is little more than a toy. The release of DALL-E 2 has proved to be somewhat successful, although not to the same extent as ChatGPT.
Methods of Image Generation
As we discussed previously, Stable Diffusion uses a diffusion model for training purposes and the creation of images. DALL-E 2 uses the same generative pre-trained transformer, or GPT, technology as its sibling in ChatGPT. However, instead of working with written text, the model itself was trained on billions of text-image pairs.
The GPT model at the heart of DALL-E 2 is further extended by the usage of a diffusion model. This brings the capabilities of image generation more in line with what you’ll see in Stable Diffusion. However, you’re only going to run DALL-E 2 as an online service. There isn’t an option for local installation. They still similarly generate images, even if the core functionality is different.
Stable Diffusion’s premium model is provided by Clipdrop and can be had for around $9 a month. You can circumvent this entirely by installing the software locally on your computer. For users with fairly powerful PCs, this is just a net savings aside from the cost of electricity. Mac users or users without a dedicated GPU will likely be just fine with the premium sub, however.
The premium sub brings unlimited image generation, faster generation, and selectable quality modes. However, you can get those same features through local installation. So, the service itself is free, provided you have the right computer for it.
Quality of Image Generation
To test the image generation capability of both generators, the same prompt was used. Then the overall results were analyzed to determine the results.
For testing purposes, a fairly detailed prompt was utilized, reading as follows:
Pen and ink drawing of a mystical underwater world, with schools of fish, coral reefs, and a mermaid, highly detailed linework, high-contrast, stylized
As you can likely see from the Stable Diffusion image above, there’s a fair amount of detail in the image itself. The mermaid is missing, but there is a decidedly sketchy quality when using the latest SDXL model to generate an image. There is plenty of linework to be seen, but it does err more on the side of abstraction rather than realism. Perhaps drawn images aren’t Stable Diffusion’s strong suit.
DALL-E 2 included the mermaid, but is a little too abstract for most tastes. It did fit within the parameters of the assignment in terms of plenty of linework. However, like Stable Diffusion, the image is missing that certain flair of realism to it. The results could likely be better refined through the use of negative prompts and a more focused generative text to start things.
As it stands, the edge does have to go Stable Diffusion. While it didn’t get on the money for the prompt, it got close enough to make a difference. DALL-E 2’s image generation just isn’t to the same level of detail when compared to the newer SDXL model.
Stable Diffusion vs. Midjourney
Midjourney is the AI image generator to beat, at least in terms of public awareness. Midjourney has enjoyed a fair degree of popularity as the premier AI image generator since its release. You’ll find stunning works of art created with Midjourney, and there are users subscribed to the service who seem to solely specialize in tailoring prompts for the absolute best results.
Methods of Image Generation
The actual model used for image generation by Midjourney isn’t fully disclosed. The image generator entered public use in 2022 and is solely accessible through Discord. you could perhaps argue that most of the artwork generated by Midjourney is similar in scope to the art generated by the likes of DALL-E 2.
It is known that Midjourney is using a diffusion model, much like Stable Diffusion and DALL-E 2. However, the team behind Midjourney is likely using an entirely different data set for training the model. This leads to sharper results, and in some cases, better realism when generating images from a text prompt.
The basic pricing for Midjourney is in line with the other paid image generation services you’ll find online. A basic plan will run you $10 a month and nets 200 generations or so. The standard plan is more robust, offering unlimited image generation and access to the member gallery.
You’ll also find a pro and mega plan, running for $60 and $120 respectively. These are fairly expensive plans and are not recommended for the average user. Both of the high-tier plans are intended for businesses, teams, and other professionals relying on AI image generation as a source of income.
Quality of Image Generation
As with the Stable Diffusion and DALL-E 2 comparison, the same prompts are being used to compare the quality of images from both sources. The prompt used for this image comparison is a little more refined, which is as follows:
1990s home movie footage of iridescent artificial intelligence carnival
First up is Stable Diffusion XL, which seems to have understood the analog film aspect at the very least. There are numerous flaws to point out here. The people depicted in the image have a distinct lack of coherent faces, hands, or any other defining attributes. Clothing styles would indicate this being more in the late 1970s or early 1980s.
That said, this is a fairly impressive attempt that could be further tweaked until something usable comes forward.
The Midjourney image above is more coherent as a whole, but still lacking something. You could readily see this being a more usable AI image generator when it gets down to it. However, the overall pricing structure blunts the edge. You can download additional models and training data for Stable Diffusion for your home PC and go from there.
The same cannot be said for Midjourney, which is a paid service restricted to usage solely through the company’s own Discord server.
So, does Stable Diffusion stack up favorably against popular AI image generators? It certainly does, but you will have to temper your expectations. As something for a proof-of-concept or a mockup, an image generator like Midjourney is going to do wonders. However, the limitations of any artificial intelligence model are going to be seen when it comes to things like anatomy, lettering, and other details real artists wouldn’t neglect.
The image featured at the top of this post is ©Lightspring/Shutterstock.com.