Stable Diffusion XL and Midjourney are both excellent modern AI image generators. While Stable Diffusion has the advantage of being released under a FOSS license, Midjourney has historically been one or two steps ahead in terms of overall output image quality. Since the last big release of Stable Diffusion XL however, the gap appears to have narrowed a lot so I decided to do a brief head to head comparison of the two.
I am picking prompts and results from the Midjourney Community Showcase page. These will have been cherry-picked by their creators to make good use of the strengths of Midjourney. For Stable Diffusion I’m using the latest release of Stable Diffusion XL with the webui setup. I will do my best using the available options to get the best possible results out of Stable Diffusion without any adjustments to the prompt itself. Keep in mind that this may be an uphill battle for stable diffusion and my choice of result may be a matter of personal preference.
Stable Diffusion XL vs Midjourney comparison
Prompt: an illustration of a man curly hair dressed in futuristic clothing, in the style of white and gold, hyper-realistic sci-fi, kingcore, rtx on, delicate gold detailing, detailed world-building, photo-realistic
Both Midjourney and Stable Diffusion XL generated impressive results. Midjourney is maybe a bit more “Sci-Fi”, as requested by the prompt. The last image was generated using CFG Scale 1. A lower CFG scale is supposed to cause the model to interpret the prompt more loosely, causing more creative results. In this instance, the result still adheres quite closely to the prompt, albeit with some changes in perspective.
Prompt: john wick guests stars on the animatrix
For this prompt, Stable Diffusion XL decided to go for higher visual complextiy than Midjourney. This means there is more detail in the end results, but also more glitchiness. Setting CFG Scale to 1 changes the art style completely. People who know about John Wick please tell me if the last picture contains any John Wick references.
Prompt: artefacts, double exposure, beautiful women reflecting on store window, outside, bright morning sun, high contrast, analog, 35mm, Leica
In this very complex and challenging prompt, both models struggle. Reflections and refractions are apparently very difficult to get right. Midjourney appears to ignore parts of the prompt to get a somewhat plausible image. On the other hand, Stable Diffusion XL tries to incorporate all aspects of the prompt and then fails harder. In either case, the reflections are highly implausible.
Prompt: a giraffe parked inside a trailer, in the style of stop-motion animation, vintage-inspired designs, animated gifs, kestutis kasparavicius
For this prompt, Midjourney basically ignored parts of the prompt again. Looking at some of the works of Kestutis Kasparavicius, clearly only Stable Diffusion XL derived inspiration from it. Midjourney didn’t even put a giraffe into the end result. On the other hand, Stable Diffusion XL is more glitchy, especially with CFG Scale 1. That may be caused by adhering to the actual prompt, even though there isn’t a lot of source material to draw from. In my opinion, this is the better approach as opposed to simply ignoring large parts of the prompt.
It’s hard to conclusively say that one model results in significantly better images than the other from this small test. One observation is however, that Stable Diffusion tends to try and interpret the prompt fully even if that causes glitches. Midjourney is (ironically?) more stable, with fewer obvious glitches, but it appears to avoid problematic components of the prompts. That may make Midjourney a bit easier to use, but also less flexible, though more testing is needed. Ultimately, while they both have their strengths, both models are more or less on the same level of quality at this point in time. It will be exciting to observe future developments for both.