After using an Nvidia Jetson Nano 4GB with LineageOS for a while as my Open Source Android TV box I noticed some issues with HDR streaming at high bitrates. Despite the serverside encoding working fine, the streams sometimes were unstable, breaking off after a few seconds. I decided to try direct streaming with VLC over SMB as well as with Kodi, but both of them had different issues with correctly displaying HDR. I ended up solving the issue using Findroid, an alternative Jellyfin client for Android. Can it replace the Jellyfin App on a TV?
Easy setup
Findroid is available on F-Droid, so the install was unproblematic on my degoogled Android box. On my phone running Graphene OS, I received an error toast on F-Droid, but it did install fine. It is also available on Google Play though, as well as on GitHub. The setup was fairly easy as well, just log in, no extra steps needed. It also supports the Jellyfin “Quick Connect” feature. That means that you won’t have to type the password manually with the on-screen or separately attached keyboard which is nice especially for TV.
Usability issues on TV
With my moderately sized Video collection, browsing performance seems to be similar to the Jellyfin TV app. Opening a collection is fairly quick. There are a few minor visual glitches I noticed though. However the biggest usability issue by far is the remote control when using it on TV: The directional keys often do not lead to the elements in the UI that you would expect. As is not uncommon for material design, the element you are currently highlighting is not always obvious. Clearly Findroid is not optimized for TV use yet, as the developer mentions on the GitHub README. It is however a planned feature, and the app is currently undergoing a rewrite, so there is hope.
With that being said, the interface is structured logically and shows all the information you would really need. It integrates well with Jellyfin’s features, like keeping track of which episodes you have watched. Searching and sorting work fine, and despite this app being optimized for mobile, it visually already works quite well on TV. I ended up using a mouse cursor to mitigate the usability issues, and with that it’s definitely usable. On mobile, with touch input, the usability issues are nonexistent.
Findroid explicitly only supports direct streaming – no transcoding. It somehow also does that better than any other app I have tried. In my case that means that there is a short few seconds of buffering when opening a very large video file, but after that, it streams perfectly, no stuttering or issues with HDR and of course with perfect quality.
Findroid: Conclusion
So is this the better Jellyfin App? Not really, but I’ll still be using it on my TV.
Which one is better depends on the situation. If you have access to a mouse, or a similar solution, Findroid is technically better for local streaming. That is especially true, if you are having issues with HDR for instance. It works even better on phones.
However, when streaming over the internet or a VPN, you’ll probably want transcoding, and that is where you will want to use the official app. If you don’t have any issues with that app or don’t need HDR, it will also work better on TVs in general.
AV1 encoding allows for video files to be stored and streamed at much higher quality for the same file size or at a much smaller file size with the same quality compared to older codecs such as H.264, HEVC, and VP9. A lot has been written about its potential for streaming HD and higher-resolution video, but as a fan of both self-hosting and offline files, I also see use cases for lower resolutions. For example, I am interested in how much less storage I need to take a number of movies offline with me on my phone.
After some experimentation, I have settled on good-quality 480p as the minimum resolution for an enjoyable movie-watching experience on a 6-inch phone display—that is about DVD resolution. The next challenge was to find the optimal parameters for encoding the video files to ensure sufficient quality at the smallest possible file size.
The Experiment
For my experiment, I decided to use my RTX 40-series graphics card’s NVENC encoder at the default “p4,” fast “p1,” and highest quality “p7” presets. This should be representative of what I would do for real-world low-resolution AV1 encoding of movies to take with me. CPU encoding might yield even higher quality, but it would likely be dramatically slower (although I haven’t yet tested the exact difference).
As a test sample, I’m using a snippet of one of my favorite movies, Children of Men. The input file is H.264-encoded, 1080p24 resolution, and the video bitrate hovers between 3 and 5 Mbit/s—so the quality is “fine” but not great. My goal is to achieve comparable subjective quality at a lower resolution and on a smaller screen with the lowest possible bitrate in AV1.
I’m using variations of the following script to crop out a scene from the input file, re-encode it with AV1, and then extract a screenshot at every CQ value. I’m running it once with each preset.
#!/bin/bash
ffmpeg -hwaccel cuda -ss 570 -i "Children of Men.mp4" -t 60 -c:v copy -c:a copy temp_snippet.mp4
for cq in {0..51}; do
ffmpeg -hwaccel cuda -i temp_snippet.mp4 -vf "scale=-2:480" -c:v av1_nvenc -b:v 0 -map 0:v:0 -preset p4 -cq ${cq} output4_cq${cq}.mkv
ffmpeg -hwaccel cuda -i output4_cq${cq}.mkv -ss 00:00:30 -vframes 1 screenshot4_cq${cq}.png
done
rm temp_snippet.mp4
Since encoding quality is subjective and the ideal settings vary based on the content of the video, you should use this as a guideline for your own tests. I’m trying to address the lack of data specifically for low-resolution AV1 encoding. Adjust the variables to your needs and preferences.
The Results
The input file is 2.9GB for 110 minutes of runtime, averaging 26MB per minute of video. 480p has only 14% of the number of pixels per frame compared to 1080p. Thus, I’m excluding all results where the output file of the one-minute clip is larger than 14% of 26MB. That is 3.6MB—anything larger is completely pointless. I will address the p1 and p4 results here; p7 is still fast with NVENC and will yield the best results.
Fastest Preset Results
At preset 1, the highest quality file that just barely stays under this threshold was encoded at CQ37:
Unsurprisingly, it looks basically as good as the source, minus the resolution.
You can save 50% of the original storage at CQ44:
The result is noticeably softer—look at Jasper’s hair. To my taste, this is a bit too soft, even on a small screen.
I found that at p1, I would only want to go down to about CQ39 before quality drops too much:
In motion, this looks close enough to the source to be acceptable to me.
At these settings, you only save about 23% of storage. That is not really enough for me to bother with re-encoding the movies I want to take with me. Even though the encode is extremely fast, I would only do this if I was in a hurry. In that case, I would just scale the source down, which is even faster.
Default Preset Results
At p4 and CQ37, quality looks about the same as the source—unsurprisingly:
Basically perfect sharpness and motion.
What’s much more interesting is the result for CQ44:
Sharpness is way better than CQ44 at p1 and still just about good enough for me to use for low-quality AV1 encoding.
Some blocking and banding become more visible at this bitrate since sharpness is so much better. It doesn’t show up too much in the screenshot but is slightly visible in motion on flat-colored surfaces. One example is Jasper’s black shirt.
Still, at this setting, you save about 50% of storage, which is significant. This looks almost as good to me as CQ39 at p1.
Going any further with the CQ setting results in a rapid loss of quality.
Slowest Preset Results
The p7 results will be the most interesting. In practice, I probably wouldn’t mind encoding at this preset since it’s still fast enough—that is, if it yields even better results than p4.
Going right into CQ44, the result is only slightly sharper than at p4, but the blocking is also reduced. There is a little more detail in the dark shirt.Unfortunately, I found that while sharpness is preserved better at p7 at higher CQ values, the blocking increases quickly. Even at CQ45, it’s visibly stronger.
However! This blocking is only very noticeable in stills and when comparing side by side with lower CQ values. In motion, I found it to be much less of a distraction than a blurry picture.
I found CQ47 to be tolerable in motion. Thus, this is the perfect balance for low-resolution AV1 encoding of movies. Using that as a baseline, I re-encoded the entire movie and tested if it held up in faster-paced scenes on my phone. And it did!
Conclusion: Low-Resolution AV1 Encoding
The resulting movie is almost one-third the file size of the H.264-encoded source (but downscaled). In a direct comparison, it is noticeably, but acceptably, softer. I find the blocking to be basically unnoticeable in motion and on the small screen. The bitrate is a tiny 210 kbit/s on average. I also found that a modern GPU can comfortably encode AV1 at 480p at nearly 1000 FPS. That is fast enough that there is no justification to use anything but p7, even if the difference compared to p4 is fairly modest.
In 2024 it is more evident than ever that we live in a world of noise and distraction, channeled to us through networked smart devices – particularly our smartphones. All it takes is one look at any public or even private space to realize that the default action for a majority of the population in developed countries is to use their phones every time they are not actively engaged with a specific activity. Now, smartphone use can mean a lot of different things. Often times we use our devices for communication or learning. However, in reality, most time on smartphones is spent on ‘entertainment’ and ‘social media’ – and everyone who has ever seen a stranger use their phone knows that that means mindless doom scrolling. Are dumbphones a possible solution?
Share of smartphone time in TaiwanAverage time spent per day using smartphones in ChinaApps in which smartphone users in Brazil spend the most time per day
It’s a natural reaction when looking at these or one’s personal app usage statistics to feel a sense of alarm. Many people report that the time they spent on social media feels wasted or turns into a complete blur. No one remembers the hundreds of TikToks or YouTube Shorts they scroll through every day. Thus, no long or short term value is generated from these activities. This is what has caused new communities and movements to appear and grow since the late phases of the pandemic. They are united by the desire to disconnect and regain control over their time. One of them is the dumbphone movement – a growing group of people who have chosen to replace their smartphones altogether1. Instead they utilize simpler devices with fewer capabilities, often no or very limited app support, bad cameras and small screens.
Understanding the Dumbphone Resurgence
The dumbphone movement is not just about nostalgia for flip phones and early-2000s tech. It’s a deliberate choice to step away from the constant exposure to smartphones, which have become integral to our lives. They are also notorious for fostering habits like doomscrolling, social media addiction, and endless notifications.
The resurgence of dumbphones is driven by a desire for simplicity, intentional living, and a return to a time when our devices served specific, limited functions rather than acting as constant companions. A dumbphone user forces themself to compromise on or forsake many of the functions any smartphone comes with. Taking pictures will require deliberately taking a camera. Music may require a dedicated MP3-player or at the very least a media collection saved on the phone. Instead of “Tap to Pay” they will use cash or physical cards. The goal of disconnection is achieved by brute force. A dumbphone user doesn’t have to deal with hundreds of notifications – there are none other than messages or calls.
Dumbphones often also come with some inherent advantages. Their simple hardware makes them rugged and cheap. Battery life is often better, and their lack of functions may make them easier to use. These phones may also be somewhat more secure and private due to their smaller attack surface, depending on the exact phones you are comparing and their usage. Social media is typically funded by data harvesting which straight up isn’t as much of a concern on a phone that doesn’t support social media apps. However, dumb phones are typically proprietary in their software (as are most smartphones), so the manufacturer may still be employing tracking.
A Better Solution than Dumbphones?
While the appeal of dumbphones is understandable, the movement seems like a bit of a knee-jerk reaction that stems from a lack of technical understanding of smartphones and computers in general. Smartphones are not inherently harmful; they are powerful tools when used mindfully and/or correctly. The issue lies in our habits and the ways in which we allow technology to dominate our lives. It’s unfair to blame people for becoming addicted to these technologies. They were designed by expert psychologists to be as addictive as possible. However, by taking a smart and defensive stance, we can benefit from smartphones while minimizing their risks. Therefore, rather than discarding modern phone technology altogether, a more balanced approach might involve learning to use our smartphones in a way that aligns with minimalist principles or is based on a better understanding of that technology.2 That means first analyzing what you are currently doing on your phone and which part of your phone or phone usage is problematic for you. Next, one should define a specific goal, such as cutting screen time or social media usage. And then, rather than choosing the nuclear option of getting rid of the entire device, one should first try mindfulness or software solutions to achieve one’s goals. For instance, simply uninstalling apps, disabling certain notifications, changing screen settings such as using monochrome filters, or using parental controls or tools like Digital Wellbeing should help most people achieve their goals. If enhanced privacy and security are your goals, you might consider getting rid of or sandboxing Google Play Services. For that consider using GrapheneOS or LineageOS.
My approach
I have personally chosen this philosophy and approach. It has allowed me to minimize my time spent on social media to near zero without missing out on any core smartphone features. GrapheneOS offers significantly more security than stock smartphones and most dumbphones with proprietary firmware.
The Drawbacks of Going Dumb
But why are dumbphones not the best solution for phone addiction and digital overload? If they achieve the main goal, what’s wrong with using them?
Well first of all it’s obvious that one will be missing out on a lot of modern technology in the process. Much of that technology can’t be shrugged off as simply a convenience. One will lose access to an excellent camera right in their pocket. Any media consumption will be compromised, even if it would not fall into the category of mindless consumption. Communication will get harder. Some people choose to resort to unencrypted text messages. Many pick what I would call a compromised dumbphone – a non-smartphone that still comes with WhatsApp & Co..3 These often will also have preinstalled Facebook and web browser apps. Still others choose smartphones with unconventional form factors reminiscent of older phones like flip phones.
These, at least to me, are not true dumbphone users, but they often give similar reasons for using such phones as actual dumbphone users do. They claim that these form factors, with their smaller screens helps them lower their phone usage and feed their nostalgic desire for a more tactile experience. Their attitude is closer to where I would suggest someone go in order to achieve a better digital lifestyle, but the same can be achieved with a regular smartphone as well.4
Compromises
Dumbphone users still rely on many of the same technologies as smartphone users, but they must compensate for their device’s lack of capabilities. That is, at the very least, inconvenient. Instead of managing a single device, they now need to keep multiple devices charged and on hand or risk missing out. Dedicated media players, a camera, a navigation device, or an e-reader are just a few such examples. This complexity can lead to higher costs, as purchasing and maintaining multiple devices can be more expensive than one smartphone.
While some may view having fewer capabilities as an advantage, I see it as a limitation. Having options and choosing not to use them is more flexible than not having those options at all. Additionally, using a smartphone doesn’t prevent someone from also using other dedicated devices. Interestingly, many dumbphone users on platforms like YouTube still own and use smartphones567, suggesting that their chosen solution is more complicated than necessary. Often, their problem could be solved with software adjustments or by selecting a different smartphone that better aligns with their needs.
The Nostalgia Factor behind Dumbphones
Many people apparently also prefer dumbphones for their nostalgic and tactile feel. This appears to be the case, as seen in communities like r/dumbphones. Their lack of capability is sometimes used as a social justification to engage in other nostalgic but irrational consumption. Some such examples: Using a dedicated camera when one normally wouldn’t or using an iPod or Walkman for listening to music. In my opinion, as an adult one should be able to be irrational in a conscious manner. That means it’s fine for you to use a smartphone and a walkman at the same time just because you feel like it or you like fidgeting with physical buttons. Many dumbphone users appear to make their quirky choice of phone a part of their identity.
The Dumbphone Economy
But let’s say you have chosen the dumbphone life: Please do not buy one of these new fangled modern “premium” dumbphones, such as the Punkt MP02. This phone, as well as many others in this category are leaning heavily on digital detoxing and minimalism as part of their marketing. They charge an exorbitant premium with no inherent benefit. They are trying to capitalize on a movement that inherently is at least somewhat anti-consumerism. Brands like these try to monetize and milk this current trend. Instead, I would steer you toward buying any old used phone or a cheap new dumbphone, like the modern Nokias. They cost a fraction of the aforementioned premium options and do exactly the same things. Buying more new and expensive devices seems like it goes against the spirit of the movement to some degree.
Finding the Balance
The resurgence of dumbphones reflects a growing desire for simplicity and intentionality in a world overwhelmed by digital distractions. While these devices offer a straightforward escape from the pitfalls of modern smartphone use, it’s important to recognize that there are more targeted alternative solutions. A more balanced approach that involves mindful usage, setting boundaries, and embracing some of the useful aspects of modern technology is probably a better fit for most people. Either way, recognizing the numerous issues behind smartphone overuse is very important. Therefore, the dumbphone movement is on the right track, even if I disagree with their conclusion.
It’s no secret that Android Smart TVs suck, or rather that their smart features are poorly implemented. They are usually powered by bottom of the barrel hardware. They are stuffed full of proprietary software. And they stop receiving updates and security patches long before the lifetime of the actual TV is over.
That more or less forces a security conscious user to air gap and replace them, if they still wish to use basic smart features on their TV. That’s what I did to my 5 year old Philips OLED and I decided on the Nvidia SHIELD TV Pro, based on positive reviews online. My overall experience has been positive: Performance is good and I’m getting more updates. It also comes with better features like a good remote and AI upscaling.
Alas it’s still liable to be discontinued well before I plan on replacing my actual TV, and before the hardware itself is obsolete. Also the OS is still beholden to the manufacturer and not the owner. Google has started implementing more and more aggressive ads right on the home screen. The entire OS sends telemetry to various vendors. And of course, you can never even know what it’s actually even doing: It runs constantly in a standby mode, the code can’t be audited by the user and it even has a microphone in the remote.
A better, more long-term solution
My criteria are:
Needs to be able to run 4k HDR at high 100+ mbit/s bitrates smoothly.
Needs to be able to stream comfortably from Jellyfin.
No spying.
Software should remain updatable for at least 3 more years.
Youtube should work.
Some kind of remote control.
Affordable.
Similarly low power draw.
My first thought was Kodi, likely on a Raspberry Pi, or alternatively a mini multimedia PC. Some research revealed that a Pi 5 could likely barely handle my performance requirements, but not perfectly either. Cost wise both options could be had for under 100€, and both would have excellent long term viability. However, why pay any money, if I already have possible solution sitting in my drawer: An unused Jetson Nano 4GB, running on a Tegra X1 SoC similar to my Shield TV Pro and a Nintendo Switch with its standout feature being a fairly competent GPU.
Nvidia pretty much abandoned the Jetson Nano. Up-to-date Linux options are limited, and the available kernels are very outdated. The CUDA SDK that is required for programming its GPU doesn’t support it anymore. OpenCL on the GPU doesn’t work either. With these factors in mind I didn’t have much use for it anymore – until now. Luckily there is a port of Android TV, specifically LineageOS available and up-to-date, so decided to install it.
Installing Android TV
I decided on a plain install without GApps. For my network connection I used Ethernet, though WiFi should work with a dongle, a wired connection should be a bit more reliable and faster. Next, I sideloaded F-Droid with adb and installed the Aurora Store. I installed the YouTube App and the Jellyfin client, Termux, VLC and a web browser. I did all that with a keyboard and mouse connected, but both my TV Remote and my Logitech Harmony Hub phone app worked perfectly out of the box via HDMI CEC.
Testing
At first, I noticed HDR being flaky in Jellyfin. It turns out that the auto bitrate adjustment picked a value that was too low for HDR. I set it manually to the highest limit and it has been working fine since. I also had issues with some files stuttering. That I could easily resolve by switching Jellyfin to using libVLC rather than ExoPlayer. Next there were some audio issues, such as muffled sound or none at all for certain video files. I fixed it by changing the settings to always down-mix 5.1 to stereo which is what I’m using anyway. I also enabled bitstreaming DTS.
Further testing revealed that YouTube still failed to stream HDR. I’ve been unable to find a solution for that as of yet, but this one I can live with. I also noticed that the Fan I had attached to the Jetson didn’t want to spin, so I removed it. This indicated to me that the Board was running in its default, more efficient mode and not the significantly faster MaxP mode. I didn’t quickly find a way of changing that, but since performance is smooth all the way, I don’t feel the need to use more electricity anyway.
Comparison to SHIELD TV Pro
You definitely lose some features compared to using a Shield TV Pro, the big ones to me being:
AI upscaling (I can live without but it is a loss, maybe some day it will be added to Jellyfin serverside)
YouTube HDR (likely fixable)
Chromecast (Could be added with GApps, but since YouTube device linking still works the main reason I would want to cast is covered)
The standby power consumption will also be likely higher, since as of now I haven’t found a way yet to get the Jetson to sleep. Likely Kodi would also have trouble with convenient sleep handling as well, though I didn’t test it. You’ll likely want to hide your Jetson as well, since it just doesn’t look as sexy as a SHIELD.
What you do gain though is privacy, security and flexibility. I also enjoy having four more USB Ports to charge my controllers.
Should you get a Jetson Nano instead of a SHIELD for Android TV?
Probably not. If you already have one, this is a great use for it. The entire setup took me a couple of hours, but in the end it works nearly as well as the SHIELD while being way more private and secure. If you don’t already have one though, you should probably do more research and consider a Raspberry Pi 5 or a multimedia mini-PC with Kodi or even Android TV first. While it’s hard for me to say whether they will provide a better experience overall, they will definitely have a longer remaining lifespan due to them being so much more popular.
2025 Update
As of 2025 I am still using this setup, however I have since installed a small Noctua fan to the Jetson since it was running fairly hot, and I have partially switched to Findroid to mitigate some HDR streaming issues I was having with other apps.
In late 2023 we live in an era of super cheap storage. Be it flash based or spinning rust, they can be regularly had for 30€ and 13€ per TB respectively. Just a few years ago you had pick very specific high density (for the time) disks to get under 20€ per TB. This isn’t necessarily practical if you only need moderate amounts of storage but want good redundancy. The true budget option is used magnetic storage, going for well under 3€ per TB. But how can you safely make use of worn, old disks? That’s where ZFS comes into play.
Used HDD pricing in 2023
The idea of using smaller used disks with high redundancy comes from the dramatic savings that can be achieved. I have seen several lots of 10-40 disks priced as low as 2.50€/TB. Now, using this many disks at once is somewhat impractical, even with high redundancy and (hot-)spares. But ironically, selling a part of such a lot as individual disks will increase their value, since most people don’t want the hassle of buying this many disks. Although individual disks (of this specific type) can go for similarly low prices, they can also go for up to 15€. Either way you would want to keep some spares.
Reliability and redundancy with ZFS
Now the reason why people are reluctant to buy used HDDs is due their limited lifetimes. Being mechanical in nature they will die eventually. Especially these super cheap lots of enterprise disks will have had a hard life with extremely high runtime in data centers. However:
These lots usually advertise their disks as 100% health.
They are enterprise grade and designed for very high uptime and high reliability
HDDs have come a long way. Even these older models are likely to be much more reliable than the ones that caused HDDs’ reputation for high failure rates.
With that being said you will want to plan for failing disks, data corruption and bit rot. You should do that for any kind of storage, but for used Hard Drives especially. And the best system that gives you both redundancy and corruption resilience at home is ZFS. ZFS is a file system with native support for RAID, or replication of data across multiple disks through parity data. Unlike traditional RAID systems, however it also provides resiliency against bit rot.
Bit rot is the degradation and corruption of data that can occur on many media. It can have many causes, but basically this phenomenon means that any storage media over time will lose and corrupt data and the only protection against that is parity and a system that can repair data from that parity. Traditional RAID will protect against failing hard disks, but it has no way of knowing which bits may have rotted or not. ZFS does that.
How does it work?
There are already many good ZFS tutorials out there, but to keep it theoretical, for very good resilience with used disks you may want to use RAIDZ3, meaning having the capacity of 3 disks as redundancy. If you had two four-port SAS-controllers (which can be had for well under 5€ each) you might have eight disks connected and keep two as cold spares. You could sell off the rest of the lot or keep it for future expansion. The ZFS calculator suggests you would then have 12TB of usable capacity, with cold spares and very high reliability for under 40€ all in. Sounds like a good deal to me.
Stable Diffusion XL and Midjourney are both excellent modern AI image generators. While Stable Diffusion has the advantage of being released under a FOSS license, Midjourney has historically been one or two steps ahead in terms of overall output image quality. Since the last big release of Stable Diffusion XL however, the gap appears to have narrowed a lot so I decided to do a brief head to head comparison of the two.
Methodology
I am picking prompts and results from the Midjourney Community Showcase page. These will have been cherry-picked by their creators to make good use of the strengths of Midjourney. For Stable Diffusion I’m using the latest release of Stable Diffusion XL with the webui setup. I will do my best using the available options to get the best possible results out of Stable Diffusion without any adjustments to the prompt itself. Keep in mind that this may be an uphill battle for stable diffusion and my choice of result may be a matter of personal preference.
Stable Diffusion XL vs Midjourney comparison
Prompt: an illustration of a man curly hair dressed in futuristic clothing, in the style of white and gold, hyper-realistic sci-fi, kingcore, rtx on, delicate gold detailing, detailed world-building, photo-realistic
Original Picture by morra 69 created using MidjourneyFirst attempt Stable Diffusion XLSecond attemptSame prompt, CFG Scale set to 1
Both Midjourney and Stable Diffusion XL generated impressive results. Midjourney is maybe a bit more “Sci-Fi”, as requested by the prompt. The last image was generated using CFG Scale 1. A lower CFG scale is supposed to cause the model to interpret the prompt more loosely, causing more creative results. In this instance, the result still adheres quite closely to the prompt, albeit with some changes in perspective.
Prompt: john wick guest stars on the animatrix
Original Midjourney image by bortispananasStable Diffusion, CFG Scale 7CFG Scale 7CFG Scale 1
For this prompt, Stable Diffusion XL decided to go for higher visual complextiy than Midjourney. This means there is more detail in the end results, but also more glitchiness. Setting CFG Scale to 1 changes the art style completely. People who know about John Wick please tell me if the last picture contains any John Wick references.
Prompt: artefacts, double exposure, beautiful women reflecting on store window, outside, bright morning sun, high contrast, analog, 35mm, Leica
Original Midjourney picture by user Hugo – the prompt is partially ignored, but the result is plausibleSDXL CFG Scale 7 – the reflections don’t make senseSecond attempt – reflections are implausible againCFG Scale 1 loses the thread completely
In this very complex and challenging prompt, both models struggle. Reflections and refractions are apparently very difficult to get right. Midjourney appears to ignore parts of the prompt to get a somewhat plausible image. On the other hand, Stable Diffusion XL tries to incorporate all aspects of the prompt and then fails harder. In either case, the reflections are highly implausible.
Prompt: a giraffe parked inside a trailer, in the style of stop-motion animation, vintage-inspired designs, animated gifs, kestutis kasparavicius
Original Midjourney image by diannedunnAttempt 1, Stable Diffusion XLAttempt 2, Stable Diffusion XLCFG Scale 1
For this prompt, Midjourney basically ignored parts of the prompt again. Looking at some of the works of Kestutis Kasparavicius, clearly only Stable Diffusion XL derived inspiration from it. Midjourney didn’t even put a giraffe into the end result. On the other hand, Stable Diffusion XL is more glitchy, especially with CFG Scale 1. That may be caused by adhering to the actual prompt, even though there isn’t a lot of source material to draw from. In my opinion, this is the better approach as opposed to simply ignoring large parts of the prompt.
Conclusion
It’s hard to conclusively say that one model results in significantly better images than the other from this small test. One observation is however, that Stable Diffusion tends to try and interpret the prompt fully even if that causes glitches. Midjourney is (ironically?) more stable, with fewer obvious glitches, but it appears to avoid problematic components of the prompts. That may make Midjourney a bit easier to use, but also less flexible, though more testing is needed. Ultimately, while they both have their strengths, both models are more or less on the same level of quality at this point in time. It will be exciting to observe future developments for both.
I am currently evaluating a number of ways of integrating large language models into my Linux command line. Shell-AI (shai) is one of the easier ones to set up. With Shell-AI, you can simply input your intent in plain English (or other supported languages), and it will suggest single-line commands that achieve your desired outcome. It is designed to work on Linux, macOS, and Windows, though I only tested it on Linux. It’s backed by OpenAI’s GPT LLM – which is problematic for a number of reasons but also means the overall quality of the responses is cutting edge.
Features
Natural Language Input: Describe what you want to do in plain English (or other supported languages).
Command Suggestions: Get single-line command suggestions that accomplish what you asked for. Select a suggestion, dismiss or regenerate in-place.
Cross-Platform: Works on Linux, macOS, and Windows.
Shell-AI result quality
I have thrown a few benchmarks and a few hours of real world use at Shell-AI. As expected, the LLM component, being based by default on gpt-3.5-turbo (although any OpenAI model can be configured) is top notch. Indeed shai was able to answer most of the questions I would usually have had to Google with reasonable solutions. It also saves time by avoiding the need for copy-pasting and context switching. The surrounding implementation that wraps the GPT-API is decent as well, providing multiple options and making it easy to select one. It asks for confirmation before executing each command. However, it doesn’t feature a built-in option to ask for clarification. For instance, quite often the output will feature a command chain that may be hard to understand. An option to ask GPT for an explanation would be nice, since Shell-AI’s output strips out any of the standard GPT fluff around the actual one-liner code. This means that I found Shell-AI to be a terrible tool for learning and a quite risky one to use at that.
OpenAI Backend issues
Shell-AI uses OpenAI’s GPT AI as a backend. That means:
You have to have an API key and pay for each call.
You need to be online at all times.
There are very serious privacy concerns despite shai itself being FOSS.
Response times are kinda slow, reducing the overall time-saving effect. With gpt-3.5-turbo which is supposed to be the fastest current option, response time is around 8 seconds. You can choose other models, but they will be even slower and the quality gains aren’t really relevant.
Conclusion
While Shell-AI is mildly interesting and it can save time significantly in some situations, I won’t be keeping it around. The main issue for me is privacy, but the poor performance limits overall usefulness as well.