Red swan tabs8/10/2023 ![]() DownloadĬurrently, there are two improved versions of VAE released by Stability. VAEs are ready to use in the Colab Notebook included in the Quick Start Guide. After that, the art creation workflow stays the same. You only need to go through the trouble of setting it up once. You should use a VAE if you are in the camp of taking all the little improvements you can get. E.g., you are already using face restoration like CodeFormer to fix eyes. You don’t need to use a VAE if you are happy with the result you are getting. Perhaps they have already incorporated the improvement to the model. 2.0 is already very good at rendering eyes. You can use them but the effect is minimal. I encourage you to do your own test.Īs a final note, EMA and MSE are compatible with Stable Diffusion v2.0. ![]() I tested with some anime models but didn’t see any improvements. ![]() You can also use these VAEs with a custom model. Improvement to text rendering is unclear. Improvements to text generation are not as clear (Added “holding a sign said Stable Diffusion” to the prompt): Original EMA MSE Comparison of VAE between original, EMA, and MSE. Note the garbled eyes in the original image are recovered. Original EMA MSE Comparison of VAE between original, EMA and MSE using SD v1.5. (prompt can be found here.) Enlarge and compare the difference. Either doing better or nothing.īelow is a comparison between the original, EMA, and MSE using Stable Diffusion v1.5 model. I didn’t see any improvements to rendering text, but I don’t think many people are using Stable Diffusion for this reason, anyway. In my own testing of Stable Diffusion v1.4 and v1.5 with 512×512 images, I see good improvements in rendering eyes in some images, especially when the faces are small. Which one should you use? Stability’s assessment with 256×256 images is that EMA produces sharper images while MSE’s images are smoother. Stability AI’s comparison between EMA, MSE, and the original decoder. (Exponential Moving Average and Mean Square Error are metrics for measuring how good the autoencoders are.) ![]() Stability AI released two variants of fine-tuned VAE decoders, EMA and MSE. It helps render eyes and text where all fine details matter. An improved VAE decodes the image better from the latent space. Instead of releasing a whole new model, which is a big file, they release only the tiny part that has been updated. This happens when the model trainer further fine-tunes the VAE part of the model with additional data. When people say downloading and using a VAE, they refer to using an improved version of it. You don’t need to install a VAE file to run Stable Diffusion-any models you use, whether v1, v2 or custom, already have a default VAE. It is part of the neural network model that encodes and decodes the images to and from the smaller latent space, so that computation can be faster. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |