Enhancing Product Photography with AI: A Tech Note on Planningo
Product photography with AI — Tech Note of Planningo
The age of finding the desired image? Now is the age of creating it yourself.
News that AI can create realistic images similar to photographs has been around for years. However, recently, AI modules that can produce realistic and natural-looking images have emerged.
Some notable examples include DALL-E, Midjourney, and Stable Diffusion. It has been reported that major companies like Google and Meta are also developing their own AI systems for image generation.
Planningo, which focuses on creating product content, conducted research on which AI to use and how to utilize it to easily create product content. If it is possible to easily create product content using AI, it is expected to be widely used when preparing detailed product pages or promotional materials for online stores.
The representative text-to-image models we identified are as follows.
DALL-E
DALL-E is a text-to-image model created by OpenAI, well-known for its chat-gpt model. It mainly focuses on generating images from text descriptions. This model creatively generates images that match a specific sentence or word. For example, if the text input is “banana-shaped sofa,” it generates an image of a banana-shaped sofa in a living room, as shown in the picture above. DALL-E can transform text into visual elements to create creative and unique images.
Midjourney
This Model is an AI model that utilizes Generative Adversarial Networks (GANs). GANs are used to generate realistic images that are difficult to distinguish from real ones, and Midjourney uses this technology to create realistic and creative images. There was once controversy when an AI-generated artwork was selected as the winning piece, and it was Midjourney that created that artwork.
Stable Diffusion
Stable Diffusion is an image generation model developed by Stability AI, focusing on stable image generation. This model aims to alleviate the instability issues that occur during the image generation process and achieve more consistent image generation.
Furthermore, there is a Control-net that can partially control the image generation AI. After researching the features of these models, we decided to use the Stable Diffusion model for our service. It is obvious that other image generation models also have excellent performance.
So why did we choose Stable Diffusion?
Actually, we had to use Stable Diffusion. What we need to use for generating product content is not text-to-image, but image-to-image. To create product content, we need to generate the natural background around the product based on the input product image. Therefore, an image should be present as an input. The existence of Control-net was also significant. Through Control-net, Planningo was able to create a more product-specific image using the Stable Diffusion model.
What is Control-net?
Control-net is a model that allows control over how Stable Diffusion generates images. We wanted to create a more natural background around the product and generate an image as if the product were originally placed in that background. We used the canny model of Control-net to generate the image.