14-17. 文章から画像を生成できる「VQGAN+CLIP」と「DALL-E mini」を試してみた

やること

文章から画像を生成する手法「VQGAN+CLIP」「DALL·E mini」が公開されていますので、ファッション画像の生成を試してみましょう。

Katherine CrowsonさんとRyan Murdochさんが、Transformerモデルを利用して高解像度画像生成する「VQGAN」と、テキストと画像を結びつける「CLIP」を組み合わせたとのことです。

Ryan Murdochさんのギャラリー

Katherine Crowsonさんのgithub

Boris Daymaさんらは、テキストから画像を生成する「DALL·E」をより軽量に再現したとのことです。

DALL·E mini

github

どちらも文章から画像を生成できますが、「VQGAN+CLIP」は基礎となる画像を入力することもできるようです。

テキスト	VQGAN+CLIP	DALL·E mini
fractal dress	テキストのみテキスト＋基礎画像	テキストのみ
A woman is wearing a dress with a fractal pattern. She is standing and reading a book.	テキスト＋基礎画像	テキストのみ
peacock dress	テキストのみテキスト＋基礎画像	テキストのみ
A woman is wearing a dress with a peacock feather pattern. She is standing and reading a book.	テキスト＋基礎画像	テキストのみ
lightning dress	テキストのみテキスト＋基礎画像	テキストのみ
A woman is wearing a lightning patterned dress. She is standing and reading a book.	テキスト＋基礎画像	テキストのみ

基礎画像にはネットから拝借した以下を用いました。

出力画像は複数枚を結合して並べました。本来は個別に出力されます。

fractal dress

fractal dress＋基礎画像

A woman is wearing a dress with a fractal pattern. She is standing and reading a book.＋基礎画像

peacock dress

peacock dress＋基礎画像

A woman is wearing a dress with a peacock feather pattern. She is standing and reading a book.＋基礎画像

lightning dress

lightning dress＋基礎画像

A woman is wearing a lightning patterned dress. She is standing and reading a book.＋基礎画像

fractal dress

A woman is wearing a dress with a fractal pattern. She is standing and reading a book.

peacock dress

A woman is wearing a dress with a peacock feather pattern. She is standing and reading a book.

lightning dress

A woman is wearing a lightning patterned dress. She is standing and reading a book.

これらの手法はこの世にない画像を生成できる点が魅力的で、デザイナーに新しい着想を与えることが期待されます。AIを活用したデザインスキームがますます発展していくことでしょう。

ただし、いまのところ、このような服が発表されたとしても「AIがデザインした」という触れ込みが先行してしまい、既存のデザインと同じ土俵では評価されません。AIブームの落ち着きを待つ必要があると考えます。