I Gave My Wife AI-Generated Art of Her For Valentines Day

A few months ago, Jason Lengstorf from Learn with Jason Tweeted a thread featuring AI art with his likeness. I was instantly amazed by the possibility of this technology. I promptly added it to my to-do list and mostly forgot about it.

In early February, with Valentine’s Day rapidly approaching, I began wondering what I should do for my wife this year. We have a tradition of getting takeout and eating at home – sushi was the choice for this year. We also try to do something special for each other. It doesn’t need to be big or expensive – just enough to make the other smile.

Let’s Skip to the Good Part

The results were better than I ever expected. It took a few days of learning and tweaking the results, but eventually, the results started to look like my wife. For reference here’s a picture of us on her birthday last year at the Red Butte Garden concert series (which is our absolute favorite concert series here in Salt Lake).

Waiting for “The Head And The Heart” to perform

I had a few ideas on what type of image I wanted to create. She was an art major for a time during college, and art nouveau has always been one of her favorite styles. Our first-ever date was to see Spirited Away, making a Ghibli-inspired portrait another contender (one I would try and fail at by the way).

Like with everything in my life, I created a spreadsheet (😂) of ideas based on things she likes. I didn’t know what I’d be able to generate and wanted to cast a wide net. Stardew Valley? Kirby? Zelda? Moulin Rouge? Spirited Away? Spider-man? Crazy Rich Asians? The Devil Wears Prada? Wandavision? Manga style? Pixar style? Comic book style? What would be possible – and what would actually look good?

I’ll detail the process of getting here later in this article, but first the results!

Art nouveau, Sandman, Punk Rainbow

My absolute favorite was this combination of art nouveau, with a bit of punk rock mixed with Delirium from Sandman. I was amazed at how much this captured her likeness.

Art Nouveau French Poster #1

This more traditional art nouveau was less photorealistic and had more of an 1880s vibe. I love the smirk on this one.

Stained Glass Mermaid

At some point, I stumbled on a stained glass prompt that someone else wrote. Changing the subject was all that was needed to create this.

Vintage Japanese Ukiyo

This retro Japanese and ukiyo-inspired portrait was posted on Reddit. A few changes of “woman” to “Wife person” (the 2-word identifier for her) and we had this.

Minimalistic Art Nouveau Portrait

Whether highly detailed or minimalistic, what was important was to find results that were Marilyn-y enough. This minimalistic art-nouveau portrait shows what’s possible with a smaller color palette.

Art Nouveau French Poster #2

I’ve been trying to bring more color into my life recently. Fewer gray shirts and bolder color choices. This next one was generated using the same prompt as the one a few above only with a different seed. I absolutely love the variety of colors.

Generating each of these images only took about 15 seconds once everything was prepared.

On the morning of Valentine’s Day, I pulled up my laptop and let my wife know I had a gift for her. We went through these photos one by one, pausing to see how each one attempted to capture her likeness. I shared another dozen photos created on MidJourney as well, but none of those were created using her actual face.

The morning was filled with laughs, gasps, wide eyes, and some side eye. It achieved what I had set out to do: make my wife smile.

If we rewind a week before, there was a problem: I had absolutely no idea how to do any of this. Sure, I’d joined the MidJourney Discord, but only to create abstract art to use for a few posts on this blog (1, 2, 3), but I had no idea how to add someone’s likeness to the mix.

For the rest of this post, I’m going to share the specifics of what I learned and show you how to create similar artwork that will delight, impress and possibly terrify the loved ones in your life.

What You’ll Need to Know and Do

Here are the steps at a very high level. Some of these will be run on your local computer, and some of them on a cloud computer. The goal at the end of this is to be able to generate AI art locally – without even needing an internet connection.

#1 Get The Stable Diffusion Web UI Running Locally

To generate custom images with someone’s likeness, you’ll need to run everything locally. I recommend downloading and running Stable Diffusion web UI. It’s a Python app that you run from the command line that creates a locally accessible website you can use to generate images.

Set the checkpoint you want to use and other settings.

I was able to set it up and run it on my M1 MacBook Pro using the Installation on Apple Silicon instructions. The only part I needed to change was locking in a specific version of fastapi after downloading the code but before running anything. Adding this one line to the requirements.txt file fixed everything and we were off!

fastapi==0.90.1

Well, almost. Once you run ./webui.sh, you’ll get a message saying you need to download at least one model to run. So… What’s a model?

#2 Using Your First AI Model

Models are the magic behind AI Art. This is an AI model, not a runway model. You’ll also see these called “checkpoints”, or files with the .ckpt extension.

These checkpoint files contain magnitudes.

Stable Diffusions breakthrough is that they found a way to feed 2 billion+ images (and descriptions of those images) into an algorithm and have it generate a file that’s only 2 GB, but contains the essence of all of the images.

Then, they asked people to generate images based on that model and have people rate the results. This further trained the model to understand which results were “good”.

There are models behind MidJourney and DALL-E as well. In those cases the models are private and consumers can’t download and use them.

Fortunately, there are a bunch of existing models you can download to use on your local computer. There are models that include a little bit of everything, down to the ultra-specific. You can download a model to create Avatar characters, watercolor art, Ghibli characters, faces, hands, water, clouds, and, well, every body part.

Side note: There’s a lot still to be figured out about the ethics of AI art. Currently, artists and creators are having their artwork stolen and used to feed these models. In a lot of ways, AI Art feels like the Napster days of digital music. The genie is out of the bottle, but we still need to find a way to compensate creators and put restrictions on the use of our own likenesses.

While searching for a model to use I started with the deliberate_v11 model, deliverate_v11.ckpt. A content warning here: this model contains everything including NSFW content. If you want a model that is more SFW, sd_v1-5_vae.ckpt is a good starting point.

Download one or both of these files and put them in the models/Stable-diffusion folder of the WebUI project. Start the WebUI from the command line. At the top of the page, there’s a place to set your checkpoint (model).

Most models I’ve used are around 2 GB to 8 GB. The larger models are trained on more data or include multiple models merged together. There’s no way to know what images went into them.

This is effectively what a model is. It’s a representation of the rules that make up a much larger set of data. You can’t generate an exact version of the Mona Lisa, but you can ask for “The Mona Lisa, starry night style”.

That doesn’t mean you’ll get exactly what you want.

“The Mona Lisa, starry night style”, sd_v1-5_vae, seed 330838989

This isn’t great. There are multiple subjects and a picture frame. Looking at this you can start to guess what went wrong. The model was likely fed with thousands of photographs of tourists taking selfies in front of the Mona Lisa. A part of the model thinks “Mona Lisa” means 2 faces and a picture frame.

Stable Diffusion has a few solutions for this.

One is generating multiple images. You can generate an infinite number of images based on this text – all different. Some of them won’t have a frame or multiple heads.

Another option is negative prompts. Negative prompts allow you to give it keywords you don’t want to appear in the final image. You might think “oh, we can just add a negative prompt for ‘frame’, ‘multiple faces’, and ‘selfie’ and it’ll fix this!”. Sadly that’s not always the case. I intended for that to be the case when I was writing this, but it doesn’t always work out. 😅

A third option is to use the img2img mode. With this mode, you can upload a picture (the original Mona Lisa in this case), and use that as a starting point. For the top image of my wife above I used img2img mode with an art nouveau image I love.

Sometimes the best solution is to create more images and then find the one that looks best. Here’s the same prompt (“The Mona Lisa, starry night style”) with a negative prompt (“multiple faces, frame, gallery, selfie, picture frame, duplicate, border, double”), generating 16 random versions

16 random images for “The Mona Lisa, starry night style”.

Some of these are better than others. That brings me to the next step.

#3 Learn How to Write Detailed Prompts

If there is one takeaway from this article, here it is:

You’ll spend most of your time iterating on your prompt.

I’ve found that in order to generate the ideal image in my head I often need to iterate on it dozens of times with minor tweaks. That’s one of the wonderful parts about this – it’s all local. You can iterate on it as much as you want. Its part of having a safe sandbox to work in that enhances creativity.

One other thing to note here: Stable Diffusion will generate this image based on a random seed. If you copy this seed and rerun the prompt with it, you’ll get the same image again. If one of these images in this contact sheet is amazing, I can rerun it and tweak the prompt to create what I’m looking for.

Learning to write prompts is its own skill that takes practice, trial, and error. I recommend reading through the Stable Diffusion Prompt book by OpenArt.ai. It’s free, and teaches the basics of how to format your prompt. The tl;dr is that you want something like this:

A <style> of <subject>, <modifiers & adjectives>, by <author>

You can specify multiple subjects, styles and authors, but this is a good starting point. There are a bunch of YouTube tutorials on writing prompts (this is a good starting point), and likely a ton of courses.

Before you get too far, I’d recommend you go into your WebUI settings and enable logging. You’ll want it to save every prompt you write, the seed the negative prompt and the output. That way if you do generate something you’re proud of you’ll be able to look back and see why.

I recommend trying to write some on your own but also copying some prompts by others. I’ve found a few places useful for finding prompts:

Stable Diffusion Subreddit
Lexica
OpenArt.ai
Stable Diffusion Prompts Page
MidJourney’s Discord

If you see something you like, try to recreate it. Many shared prompts don’t include the exact seed used to generate it. That just means you won’t be able to recreate the image exactly. But you might end up with something even better!

#4 Prepare Photos of Your Loved One

This is the part that gets a little tricky. It’s still relatively easy, but I had to stretch a little outside of my comfort zone.

First, you’re going to need to prepare a bunch of images of the person you’re wanting to generate images of. For this, I used the JoePenna/Dreambooth-Stable-Diffusion walkthrough which recommends the following sizes:

2-3 Full Body
3-5 Upper body
5-12 Close-up on the face

The more different these images are the better. I originally created a model with my wife smiling in every image and it generated images that were similar to that.

You’ll need to crop these down to only the subject. There’s no need for an entire background or other people. Trim the images to just the body, upper body or face. For the face ones, I included my wife’s entire head. I just used Preview for cropping them down.

I’ve seen a few guides that require specific heights and widths for these images, but all of mine were different sizes.

According to the guide, I needed to upload these to have a public URL. You can use Imgur, your own hosting space or anywhere else with a public link.

Once those are set you’re ready to create the model.

#5 Creating A Personalized Model

The process of creating the model looks like this:

Head over to the JoePenna/Dreambooth-Stable-Diffusion starting at the “Vast.AI Instructions” section.
Spin up a server in the cloud.
Create a Python notebook
Clone the repo
Go through the instructions (~2 hours).
Save the ckpt model locally.
Turn off your Vast.ai server

This step isn’t free. To train 2 models (because I messed up at first) it took me about $13. Partially because I had the server running while I was choosing and cropping images (step #4 above). Hopefully, it’ll go faster for you if you already have those images ready.

There’s not much I can add to JoePenna’s guide. If you make a mistake you can always delete your entire server and start over or head over to the Discord server and ask for help there.

One tip: you’ll be asked to make a few choices. Here’s what I decided on:

Download the 1.5 model from hugging face – I used the one provided (panopstor/EveryDream/sd_v1-5_vae.ckpt).
Regularization Images – skipped since we’re providing our own.
Download pre-generated regularization images – I used “person_ddim” here. Originally I used the “woman” one. “person” worked better, but by that point, I was also writing better prompts, so 🤷.
In the Training section, make sure you indicate you’re training a face and name your subject (token).

Make sure you remember your token and class_word. If your class_word is person and your token is FirstnameLastname, then you’ll need to write your prompts in the format “An oil portrait of FirstnameLastname person sitting in a garden”. If you forget these your model will be useless.

There are instructions to save the model to Google Drive, but you can skip that if you download the model locally. Save this file in models/Stable-diffusion folder and start using it!

If I ask for “An oil portrait of Wife person”, I already start to see her likeness in the results (although not the most flattering).

A few oil portraits generated from the sd_1.5 model + a custom model.

This was generated from the guide using no negative prompt and a batch size of 4. They’re exactly what I asked for and almost nothing more.

#6 Merging the Wife Model into another Model

At this point, your model is based on the sd_v1-5_vae.ckpt model if you’ve followed along. You could use this and create some very targeted images. In fact, I think you’ll probably want to pause here and just try it out.

You can do this using the Checkpoint Merger in the WebUI:

In this case, I wanted to merge together the deliberate_v11 model + my wife’s specific part of the model created earlier.

To do this you want to merge together A + (B-C). “A” is the new model you want to use. “B” is my Wife model that was generated using the sd_1-5_vae.ckpt model. “C” is the same sd_1-5_vae.ckpt model that was used to generate it originally.

For this configuration, you’ll want to “Add difference”.

You can optionally bake in a VAE. I had absolutely no idea what this meant, but after some research, I found these are often used for additional clarity.

VAE stands for variational autoencoder. It is the part of the neural network model that encodes and decodes the images to and from the smaller latent space, so that computation can be faster.
How to use a VAE

Think about it as a way to improve the fine details of an image. The VAE I found and used for the images I created is the vae-ft-mse-840000-ema-pruned.ckpt VAE. Download this file and put it in the models/VAE directory.

You can decide to bake this into your new model, or head into settings and use it in addition to your model.

If I ask this new model for “An oil portrait of Wife person” I get completely different results.

A few oil portraits generated with dilberate_v11 + wife model.

Both have a unique style to them. One of the fun parts for me was trying out different models and seeing which ones looked the most like her.

There’s a setting during the model merge for “multiplier” which indicates how much it should weigh the model you’re adding (the wife model). The lower this is, the less it’ll weigh the new model. After some experimentation, I set this to 0.9, indicating that it should weigh the wife model heavily.

Putting it All Together

After quite a bit model tweaking, prompt tweaking and research I ended up with a few styles I liked. If you’re curious to create something like this for yourself, here are the settings I used for it. It was a combination of pieces I copied from various other prompts and tweaked over and over again until it felt right.

Mode: img2img with this base image: https://imgur.com/a/J17udIG
Model: deliberate_v11 + custom wife model
Prompt: beautiful Wife person as delirium from sandman, art nouveau style, Alphonse Mucha, retro, flowing hair, flowing dress, line art, over the shoulder, nature, purple hair, petite, snthwve style nvinkpunk drunken, (hallucinating colorful soap bubbles), by jeremy mann, by sandra chevrier, by dave mckean and richard avedon and maciej kuciara, punk rock, tank girl, high detailed, 8k
Negative prompt: cartoon, 3d, ((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), wierd colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render
Sampling method: Euler A
Size: 576w, 864h
Seed: 4222144451
Denoising strength: 0.75
CFG Scale: 7
Restore faces: on
VAE: vae-ft-mse-840000-ema-pruned

One thing that was interesting to me with the result was how important it was to weigh my wife’s model at 0.9. Using this exact same prompt with a 0.5 weight gave a completely different image. You can still spot some inspiration from my wife’s likeness, but it doesn’t have the same soul.

Next Steps

The AI art field is only going to grow. As someone who’s only drawn a handful of things I’m proud of in my life, I’m captivated by the ability to generate artwork using only words. I’m also excited by how many more people will find they can be creative in this medium who lack the skills (or drive) to create art in other ways.

To say I’m scared of the ability for misuse is an understatement. This technology is so incredibly easy to use that the scale of misuse is hard to even fathom. Even for this post, I ran everything by my wife before posting it here.

Having this as a secret project gave me something to work towards. If I had been trying to learn how to generate AI art for the purpose of a blog post or just for fun I wouldn’t have learned as quickly or had anywhere near as much fun.

I’d like to do something else in the AI art space, but I’m not sure what. Having a challenge has been a lot of fun. Having a community to learn from has been amazing. The MidJourney Discord has a daily theme that’s a lot of fun to follow and learn from. Something I think would help is having a community with a weekly or monthly challenge with everyone sharing their creations along the way. If you’re aware of something like that, I’d love to hear about it.