…

三日坊主すぎる。ちゃんとログ取ろう。

AI が…

だいぶ熱いそうで、Chat GPT 使ってみたり Stable diffusion 使ってみたりしてる。

GUI からは操作できるんだけど、システム化しようと思ったら GUI からしか使えないのは辛い。

ひとまず pubsub に job 入れると非同期で画像を生成して storage に上げる、みたいな仕組みを作りたい。

Stable diffusion

とりあえずコマンドラインから実行できるようにしたい。

https://github.com/CompVis/stable-diffusion をずっと見てたんだけど、v1 だった・・・
https://github.com/Stability-AI/stablediffusion が最新

README の通り以下を実行する。

conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

conda は brew で入れた。

んで pip3 install -r requirements.txt で必要なものをインストールする。

pip3 install -r requirements.txt

生成っぽいコマンドを実行。大きいと遅いので一旦小さめで吐き出してみる。

$ python3 scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ~/Downloads/anything-v4.5-pruned-fp32.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 256 --W 256
Global seed set to 42
Loading model from /Users/sat8bit/Downloads/anything-v4.5-pruned-fp32.ckpt
Global Step: 0
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Downloading (…)ip_pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 3.94G/3.94G [01:35<00:00, 41.4MB/s]
Traceback (most recent call last):
  File "/Users/sat8bit/Works/src/github.com/Stability-AI/stablediffusion/scripts/txt2img.py", line 388, in <module>
    main(opt)
  File "/Users/sat8bit/Works/src/github.com/Stability-AI/stablediffusion/scripts/txt2img.py", line 219, in main
    model = load_model_from_config(config, f"{opt.ckpt}", device)
  File "/Users/sat8bit/Works/src/github.com/Stability-AI/stablediffusion/scripts/txt2img.py", line 35, in load_model_from_config
    m, u = model.load_state_dict(sd, strict=False)
  File "/Users/sat8bit/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 320]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 320]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 320]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320, 1, 1]) from checkpoint, the shape in current model is torch.Size([320, 320]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 640]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 640]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 640]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).

ぐぬぬ。引き続き調査が必要。
手元で動かすにはそもそもまだ足りてない知識が多すぎる。。

Chat GPT の API を使ってみよう。

ちょっと気分を変えて Chat GPT の API を叩いてみる。

$ curl https://api.openai.com/v1/completions -H "Content-Type: application/json" -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
  "model": "text-davinci-003",
  "prompt": "Correct this to standard English:\n\nShe no went to the market.",
  "temperature": 0,
  "max_tokens": 60,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0
}'
{
    "error": {
        "message": "You exceeded your current quota, please check your plan and billing details.",
        "type": "insufficient_quota",
        "param": null,
        "code": null
    }
}

ああそうか無料枠ないのか・・・とりあえずカードを指して、API Key を作り直して再度サンプルを叩いてみる。

url https://api.openai.com/v1/completions -H "Content-Type: application/json" -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
  "model": "text-davinci-003",
  "prompt": "Correct this to standard English:\n\nShe no went to the market.",
  "temperature": 0,
  "max_tokens": 60,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0
}'
{"id":"cmpl-7MtFPbCYga67PNDTKnmTirogYPJYF","object":"text_completion","created":1685689943,"model":"text-davinci-003","choices":[{"text":"\n\nShe did not go to the market.","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":10,"total_tokens":25}}

おー簡単。すごい。日本語の文章を作ってみる。

$ curl https://api.openai.com/v1/completions -H "Content-Type: application/json" -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
  "model": "text-davinci-003",
  "prompt": "初学者向けの Python の記事を markdown で作ってください。",
  "temperature": 0,
  "max_tokens": 600,
  "top_p": 1.0,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0
}'
{"id":"cmpl-7MtIMuBf3SUNGxysIrDnL4IZnDlEZ","object":"text_completion","created":1685690126,"model":"text-davinci-003","choices":[{"text":"\n\n# Python とは\nPython はオープンソースのプログラミング言語です。Python は、様々なプログラミングタスクを容易に行うことができるように設計されています。Python は、Web アプリケーション開発、データ分析、機械学習など、さまざまな分野で使用されています。\n\n# Python の基本\nPython は、非常に簡単な文法を持つプログラミング言語です。Python でプログラミングを行うには、次のような基本的な概念を理解する必要があります。\n\n* 変数: プログラムで使用するデータを格納するための仮想的な箱です。\n* 条件分岐: プログラムが特定の条件を満たした場合に実行される処理を定義します。\n* ループ: プログラムが特定の処理を繰り返し実行するための仕組みです。\n* 関数: プログラムで何度も使用する処理をまとめて定義します。\n\n# Python の学習方法\nPython を学習するためには、次のような方法があります。\n\n* オンラインコース: Python の基本的な概念を学ぶために、オンラインコースを受講することができます。\n* チュートリアル: Python の基本的な概念を学ぶ","index":0,"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":31,"completion_tokens":600,"total_tokens":631}}

なんか生成できそう。 jq 食わせておけばよかった。。

生成された markdown を整形。

# Python とは
Python はオープンソースのプログラミング言語です。Python は、様々なプログラミングタスクを容易に行うことができるように設計されています。Python は、Web アプリケーション開発、データ分析、機械学習など、さまざまな分野
  使用されています。

# Python の基本
Python は、非常に簡単な文法を持つプログラミング言語です。Python でプログラミングを行うには、次のような基本的な概念を理解する必要があります。

* 変数: プログラムで使用するデータを格納するための仮想的な箱です。
* 条件分岐: プログラムが特定の条件を満たした場合に実行される処理を定義します。
* ループ: プログラムが特定の処理を繰り返し実行するための仕組みです。
* 関数: プログラムで何度も使用する処理をまとめて定義します。

# Python の学習方法
Python を学習するためには、次のような方法があります。

* オンラインコース: Python の基本的な概念を学ぶために、オンラインコースを受講することができます。
* チュートリアル: Python の基本的な概念を学ぶ

ここまでで 25 + 631 = 656 トークン。 Pricing 見てみる。

davinci は以下の通りなので、

$0.0200 / 1K tokens

0.0200 * 656 / 1000 = 0.01312 なので、円に直すと 1.3円くらい。

なんかコンテンツを自動で生成して継続的に拡充するコンテンツとかは面白いかも？まだよくわからない

その他

markdown の preview の font を変える

markdown の preview (特にコードブロックが等幅になってないの）が見づらすぎて耐えられなくなったので font 変える方法を模索。

markdown.styles という設定値で css を指定できるらしい。

準備しないといけないのか・・って思ったけどよく考えたら hugo 用の css があるのでそれを使う。

  "markdown.styles": ["themes/hugo-notepadium/assets/css/style.css"]

それっぽくなったけどなんかなぁって思い、もう少し詳しく devtools で調べてみる。

あれ？ editor の font がデフォルトで当たるようになってる・・・

editor の Font は HackGenNerd Console を指定していたんだけど、これが当たってないのかな？

FontBook で見てみたらフォントがインストールされてないではないか・・・！

ということでちゃんとインストールされているフォントを当てたら治った。

  "editor.fontFamily": "HackGen Console NF",

２年前のときはインストールされていたのか、或いは設定したつもりでやっていたのか・・・

VScode で書いている markdown への画像貼り付け

２年前に log 取る用の環境整備した際に書いた記事読みながら書いてる。

https://sat8bit.github.io/posts/vscode-cheatsheet/

画像貼り付けできなかったっけな・・って思ったら Paste Image のショートカットは Cmd + Option + V だった。えらい。

Daily/2023/06/02

…

AI が…

Stable diffusion

Chat GPT の API を使ってみよう。

その他

markdown の preview の font を変える

VScode で書いている markdown への画像貼り付け

Copilot のレコメンドが面白かった

Next