多様な未来製作所 TAKAGI-1 みくすと 2025/05/27

2025/05/27

Twitter

テキストのみで訓練されたAIでも、内部的に「画像と音声を理解する能力」を発達させることが明らかにされました。
つまり、AIは「読む」ことで、「見る」「聞く」能力のもとになる回路を獲得していた、とのことです。

これまでの常識では、
画像には画像用のAI、音声には音声用のAIを別々に作っていました。しかし今回スタンフォード大学の研究者らは「言語モデルひとつでいろんなことができるかもしれない」と示してくれました。

つまり、一から新しいAIを作らなくても、すでにある言語用AIを少しだけ調整すれば、さまざまなタスクに応用できる可能性があるということです。

実験では、AIの本体はほとんどそのままで、入力まわりのごく一部と出力だけを調整することで、「これは猫の写真」「これはクラシック音楽」といった分類ができるようになりました。

なお、AIのサイズが大きくなるほど、そのような能力がどんどん向上する一貫した傾向も確認されたそうです。","1
posted at 12:40:11
Large Language Models Implicitly Learn to See and Hear Just By Reading
https://doi.org/10.48550/arXiv.2505.17091
Prateek Verma, Mert Pilanci
（Stanford University）

This paper presents a fascinating find: By training an auto-regressive LLM model on text tokens, the text model inherently develops internally an ability to understand images and audio, thereby developing the ability to see and hear just by reading. Popular audio and visual LLM models fine-tune text LLM models to give text output conditioned on images and audio embeddings. On the other hand, our architecture takes in patches of images, audio waveforms or tokens as input. It gives us the embeddings or category labels typical of a classification pipeline. We show the generality of text weights in aiding audio classification for datasets FSD-50K and GTZAN. Further, we show this working for image classification on CIFAR-10 and Fashion-MNIST, as well on image patches. This pushes the notion of text-LLMs learning powerful internal circuits that can be utilized by activating necessary connections for various applications rather than training models from scratch every single time.","1
posted at 12:40:19
関連ポスト： https://twitter.com/ai_database/status/1924441004402430093","1
posted at 12:40:26
関連記事：
マルチモーダルLLM活用で画像異常検知に「意味」を与え精度向上　見つけるだけで終わらせない
https://ai-data-base.com/archives/89515","1
posted at 12:40:35
日本郵便から ken_all じゃなくて無料の公式 API で郵便番号を住所に変換できる API リリースされてんだけど嘘だろ！？！？サイトデザインもイケてるし DX 部隊が頑張ったんやろか https://guide-biz.da.pf.japanpost.jp/api/ https://x.com/izutorishima/status/1926776123339194793/photo/1","1
posted at 12:52:13

<<2025/04/29 ＜2025/05/26 || 2025/05/28＞ 2025/06/24>>