TAKAGI-1 みくすと 2025/05/27

2025/05/27

Twitter @atene_gakudo

テキストのみで訓練されたAIでも、内部的に「画像と音声を理解する能力」を発達させることが明らかにされました。
つまり、AIは「読む」ことで、「見る」「聞く」能力のもとになる回路を獲得していた、とのことです。

これまでの常識では、
画像には画像用のAI、音声には音声用のAIを別々に作っていました。しかし今回スタンフォード大学の研究者らは「言語モデルひとつでいろんなことができるかもしれない」と示してくれました。

つまり、一から新しいAIを作らなくても、すでにある言語用AIを少しだけ調整すれば、さまざまなタスクに応用できる可能性があるということです。

実験では、AIの本体はほとんどそのままで、入力まわりのごく一部と出力だけを調整することで、「これは猫の写真」「これはクラシック音楽」といった分類ができるようになりました。

なお、AIのサイズが大きくなるほど、そのような能力がどんどん向上する一貫した傾向も確認されたそうです。","1
posted at 12:40:11
Large Language Models Implicitly Learn to See and Hear Just By Reading
https://doi.org/10.48550/arXiv.2505.17091
Prateek Verma, Mert Pilanci
（Stanford University）

This paper presents a fascinating find: By training an auto-regressive LLM model on text tokens, the text model inherently develops internally an ability to understand images and audio, thereby developing the ability to see and hear just by reading. Popular audio and visual LLM models fine-tune text LLM models to give text output conditioned on images and audio embeddings. On the other hand, our architecture takes in patches of images, audio waveforms or tokens as input. It gives us the embeddings or category labels typical of a classification pipeline. We show the generality of text weights in aiding audio classification for datasets FSD-50K and GTZAN. Further, we show this working for image classification on CIFAR-10 and Fashion-MNIST, as well on image patches. This pushes the notion of text-LLMs learning powerful internal circuits that can be utilized by activating necessary connections for various applications rather than training models from scratch every single time.","1
posted at 12:40:19
関連ポスト： https://twitter.com/ai_database/status/1924441004402430093","1
posted at 12:40:26
関連記事：
マルチモーダルLLM活用で画像異常検知に「意味」を与え精度向上　見つけるだけで終わらせない
https://ai-data-base.com/archives/89515","1
posted at 12:40:35
日本郵便から ken_all じゃなくて無料の公式 API で郵便番号を住所に変換できる API リリースされてんだけど嘘だろ！？！？サイトデザインもイケてるし DX 部隊が頑張ったんやろか https://guide-biz.da.pf.japanpost.jp/api/ https://x.com/izutorishima/status/1926776123339194793/photo/1","1
posted at 12:52:13

Twitter @takagi1

#GQuuuuuuX #ジークアクス https://x.com/ShimakazeYamato/status/1926836564551725203/photo/1","1
posted at 05:46:07
大阪関西万博のコモンズCにあるサンマリノ館に、ユネスコ世界記憶遺産候補であるサンマリノの生涯に関する世界最古の写本が展示。
私「レプリカですか？」
?「本物です。SPが何人も付いて運んできました。」
この展示、もっと知られていいと思う。因みにgoogleレンズの翻訳機能で読めたw https://x.com/polca220/status/1926793218466705590/photo/1","1
posted at 06:18:38
万博の当日予約の状況を一目で見られるサイトを作りました
http://expo.ebii.net https://x.com/s__hrimp/status/1926896804206887167/photo/1","1
posted at 06:22:18
オマーンの楽器で奏でられる「Chu! 可愛くてごめん」は、いくらなんでも反則すぎるだろwwwwww

めっちゃ良い音するんだけどwwwww

#たかちょの大阪万博旅 https://x.com/takacho_01/status/1925402921333924204/video/1","1
posted at 06:31:47
トルクメニスタンパビリオンの影に隠れて地味なバーレーンパビリオンですが、今回の万博パビリオン木造部門トップクラスの美しさ。設計はレバノン出身の女性建築家、リナ・ゴットメ氏です。
#大阪・関西万博 https://x.com/ken_ta_rou/status/1926494442388594994/photo/1","1
posted at 12:45:48
ほんまや！！！！！！！！！！
ドゥー・ムラサメ　男なのか女なのか分からなかったけど

女の子確定か！！！！

イタリア語字幕だと
「あの子供」が「ragazza（少女）」になってる！！！！！！！！

ありがとうイタリア人！！！！！！！！！！！！！！！！！ https://x.com/hahaha3/status/1925139475841360135/photo/1","1
posted at 12:47:02
飲み会のノリで三十六歌仙大伴家持にちなんだ狂名を大田南畝からもらってしまう次郎兵衛兄さんw #大河べらぼう https://x.com/kysn/status/1926603731795149059/photo/1","1
posted at 12:47:37
「カウンターウエイト」と表記されることが多いけれど、銃塔から突き出した銃身が前進速度の風圧を受けて風下に回ろうとする力が掛かるのを、反対側にも空気抵抗源を作って相殺するためのものです。 https://twitter.com/F4F_4B/status/1926945794906575309","1
posted at 12:50:34
北斎漫画すごい
なんというデッサン手引書
江戸中期に、ですよこれは！ https://x.com/eki_itsuki/status/1926932551253270547/photo/1","1
posted at 12:51:12

はてなブックマーク

溝上法律特許事務所＜大阪＞新聞の「見出し」と著作権法、不正競争防止法、不法行為性について
[copyright]

<<2025/04/29 ＜2025/05/26 || 2025/05/28＞ 2025/06/24>>

TAKAGI-1 みくすと 総合版

Twitter @atene_gakudo

Twitter @takagi1

はてなブックマーク

TAKAGI-1 みくすと総合版