Tsarin Abubuwan Ciki
- 1. Gabatarwa
- 2. Hanyar Aiki
- 3. Gwaje-gwaje da Sakamako
- 4. Tsarin Bincike
- 5. Ayyuka na Gaba
- 6. Bayanan Kara
1. Gabatarwa
Vision Transformers (ViTs) sun kawo sauyi ga ayyukan gani na kwamfuta amma suna fama da sarƙaƙƙiyar lissafi mai siffar murabba'i saboda tsarin kula da kai. Hanyoyin datse tokens na yau da kullun sun fi mayar da hankali kan muhimmancin token, suna adana tokens masu kulawa yayin zubar da waɗanda ba su kulawa ba. Duk da haka, wannan hanyar ta yi watsi da bambancin token na duniya, wanda yake da muhimmanci ga bayyana samfurin. Wannan takarda ta gabatar da wata sabuwar hanyar raba da haɗa tokens wacce ta haɗa ingantaccen aiki don duka muhimmancin token da bambancinsu.
Mahimman Ma'aunin Aiki
DeiT-S: Rage FLOPs 35% tare da ragin daidaito kawai 0.2%
DeiT-T: Rage FLOPs 40% tare da ingantaccen daidaito 0.1%
2. Hanyar Aiki
2.1 Rarraba Tokens
Dangane da makin kulawa na token na aji, muna raba tokens zuwa ƙungiyoyi masu kulawa da marasa kulawa. Makin kulawa na token $i$ ana ƙidashir ta kamar $A_i = \text{softmax}\left(\frac{Q_{cls}K_i^T}{\sqrt{d}}\right)$, inda $Q_{cls}$ shine tambayar token na aji kuma $K_i$ shine makullin token $i$.
2.2 Haɗa Tokens
Muna adana mafi yawan tokens na gida masu rarrabewa daga ƙungiyar masu kulawa yayin haɗa kama tokens marasa kulawa ta amfani da algorithms na clustering. Tsarin haɗawa yana rage asarar bayanai yayin haɓaka bambancin token.
2.3 Tsarin Lissafi
Dukkan manufar aikin ya haɗa da kiyayewa muhimmanci da haɓaka bambancin: $L = \alpha L_{imp} + \beta L_{div}$, inda $L_{imp}$ ya tabbatar an adana muhimman tokens kuma $L_{div}$ yana inganta bambancin ta hanyar daidaita clustering.
3. Gwaje-gwaje da Sakamako
3.1 Tsarin Gwaji
Muna kimanta hanyarmu akan ImageNet-1K ta amfani da tsarin DeiT-S da DeiT-T. Hanyoyin kwatancin sun haɗa da DyViT da EViT don datse tushen muhimmanci da kuma clustering na sauki don hanyoyin tushen bambancin.
3.2 Kwatancin Aiki
Hanyarmu ta sami mafi kyawun aiki a cikin nau'ikan adana kima daban-daban. A kan DeiT-S, mun rage FLOPs da 35% tare da ragin daidaito kawai 0.2%, wanda ya fi na hanyoyin tushen muhimmanci kawai waɗanda ke fama da raguwar daidaito sosai a ƙananan adana kima.
3.3 Nazarin Cirewa
Gwaje-gwaje sun tabbatar da cewa duka abubuwan muhimmanci da bambancin sun zama dole. Cire kowane ɓangare yana haifar da raguwar aiki, tare da bambancin yana da mahimmanci musamman a ƙananan adana kima.
4. Tsarin Bincike
Mahimmin Fahimta
Babban nasara a nan shine gane cewa bambancin token ba wai kawai abu ne mai kyau ba—ba za a iya sasantawa ba don kiyaye bayyana samfurin yayin datsewa. Yayin da kowa ke bin makin kulawa, wannan binciken ya bayyana babban aibi a cikin hanyoyin tushen muhimmanci kawai: suna haifar da ɗakunan amsawa na kama tokens masu kulawa.
Tsarin Ma'ana
Hanyar aiki tana bin tsari mai kyau na matakai uku: raba dangane da kulawa, adana muhimman fasalulluka na gida, sannan a haɗa da dabaru don kiyaye mahallin duniya. Wannan ba ci gaba ne kawai ba—tunani ne na gine-gine wanda ke magance babban tashin hankali tsakanin inganci da ƙarfin wakilci.
Ƙarfi da Aibobi
Ƙarfi: Manufar ingantawa biyu tana da ma'ana ta lissafi, sakamakon gwaji yana da gamsarwa a cikin gine-gine, kuma hanyar tana haɗa fahimtar ka'ida da aiwatarwa mai kyau. Gaskiyar cewa DeiT-T ya inganta daidaito yayin rage lissafi abu ne mai ban mamaki.
Aibobi: Kashe kuɗin clustering ba maras muhimmanci bane, kuma hanyar tana ɗaukar makin muhimmanci na tsaye wanda ƙila ba zai yi tasiri ba a cikin yanayin shaidar canzawa. Idan aka kwatanta da hanyoyin zaɓin token masu ƙarfi kamar DynamicViT, akwai yuwuwar cinikin jinkiri da ya kamata a magance.
Fahimta Mai Aiki
Ga masu aiki: Ai wannan hanyar nan da nan don kowane aikin ViT inda kasafin kuɗin lissafi yake da muhimmanci. Ga masu bincike: Ƙa'idar kiyaye bambancin ya kamata ya zama ma'auni a cikin duk binciken ingantaccen transformer—wannan na iya zama guntun da ya ɓace don sanya ViTs su zama masu girma da gaske.
5. Ayyuka na Gaba
Wannan hanyar tana da muhimman tasiri ga ayyukan gani na ainihin lokaci, lissafi na gefe, da manyan tsarin gani. Ƙa'idodin za su iya faɗaɗa bayan rarrabuwa zuwa gano abu, rarrabuwa, da ayyukan fahimtar bidiyo inda ingancin lissafi ke da muhimmanci.
6. Bayanan Kara
- Vaswani et al. "Attention Is All You Need" (2017)
- Dosovitskiy et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (2020)
- Liu et al. "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (2021)
- Wang et al. "Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions" (2021)