Zaɓi Harshe

DafnyBench: Ma'auni don Tabbatar da Software na Yau da Kullun

DafnyBench shine mafi girman ma'auni don horarwa da kimanta tsarin kwaikwayon na'ura don tabbatar da software na yau da kullun, yana ɗauke da shirye-shirye sama da 750 tare da layukan lamba sama da 53,000.
computationaltoken.com | PDF Size: 0.5 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - DafnyBench: Ma'auni don Tabbatar da Software na Yau da Kullun

Tsarin Abubuwan Ciki

750+

Shirye-shirye a cikin Ma'auni

53,000+

Layukan Lamba

68%

Mafi kyawun Kimar Nasarar

10x

Rage Farashin Tabbatarwa

1 Gabatarwa

Manyan Samfuran Harshe (LLMs) suna hanzarta ci gaban software ta hanyar haɗin gwiwar matukin jirgi da kayan aikin haɗa shirye-shirye, amma tabbatar da amincin lamba yana ci gaba da zama kalubale. Tabbatarwa na yau da kullun yana ba da hujjar lissafi cewa software ya cika ƙayyadaddun bayanai, duk da haka amfani da shi yana iyakance ta hanyar tsada mai yawa da ƙwararrun koyo. DafnyBench yana magance wannan gibi a matsayin mafi girman ma'auni don horarwa da kimanta tsarin ML a cikin tabbatarwa na yau da kullun.

2 Ayyukan Da suka Danganta

Ma'auni na yau da kullun kamar Clover (shirye-shirye 66) da dafny-synthesis (shirye-shirye 153) ba su isa don horon ML na zamani ba. Ma'auni na tabbatar da ka'idar lissafi sun ƙunshi ka'idodi sama da 100,000 tare da nasarorin AI sama da 82%, suna nuna buƙatar irin wannan sikelin a cikin tabbatar da software.

3 Gina Ma'auni

3.1 Tsarin Bayanai

DafnyBench ya ƙunshi shirye-shirye sama da 750 tare da kusan layukan lambar Dafny 53,000, wanda ya wuce ma'auni na baya duka biyun girma da rikitarwa.

3.2 Bukatun Nuni

Yawancin shirye-shirye suna buƙatar ƙarin nuni don mai tabbatar da ka'idar kai tsaye. Waɗannan nuni suna jagorantar tsarin tabbatarwa kuma suna wakiltar ƙarin ilimin da ake buƙata bayan ainihin aiwatarwa.

4 Kimanta Aikin LLM

4.1 Tsarin Gwaji

Ana gwada ikon GPT-4 da Claude 3 na samar da nuni ta atomatik don injin tabbatar da Dafny. Kimantawa yana auna yawan nasara a cikin rikitattun shirye-shirye daban-daban da buƙatun nuni.

4.2 Bincike na Sakamako

Mafi kyawun samfuri da tsarin ƙarfafawa sun sami kimar nasara ta 68%. Aiki yana inganta tare da martanin saƙon kuskure amma yana lalacewa tare da ƙaruwar rikitarwar lamba da buƙatun nuni. Yiwuwar nasarar tabbatarwa yana biye da: $P_{success} = \frac{1}{1 + e^{-(\alpha - \beta \cdot C)}}$ inda $C$ ke wakiltar rikitarwar lamba kuma $\alpha$, $\beta$ siffofi ne na musamman na samfuri.

Yawan Nasarar Tabbatarwa vs. Rikitarwar Lamba

Zanen yana nuna alaƙar sabanin tsaka-tsaki tsakanin rikitarwar lamba da yawan nasarar tabbatarwa. Shirye-shiryen da ke buƙatar nuni sama da layuka 50 suna nuna ƙimar nasara ƙasa da 50%, yayin da sauƙaƙan shirye-shirye ke samun nasarar tabbatarwa har zuwa 85%.

5 Ƙarshe da Ayyukan Gaba

DafnyBench yana ba da damar haɓaka cikin sauri a cikin sarrafa tabbatarwa na yau da kullun. Aikin gaba ya haɗa da faɗaɗa bambancin ma'auni, inganta samarwar nuni na LLM, da haɗa tabbatarwa kai tsaye cikin hanyoyin tarawa.

6 Bincike na Fasaha

Ra'ayi na Manazin Masana'antu

Kai Tsaye Ga Matsala (Cutting to the Chase)

DafnyBand ba wani atisaye ne na ilimi kawai ba—yunkuri ne na dabarun kafa hanyar haɗi tsakanin lambar da AI ta samar da software mai shirye don samarwa. Ƙimar nasara ta 68% ta bayyana duka alƙawari da kuma gaskiyar mai raɗaɗi: yayin da LLMs zasu iya taimakawa wajen tabbatarwa, mun yi nisa da cikakken sarrafa amincin kai tsaye.

Sarkar Ma'ana (Logical Chain)

Binciken yana bin ci gaba mai ban sha'awa: gano matsalar toshewar tabbatarwa na yau da kullun → gane ƙarancin bayanan horarwa na ML → gina babban ma'auni → gwada iyawar LLM na yanzu → kafa tushe don ingantawa na gaba. Wannan yayi daidai da yanayin hangen nesa na kwamfuta bayan gabatarwar ImageNet, inda ma'auni na daidaitaccen tsari ya hanzarta ci gaba ta hanyar oda-oda na girma.

Abubuwan Haske da Matsaloli (Highlights and Pain Points)

Abubuwan Haske: Girman bai taɓa yin irinsa ba—layukan lamba 53,000 da aka tabbatar sun fi girman ƙoƙarin baya. Mayar da hankali kan Dafny yana da dabaru, yana amfani da tsarinsa mai kama da Python don ɗaukar kowa. Tsarin martanin saƙon kuskure yana nuna fahimtar injiniyan aiki.

Matsaloli: Ƙimar nasara ta 68%, ko da yake abin burgewa, tana nufin kashi 32% na gazawar—wanda ba a yarda da shi ba don tsarin mahimmancin muhimmanci. Rarraba rikitarwar ma'auni ba a bayyane ba, yana sa ya yi wahala a tantance inda ake buƙatar ingantaccen mafi yawan buƙata. Kamar yawancin ma'auni na ilimi, yana iya fuskantar haɗarin wuce gona da iri yayin da samfuran suka inganta don wannan takamaiman bayanan.

Abubuwan Kafa Aiki (Actionable Insights)

Ga ƙungiyoyin injiniya: Fara haɗa kayan aikin tabbatarwa na yau da kullun yanzu, ko da a wani ɓangare. Rage farashin daga 10x zuwa kusan sifiri yana zuwa da sauri fiye da yadda yawancin ƙungiyoyi suka sani. Ga masu bincike: Mayar da hankali kan lamuran gazawar—fahimtar dalilin da ya sa kashi 32% na shirye-shirye suka ƙi tabbatarwa zai bayyana iyakoki na asali a cikin hanyoyin na yanzu. Ga masu saka hannun jari: Kayan aikin tabbatarwa na yau da kullun yana wakiltar babbar dama yayin da amincin software ya zama abin da ba za a iya sasantawa ba a cikin tsarin kai tsaye, kiwon lafiya, da kuɗi.

Wannan aikin yana kan haɗuwa da yawancin sauye-sauye masu canzawa: masana'antar AI, rikicin amincin software a cikin tsarin mahimmanci, da kuma balagaggen hanyoyin na yau da kullun. Kama da yadda ImageNet ya kawo juyin juya hali ga hangen nesa na kwamfuta, DafnyBench yana da yuwuwar haifar da irin wannan ci gaban a cikin tabbatar da software. Maganar ma'auni na tabbatar da ka'idar lissafi suna samun nasarorin kashi 82% yana nuna muna kusan shekaru 4-5 daga irin wannan aikin a cikin tabbatar da software, dangane da tarihin ci gaban ma'auni kamar waɗanda aka kwatanta a cikin takardar CycleGAN da sauran ingantaccen sauri.

Hanyar fasaha ta amfani da nuni azaman maƙasudai na tsaka-tsaki na tabbatarwa yana da fahimta musamman. Yana haifar da matsalar koyo mai sauƙi ga LLMs yayin kiyaye ƙaƙƙarfan cikakken tabbatarwa. Wannan tsari mai yadudduka yayi daidai da nasarar dabarun a wasu yankuna na AI, kamar amfani da hanyoyin kulawa a cikin tsarin canzawa wanda ya haifar da ci gaban bincike a cikin sarrafa harshe na halitta.

Duk da haka, binciken ya bar tambayoyin da ba a amsa ba game da haɓakawa bayan yanayin Dafny da kuma farashin lissafi na tabbatarwa a sikeli. Yayin da ƙungiyoyi kamar NASA da kamfanonin motoci ke ƙara ba da umarnin tabbatarwa na yau da kullun don tsarin aminci mai mahimmanci, tasirin tattalin arziƙin rage farashin tabbatarwa daga 10x zuwa kusan sifiri za a iya auna shi cikin biliyoyin daloli kuma, mafi mahimmanci, hana bala'o'i.

7 Aiwar Lamba

Misalin Tabbatar da Dafny

method ComputeSum(n: int) returns (sum: int)
  requires n >= 0
  ensures sum == n * (n + 1) / 2
{
  sum := 0;
  var i := 0;
  while i <= n
    invariant sum == i * (i - 1) / 2
    invariant i <= n + 1
  {
    sum := sum + i;
    i := i + 1;
  }
}

Wannan hanyar Dafny tana lissafin jimlar lambobin farko na n tare da tabbatarwa na yau da kullun. Magana ta requires tana ƙayyadaddun sharuɗɗan gaba, ensures tana ƙayyadaddun sharuɗɗan baya, kuma invariant yana kula da daidaiton madauki.

8 Aikace-aikacen Gaba

Haɗa tabbatarwa na yau da kullun cikin masu tarawa azaman mataki na ƙarshe na yau da kullun. Tabbatar da tsarin kai tsaye don motoci da sararin samaniya. Tabbatar da kwangilar wayo don aikace-aikacen blockchain. Tabbacin software na na'urar likita. Kare abubuwan more rayuwa masu mahimmanci.

9 Nassoshi

  1. Leino, K. R. M. (2010). Dafny: Mai tabbatar da shiri ta atomatik don daidaiton aiki. LPAR-16.
  2. Brown, T. B., et al. (2020). Samfuran harshe masu koyo kaɗan. NeurIPS.
  3. Irving, G., et al. (2016). DeepMath-Samfuran jeri mai zurfi don zaɓin farko. NeurIPS.
  4. Avizienis, A., et al. (2004). Asali ra'ayoyi da taxonomy na dogaro da amintaccen kwamfuta. IEEE Transactions.
  5. Zhu, J. Y., et al. (2017). Fassarar hoto zuwa hoto mara bi da bi ta amfani da hanyoyin sadarwar adawa da juna. ICCV.
  6. Amazon Web Services (2023). Tabbatarwa na Yau da Kullun a cikin Tsarin Samarwa.
  7. Microsoft Research (2022). Aiwatar da Hanyoyin Na Yau da Kullun a Sikeli.