




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
ABeginner’sGuidetoLargeLanguageModels
Part1
Contributors:
AnnamalaiChockalingamAnkurPatel
ShashankVermaTiffanyYeung
TableofContents
Preface 3
Glossary 5
IntroductiontoLLMs 8
WhatAreLargeLanguageModels(LLMs)? 8
FoundationLanguageModelsvs.Fine-TunedLanguageModels 11
EvolutionofLargeLanguageModels 11
NeuralNetworks 12
Transformers 14
HowEnterprisesCanBenefitFromUsingLargeLanguageModels 20
ChallengesofLargeLanguageModels 21
WaystoBuildLLMs 21
HowtoEvaluateLLMs 22
NotableCompaniesintheLLMField 23
PopularStartup-developedLLMApps 23
ABeginner’sGuidetoLargeLanguageModels2
ABeginner’sGuidetoLargeLanguageModeIs3
Preface
Languagehasbeenintegraltohumansocietyforthousandsofyears.Along-prevailingtheory,laryngealdescenttheoryorLDT,suggeststhatspeechand,thus,language,
mayhaveevoIvedabout200,000or300,000yearsago,whiIenewerresearchshowsitcouId’vehappenedevensooner.
Regardlessofwhenitfirstappeared,languageremainsthecornerstoneofhumancommunication.IthastakenonanevengreaterroIeintoday’sdigitaIage,whereanunprecedentedportionofthe
populationcancommunicateviabothtextandspeechacrosstheglobe.
Thisisunderscoredbythefactthat347.3billionemailmessagesaresentandreceivedworldwide
everyday,andthatfivebillionpeople一orover63%oftheentireworldpopulation一sendandreceivetextmessages.
Languagehasthereforebecomeavasttroveofinformationthatcanhelpenterprisesextractvaluableinsights,identifytrends,andmakeinformeddecisions.Asanexample,enterprisescananalyzetextsIikecustomerreviewstoidentifytheirproducts’best-sellingfeaturesandfine-tunetheirfuture
productdevelopment.
Similarly,languageproduction一asopposedtolanguageanalysis一isalsobecominganincreasinglyimportanttoolforenterprises.Creatingblogposts,forexample,canhelpenterprisesraisebrand
awarenesstoapreviouslyunheard-ofextent,whilecomposingemailscanhelpthemattractnewstakeholdersorpartnersatanunmatchedspeed.
However,bothlanguageanalysisandproductionaretime-consumingprocessesthatcandistract
employeesanddecision-makersfrommoreimportanttasks.Forinstance,leadersoftenneedtosiftthroughvastamountsoftextinordertomakeinformeddecisionsinsteadofmakingthembasedonextractedkeyinformation.
Enterprisescanminimizetheseandotherproblems,suchastheriskofhumanerror,byemploying
largelanguagemodels(LLMs)forlanguage-relatedtasks.LLMscanhelpenterprisesaccelerateandlargelyautomatetheireffortsrelatedtobothlanguageproductionandanalysis,savingvaluabletimeandresourceswhileimprovingaccuracyandefficiency.
Unlikeprevioussolutions,suchasrule-basedsystems,LLMsareincrediblyversatileandcanbeeasilyadaptedtoawiderangeoflanguage-relatedtasks,likegeneratingcontentorsummarizinglegal
documentation.
ABeginner’sGuidetoLargeLanguageModels4
ThegoalofthisbookistohelpenterprisesunderstandwhatmakesLLMssogroundbreaking
comparedtoprevioussolutionsandhowtheycanbenefitfromadoptingordevelopingthem.ItalsoaimstohelpenterprisesgetaheadstartbyoutliningthemostcrucialstepstoLLMdevelopment,
training,anddeployment.
Toachievethesegoals,thebookisdividedintothreeparts:
>Part1definesLLMsandoutlinesthetechnologicalandmethodologicaladvancementsoverthe
yearsthatmadethempossible.Italsotacklesmorepracticaltopics,suchashowenterprisescandeveloptheirownLLMsandthemostnotablecompaniesintheLLMfield.Thisshouldhelp
enterprisesunderstandhowadoptingLLMscanunlockcutting-edgepossibilitiesandrevolutionizetheiroperations.
>Part2discussesfivemajorusecasesofLLMswithinenterprises,includingcontentgeneration,summarization,andchatbotsupport.Eachusecaseisexemplifiedwithreal-lifeappsandcasestudies,soastoshowhowLLMscansolverealproblemsandhelpenterprisesachievespecificobjectives.
>Part3isapracticalguideforenterprisesthatwanttobuild,train,anddeploytheirownLLMs.Itprovidesanoverviewofnecessarypre-requirementsandpossibletrade-offswithdifferent
developmentanddeploymentmethods.MLengineersanddatascientistscanusethisasareferencethroughouttheirLLMdevelopmentprocesses.
Hopefully,thiswillinspireenterprisesthathavenotyetadoptedordevelopedtheirownLLMstodososooninordertogainacompetitiveadvantageandoffernewSOTAservicesorproducts.Themostbenefitswillbe,asusual,reservedforearlyadoptersortrulyvisionaryinnovators.
ABeginner’sGuidetoLargeLanguageModels5
Glossary
TermsDescription
Deeplearningsystems
Systemsthatrelyonneuralnetworkswithmanyhiddenlayerstolearncomplexpatterns.
GenerativeAI
AIprogramsthatcangeneratenewcontent,liketext,images,andaudio,ratherthanjustanalyzeit.
Largelanguagemodels(LLMs)
Languagemodelsthatrecognize,summarize,translate,predict,andgeneratetextandothercontent.They’recalledlarge
becausetheyaretrainedonlargeamountsofdataandhavemanyparameters,withpopularLLMsreachinghundredsofbillionsofparameters.
Naturallanguageprocessing(NLP)
Theabilityofacomputerprogramtounderstandandgeneratetextinnaturallanguage.
Longshort-termmemoryneuralnetwork(LSTM)
AspecialtypeofRNNswithmorecomplexcellblocksthatallowittoretainmorepastinputs.
Naturallanguagegeneration(NLG)
ApartofNLPthatreferstotheabilityofacomputerprogramtogeneratehuman-liketext.
Naturallanguageunderstanding(NLU)
ApartofNLPthatreferstotheabilityofacomputerprogramtounderstandhuman-liketext.
Neuralnetwork(NN)
Amachinelearningalgorithminwhichtheparametersare
organizedintoconsecutivelayers.ThelearningprocessofNNsisinspiredbythehumanbrain.Muchlikehumans,NNs“l(fā)earn”
importantfeaturesviarepresentationlearningandrequirelesshumaninvolvementthanmostotherapproachestomachine
learning.
PerceptionAI
AIprogramsthatcanprocessandanalyzebutnotgeneratedata,mainlydevelopedbefore2020.
Recurrentneuralnetwork(RNN)
Neuralnetworkthatprocessesdatasequentiallyandcanmemorizepastinputs.
ABeginner’sGuidetoLargeLanguageModels6
TermsDescription
Rule-basedsystem
Asystemthatreliesonhuman-craftedrulestoprocessdata.
Traditionalmachinelearning
Traditionalmachinelearningusesastatisticalapproach,drawingprobabilitydistributionsofwordsorothertokensbasedona
largeannotatedcorpus.Itrelieslessonrulesandmoreondata.
Transformer
Atypeofneuralnetworkarchitecturedesignedtoprocesssequentialdatanon-sequentially.
Structureddata
Datathatisquantitativeinnature,suchasphonenumbers,andcanbeeasilystandardizedandadjustedtoapre-definedformatthatMLalgorithmscanquicklyprocess.
Unstructureddata
Datathatisqualitativeinnature,suchascustomerreviews,anddifficulttostandardize.Suchdataisstoredinitsnativeformats,likePDFfiles,beforeuse.
Fine-tuning
Atransferlearningmethodusedtoimprovemodelperformanceonselecteddownstreamtasksordatasets.It’susedwhenthe
targettaskissimilartothepre-trainingtaskandinvolvescopyingtheweightsofaPLMandtuningthemondesiredtasksordata.
Customization
AmethodofimprovingmodelperformancebymodifyingonlyoneorafewselectedparametersofaPLMinsteadofupdatingtheentiremodel.Itinvolvesusingparameter-efficient
techniques(PEFT).
Parameter-efficienttechniques(PEFT)
Techniqueslikepromptlearning,LoRa,andadaptertuning
whichallowresearcherstocustomizePLMsfordownstreamtasksordatasetswhilpreservingandleveragingexisting
knowledgeofPLMs.Thesetechniquesareusedduringmodelcustomizationandallowforquickertrainingandoftenmoreaccuratepredictions.
Promptlearning
AnumbrellatermfortwoPEFTtechniques,prompttuningand
p-tuning,whichhelpcustomizemodelsbyinsertingvirtualtokenembeddingsamongdiscreteorrealtokenembeddings.
Adaptertuning
APEFTtechniquethatinvolvesaddinglightweightfeed-forwardlayers,calledadapters,betweenexistingPLMlayersand
updatingonlytheirweightsduringcustomizationwhilekeepingtheoriginalPLMweightsfrozen.
Open-domainquestionanswering
Answeringquestionsfromavarietyofdifferentdomains,likelegal,medical,andfinancial,insteadofjustonedomain.
Extractivequestionanswering
Answeringquestionsbyextractingtheanswersfromexistingtextsordatabases.
ABeginner’sGuidetoLargeLanguageModels7
TermsDescription
ThroughputAmeasureofmodelefficiencyandspeed.Itreferstothe
amountofdataorthenumberofpredictionsthatamodelcanprocessorgeneratewithinapre-definedtimeframe.
LatencyTheamountoftimeamodelneedstoprocessinputand
generateoutput.
DataReadinessThesuitabilityofdataforuseintraining,basedonfactorssuch
asdataquantity,structure,andquality.
ABeginner’sGuidetoLargeLanguageModels8
IntroductiontoLLMs
Alargelanguagemodelisatypeofartificialintelligence(AI)system
thatiscapableofgeneratinghuman-liketextbasedonthepatterns
andrelationshipsitlearnsfromvastamountsofdata.Largelanguagemodelsuseamachinelearningtechniquecalleddeeplearningtoanalyzeandprocesslargesetsofdata,suchasbooks,articles,andwebpages.
LargelanguagemodelsunlockednumerousunprecedentedpossibilitiesinthefieldofNLPandAI.ThiswasmostnotablydemonstratedbythereleaseofOpenAI’sGPT-3in2020,thethen-largestlanguagemodeleverdeveloped.
Thesemodelsaredesignedtounderstandthecontextandmeaningoftextandcangeneratetextthatisgrammaticallycorrectandsemanticallyrelevant.Theycanbetrainedonawiderangeoftasks,
includinglanguagetranslation,summarization,questionanswering,andtextcompletion.
GPT-3madeitevidentthatlarge-scalemodelscanaccuratelyperformawide–andpreviously
unheard-of–rangeofNLPtasks,fromtextsummarizationtotextgeneration.ItalsoshowedthatLLMscouldgenerateoutputsthatarenearlyindistinguishablefromhuman-createdtext,allwhilelearningontheirownwithminimalhumanintervention.
Thispresentedanenormousimprovementfromearlier,mainlyrule-basedmodelsthatcouldneitherlearnontheirownnorsuccessfullysolvetaskstheyweren’ttrainedon.Itisnosurprise,then,that
manyotherenterprisesandstartupssoonstarteddevelopingtheirownLLMsoradoptingexistingLLMsinordertoacceleratetheiroperations,reduceexpenses,andstreamlineworkflows.
Part1isintendedtoprovideasolidintroductionandfoundationforanyenterprisethatisconsideringbuildingoradoptingitsownLLM.
WhatAreLargeLanguageModels(LLMs)?
Largelanguagemodels(LLMs)aredeeplearningalgorithmsthatcanrecognize,extract,summarize,predict,andgeneratetextbasedonknowledgegainedduringtrainingonverylargedatasets.
They’realsoasubsetofamoregeneraltechnologycalledlanguagemodels.Alllanguagemodelshaveonethingincommon:theycanprocessandgeneratetextthatsoundslikenaturallanguage.Thisis
knownasperformingtasksrelatedtonaturallanguageprocessing(NLP).
ABeginner’sGuidetoLargeLanguageModels9
AlthoughalllanguagemodelscanperformNLPtasks,theydifferinothercharacteristics,suchastheirsize.Unlikeothermodels,LLMsareconsideredlargeinsizebecauseoftworeasons:
1.They’retrainedusinglargeamountsofdata.
2.Theycompriseahugenumberoflearnableparameters(i.e.,representationsoftheunderlyingstructureoftrainingdatathathelpmodelsperformtasksonnewornever-before-seendata).
Table1
showcasestwolargelanguagemodels,MT-NLGandGPT-3Davinci,tohelpclarifywhat’s
consideredlargebycontemporarystandards.
Table1.ComparisonofMT-NLGandGPT-3
LargeLanguageModel
Numberof
parameters
Numberoftokensinthetrainingdata
NVIDIAModel:Megatron-TuringNaturalLanguageGenerationModel(MT-NLG)
530billion
270billion
OpenAIModel:GPT-3DavinciModel
175billion
499billion
Sincethequalityofamodelheavilydependsonthemodelsizeandthesizeoftrainingdata,largerlanguagemodelstypicallygeneratemoreaccurateandsophisticatedresponsesthantheirsmallercounterparts.
ABeginner’sGuidetoLargeLanguageModels10
Figure1.AnswerGeneratedbyGPT-3.
However,theperformanceoflargelanguagemodelsdoesn’tjustdependonthemodelsizeordataquantity.Qualityofthedatamatters,too.
Forexample,LLMstrainedonpeer-reviewedresearchpapersorpublishednovelswillusuallyperformbetterthanLLMstrainedonsocialmediaposts,blogcomments,orotherunreviewedcontent.Low-
qualitydatalikeuser-generatedcontentmayleadtoallsortsofproblems,suchasmodelspickingupslang,learningincorrectspellingsofwords,andsoon.
Inaddition,modelsneedverydiversedatainordertoperformvariousNLPtasks.However,ifthe
modelisintendedtobeespeciallygoodatsolvingaparticularsetoftasks,thenfine-tuneitusinga
morerelevantandnarrowerdataset.Bydoingsoafoundationlanguagemodelistransformed—fromonethat’sgoodatperformingvariousNLPtasksacrossabroadsetofdomains–intoafine-tuned
modelthatspecializesinperformingtasksinanarrowlyscopeddomain.
ABeginner’sGuidetoLargeLanguageModels11
FoundationLanguageModelsvs.Fine-TunedLanguageModels
Foundationlanguagemodels,suchastheaforementionedMT-NLGandGPT-3,arewhatisusuallyreferredtowhendiscussingLLMs.They’retrainedonvastamountsofdataandcanperformawidevarietyofNLPtasks,fromansweringquestionsandgeneratingbooksummariestocompletingandtranslatingsentences.
Thankstotheirsize,foundationmodelscanperformwellevenwhentheyhavelittledomain-specificdataattheirdisposal.Theyhavegoodgeneralperformanceacrosstasksbutmaynotexcelat
performinganyonespecifictask.
Fine-tunedlanguagemodels,ontheotherhand,arelargelanguagemodelsderivedfromfoundationLLMs.They’recustomizedforspecificusecasesordomainsand,thus,becomebetteratperformingmorespecializedtasks.
Apartfromthefactthatfine-tunedmodelscanperformspecifictasksbetterthanfoundationmodels,theirbiggeststrengthisthattheyarelighterand,generally,easiertotrain.Buthowdoesoneactuallyfine-tuneafoundationmodelforspecificobjectives?
Currently,themostpopularmethodiscustomizingamodelusingparameter-efficientcustomizationtechniques,suchasp-tuning,prompttuning,adapters,andsoon.Customizationisfarlesstime-
consumingandexpensivethanfine-tuningtheentiremodel,althoughitmayleadtosomewhat
poorerperformancethanothermethods.Customizationmethodsarefurtherdiscussedin
Part3.
EvolutionofLargeLanguageModels
AIsystemswerehistoricallyaboutprocessingandanalyzingdata,notgeneratingit.Theyweremoreorientedtowardperceivingandunderstandingtheworldaroundusratherthanongeneratingnewinformation.ThisdistinctionmarksthemaindifferencebetweenPerceptiveandGenerativeAI,withthelatterbecomingincreasinglyprevalentsincearound2020,oraftercompaniesstartedadoptingtransformermodelsanddevelopingincreasinglymorerobustLLMsatalargescale.
TheadventoflargelanguagemodelsfurtherfueledarevolutionaryparadigmshiftinthewayNLP
modelsaredesigned,trained,andused.Totrulyunderstandthis,itmaybehelpfultocomparelargelanguagemodelstopreviousNLPmodelsandhowtheyworked.Forthispurpose,let’sbrieflyexplorethreeregimesinthehistoryofNLP:pre-transformersNLP,transformersNLP,andLLMNLP.
1.Pre-transformersNLPwasmainlymarkedbymodelsthatreliedonhuman-craftedrulesrather
thanmachinelearningalgorithmstoperformNLPtasks.Thismadethemsuitableforsimplertasksthatdidn’trequiretoomanyrules,liketextclassification,butunsuitableformorecomplextasks,suchasmachinetranslation.Rule-basedmodelsalsoperformedpoorlyinedge-casescenarios
becausetheycouldn’tmakeaccuratepredictionsorclassificationsfornever-before-seendataforwhichnoclearruleswereset.Thisproblemwassomewhatsolvedwithsimpleneuralnetworks,
suchasRNNsandLSTMs,developedduringthelaterphasesofthisperiod.RNNsandLSTMscouldmemorizepastdatatoacertainextentand,thus,providecontext-dependentpredictionsand
ABeginner’sGuidetoLargeLanguageModels12
classifications.However,RNNsandLSTMscouldnotmakepredictionsoverlongspansoftext,limitingtheireffectiveness.
2.TransformersNLPwassetinmotionbytheriseofthetransformerarchitecturein2017.
Transformerscouldgeneralizebetterthanthethen-prevailingRNNsandLSTMs,capturemore
context,andprocessmoredataatonce.TheseimprovementsenabledNLPmodelstounderstandlongersequencesofdataandperformamuchwiderrangeoftasks.However,fromtoday’spointofview,modelsdevelopedduringthisperiodhadlimitedcapabilities,mainlyduetothegenerallackoflarge-scaledatasetsandadequatecomputationalresources.Theyalsomainlysparked
attentionamongresearchersandexpertsinthefieldbutnotthegeneralpublic,astheyweren’tuser-friendlynoraccurateenoughtobecomecommercialized.
3.LLMNLPwasmainlyinitiatedbythelaunchofOpenAI’sGPT-3in2020.Largelanguagemodels
likeGPT-3weretrainedonmassiveamountsofdata,whichallowedthemtoproducemore
accurateandcomprehensiveNLPresponsescomparedtopreviousmodels.Thisunlockedmanynewpossibilitiesandbroughtusclosertoachievingwhatmanyconsider“true”AI.Also,LLMs
madeNLPmodelsmuchmoreaccessibletonon-technicaluserswhocouldnowsolveavarietyofNLPtasksjustbyusingnatural-languageprompts.NLPtechnologywasfinallydemocratized.
Theswitchfromonemethodologytoanotherwaslargelydrivenbyrelevanttechnologicaland
methodologicaladvancements,suchastheadventofneuralnetworks,attentionmechanisms,andtransformersanddevelopmentsinthefieldofunsupervisedandself-supervisedlearning.The
followingsectionswillbrieflyexplaintheseconcepts,asunderstandingthemiscrucialfortrulyunderstandinghowLLMsworkandhowtobuildnewLLMsfromscratch.
NeuralNetworks
Neuralnetworks(NNs)aremachinelearningalgorithmslooselymodeledafterthehumanbrain.Likethebiologicalhumanbrain,artificialneuralnetworksconsistofneurons,alsocallednodes,thatareresponsibleforallmodelfunctions,fromprocessinginputtogeneratingoutput.
Theneuronsarefurtherorganizedintolayers,verticallystackedcomponentsofNNsthatperformspecifictasksrelatedtoinputandoutputsequences.
ABeginner’sGuidetoLargeLanguageModels13
Everyneuralnetworkhasatleastthreelayers:
>Theinputlayeracceptsdataandpassesittotherestofthenetwork.
>Thehiddenlayer,ormultiplehiddenlayers,performsspecificfunctionsthatmakethefinal
outputofanNNpossible.Thesefunctionscanincludeidentifyingorclassifyingdata,generatingnewdata,andotherfunctionsdependingonthespecificNLPtaskinquestion.
>Theoutputlayergeneratesapredictionorclassificationbasedontheinput.
WhenLLMswerefirstdeveloped,theywerebasedonsimplerNNarchitectureswithfewerlayers,mainlyrecurrentneuralnetworks(RNNs)andlongshort-termmemorynetworks(LSTMs).Unlike
otherneuralnetworks,RNNsandLSTMscouldtakeintoaccountthecontext,position,and
relationshipsbetweenwordseveniftheywerefarapartinadatasequence.Simplyput,thismeanttheycouldmemorizeandconsiderpastdatawhengeneratingoutput,whichresultedinmore
accuratesolutionstomanyNLPtasks,especiallysentimentanalysisandtextclassification.
ThebiggestadvantagethatneuralnetworkslikeRNNsandLSTMshadovertraditional,rule-basedsystemswasthattheywerecapableoflearningontheirownwithlittletonohumaninvolvement.Theyanalyzedatatocreatetheirownrules,ratherthanlearntherulesfirstandapplythemtodatalater.Thisisalsoknownasrepresentationlearningandisinspiredbyhumanlearningprocesses.
Representations,orfeatures,arehiddenpatternsthatneuralnetworkscanextractfromdata.To
exemplifythis,let’simaginewe’retraininganNN-basedmodelonadatasetcontainingthefollowingtokens:
“cat,”“cats,”dog,”“dogs”
Afteranalyzingthesetokens,themodelmayidentifyarepresentationthatonecouldformulateas:
Pluralnounshavethesuffix“-s.”
Themodelwillthenextractthisrepresentationandapplyittoneworedge-casescenarioswhosedatadistributionfollowsthatoftrainingdata.Forexample,theassumptioncanbemadethatthemodel
willcorrectlyclassifytokenslike“chairs”or“table”aspluralorsingularevenifithadnotencounteredthembefore.Onceitencountersirregularnounsthatdon’tfollowtheextractedrepresentation,the
modelwillupdateitsparameterstoreflectnewrepresentations,suchas:
Pluralnounsarefollowedbypluralverbs.
ThisapproachenablesNN-basedmodelstogeneralizebetterthanrule-basedsystemsandsuccessfullyperformawiderrangeoftasks.
However,theirabilitytoextractrepresentationsisverymuchdependentonthenumberofneuronsandlayerscomprisinganetwork.Themoreneuronsneuralnetworkshave,themorecomplex
representationstheycanextract.That’swhy,today,mostlargelanguagemodelsusedeeplearningneuralnetworkswithmultiplehiddenlayersand,thus,ahighernumberofneurons.
ABeginner’sGuidetoLargeLanguageModels14
Figure2
showsaside-by-sidecomparisonofasingle-layerneuralnetworkandadeeplearningneural
network.
Figure2.ComparisonofSingle-Layervs.DeepLearningNeuralNetwork
Whilethismayseemlikeanobviouschoicetoday,considerthatdevelopingdeepneuralnetworksdidnotmakesensebeforethehardwareevolvedtobeabletohandlemassiveworkloads.Thisonly
becamepossibleafter~1999whenNVIDIAintroduced“theworld’sfirstGPU,”orgraphicsprocessingunit,tothewidermarketor,moreprecisely,afterawildlysuccessfulCNNcalledAlexNetpopularizedtheiruseindeeplearningin2012.
GPUshadahighlyparallelizablearchitecturewhichenabledtherapidadvancesindeeplearning
systemsthatareseentoday.Amongotheradvancements,theadventofGPUsusheredinthe
developmentofanewtypeofneuralnetworkthatwouldrevolutionizethefieldofNLP:transformers.
Transformers
WhileRNNsandLSTMshavetheiradvantages,especiallycomparedtotraditionalmodels,theyalso
havesomelimitationsthatmakethemunsuitableformorecomplexNLPtasks,suchasmachine
translation.Theirmainlimitationistheinabilitytoprocesslongerdatasequencesand,thus,considertheoverallcontextoftheinputsequence.BecauseLSTMsandRNNscannothandletoomuchcontextwell,theiroutputsarepronetobeinginaccurateornonsensical.Thisandotherchallengeshavebeenlargelyovercomewiththeadventofnew,specialneuralnetworkscalledtransformers.
Transformerswerefirstintroducedin2017byVaswanietal.inapapertitled"AttentionisAllYouNeed."Thetitlealludedtoattentionmechanisms,whichwouldbecomethekeycomponentof
transformers.
ABeginner’sGuidetoLargeLanguageModels15
“Weproposeanewsimplenetworkarchitecture,theTransformer,basedsolelyonattention
mechanisms,dispensingwithrecurrenceandconvolutionsentirely.”-Vaswaniet.al,“AttentionisA
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 江西省撫州市金溪縣2025年小升初考試數(shù)學(xué)試卷含解析
- 湖北職業(yè)技術(shù)學(xué)院《橄欖球》2023-2024學(xué)年第一學(xué)期期末試卷
- 吉林省長春市高新區(qū)重點(diǎn)中學(xué)2025屆下學(xué)期初三化學(xué)試題期初聯(lián)考考試試卷含解析
- 江蘇省濱淮2025屆初三下學(xué)期化學(xué)試題3月份考試試卷含解析
- 浙江省金華市2025屆六年級下學(xué)期5月模擬預(yù)測數(shù)學(xué)試題含解析
- 湖南理工學(xué)院《基本樂理(一)》2023-2024學(xué)年第二學(xué)期期末試卷
- 江西財經(jīng)職業(yè)學(xué)院《自然資源調(diào)查與評估》2023-2024學(xué)年第二學(xué)期期末試卷
- 西南財經(jīng)大學(xué)《餐飲空間設(shè)計》2023-2024學(xué)年第二學(xué)期期末試卷
- 商丘市重點(diǎn)中學(xué)2024-2025學(xué)年初三下期末大聯(lián)考化學(xué)試題含解析
- 浙江廣廈建設(shè)職業(yè)技術(shù)大學(xué)《高等流體力學(xué)(全英文)》2023-2024學(xué)年第二學(xué)期期末試卷
- 河南省天一小高考2024-2025學(xué)年(下)高三第三次考試政治
- 自制結(jié)婚協(xié)議書范本
- 新課標(biāo)《義務(wù)教育歷史課程標(biāo)準(zhǔn)(2022年版)》解讀課件
- 2025年陜西榆林能源集團(tuán)橫山煤電有限公司招聘筆試參考題庫附帶答案詳解
- 2025年上半年江西省水務(wù)集團(tuán)限責(zé)任公司招聘60人易考易錯模擬試題(共500題)試卷后附參考答案
- 2025年遼寧省能源控股集團(tuán)所屬遼能股份公司招聘筆試參考題庫附帶答案詳解
- 第五課 我國的根本政治制度課件高考政治一輪復(fù)習(xí)統(tǒng)編版必修三政治與法治
- 2024年南通市公安局蘇錫通園區(qū)分局招聘警務(wù)輔助人員考試真題
- 精神科護(hù)理不良事件分析討論
- 填海石采購合同6篇
- 江蘇省蘇州市2022-2023學(xué)年高二下學(xué)期數(shù)學(xué)期中試卷(含答案)
評論
0/150
提交評論