【英偉達(dá)】大語言模型新手入門指南-A Beginners Guide to Large Language Models_第1頁
【英偉達(dá)】大語言模型新手入門指南-A Beginners Guide to Large Language Models_第2頁
【英偉達(dá)】大語言模型新手入門指南-A Beginners Guide to Large Language Models_第3頁
【英偉達(dá)】大語言模型新手入門指南-A Beginners Guide to Large Language Models_第4頁
【英偉達(dá)】大語言模型新手入門指南-A Beginners Guide to Large Language Models_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

ABeginner’sGuidetoLargeLanguageModels

Part1

Contributors:

AnnamalaiChockalingamAnkurPatel

ShashankVermaTiffanyYeung

TableofContents

Preface 3

Glossary 5

IntroductiontoLLMs 8

WhatAreLargeLanguageModels(LLMs)? 8

FoundationLanguageModelsvs.Fine-TunedLanguageModels 11

EvolutionofLargeLanguageModels 11

NeuralNetworks 12

Transformers 14

HowEnterprisesCanBenefitFromUsingLargeLanguageModels 20

ChallengesofLargeLanguageModels 21

WaystoBuildLLMs 21

HowtoEvaluateLLMs 22

NotableCompaniesintheLLMField 23

PopularStartup-developedLLMApps 23

ABeginner’sGuidetoLargeLanguageModels2

ABeginner’sGuidetoLargeLanguageModeIs3

Preface

Languagehasbeenintegraltohumansocietyforthousandsofyears.Along-prevailingtheory,laryngealdescenttheoryorLDT,suggeststhatspeechand,thus,language,

mayhaveevoIvedabout200,000or300,000yearsago,whiIenewerresearchshowsitcouId’vehappenedevensooner.

Regardlessofwhenitfirstappeared,languageremainsthecornerstoneofhumancommunication.IthastakenonanevengreaterroIeintoday’sdigitaIage,whereanunprecedentedportionofthe

populationcancommunicateviabothtextandspeechacrosstheglobe.

Thisisunderscoredbythefactthat347.3billionemailmessagesaresentandreceivedworldwide

everyday,andthatfivebillionpeople一orover63%oftheentireworldpopulation一sendandreceivetextmessages.

Languagehasthereforebecomeavasttroveofinformationthatcanhelpenterprisesextractvaluableinsights,identifytrends,andmakeinformeddecisions.Asanexample,enterprisescananalyzetextsIikecustomerreviewstoidentifytheirproducts’best-sellingfeaturesandfine-tunetheirfuture

productdevelopment.

Similarly,languageproduction一asopposedtolanguageanalysis一isalsobecominganincreasinglyimportanttoolforenterprises.Creatingblogposts,forexample,canhelpenterprisesraisebrand

awarenesstoapreviouslyunheard-ofextent,whilecomposingemailscanhelpthemattractnewstakeholdersorpartnersatanunmatchedspeed.

However,bothlanguageanalysisandproductionaretime-consumingprocessesthatcandistract

employeesanddecision-makersfrommoreimportanttasks.Forinstance,leadersoftenneedtosiftthroughvastamountsoftextinordertomakeinformeddecisionsinsteadofmakingthembasedonextractedkeyinformation.

Enterprisescanminimizetheseandotherproblems,suchastheriskofhumanerror,byemploying

largelanguagemodels(LLMs)forlanguage-relatedtasks.LLMscanhelpenterprisesaccelerateandlargelyautomatetheireffortsrelatedtobothlanguageproductionandanalysis,savingvaluabletimeandresourceswhileimprovingaccuracyandefficiency.

Unlikeprevioussolutions,suchasrule-basedsystems,LLMsareincrediblyversatileandcanbeeasilyadaptedtoawiderangeoflanguage-relatedtasks,likegeneratingcontentorsummarizinglegal

documentation.

ABeginner’sGuidetoLargeLanguageModels4

ThegoalofthisbookistohelpenterprisesunderstandwhatmakesLLMssogroundbreaking

comparedtoprevioussolutionsandhowtheycanbenefitfromadoptingordevelopingthem.ItalsoaimstohelpenterprisesgetaheadstartbyoutliningthemostcrucialstepstoLLMdevelopment,

training,anddeployment.

Toachievethesegoals,thebookisdividedintothreeparts:

>Part1definesLLMsandoutlinesthetechnologicalandmethodologicaladvancementsoverthe

yearsthatmadethempossible.Italsotacklesmorepracticaltopics,suchashowenterprisescandeveloptheirownLLMsandthemostnotablecompaniesintheLLMfield.Thisshouldhelp

enterprisesunderstandhowadoptingLLMscanunlockcutting-edgepossibilitiesandrevolutionizetheiroperations.

>Part2discussesfivemajorusecasesofLLMswithinenterprises,includingcontentgeneration,summarization,andchatbotsupport.Eachusecaseisexemplifiedwithreal-lifeappsandcasestudies,soastoshowhowLLMscansolverealproblemsandhelpenterprisesachievespecificobjectives.

>Part3isapracticalguideforenterprisesthatwanttobuild,train,anddeploytheirownLLMs.Itprovidesanoverviewofnecessarypre-requirementsandpossibletrade-offswithdifferent

developmentanddeploymentmethods.MLengineersanddatascientistscanusethisasareferencethroughouttheirLLMdevelopmentprocesses.

Hopefully,thiswillinspireenterprisesthathavenotyetadoptedordevelopedtheirownLLMstodososooninordertogainacompetitiveadvantageandoffernewSOTAservicesorproducts.Themostbenefitswillbe,asusual,reservedforearlyadoptersortrulyvisionaryinnovators.

ABeginner’sGuidetoLargeLanguageModels5

Glossary

TermsDescription

Deeplearningsystems

Systemsthatrelyonneuralnetworkswithmanyhiddenlayerstolearncomplexpatterns.

GenerativeAI

AIprogramsthatcangeneratenewcontent,liketext,images,andaudio,ratherthanjustanalyzeit.

Largelanguagemodels(LLMs)

Languagemodelsthatrecognize,summarize,translate,predict,andgeneratetextandothercontent.They’recalledlarge

becausetheyaretrainedonlargeamountsofdataandhavemanyparameters,withpopularLLMsreachinghundredsofbillionsofparameters.

Naturallanguageprocessing(NLP)

Theabilityofacomputerprogramtounderstandandgeneratetextinnaturallanguage.

Longshort-termmemoryneuralnetwork(LSTM)

AspecialtypeofRNNswithmorecomplexcellblocksthatallowittoretainmorepastinputs.

Naturallanguagegeneration(NLG)

ApartofNLPthatreferstotheabilityofacomputerprogramtogeneratehuman-liketext.

Naturallanguageunderstanding(NLU)

ApartofNLPthatreferstotheabilityofacomputerprogramtounderstandhuman-liketext.

Neuralnetwork(NN)

Amachinelearningalgorithminwhichtheparametersare

organizedintoconsecutivelayers.ThelearningprocessofNNsisinspiredbythehumanbrain.Muchlikehumans,NNs“l(fā)earn”

importantfeaturesviarepresentationlearningandrequirelesshumaninvolvementthanmostotherapproachestomachine

learning.

PerceptionAI

AIprogramsthatcanprocessandanalyzebutnotgeneratedata,mainlydevelopedbefore2020.

Recurrentneuralnetwork(RNN)

Neuralnetworkthatprocessesdatasequentiallyandcanmemorizepastinputs.

ABeginner’sGuidetoLargeLanguageModels6

TermsDescription

Rule-basedsystem

Asystemthatreliesonhuman-craftedrulestoprocessdata.

Traditionalmachinelearning

Traditionalmachinelearningusesastatisticalapproach,drawingprobabilitydistributionsofwordsorothertokensbasedona

largeannotatedcorpus.Itrelieslessonrulesandmoreondata.

Transformer

Atypeofneuralnetworkarchitecturedesignedtoprocesssequentialdatanon-sequentially.

Structureddata

Datathatisquantitativeinnature,suchasphonenumbers,andcanbeeasilystandardizedandadjustedtoapre-definedformatthatMLalgorithmscanquicklyprocess.

Unstructureddata

Datathatisqualitativeinnature,suchascustomerreviews,anddifficulttostandardize.Suchdataisstoredinitsnativeformats,likePDFfiles,beforeuse.

Fine-tuning

Atransferlearningmethodusedtoimprovemodelperformanceonselecteddownstreamtasksordatasets.It’susedwhenthe

targettaskissimilartothepre-trainingtaskandinvolvescopyingtheweightsofaPLMandtuningthemondesiredtasksordata.

Customization

AmethodofimprovingmodelperformancebymodifyingonlyoneorafewselectedparametersofaPLMinsteadofupdatingtheentiremodel.Itinvolvesusingparameter-efficient

techniques(PEFT).

Parameter-efficienttechniques(PEFT)

Techniqueslikepromptlearning,LoRa,andadaptertuning

whichallowresearcherstocustomizePLMsfordownstreamtasksordatasetswhilpreservingandleveragingexisting

knowledgeofPLMs.Thesetechniquesareusedduringmodelcustomizationandallowforquickertrainingandoftenmoreaccuratepredictions.

Promptlearning

AnumbrellatermfortwoPEFTtechniques,prompttuningand

p-tuning,whichhelpcustomizemodelsbyinsertingvirtualtokenembeddingsamongdiscreteorrealtokenembeddings.

Adaptertuning

APEFTtechniquethatinvolvesaddinglightweightfeed-forwardlayers,calledadapters,betweenexistingPLMlayersand

updatingonlytheirweightsduringcustomizationwhilekeepingtheoriginalPLMweightsfrozen.

Open-domainquestionanswering

Answeringquestionsfromavarietyofdifferentdomains,likelegal,medical,andfinancial,insteadofjustonedomain.

Extractivequestionanswering

Answeringquestionsbyextractingtheanswersfromexistingtextsordatabases.

ABeginner’sGuidetoLargeLanguageModels7

TermsDescription

ThroughputAmeasureofmodelefficiencyandspeed.Itreferstothe

amountofdataorthenumberofpredictionsthatamodelcanprocessorgeneratewithinapre-definedtimeframe.

LatencyTheamountoftimeamodelneedstoprocessinputand

generateoutput.

DataReadinessThesuitabilityofdataforuseintraining,basedonfactorssuch

asdataquantity,structure,andquality.

ABeginner’sGuidetoLargeLanguageModels8

IntroductiontoLLMs

Alargelanguagemodelisatypeofartificialintelligence(AI)system

thatiscapableofgeneratinghuman-liketextbasedonthepatterns

andrelationshipsitlearnsfromvastamountsofdata.Largelanguagemodelsuseamachinelearningtechniquecalleddeeplearningtoanalyzeandprocesslargesetsofdata,suchasbooks,articles,andwebpages.

LargelanguagemodelsunlockednumerousunprecedentedpossibilitiesinthefieldofNLPandAI.ThiswasmostnotablydemonstratedbythereleaseofOpenAI’sGPT-3in2020,thethen-largestlanguagemodeleverdeveloped.

Thesemodelsaredesignedtounderstandthecontextandmeaningoftextandcangeneratetextthatisgrammaticallycorrectandsemanticallyrelevant.Theycanbetrainedonawiderangeoftasks,

includinglanguagetranslation,summarization,questionanswering,andtextcompletion.

GPT-3madeitevidentthatlarge-scalemodelscanaccuratelyperformawide–andpreviously

unheard-of–rangeofNLPtasks,fromtextsummarizationtotextgeneration.ItalsoshowedthatLLMscouldgenerateoutputsthatarenearlyindistinguishablefromhuman-createdtext,allwhilelearningontheirownwithminimalhumanintervention.

Thispresentedanenormousimprovementfromearlier,mainlyrule-basedmodelsthatcouldneitherlearnontheirownnorsuccessfullysolvetaskstheyweren’ttrainedon.Itisnosurprise,then,that

manyotherenterprisesandstartupssoonstarteddevelopingtheirownLLMsoradoptingexistingLLMsinordertoacceleratetheiroperations,reduceexpenses,andstreamlineworkflows.

Part1isintendedtoprovideasolidintroductionandfoundationforanyenterprisethatisconsideringbuildingoradoptingitsownLLM.

WhatAreLargeLanguageModels(LLMs)?

Largelanguagemodels(LLMs)aredeeplearningalgorithmsthatcanrecognize,extract,summarize,predict,andgeneratetextbasedonknowledgegainedduringtrainingonverylargedatasets.

They’realsoasubsetofamoregeneraltechnologycalledlanguagemodels.Alllanguagemodelshaveonethingincommon:theycanprocessandgeneratetextthatsoundslikenaturallanguage.Thisis

knownasperformingtasksrelatedtonaturallanguageprocessing(NLP).

ABeginner’sGuidetoLargeLanguageModels9

AlthoughalllanguagemodelscanperformNLPtasks,theydifferinothercharacteristics,suchastheirsize.Unlikeothermodels,LLMsareconsideredlargeinsizebecauseoftworeasons:

1.They’retrainedusinglargeamountsofdata.

2.Theycompriseahugenumberoflearnableparameters(i.e.,representationsoftheunderlyingstructureoftrainingdatathathelpmodelsperformtasksonnewornever-before-seendata).

Table1

showcasestwolargelanguagemodels,MT-NLGandGPT-3Davinci,tohelpclarifywhat’s

consideredlargebycontemporarystandards.

Table1.ComparisonofMT-NLGandGPT-3

LargeLanguageModel

Numberof

parameters

Numberoftokensinthetrainingdata

NVIDIAModel:Megatron-TuringNaturalLanguageGenerationModel(MT-NLG)

530billion

270billion

OpenAIModel:GPT-3DavinciModel

175billion

499billion

Sincethequalityofamodelheavilydependsonthemodelsizeandthesizeoftrainingdata,largerlanguagemodelstypicallygeneratemoreaccurateandsophisticatedresponsesthantheirsmallercounterparts.

ABeginner’sGuidetoLargeLanguageModels10

Figure1.AnswerGeneratedbyGPT-3.

However,theperformanceoflargelanguagemodelsdoesn’tjustdependonthemodelsizeordataquantity.Qualityofthedatamatters,too.

Forexample,LLMstrainedonpeer-reviewedresearchpapersorpublishednovelswillusuallyperformbetterthanLLMstrainedonsocialmediaposts,blogcomments,orotherunreviewedcontent.Low-

qualitydatalikeuser-generatedcontentmayleadtoallsortsofproblems,suchasmodelspickingupslang,learningincorrectspellingsofwords,andsoon.

Inaddition,modelsneedverydiversedatainordertoperformvariousNLPtasks.However,ifthe

modelisintendedtobeespeciallygoodatsolvingaparticularsetoftasks,thenfine-tuneitusinga

morerelevantandnarrowerdataset.Bydoingsoafoundationlanguagemodelistransformed—fromonethat’sgoodatperformingvariousNLPtasksacrossabroadsetofdomains–intoafine-tuned

modelthatspecializesinperformingtasksinanarrowlyscopeddomain.

ABeginner’sGuidetoLargeLanguageModels11

FoundationLanguageModelsvs.Fine-TunedLanguageModels

Foundationlanguagemodels,suchastheaforementionedMT-NLGandGPT-3,arewhatisusuallyreferredtowhendiscussingLLMs.They’retrainedonvastamountsofdataandcanperformawidevarietyofNLPtasks,fromansweringquestionsandgeneratingbooksummariestocompletingandtranslatingsentences.

Thankstotheirsize,foundationmodelscanperformwellevenwhentheyhavelittledomain-specificdataattheirdisposal.Theyhavegoodgeneralperformanceacrosstasksbutmaynotexcelat

performinganyonespecifictask.

Fine-tunedlanguagemodels,ontheotherhand,arelargelanguagemodelsderivedfromfoundationLLMs.They’recustomizedforspecificusecasesordomainsand,thus,becomebetteratperformingmorespecializedtasks.

Apartfromthefactthatfine-tunedmodelscanperformspecifictasksbetterthanfoundationmodels,theirbiggeststrengthisthattheyarelighterand,generally,easiertotrain.Buthowdoesoneactuallyfine-tuneafoundationmodelforspecificobjectives?

Currently,themostpopularmethodiscustomizingamodelusingparameter-efficientcustomizationtechniques,suchasp-tuning,prompttuning,adapters,andsoon.Customizationisfarlesstime-

consumingandexpensivethanfine-tuningtheentiremodel,althoughitmayleadtosomewhat

poorerperformancethanothermethods.Customizationmethodsarefurtherdiscussedin

Part3.

EvolutionofLargeLanguageModels

AIsystemswerehistoricallyaboutprocessingandanalyzingdata,notgeneratingit.Theyweremoreorientedtowardperceivingandunderstandingtheworldaroundusratherthanongeneratingnewinformation.ThisdistinctionmarksthemaindifferencebetweenPerceptiveandGenerativeAI,withthelatterbecomingincreasinglyprevalentsincearound2020,oraftercompaniesstartedadoptingtransformermodelsanddevelopingincreasinglymorerobustLLMsatalargescale.

TheadventoflargelanguagemodelsfurtherfueledarevolutionaryparadigmshiftinthewayNLP

modelsaredesigned,trained,andused.Totrulyunderstandthis,itmaybehelpfultocomparelargelanguagemodelstopreviousNLPmodelsandhowtheyworked.Forthispurpose,let’sbrieflyexplorethreeregimesinthehistoryofNLP:pre-transformersNLP,transformersNLP,andLLMNLP.

1.Pre-transformersNLPwasmainlymarkedbymodelsthatreliedonhuman-craftedrulesrather

thanmachinelearningalgorithmstoperformNLPtasks.Thismadethemsuitableforsimplertasksthatdidn’trequiretoomanyrules,liketextclassification,butunsuitableformorecomplextasks,suchasmachinetranslation.Rule-basedmodelsalsoperformedpoorlyinedge-casescenarios

becausetheycouldn’tmakeaccuratepredictionsorclassificationsfornever-before-seendataforwhichnoclearruleswereset.Thisproblemwassomewhatsolvedwithsimpleneuralnetworks,

suchasRNNsandLSTMs,developedduringthelaterphasesofthisperiod.RNNsandLSTMscouldmemorizepastdatatoacertainextentand,thus,providecontext-dependentpredictionsand

ABeginner’sGuidetoLargeLanguageModels12

classifications.However,RNNsandLSTMscouldnotmakepredictionsoverlongspansoftext,limitingtheireffectiveness.

2.TransformersNLPwassetinmotionbytheriseofthetransformerarchitecturein2017.

Transformerscouldgeneralizebetterthanthethen-prevailingRNNsandLSTMs,capturemore

context,andprocessmoredataatonce.TheseimprovementsenabledNLPmodelstounderstandlongersequencesofdataandperformamuchwiderrangeoftasks.However,fromtoday’spointofview,modelsdevelopedduringthisperiodhadlimitedcapabilities,mainlyduetothegenerallackoflarge-scaledatasetsandadequatecomputationalresources.Theyalsomainlysparked

attentionamongresearchersandexpertsinthefieldbutnotthegeneralpublic,astheyweren’tuser-friendlynoraccurateenoughtobecomecommercialized.

3.LLMNLPwasmainlyinitiatedbythelaunchofOpenAI’sGPT-3in2020.Largelanguagemodels

likeGPT-3weretrainedonmassiveamountsofdata,whichallowedthemtoproducemore

accurateandcomprehensiveNLPresponsescomparedtopreviousmodels.Thisunlockedmanynewpossibilitiesandbroughtusclosertoachievingwhatmanyconsider“true”AI.Also,LLMs

madeNLPmodelsmuchmoreaccessibletonon-technicaluserswhocouldnowsolveavarietyofNLPtasksjustbyusingnatural-languageprompts.NLPtechnologywasfinallydemocratized.

Theswitchfromonemethodologytoanotherwaslargelydrivenbyrelevanttechnologicaland

methodologicaladvancements,suchastheadventofneuralnetworks,attentionmechanisms,andtransformersanddevelopmentsinthefieldofunsupervisedandself-supervisedlearning.The

followingsectionswillbrieflyexplaintheseconcepts,asunderstandingthemiscrucialfortrulyunderstandinghowLLMsworkandhowtobuildnewLLMsfromscratch.

NeuralNetworks

Neuralnetworks(NNs)aremachinelearningalgorithmslooselymodeledafterthehumanbrain.Likethebiologicalhumanbrain,artificialneuralnetworksconsistofneurons,alsocallednodes,thatareresponsibleforallmodelfunctions,fromprocessinginputtogeneratingoutput.

Theneuronsarefurtherorganizedintolayers,verticallystackedcomponentsofNNsthatperformspecifictasksrelatedtoinputandoutputsequences.

ABeginner’sGuidetoLargeLanguageModels13

Everyneuralnetworkhasatleastthreelayers:

>Theinputlayeracceptsdataandpassesittotherestofthenetwork.

>Thehiddenlayer,ormultiplehiddenlayers,performsspecificfunctionsthatmakethefinal

outputofanNNpossible.Thesefunctionscanincludeidentifyingorclassifyingdata,generatingnewdata,andotherfunctionsdependingonthespecificNLPtaskinquestion.

>Theoutputlayergeneratesapredictionorclassificationbasedontheinput.

WhenLLMswerefirstdeveloped,theywerebasedonsimplerNNarchitectureswithfewerlayers,mainlyrecurrentneuralnetworks(RNNs)andlongshort-termmemorynetworks(LSTMs).Unlike

otherneuralnetworks,RNNsandLSTMscouldtakeintoaccountthecontext,position,and

relationshipsbetweenwordseveniftheywerefarapartinadatasequence.Simplyput,thismeanttheycouldmemorizeandconsiderpastdatawhengeneratingoutput,whichresultedinmore

accuratesolutionstomanyNLPtasks,especiallysentimentanalysisandtextclassification.

ThebiggestadvantagethatneuralnetworkslikeRNNsandLSTMshadovertraditional,rule-basedsystemswasthattheywerecapableoflearningontheirownwithlittletonohumaninvolvement.Theyanalyzedatatocreatetheirownrules,ratherthanlearntherulesfirstandapplythemtodatalater.Thisisalsoknownasrepresentationlearningandisinspiredbyhumanlearningprocesses.

Representations,orfeatures,arehiddenpatternsthatneuralnetworkscanextractfromdata.To

exemplifythis,let’simaginewe’retraininganNN-basedmodelonadatasetcontainingthefollowingtokens:

“cat,”“cats,”dog,”“dogs”

Afteranalyzingthesetokens,themodelmayidentifyarepresentationthatonecouldformulateas:

Pluralnounshavethesuffix“-s.”

Themodelwillthenextractthisrepresentationandapplyittoneworedge-casescenarioswhosedatadistributionfollowsthatoftrainingdata.Forexample,theassumptioncanbemadethatthemodel

willcorrectlyclassifytokenslike“chairs”or“table”aspluralorsingularevenifithadnotencounteredthembefore.Onceitencountersirregularnounsthatdon’tfollowtheextractedrepresentation,the

modelwillupdateitsparameterstoreflectnewrepresentations,suchas:

Pluralnounsarefollowedbypluralverbs.

ThisapproachenablesNN-basedmodelstogeneralizebetterthanrule-basedsystemsandsuccessfullyperformawiderrangeoftasks.

However,theirabilitytoextractrepresentationsisverymuchdependentonthenumberofneuronsandlayerscomprisinganetwork.Themoreneuronsneuralnetworkshave,themorecomplex

representationstheycanextract.That’swhy,today,mostlargelanguagemodelsusedeeplearningneuralnetworkswithmultiplehiddenlayersand,thus,ahighernumberofneurons.

ABeginner’sGuidetoLargeLanguageModels14

Figure2

showsaside-by-sidecomparisonofasingle-layerneuralnetworkandadeeplearningneural

network.

Figure2.ComparisonofSingle-Layervs.DeepLearningNeuralNetwork

Whilethismayseemlikeanobviouschoicetoday,considerthatdevelopingdeepneuralnetworksdidnotmakesensebeforethehardwareevolvedtobeabletohandlemassiveworkloads.Thisonly

becamepossibleafter~1999whenNVIDIAintroduced“theworld’sfirstGPU,”orgraphicsprocessingunit,tothewidermarketor,moreprecisely,afterawildlysuccessfulCNNcalledAlexNetpopularizedtheiruseindeeplearningin2012.

GPUshadahighlyparallelizablearchitecturewhichenabledtherapidadvancesindeeplearning

systemsthatareseentoday.Amongotheradvancements,theadventofGPUsusheredinthe

developmentofanewtypeofneuralnetworkthatwouldrevolutionizethefieldofNLP:transformers.

Transformers

WhileRNNsandLSTMshavetheiradvantages,especiallycomparedtotraditionalmodels,theyalso

havesomelimitationsthatmakethemunsuitableformorecomplexNLPtasks,suchasmachine

translation.Theirmainlimitationistheinabilitytoprocesslongerdatasequencesand,thus,considertheoverallcontextoftheinputsequence.BecauseLSTMsandRNNscannothandletoomuchcontextwell,theiroutputsarepronetobeinginaccurateornonsensical.Thisandotherchallengeshavebeenlargelyovercomewiththeadventofnew,specialneuralnetworkscalledtransformers.

Transformerswerefirstintroducedin2017byVaswanietal.inapapertitled"AttentionisAllYouNeed."Thetitlealludedtoattentionmechanisms,whichwouldbecomethekeycomponentof

transformers.

ABeginner’sGuidetoLargeLanguageModels15

“Weproposeanewsimplenetworkarchitecture,theTransformer,basedsolelyonattention

mechanisms,dispensingwithrecurrenceandconvolutionsentirely.”-Vaswaniet.al,“AttentionisA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論