從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

上傳人：1*** IP屬地：山西上傳時間：2025-01-14 格式：DOCX 頁數(shù)：36 大小：246.61KB 積分：19.9 舉報(bào) 版權(quán)申訴

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第2頁

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第3頁

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第4頁

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第5頁

已閱讀5頁，還剩31頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

LearningtoPlanforRetrieval-AugmentedLargeLanguageModelsfromKnowledgeGraphs

JunjieWang1,2,5*,MingyangChen3

,BinbinHu2,5,DanYang2,5,ZiqiLiu2,5,

YueShen2,5,PengWei2,5,ZhiqiangZhang2,5,JinjieGu2,5,JunZhou2,5,

JeffZ.Pan4,WenZhang1,5?,HuajunChen1,5

1ZhejiangUniversity,2AntGroup,3BaichuanInc.,4TheUniversityofEdinburgh

5ZhejiangUniversity-AntGroupJointLaboratoryofKnowledgeGraph

{wangjj2018,zhang.wen,huajunsir}@,chenmingyang@

Planning

/j.z.pan/

/zjukg/LPKG

Abstract

arXiv:2406.14282v3[cs.CL]23Oct2024

Improvingtheperformanceoflargelanguagemodels(LLMs)incomplexquestion-answering(QA)scenarioshasalwaysbeenaresearchfo-calpoint.Recentstudieshaveattemptedtoen-hanceLLMs’performancebycombiningstep-wiseplanningwithexternalretrieval.WhileeffectiveforadvancedmodelslikeGPT-3.5,smallerLLMsfacechallengesindecomposingcomplexquestions,necessitatingsupervisedfine-tuning.Previousworkhasreliedonman-ualannotationandknowledgedistillationfromteacherLLMs,whicharetime-consumingandnotaccurateenough.Inthispaper,weintro-duceanovelframeworkforenhancingLLMs’planningcapabilitiesbyusingplanningdataderivedfromknowledgegraphs(KGs).LLMsfine-tunedwiththisdatahaveimprovedplan-ningcapabilities,betterequippingthemtohan-dlecomplexQAtasksthatinvolveretrieval.Evaluationsonmultipledatasets,includingournewlyproposedbenchmark,highlighttheef-fectivenessofourframeworkandthebenefitsofKG-derivedplanningdata.

1Introduction

Thepastfewyearshavewitnessedsignificantin-

novationsinLLMs(Ouyangetal.,

2022;

Touvron

etal.,

2023;

Chowdheryetal.,

2023;

AI@Meta,

2024

).WhileLLMsexcelinmanynaturallan-guageprocessingtasks,theystillfacechallenges,particularlythesmallermodels,inhandlingcom-plexquestion-answering(QA)tasks(

Pressetal.,

2023;

Shaoetal.,

2023;

Yaoetal.,

2022;

Xiong

etal.,

2024a;

Huangetal.,

2024)

ToimprovetheperformanceofLLMsoncom-plexQAtasks,pastresearchhastriedvariousmeth-ods:(1)Employingcarefullydesignedpromptstrategiestoguidethemodelinreasoning,such

asChainofThought(CoT)(Kojimaetal.,

2022;

*Equalcontribution.

?Correspondingauthors.

Pattern

Q:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Spouse

Sports

FranWalshAns_1

Q1:WhoisFranWalsh’sSpouse?

A1:Ans_1

Q2:Whatsportsdoes{Ans_1}play?

Instance

A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?

A3:Ans_3

Fluminense

FinalAnswer:A2&A3

Figure1:AnexampleofaKGpattern,itsgroundedinstance,andverbalizedplanningprocess.

Weietal.,

2022

)andTreeofThought(ToT)(

Yao

etal.,

2024

)methods;(2)Utilizingretrievaltech-niquestoobtainsupplementalinformationfromexternalknowledgesource(

Lewisetal.,

2020;

Guu

etal.,

2020);(3)Combiningpromptstrategieswith

retrievalenhancements,asexemplifiedbymeth-

odslikeReAct(Yaoetal.,

2022)andSelf-Ask

(Pressetal.,

2023

).Thethirdapproachhasgar-neredwidespreadresearchinterestduetoitsinte-grationoftheadvantagesofthefirsttwomethods.Thefundamentalideaofthisclassofmethodsisto

guideLLMsinbreakingdownacomplexquestionintomultiplesimplersub-questionsandthenusearetrieval-augmentedgeneration(RAG)(

Huang

etal.,

2023,

2024

)methodtoanswereachsub-question,therebydeducingtheanswertotheorigi-nalcomplexquestion.However,planningforcom-plexquestionsisnon-trivial,especiallyforsmallerLLMs(withfewerthan10billionparameters),whichoftenrequiresupervisedfine-tuning(

Ak-

sitovetal.,

2023;

Chenetal.,

2023a;

Qinetal.,

2023

Thisraisesawidelyconcerningissue:howtoobtainsuperviseddataforlearningtheplanningabilityoncomplexquestions.Manualannotationistime-consumingandlabor-intensive,makingitdifficulttoscale.Mostexistingmethodsattemptto

distillknowledgefromteacherLLMs(Yaoetal.,

2022;

Aksitovetal.,

2023

),whichplacesexcessive

trustintheteacherLLMsand,inreality,cannot

guaranteetheaccuracyofthedistilledknowledge.

Thesechallengesinspireustoexplorenewwaysofobtainingsupervisedplanningdata.

KnowledgeGraphs(KGs)(Panetal.,

2017b,a)

usuallystoreaccurateknowledgeinastructuredway.WefindthataKGpatterncanbeviewedastheabstractofacomplexquestion,asshowninFigure

,whichrevealstheconnectionbetweenquestionplanningandpatterns.Thisopensupthepossibilityofconstructingtrainingdatatoen-hancetheplanningcapabilitiesofLLMsusingKGs.Specifically,westartbygroundingpredefinedpat-ternsinanopen-domainKGtoextractnumerousinstances,whichwethenverbalizeintocomplexquestionsandcorrespondingsub-questionsinnat-urallanguage.Inthisway,weeffectivelycreatealargenumberofaccurateplanningdataforfine-tuning.Beingfine-tunedwiththeseplanningdata,LLMs’capabilityofgeneratingplansforcomplexquestionsisenhanced,resultinginbetterfinalan-swersbyparsingandexecutingtheseplans.WerefertothisinnovativeframeworkasLearningtoPlanfromKnowledgeGraphs(LPKG).

Additionally,weconstructaComprehensiveLogicalQAbenchmark,CLQA-Wiki,fromasub-

setofWikidata(VrandecicandKr?tzsch,

2014)via

groundingrichpatternsasaforementioned.Exist-

ingcomplexQAbenchmarks(Yangetal.,

2018;

etal.,

2020;

Pressetal.,

2023;

Trivedietal.,

2022

)primarilyfocusonmulti-hopandcomparison-typequestionsandlacklogicaloperations.Furthermore,mostquestionsarelabeledwithonlyoneanswer,whereasinreality,theyoftenhavemultiplecorrectanswers.TheCLQA-Wikibenchmarkevenlycov-ersmulti-hop,comparison,intersection,anduniontypesofquestions,whichismorecomprehensiveandchallengingforcomplexQAevaluation.

Ourcontributionscanbesummarizedasfollows:

(1)WeintroduceanovelframeworkLPKGthatenhancestheplanningabilityofLLMsusingdataconstructedfromKGpatterns;(2)Wedevelopacomprehensiveandchallengingevaluationbench-mark,namedCLQA-Wiki,tomoreeffectivelyas-sesstheperformanceofLLMsoncomplexQAtasks;(3)OurproposedframeworkLPKGachievesbetterresultsthanpopularbaselinesonmultipleconventionalcomplexQAbenchmarks,andweverifytheeffectivenessoftheintroductionofKG-sourcedplanningdata.

2RelatedWorks

ReasoningandPlanningwithLLMsInthecon-textofLLMs,reasoningtypicallyinvolvesdecom-posingcomplexquestionsintosub-questions(

Mi-

alonetal.,

2023;

Haoetal.,

2023)

.Prominenttech-niquesincludeChain-of-Thought(CoT)prompt-ing(

Weietal.,

2022

)whichelicitsrationalesthatleadtothefinalanswers,anditsextension,usingself-consistency(

Wangetal.,

2023)orautomated

demonstrationselection(

Zhangetal.,

2023)

.Other

methods,suchasReAct(Yaoetal.,

2022),gen

-eratereasoningstepssequentiallybyintegratingplanning,withadditionalstrategieslikeTreeof

Thoughts(ToT)(Yaoetal.,

2024),Reasoningvia

Planning(RAP)(Haoetal.,

2023),andothermeth

-ods(

Khotetal.,

2023;

Zhouetal.,

2023

)facil-itatingcomplexquestiondecompositionthroughvariedplanningapproaches.Unlikemostmethodsthatrelyonin-contextlearningthroughprompten-gineering,ourapproachgeneratesplanningdatafromKGstofine-tuneLLM,therebyenhancingtheirplanningcapabilities.

Retrieval-AugmentedGenerationRetrieval-AugmentedGeneration(RAG)canenhanceLLMsbyincorporatingexternaldata,allowingmodelstoaccessup-to-dateinformationandfactualknowl-edgetomitigatehallucinations(

Gaoetal.,

2023;

Guuetal.,

2020;

Lewisetal.,

2020

).Eachmod-uleintheRAGpipelinecanbeoptimized,forin-stance,throughretrievertuning(

Shietal.,

2023;

Linetal.,

2023

),self-reflectionduringretrieval

(Asaietal.,

2023;

Yanetal.,

2024

),orqueryre-finement(

Chanetal.,

2024

).Toaddressmulti-

hopquestions,iterativeRAGmodels(Shaoetal.,

2023;

Fengetal.,

2023;

Pressetal.,

2023)have

beendeveloped,whichiterativelyconductretrieval-enhancedgenerationandgeneration-enhancedre-trieval.However,themultipleRAGstepsinexist-ingmethodsarenotoptimizedandrelyheavilyonin-contextlearning.OurapproachusesplanningdatafromKGstofacilitatemoreefficientRAG.

LLMswithKGsIntheexistingrealmofLLMs,KGsareprimarilyutilizedassourcesofstructuredfactualknowledge(

Panetal.,

2023

).Forexam-ple,Think-on-

Graph(Sunetal.,

2023)extracts

relevanttriplesfromKGstoassistinQA.Reason-

ingonGraph(RoG)(Luoetal.,

2023)generates

relation-basedplansandretrievescorrespondingpathsfromthesegraphs.WhileaidinginKGQAtaskswhereanswersaredirectlysourcedfrom

Step2:PlanningLLMTuningandInference

SFTInference

“WhichregionsborderDrakeBell'sbirthplaceandSantaAnaatthesametime?”

Sub_Question_1:str="WhatisthebirthplaceofDrakeBell?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whichareasborderwith{Ans_1}"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="WhichareasborderwithSantaAna?"Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Step3:PlanParsingandExecution

RetrievalQALLM

SetOperate

FinalAnswer

…

Inter_Results1:str=Intersection…

Sub_Question_1:…

Info_1:str=Search…

…

Ans_1:str=Get_Answer…

Final_Answer…

…

Step1:DataConstruction

Sports

FranWalshAns_1

FluminenseSports

KnowledgeGraph

Spouse

Q1:WhatistheSpouseofFranWalsh?

A1:Ans_1

Sports

Q2:Whatsportsdoes{Ans_1}play?A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?A3:Ans_3

FinalQ:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Output

Sub_Question_1:str="WhatistheSpouseofFranWalsh?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whatsportsdoes{Ans_1}play?"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="Q3:WhatsportsdoesFluminenseplay?"

Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Input

##Example0##

…

##Example1##

...

##YourTurn##

Original_Question:str=‘WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?’

Grounding

Verbalization

Filling

Spouse

Figure2:OverviewofourLearningtoPlanfromKnowlegeGraph(LPKG)framework.

KGs,thesegraphsalsosupportrationalegeneration.Chain-of-Knowledge(CoK)(

Lietal.,

2024

)fur-therleveragesKGsalongwithotherheterogeneoussourcestogeneratefaithfulrationales.Unlikepre-viousstudies,ourapproachconstructsplanningdataforcomplexquestionsfromKGs,recogniz-ingthatpatternswithinKGsinherentlyrepresentmulti-stepplans.ThisdataisutilizedtoenhancetheplanningcapabilitiesofLLMs.

ComplexLogicalQueryinKGsRecentre-searchoncomplexlogicqueriesinKGsprimarilyfocusesonfirst-orderlogical(FOL)queriesthatincorporateoperationslikeconjunctions,disjunc-tions,negation,andexistentialquantifierswithin

incompleteKGs(Hamiltonetal.,

2018;

Renetal.,

2020;

RenandLeskovec,

2020;

Arakelyanetal.,

2021;

Chenetal.,

2022;

Xuetal.,

2022;

Xiong

etal.,

2024b;

Wuetal.,

2024

).Theseworksde-finediversepatternstoassessthecapabilityoflog-icaloperationsinvectorspaces,specificallytar-getinglogicalformsratherthannaturallanguage.Nonetheless,theirmethodologiesforpatterndefini-tionandextractioninspireourapproachtoderivingcomplexquestionsfromKGs.

3Method

3.1Overview

AsshowninFigure

,thereare3stepsinourLearningtoPlanfromKnowledgeGraphs(LPKG)

framework.(1)Inthedataconstructionstep,weconstructplanningdatafromKGs.Specifically,wedefinedsomebasicKGpatternsasshowninFigure

WegroundpatternsinanexistingKGtoextractinstances.Foreachextractedinstance,wesequen-tiallyverbalizethesub-querieswithintheinstanceintonaturallanguagesub-questionsaccordingtotheirorderintheinstance,eventuallyassemblingthemintoacomplexquestion.Afterward,webuildinputandoutputtemplatesforplanningdata,wherecomplexquestionsareconcatenatedtotheinputprompt,andsub-questionsarefilledintothecorre-spondingpositionsintheoutputtextaccordingtothetypeofpatterns.(2)IntheplanningLLMtun-ingandinferencestep,wefine-tuneLLMsbasedonsuchplanningdatatoenabletheLLMstofol-lowinstructionstoinfertheplanforeachquestioninthedownstreamtestsets.(3)Inthethirdstep,suchaplanwillbeparsedandexecuted,therebyobtainingthefinalanswertoeachquestion.

3.2ConstructionofPlanningData

BasicKGPatterns.Inspiredbypreviouswork

oncomplexlogicquerieswithinKGs(Ren

andLeskovec,

2020

),wedefinethebasicKGpatternsasshowninFigure

ThesetofKGpatternsisdenotedasP={1p,2p,3p,2i,3i,2u,ip,pi,compare}.Specifi-cally,p,i,urespectivelyindicateprojection,in-tersection,andunion.1p,2p,and3prepresent

ccompare

1p2p3p2u

2i3ipiip

Figure3:BasicKGpatterns.

queriesthatspanfromonetothreehops,2iand3irespectivelyrepresenttheintersectionoftwosub-queriesandthreesub-queries,2urepresentstheunionoftwosub-queries,andipandpirepresentcomplexqueriesthatcombinetwo-hopwithinter-sectionlogic.Inaddition,wealsocombinepairsoftriplesthathavenumerictailentitiesandthesamerelationstoconstructcomparisonpatterns,denotedascompare.

Grounding.GivenaKG,wefirstgroundthesepatternsinittoextractinstances:

Ipat=fpat(KG),pat∈P(1)

whereIpataretheinstancesgroundedbyknowl-edgegraphKGofpatternpat,fpatisthecorre-spondingextractionfunction.Forexample,anin-stanceofthe2ppatterncanbe“(Inkheart,(castmember,educatedat))”.Tobestmeettheneedsofopen-

domainQA,weuseWikidata15k(Chenetal.,

2023b

),asubsetoftheopen-domainKGWikidata,asKG.

Verbalization.Subsequently,basedonthegroundedinstances,weneedtoverbalizethembottom-upintosub-questionsandassemblethemintocomplexquestions.Thereareseveralmethodsforthisstep,suchasatemplates-basedmethod,manualannotation,orutilizinganLLM.Sincethetemplate-basedapproachoftenlacksfluencyinlanguageexpression,andthemanualmethodistime-consumingandlabor-intensive,weoptforanLLM-basedmethod.Specifically,wewriteasmallnumberofverbalizationexamplesforeachpatterntype.TheseexamplesareusedasdemonstrationsDe1tofillintheprompt.Finally,weconcatenateagroundedinstancei∈Ipattotheprompt,ask-inganLLMtoverbalizeittoanaturallanguagequestion:

{{Qsn}=1,Qc}=llm(concat(De1,i))(2)

where{Qsn}=1andQcrepresenttheresulting

sub-questionsandcomplexquestionrespectively,

concatisstringlevelconcatenation.WeuseGPT-4asllmhere.Itisimportanttonotethatherethellm’sroleismerelytotransformthedataformat;thesub-questionsandcomplexquestionstillorig-inatefromthestructureoftheKGitself,withoutintroducinganyknowledgefromthellminthetaskofquestionplanning.ThepromptweusecanbefoundinAppendix

C.1.

Filling.Wethenextractsub-questionsandcom-plexquestionsfromtheoutputofthellm.Subse-quently,webuiltasetofplanningtemplatesTpatfortheplanningprocessofquestionscorrespond-

ingtoeachpattern.The{Qsn}=1obtainedinthe

previousstepwillbefilledintofixedpositionsinTpatcorrespondingtotheirpatterntype,therebyobtainingtheoutputfortraining.TheQcobtainedinthepreviousstepisconcatenatedtotheendofafixedinstructionInsandsomeplanningdemon-strationsDe2(alsoconstructedfromKGs),thusobtainingtheinputfortrainingdata:

x=concat(Ins,De2,Qc)(3)

y=Tpat.fill({Qs}=1),pat∈P(4)

where.fillisafillingfunctionoftemplatesTpat.Inspiredby(

Aksitovetal.,

2023

),weuseacode-formattedinputxandoutputyhere(shownin“In-put”and“Output”inFigure

)tofacilitatefor-mattingandsubsequentparsingandexecutionoftheoutputplan(moredetailsinAppendix

C.2

).Intheend,weobtain9000trainingdataentries

Dtrain={xn,yn},with1000entriesforeach

pattern.Werandomlyselect100itemsfromthetrainingsetsformanualverification,withanaccu-racyrateofover95%.

3.3Fine-tuningandInferenceofPlanning

LLMs

WeusetheobtainedtrainingdataDtraintofine-tunetheplanningLLMsMpdirectlywiththestan-dardnexttokentrainingobjective:

E(x,y)∈DtrainLogpMp(y|x)(5)

Thefine-tunedplanningLLMMpcanbeusedtoinfertheplanPforeachquestionQtestinthedownstreamtestset:

P=Mp(concat(Ins,De2,Qtest))(6)

whereInsandDe2arethesameasthecontents

intheEquation(3

).Itshouldbenotedthatinthe

Type

Count

Type

Count

2pquestion

200

3pquestion

200

2iquestion

200

3iquestion

200

ipquestion

piquestion

2uquestion

200

comparequestion

100

Table1:DistributionofCLQA-Wiki.

multi-hopquestions,thespecificsub-questionsinthesecondandthirdhopsneedtobeconstructedbasedontheanswerstotheprevioushop’ssub-questions.SinceourPoutputsallprocessesatonce,theMpcannotknowtheanswerstotheprevi-oushop’ssub-questionswhenoutputtingtheplans.Therefore,wewilluseaplaceholdertoreplacetheanswertotheprevioushopsub-questions,allow-ingtheplanningtoproceedsmoothly(asshowninTable

10,

13,

inAppendix

C.1

).Theseplace-holderswillthenbefilledinduringthesubsequent

parsingandexecutionprocess.

3.4PlanParsingandExecution

TheobtainedplanPneedstobeparsedandexe-cutedtoobtainthefinalansweroftheQtest.Duetoouradoptionofcode-formattedinputandoutputforfine-tuningtheMp,thePhereisalsohighlyfor-mattedcode,whichfacilitatesourparsingofeachstepoftheplanandexecutingthem.Inparticular:

?Whenastepincludesa“Search”function,wewillcallanexternalretrievaltool.

?Whenastepincludesa“GetAnswer”func-tion,we’llinvokeanexternalQALLMMQAtogetanswersforasub-questionbasedonthere-trievedinformation.Thepossibleplaceholdersinsub-questionswillbefilledwithpreviousanswers.WeaskQALLMtoorganizeanswersintheformofalist(promptisshowninTable

inAppendix

C.3

?When“Intersection”or“Union”appearsinthestep,wewillrunactualintersectionorunionfunctions.Thiscanbeeasilycompletedduetolistformatanswersinthepreviousstep.

ItisimportanttonotethattheplanningLLMMpandtheQALLMMQAarecompletelydecou-pledinourframework.HerewecanuseanyLLMoff-the-shelftohandlethetaskofQA.Ultimately,wecanobtaintheanswertoQtest.

4NewBenchMark:CLQA-Wiki

TheconventionalcomplexQAdatasetsinclude

HotPotQA(Yangetal.,

2018),2WikiMultihopQA

(Hoetal.,

2020

),MuSiQue(

Trivedietal.,

2022

andBamboogle(Pressetal.,

2023)

.DespitetheirwidespreaduseinevaluatingtheQAperformanceoflanguagemodels,weidentifysomeproblemswiththesedatasets:

(1)Allthesedatasetsareprimarilyfocusedonmulti-hopandcomparison-typequestions.Thetypesofquestionsarenotbalancedandcomprehen-siveenough,andlessattentionispaidtoquestionsinvolvingintersectionandunionlogic,whicharealsoverycommoninreality.

(2)ExceptforMuSiQue,thequestionsontherestoftheotherthreedatasetsonlyhaveonean-swer,whereasmanyquestionsinrealityoftenhavemultipleanswers.Forexample,theanswertoanintersectionquestion“WhichcountryborderswithRussiaandChinaatthesametime?”isaset[Mon-golia,Kazakhstan,NorthKorea].

Inlightofthis,weaimtoconstructanewtest-ingbenchmarkthatembodiesmorecomprehensivelogicandallowsforanunrestrictednumberofan-swerstomorethoroughlyevaluatetheperformanceoflanguagemodelsonvariouslogicalquestions.Consideringthedetailedpatternstructuresandun-restrictednumberofanswerentitiesinKGs,weconstructatestsetbasedonWikidata15k.

Similartothemethodusedtoconstructtheplan-ningdata,weextractinstancesfromWikidata15k(whichdonotappearinthetrainingdata)anduseGPT-4todoverbalization.Moreover,foreachinstance,wecanobtainalltheanswerentitiesfromWikidata15k,whichwethendesignateastheanswerstothequestions.Aftermanualqualitychecks,weobtainatestsetcalledCLQA-Wiki,whichcontains1,200piecesofdatafeaturingavarietyofComprehensiveLogicalQApairs.ThequestiontypesandtheirdistributionarelistedinTa-ble

Itisworthnotingthatwehaveconstructed9typesoftestingquestionsuntilnow,andfornewlydefinedpatterns,wecanalsoquicklyconstructcorrespondingquestionsusingtheabovemethod,showingthebetterscalabilityofourdataset.

5Experiment

Weaimtoanswerthefollowingresearchquestionsinourexperiments:

?RQ1:CanLPKGoutperformbaselinemeth-odsonconventionalcomplexQAdatasets?

?RQ2:CanplanningdataderivedfromKGshelpimprovetheplanningabilityoftheLLMs?

?RQ3:CanplanningdataderivedfromKGs

bemorehelpfulinimprovingtheLLMs’planningabilitycomparedtonormaldistillationmethods?

?RQ4:CanLPKGoutperformbaselinemeth-odsonthenewbenchmarkCLQA-Wiki?

5.1ExperimentalSettings

DatasetsWefirstconductexperimentsonfourconventionalcomplexQAdatasets:

HotPotQA(Yangetal.,

2018),2WikiMulti

HopQA(2WikiMQA)(Hoetal.,

2020),MuSiQue

(Trivedietal.,

2022

),andBamboogle(

Pressetal.,

2023

).Amongthem,HotPotQA,2WikiMQA,andMuSiQuecontaincompletedtrainsets,developmentsets,andtestsets,whileBamboogleisasmalldatasetthatonlycontains125testdata.

Similartothepreviousmethod(Shaoetal.,

2023;

Aksitovetal.,

2023

),werespectivelyextractthefirst500entriesfromthedevelopmentsetofHotPotQA,2WikiMQA.ForMuSiQue,wefollow

Pressetal.

(2023)touseonly2-hopquestions

inthedevelopmentset.AndforBamboogle,weuseallofitsdataastestdata.Finally,weconducttestingonourbenchmarkCLQA-Wiki.

BaselinesWecompareourframeworktovariousbaselines:?Direct:DirectlyinputtheoriginalquestionintoLLM.?CoT:Follow

Kojimaetal.

(2022),weinstructLLMfirstly“Thinkstepbystep”

andthengivethefinalanswers.?DirectRAG:ThepromptsenttoLLMcontainstheoriginalquestionandretrievedinformationrelatedtotheoriginalquestion.?ReAct

(Yaoetal.,

2022

):Answeringquestionsthroughiterativeplanning,action,andobservation.Theactionhereistheretrievaltoolandobservationistheretrievedinformation.TheplanningandQAareconductedonasingleLLM.?Self-Ask

(Pressetal.,

2023

):SimilartoReAct,itfirstinstructsLLMtojudgewhethersub-questionsareneeded.Ifso,itwillrequestLLMtogeneratethesub-questions,thenconductexternalretrievalbasedonthesub-questions,andallowLLMtopro-videanswersbasedontheretrievedinformation.?ICLPKGAvariantofLPKGframework.Plan-ningLLMsarenotfine-tuned,whilejustusingIn-ContextLearningtodoPlanningwithsomeKG-sourcedplanningdemonstrations.

EvaluationMetricsExactMatch(EM)issetasanevaluationmetricinHotPotQA,2WikiMQA,Bamboogle,andMuSiQue.WhileinCLQA-Wiki,weuseRecallandPrecision.

ImplementationDetailsAllbaselinesarecon-ductedwithgpt-3.5-turbo-1106

(GPT-3.5).Thepromptsof“Direct”,“CoT”,and“DirectRAG”arewrittenbyourselves.TheReActandSelf-AskarereplicatedbasedontheirsourcecodewiththeGPT-3.5API.Tofacilitateassessment,wewillaskthemodeltoonlyoutputconciseanswerphrases. Inourframework:(1)Forpatterngrounding,weuseWikidata15kasKG,whichcontainsabout15kentitiesand263relations.Theextractiontoolingroundingismodifiedfromexistingworks

(RenandLeskovec,

2020

).(2)FortheplanningLLMMp,wechooseCodeQwen1.5-7B-ChatandLlama3-8B-Instruct,oneexcelsatcodingwhiletheotherexcelsatcommonsensereasoning.Wefine-tunethemwithLoratuning,runningon4x80GA100GPUsforabout3hours.Thefine-tuningisconductedfor2epochs,withalearningrateof5e-5andacosinelearningratescheduler.(3)Forretrieval,followingpreviousworks(

Shao

人人文庫> 全部分類> 行業(yè)資料 > 信息產(chǎn)業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

文檔簡介

溫馨提示

最新文檔

評論

從知識圖譜中習(xí)得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔