從知識圖譜中習得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第1頁
從知識圖譜中習得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第2頁
從知識圖譜中習得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第3頁
從知識圖譜中習得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第4頁
從知識圖譜中習得大語言模型的規(guī)劃能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第5頁
已閱讀5頁,還剩31頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

LearningtoPlanforRetrieval-AugmentedLargeLanguageModelsfromKnowledgeGraphs

JunjieWang1,2,5*,MingyangChen3

*

,BinbinHu2,5,DanYang2,5,ZiqiLiu2,5,

YueShen2,5,PengWei2,5,ZhiqiangZhang2,5,JinjieGu2,5,JunZhou2,5,

JeffZ.Pan4,WenZhang1,5?,HuajunChen1,5

?

1ZhejiangUniversity,2AntGroup,3BaichuanInc.,4TheUniversityofEdinburgh

5ZhejiangUniversity-AntGroupJointLaboratoryofKnowledgeGraph

{wangjj2018,zhang.wen,huajunsir}@,chenmingyang@

Planning

/j.z.pan/

/zjukg/LPKG

Abstract

arXiv:2406.14282v3[cs.CL]23Oct2024

Improvingtheperformanceoflargelanguagemodels(LLMs)incomplexquestion-answering(QA)scenarioshasalwaysbeenaresearchfo-calpoint.Recentstudieshaveattemptedtoen-hanceLLMs’performancebycombiningstep-wiseplanningwithexternalretrieval.WhileeffectiveforadvancedmodelslikeGPT-3.5,smallerLLMsfacechallengesindecomposingcomplexquestions,necessitatingsupervisedfine-tuning.Previousworkhasreliedonman-ualannotationandknowledgedistillationfromteacherLLMs,whicharetime-consumingandnotaccurateenough.Inthispaper,weintro-duceanovelframeworkforenhancingLLMs’planningcapabilitiesbyusingplanningdataderivedfromknowledgegraphs(KGs).LLMsfine-tunedwiththisdatahaveimprovedplan-ningcapabilities,betterequippingthemtohan-dlecomplexQAtasksthatinvolveretrieval.Evaluationsonmultipledatasets,includingournewlyproposedbenchmark,highlighttheef-fectivenessofourframeworkandthebenefitsofKG-derivedplanningdata.

1Introduction

Thepastfewyearshavewitnessedsignificantin-

novationsinLLMs(Ouyangetal.,

2022;

Touvron

etal.,

2023;

Chowdheryetal.,

2023;

AI@Meta,

2024

).WhileLLMsexcelinmanynaturallan-guageprocessingtasks,theystillfacechallenges,particularlythesmallermodels,inhandlingcom-plexquestion-answering(QA)tasks(

Pressetal.,

2023;

Shaoetal.,

2023;

Yaoetal.,

2022;

Xiong

etal.,

2024a;

Huangetal.,

2024)

.

ToimprovetheperformanceofLLMsoncom-plexQAtasks,pastresearchhastriedvariousmeth-ods:(1)Employingcarefullydesignedpromptstrategiestoguidethemodelinreasoning,such

asChainofThought(CoT)(Kojimaetal.,

2022;

*Equalcontribution.

?Correspondingauthors.

Pattern

Q:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Spouse

Sports

FranWalshAns_1

Q1:WhoisFranWalsh’sSpouse?

A1:Ans_1

Q2:Whatsportsdoes{Ans_1}play?

Instance

A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?

A3:Ans_3

Fluminense

FinalAnswer:A2&A3

Figure1:AnexampleofaKGpattern,itsgroundedinstance,andverbalizedplanningprocess.

Weietal.,

2022

)andTreeofThought(ToT)(

Yao

etal.,

2024

)methods;(2)Utilizingretrievaltech-niquestoobtainsupplementalinformationfromexternalknowledgesource(

Lewisetal.,

2020;

Guu

etal.,

2020);(3)Combiningpromptstrategieswith

retrievalenhancements,asexemplifiedbymeth-

odslikeReAct(Yaoetal.,

2022)andSelf-Ask

(Pressetal.,

2023

).Thethirdapproachhasgar-neredwidespreadresearchinterestduetoitsinte-grationoftheadvantagesofthefirsttwomethods.Thefundamentalideaofthisclassofmethodsisto

guideLLMsinbreakingdownacomplexquestionintomultiplesimplersub-questionsandthenusearetrieval-augmentedgeneration(RAG)(

Huang

etal.,

2023,

2024

)methodtoanswereachsub-question,therebydeducingtheanswertotheorigi-nalcomplexquestion.However,planningforcom-plexquestionsisnon-trivial,especiallyforsmallerLLMs(withfewerthan10billionparameters),whichoftenrequiresupervisedfine-tuning(

Ak-

sitovetal.,

2023;

Chenetal.,

2023a;

Qinetal.,

2023

).

Thisraisesawidelyconcerningissue:howtoobtainsuperviseddataforlearningtheplanningabilityoncomplexquestions.Manualannotationistime-consumingandlabor-intensive,makingitdifficulttoscale.Mostexistingmethodsattemptto

distillknowledgefromteacherLLMs(Yaoetal.,

2022;

Aksitovetal.,

2023

),whichplacesexcessive

trustintheteacherLLMsand,inreality,cannot

guaranteetheaccuracyofthedistilledknowledge.

Thesechallengesinspireustoexplorenewwaysofobtainingsupervisedplanningdata.

KnowledgeGraphs(KGs)(Panetal.,

2017b,a)

usuallystoreaccurateknowledgeinastructuredway.WefindthataKGpatterncanbeviewedastheabstractofacomplexquestion,asshowninFigure

1

,whichrevealstheconnectionbetweenquestionplanningandpatterns.Thisopensupthepossibilityofconstructingtrainingdatatoen-hancetheplanningcapabilitiesofLLMsusingKGs.Specifically,westartbygroundingpredefinedpat-ternsinanopen-domainKGtoextractnumerousinstances,whichwethenverbalizeintocomplexquestionsandcorrespondingsub-questionsinnat-urallanguage.Inthisway,weeffectivelycreatealargenumberofaccurateplanningdataforfine-tuning.Beingfine-tunedwiththeseplanningdata,LLMs’capabilityofgeneratingplansforcomplexquestionsisenhanced,resultinginbetterfinalan-swersbyparsingandexecutingtheseplans.WerefertothisinnovativeframeworkasLearningtoPlanfromKnowledgeGraphs(LPKG).

Additionally,weconstructaComprehensiveLogicalQAbenchmark,CLQA-Wiki,fromasub-

setofWikidata(VrandecicandKr?tzsch,

2014)via

groundingrichpatternsasaforementioned.Exist-

ingcomplexQAbenchmarks(Yangetal.,

2018;

Ho

etal.,

2020;

Pressetal.,

2023;

Trivedietal.,

2022

)primarilyfocusonmulti-hopandcomparison-typequestionsandlacklogicaloperations.Furthermore,mostquestionsarelabeledwithonlyoneanswer,whereasinreality,theyoftenhavemultiplecorrectanswers.TheCLQA-Wikibenchmarkevenlycov-ersmulti-hop,comparison,intersection,anduniontypesofquestions,whichismorecomprehensiveandchallengingforcomplexQAevaluation.

Ourcontributionscanbesummarizedasfollows:

(1)WeintroduceanovelframeworkLPKGthatenhancestheplanningabilityofLLMsusingdataconstructedfromKGpatterns;(2)Wedevelopacomprehensiveandchallengingevaluationbench-mark,namedCLQA-Wiki,tomoreeffectivelyas-sesstheperformanceofLLMsoncomplexQAtasks;(3)OurproposedframeworkLPKGachievesbetterresultsthanpopularbaselinesonmultipleconventionalcomplexQAbenchmarks,andweverifytheeffectivenessoftheintroductionofKG-sourcedplanningdata.

2RelatedWorks

ReasoningandPlanningwithLLMsInthecon-textofLLMs,reasoningtypicallyinvolvesdecom-posingcomplexquestionsintosub-questions(

Mi-

alonetal.,

2023;

Haoetal.,

2023)

.Prominenttech-niquesincludeChain-of-Thought(CoT)prompt-ing(

Weietal.,

2022

)whichelicitsrationalesthatleadtothefinalanswers,anditsextension,usingself-consistency(

Wangetal.,

2023)orautomated

demonstrationselection(

Zhangetal.,

2023)

.Other

methods,suchasReAct(Yaoetal.,

2022),gen

-eratereasoningstepssequentiallybyintegratingplanning,withadditionalstrategieslikeTreeof

Thoughts(ToT)(Yaoetal.,

2024),Reasoningvia

Planning(RAP)(Haoetal.,

2023),andothermeth

-ods(

Khotetal.,

2023;

Zhouetal.,

2023

)facil-itatingcomplexquestiondecompositionthroughvariedplanningapproaches.Unlikemostmethodsthatrelyonin-contextlearningthroughprompten-gineering,ourapproachgeneratesplanningdatafromKGstofine-tuneLLM,therebyenhancingtheirplanningcapabilities.

Retrieval-AugmentedGenerationRetrieval-AugmentedGeneration(RAG)canenhanceLLMsbyincorporatingexternaldata,allowingmodelstoaccessup-to-dateinformationandfactualknowl-edgetomitigatehallucinations(

Gaoetal.,

2023;

Guuetal.,

2020;

Lewisetal.,

2020

).Eachmod-uleintheRAGpipelinecanbeoptimized,forin-stance,throughretrievertuning(

Shietal.,

2023;

Linetal.,

2023

),self-reflectionduringretrieval

(Asaietal.,

2023;

Yanetal.,

2024

),orqueryre-finement(

Chanetal.,

2024

).Toaddressmulti-

hopquestions,iterativeRAGmodels(Shaoetal.,

2023;

Fengetal.,

2023;

Pressetal.,

2023)have

beendeveloped,whichiterativelyconductretrieval-enhancedgenerationandgeneration-enhancedre-trieval.However,themultipleRAGstepsinexist-ingmethodsarenotoptimizedandrelyheavilyonin-contextlearning.OurapproachusesplanningdatafromKGstofacilitatemoreefficientRAG.

LLMswithKGsIntheexistingrealmofLLMs,KGsareprimarilyutilizedassourcesofstructuredfactualknowledge(

Panetal.,

2023

).Forexam-ple,Think-on-

Graph(Sunetal.,

2023)extracts

relevanttriplesfromKGstoassistinQA.Reason-

ingonGraph(RoG)(Luoetal.,

2023)generates

relation-basedplansandretrievescorrespondingpathsfromthesegraphs.WhileaidinginKGQAtaskswhereanswersaredirectlysourcedfrom

Step2:PlanningLLMTuningandInference

SFTInference

“WhichregionsborderDrakeBell'sbirthplaceandSantaAnaatthesametime?”

Sub_Question_1:str="WhatisthebirthplaceofDrakeBell?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whichareasborderwith{Ans_1}"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="WhichareasborderwithSantaAna?"Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Step3:PlanParsingandExecution

RetrievalQALLM

SetOperate

FinalAnswer

Inter_Results1:str=Intersection…

Sub_Question_1:…

Info_1:str=Search…

Ans_1:str=Get_Answer…

Final_Answer…

Step1:DataConstruction

Sports

FranWalshAns_1

FluminenseSports

KnowledgeGraph

Spouse

Q1:WhatistheSpouseofFranWalsh?

A1:Ans_1

Sports

Q2:Whatsportsdoes{Ans_1}play?A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?A3:Ans_3

FinalQ:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Output

Sub_Question_1:str="WhatistheSpouseofFranWalsh?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whatsportsdoes{Ans_1}play?"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="Q3:WhatsportsdoesFluminenseplay?"

Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Input

##Example0##

##Example1##

...

##YourTurn##

Original_Question:str=‘WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?’

Grounding

Verbalization

Filling

Spouse

Figure2:OverviewofourLearningtoPlanfromKnowlegeGraph(LPKG)framework.

KGs,thesegraphsalsosupportrationalegeneration.Chain-of-Knowledge(CoK)(

Lietal.,

2024

)fur-therleveragesKGsalongwithotherheterogeneoussourcestogeneratefaithfulrationales.Unlikepre-viousstudies,ourapproachconstructsplanningdataforcomplexquestionsfromKGs,recogniz-ingthatpatternswithinKGsinherentlyrepresentmulti-stepplans.ThisdataisutilizedtoenhancetheplanningcapabilitiesofLLMs.

ComplexLogicalQueryinKGsRecentre-searchoncomplexlogicqueriesinKGsprimarilyfocusesonfirst-orderlogical(FOL)queriesthatincorporateoperationslikeconjunctions,disjunc-tions,negation,andexistentialquantifierswithin

incompleteKGs(Hamiltonetal.,

2018;

Renetal.,

2020;

RenandLeskovec,

2020;

Arakelyanetal.,

2021;

Chenetal.,

2022;

Xuetal.,

2022;

Xiong

etal.,

2024b;

Wuetal.,

2024

).Theseworksde-finediversepatternstoassessthecapabilityoflog-icaloperationsinvectorspaces,specificallytar-getinglogicalformsratherthannaturallanguage.Nonetheless,theirmethodologiesforpatterndefini-tionandextractioninspireourapproachtoderivingcomplexquestionsfromKGs.

3Method

3.1Overview

AsshowninFigure

2

,thereare3stepsinourLearningtoPlanfromKnowledgeGraphs(LPKG)

framework.(1)Inthedataconstructionstep,weconstructplanningdatafromKGs.Specifically,wedefinedsomebasicKGpatternsasshowninFigure

3.

WegroundpatternsinanexistingKGtoextractinstances.Foreachextractedinstance,wesequen-tiallyverbalizethesub-querieswithintheinstanceintonaturallanguagesub-questionsaccordingtotheirorderintheinstance,eventuallyassemblingthemintoacomplexquestion.Afterward,webuildinputandoutputtemplatesforplanningdata,wherecomplexquestionsareconcatenatedtotheinputprompt,andsub-questionsarefilledintothecorre-spondingpositionsintheoutputtextaccordingtothetypeofpatterns.(2)IntheplanningLLMtun-ingandinferencestep,wefine-tuneLLMsbasedonsuchplanningdatatoenabletheLLMstofol-lowinstructionstoinfertheplanforeachquestioninthedownstreamtestsets.(3)Inthethirdstep,suchaplanwillbeparsedandexecuted,therebyobtainingthefinalanswertoeachquestion.

3.2ConstructionofPlanningData

BasicKGPatterns.Inspiredbypreviouswork

oncomplexlogicquerieswithinKGs(Ren

andLeskovec,

2020

),wedefinethebasicKGpatternsasshowninFigure

3.

ThesetofKGpatternsisdenotedasP={1p,2p,3p,2i,3i,2u,ip,pi,compare}.Specifi-cally,p,i,urespectivelyindicateprojection,in-tersection,andunion.1p,2p,and3prepresent

c

u

ccompare

u

1p2p3p2u

2i3ipiip

Figure3:BasicKGpatterns.

queriesthatspanfromonetothreehops,2iand3irespectivelyrepresenttheintersectionoftwosub-queriesandthreesub-queries,2urepresentstheunionoftwosub-queries,andipandpirepresentcomplexqueriesthatcombinetwo-hopwithinter-sectionlogic.Inaddition,wealsocombinepairsoftriplesthathavenumerictailentitiesandthesamerelationstoconstructcomparisonpatterns,denotedascompare.

Grounding.GivenaKG,wefirstgroundthesepatternsinittoextractinstances:

Ipat=fpat(KG),pat∈P(1)

whereIpataretheinstancesgroundedbyknowl-edgegraphKGofpatternpat,fpatisthecorre-spondingextractionfunction.Forexample,anin-stanceofthe2ppatterncanbe“(Inkheart,(castmember,educatedat))”.Tobestmeettheneedsofopen-

domainQA,weuseWikidata15k(Chenetal.,

2023b

),asubsetoftheopen-domainKGWikidata,asKG.

Verbalization.Subsequently,basedonthegroundedinstances,weneedtoverbalizethembottom-upintosub-questionsandassemblethemintocomplexquestions.Thereareseveralmethodsforthisstep,suchasatemplates-basedmethod,manualannotation,orutilizinganLLM.Sincethetemplate-basedapproachoftenlacksfluencyinlanguageexpression,andthemanualmethodistime-consumingandlabor-intensive,weoptforanLLM-basedmethod.Specifically,wewriteasmallnumberofverbalizationexamplesforeachpatterntype.TheseexamplesareusedasdemonstrationsDe1tofillintheprompt.Finally,weconcatenateagroundedinstancei∈Ipattotheprompt,ask-inganLLMtoverbalizeittoanaturallanguagequestion:

{{Qsn}=1,Qc}=llm(concat(De1,i))(2)

where{Qsn}=1andQcrepresenttheresulting

sub-questionsandcomplexquestionrespectively,

concatisstringlevelconcatenation.WeuseGPT-4asllmhere.Itisimportanttonotethatherethellm’sroleismerelytotransformthedataformat;thesub-questionsandcomplexquestionstillorig-inatefromthestructureoftheKGitself,withoutintroducinganyknowledgefromthellminthetaskofquestionplanning.ThepromptweusecanbefoundinAppendix

C.1.

Filling.Wethenextractsub-questionsandcom-plexquestionsfromtheoutputofthellm.Subse-quently,webuiltasetofplanningtemplatesTpatfortheplanningprocessofquestionscorrespond-

ingtoeachpattern.The{Qsn}=1obtainedinthe

previousstepwillbefilledintofixedpositionsinTpatcorrespondingtotheirpatterntype,therebyobtainingtheoutputfortraining.TheQcobtainedinthepreviousstepisconcatenatedtotheendofafixedinstructionInsandsomeplanningdemon-strationsDe2(alsoconstructedfromKGs),thusobtainingtheinputfortrainingdata:

x=concat(Ins,De2,Qc)(3)

y=Tpat.fill({Qs}=1),pat∈P(4)

where.fillisafillingfunctionoftemplatesTpat.Inspiredby(

Aksitovetal.,

2023

),weuseacode-formattedinputxandoutputyhere(shownin“In-put”and“Output”inFigure

2

)tofacilitatefor-mattingandsubsequentparsingandexecutionoftheoutputplan(moredetailsinAppendix

C.2

).Intheend,weobtain9000trainingdataentries

Dtrain={xn,yn},with1000entriesforeach

pattern.Werandomlyselect100itemsfromthetrainingsetsformanualverification,withanaccu-racyrateofover95%.

3.3Fine-tuningandInferenceofPlanning

LLMs

WeusetheobtainedtrainingdataDtraintofine-tunetheplanningLLMsMpdirectlywiththestan-dardnexttokentrainingobjective:

E(x,y)∈DtrainLogpMp(y|x)(5)

Thefine-tunedplanningLLMMpcanbeusedtoinfertheplanPforeachquestionQtestinthedownstreamtestset:

P=Mp(concat(Ins,De2,Qtest))(6)

whereInsandDe2arethesameasthecontents

intheEquation(3

).Itshouldbenotedthatinthe

Type

Count

Type

Count

2pquestion

200

3pquestion

200

2iquestion

200

3iquestion

200

ipquestion

50

piquestion

50

2uquestion

200

comparequestion

100

Table1:DistributionofCLQA-Wiki.

multi-hopquestions,thespecificsub-questionsinthesecondandthirdhopsneedtobeconstructedbasedontheanswerstotheprevioushop’ssub-questions.SinceourPoutputsallprocessesatonce,theMpcannotknowtheanswerstotheprevi-oushop’ssub-questionswhenoutputtingtheplans.Therefore,wewilluseaplaceholdertoreplacetheanswertotheprevioushopsub-questions,allow-ingtheplanningtoproceedsmoothly(asshowninTable

9,

10,

13,

14

inAppendix

C.1

).Theseplace-holderswillthenbefilledinduringthesubsequent

parsingandexecutionprocess.

3.4PlanParsingandExecution

TheobtainedplanPneedstobeparsedandexe-cutedtoobtainthefinalansweroftheQtest.Duetoouradoptionofcode-formattedinputandoutputforfine-tuningtheMp,thePhereisalsohighlyfor-mattedcode,whichfacilitatesourparsingofeachstepoftheplanandexecutingthem.Inparticular:

?Whenastepincludesa“Search”function,wewillcallanexternalretrievaltool.

?Whenastepincludesa“GetAnswer”func-tion,we’llinvokeanexternalQALLMMQAtogetanswersforasub-questionbasedonthere-trievedinformation.Thepossibleplaceholdersinsub-questionswillbefilledwithpreviousanswers.WeaskQALLMtoorganizeanswersintheformofalist(promptisshowninTable

7

inAppendix

C.3

).

?When“Intersection”or“Union”appearsinthestep,wewillrunactualintersectionorunionfunctions.Thiscanbeeasilycompletedduetolistformatanswersinthepreviousstep.

ItisimportanttonotethattheplanningLLMMpandtheQALLMMQAarecompletelydecou-pledinourframework.HerewecanuseanyLLMoff-the-shelftohandlethetaskofQA.Ultimately,wecanobtaintheanswertoQtest.

4NewBenchMark:CLQA-Wiki

TheconventionalcomplexQAdatasetsinclude

HotPotQA(Yangetal.,

2018),2WikiMultihopQA

(Hoetal.,

2020

),MuSiQue(

Trivedietal.,

2022

),

andBamboogle(Pressetal.,

2023)

.DespitetheirwidespreaduseinevaluatingtheQAperformanceoflanguagemodels,weidentifysomeproblemswiththesedatasets:

(1)Allthesedatasetsareprimarilyfocusedonmulti-hopandcomparison-typequestions.Thetypesofquestionsarenotbalancedandcomprehen-siveenough,andlessattentionispaidtoquestionsinvolvingintersectionandunionlogic,whicharealsoverycommoninreality.

(2)ExceptforMuSiQue,thequestionsontherestoftheotherthreedatasetsonlyhaveonean-swer,whereasmanyquestionsinrealityoftenhavemultipleanswers.Forexample,theanswertoanintersectionquestion“WhichcountryborderswithRussiaandChinaatthesametime?”isaset[Mon-golia,Kazakhstan,NorthKorea].

Inlightofthis,weaimtoconstructanewtest-ingbenchmarkthatembodiesmorecomprehensivelogicandallowsforanunrestrictednumberofan-swerstomorethoroughlyevaluatetheperformanceoflanguagemodelsonvariouslogicalquestions.Consideringthedetailedpatternstructuresandun-restrictednumberofanswerentitiesinKGs,weconstructatestsetbasedonWikidata15k.

Similartothemethodusedtoconstructtheplan-ningdata,weextractinstancesfromWikidata15k(whichdonotappearinthetrainingdata)anduseGPT-4todoverbalization.Moreover,foreachinstance,wecanobtainalltheanswerentitiesfromWikidata15k,whichwethendesignateastheanswerstothequestions.Aftermanualqualitychecks,weobtainatestsetcalledCLQA-Wiki,whichcontains1,200piecesofdatafeaturingavarietyofComprehensiveLogicalQApairs.ThequestiontypesandtheirdistributionarelistedinTa-ble

1.

Itisworthnotingthatwehaveconstructed9typesoftestingquestionsuntilnow,andfornewlydefinedpatterns,wecanalsoquicklyconstructcorrespondingquestionsusingtheabovemethod,showingthebetterscalabilityofourdataset.

5Experiment

Weaimtoanswerthefollowingresearchquestionsinourexperiments:

?RQ1:CanLPKGoutperformbaselinemeth-odsonconventionalcomplexQAdatasets?

?RQ2:CanplanningdataderivedfromKGshelpimprovetheplanningabilityoftheLLMs?

?RQ3:CanplanningdataderivedfromKGs

bemorehelpfulinimprovingtheLLMs’planningabilitycomparedtonormaldistillationmethods?

?RQ4:CanLPKGoutperformbaselinemeth-odsonthenewbenchmarkCLQA-Wiki?

5.1ExperimentalSettings

DatasetsWefirstconductexperimentsonfourconventionalcomplexQAdatasets:

HotPotQA(Yangetal.,

2018),2WikiMulti

-

HopQA(2WikiMQA)(Hoetal.,

2020),MuSiQue

(Trivedietal.,

2022

),andBamboogle(

Pressetal.,

2023

).Amongthem,HotPotQA,2WikiMQA,andMuSiQuecontaincompletedtrainsets,developmentsets,andtestsets,whileBamboogleisasmalldatasetthatonlycontains125testdata.

Similartothepreviousmethod(Shaoetal.,

2023;

Aksitovetal.,

2023

),werespectivelyextractthefirst500entriesfromthedevelopmentsetofHotPotQA,2WikiMQA.ForMuSiQue,wefollow

Pressetal.

(2023)touseonly2-hopquestions

inthedevelopmentset.AndforBamboogle,weuseallofitsdataastestdata.Finally,weconducttestingonourbenchmarkCLQA-Wiki.

BaselinesWecompareourframeworktovariousbaselines:?Direct:DirectlyinputtheoriginalquestionintoLLM.?CoT:Follow

Kojimaetal.

(2022),weinstructLLMfirstly“Thinkstepbystep”

andthengivethefinalanswers.?DirectRAG:ThepromptsenttoLLMcontainstheoriginalquestionandretrievedinformationrelatedtotheoriginalquestion.?ReAct

(Yaoetal.,

2022

):Answeringquestionsthroughiterativeplanning,action,andobservation.Theactionhereistheretrievaltoolandobservationistheretrievedinformation.TheplanningandQAareconductedonasingleLLM.?Self-Ask

(Pressetal.,

2023

):SimilartoReAct,itfirstinstructsLLMtojudgewhethersub-questionsareneeded.Ifso,itwillrequestLLMtogeneratethesub-questions,thenconductexternalretrievalbasedonthesub-questions,andallowLLMtopro-videanswersbasedontheretrievedinformation.?ICLPKGAvariantofLPKGframework.Plan-ningLLMsarenotfine-tuned,whilejustusingIn-ContextLearningtodoPlanningwithsomeKG-sourcedplanningdemonstrations.

EvaluationMetricsExactMatch(EM)issetasanevaluationmetricinHotPotQA,2WikiMQA,Bamboogle,andMuSiQue.WhileinCLQA-Wiki,weuseRecallandPrecision.

ImplementationDetailsAllbaselinesarecon-ductedwithgpt-3.5-turbo-1106

1

(GPT-3.5).Thepromptsof“Direct”,“CoT”,and“DirectRAG”arewrittenbyourselves.TheReActandSelf-AskarereplicatedbasedontheirsourcecodewiththeGPT-3.5API.Tofacilitateassessment,wewillaskthemodeltoonlyoutputconciseanswerphrases. Inourframework:(1)Forpatterngrounding,weuseWikidata15kasKG,whichcontainsabout15kentitiesand263relations.Theextractiontoolingroundingismodifiedfromexistingworks

(RenandLeskovec,

2020

).(2)FortheplanningLLMMp,wechooseCodeQwen1.5-7B-ChatandLlama3-8B-Instruct,oneexcelsatcodingwhiletheotherexcelsatcommonsensereasoning.Wefine-tunethemwithLoratuning,runningon4x80GA100GPUsforabout3hours.Thefine-tuningisconductedfor2epochs,withalearningrateof5e-5andacosinelearningratescheduler.(3)Forretrieval,followingpreviousworks(

Shao

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論