版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
LearningtoPlanforRetrieval-AugmentedLargeLanguageModelsfromKnowledgeGraphs
JunjieWang1,2,5*,MingyangChen3
*
,BinbinHu2,5,DanYang2,5,ZiqiLiu2,5,
YueShen2,5,PengWei2,5,ZhiqiangZhang2,5,JinjieGu2,5,JunZhou2,5,
JeffZ.Pan4,WenZhang1,5?,HuajunChen1,5
?
1ZhejiangUniversity,2AntGroup,3BaichuanInc.,4TheUniversityofEdinburgh
5ZhejiangUniversity-AntGroupJointLaboratoryofKnowledgeGraph
{wangjj2018,zhang.wen,huajunsir}@,chenmingyang@
Planning
/j.z.pan/
/zjukg/LPKG
Abstract
arXiv:2406.14282v3[cs.CL]23Oct2024
Improvingtheperformanceoflargelanguagemodels(LLMs)incomplexquestion-answering(QA)scenarioshasalwaysbeenaresearchfo-calpoint.Recentstudieshaveattemptedtoen-hanceLLMs’performancebycombiningstep-wiseplanningwithexternalretrieval.WhileeffectiveforadvancedmodelslikeGPT-3.5,smallerLLMsfacechallengesindecomposingcomplexquestions,necessitatingsupervisedfine-tuning.Previousworkhasreliedonman-ualannotationandknowledgedistillationfromteacherLLMs,whicharetime-consumingandnotaccurateenough.Inthispaper,weintro-duceanovelframeworkforenhancingLLMs’planningcapabilitiesbyusingplanningdataderivedfromknowledgegraphs(KGs).LLMsfine-tunedwiththisdatahaveimprovedplan-ningcapabilities,betterequippingthemtohan-dlecomplexQAtasksthatinvolveretrieval.Evaluationsonmultipledatasets,includingournewlyproposedbenchmark,highlighttheef-fectivenessofourframeworkandthebenefitsofKG-derivedplanningdata.
1Introduction
Thepastfewyearshavewitnessedsignificantin-
novationsinLLMs(Ouyangetal.,
2022;
Touvron
etal.,
2023;
Chowdheryetal.,
2023;
AI@Meta,
2024
).WhileLLMsexcelinmanynaturallan-guageprocessingtasks,theystillfacechallenges,particularlythesmallermodels,inhandlingcom-plexquestion-answering(QA)tasks(
Pressetal.,
2023;
Shaoetal.,
2023;
Yaoetal.,
2022;
Xiong
etal.,
2024a;
Huangetal.,
2024)
.
ToimprovetheperformanceofLLMsoncom-plexQAtasks,pastresearchhastriedvariousmeth-ods:(1)Employingcarefullydesignedpromptstrategiestoguidethemodelinreasoning,such
asChainofThought(CoT)(Kojimaetal.,
2022;
*Equalcontribution.
?Correspondingauthors.
Pattern
Q:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?
Spouse
Sports
FranWalshAns_1
Q1:WhoisFranWalsh’sSpouse?
A1:Ans_1
Q2:Whatsportsdoes{Ans_1}play?
Instance
A2:Ans_2
Sports
Q3:WhatsportsdoesFluminenseplay?
A3:Ans_3
Fluminense
FinalAnswer:A2&A3
Figure1:AnexampleofaKGpattern,itsgroundedinstance,andverbalizedplanningprocess.
Weietal.,
2022
)andTreeofThought(ToT)(
Yao
etal.,
2024
)methods;(2)Utilizingretrievaltech-niquestoobtainsupplementalinformationfromexternalknowledgesource(
Lewisetal.,
2020;
Guu
etal.,
2020);(3)Combiningpromptstrategieswith
retrievalenhancements,asexemplifiedbymeth-
odslikeReAct(Yaoetal.,
2022)andSelf-Ask
(Pressetal.,
2023
).Thethirdapproachhasgar-neredwidespreadresearchinterestduetoitsinte-grationoftheadvantagesofthefirsttwomethods.Thefundamentalideaofthisclassofmethodsisto
guideLLMsinbreakingdownacomplexquestionintomultiplesimplersub-questionsandthenusearetrieval-augmentedgeneration(RAG)(
Huang
etal.,
2023,
2024
)methodtoanswereachsub-question,therebydeducingtheanswertotheorigi-nalcomplexquestion.However,planningforcom-plexquestionsisnon-trivial,especiallyforsmallerLLMs(withfewerthan10billionparameters),whichoftenrequiresupervisedfine-tuning(
Ak-
sitovetal.,
2023;
Chenetal.,
2023a;
Qinetal.,
2023
).
Thisraisesawidelyconcerningissue:howtoobtainsuperviseddataforlearningtheplanningabilityoncomplexquestions.Manualannotationistime-consumingandlabor-intensive,makingitdifficulttoscale.Mostexistingmethodsattemptto
distillknowledgefromteacherLLMs(Yaoetal.,
2022;
Aksitovetal.,
2023
),whichplacesexcessive
trustintheteacherLLMsand,inreality,cannot
guaranteetheaccuracyofthedistilledknowledge.
Thesechallengesinspireustoexplorenewwaysofobtainingsupervisedplanningdata.
KnowledgeGraphs(KGs)(Panetal.,
2017b,a)
usuallystoreaccurateknowledgeinastructuredway.WefindthataKGpatterncanbeviewedastheabstractofacomplexquestion,asshowninFigure
1
,whichrevealstheconnectionbetweenquestionplanningandpatterns.Thisopensupthepossibilityofconstructingtrainingdatatoen-hancetheplanningcapabilitiesofLLMsusingKGs.Specifically,westartbygroundingpredefinedpat-ternsinanopen-domainKGtoextractnumerousinstances,whichwethenverbalizeintocomplexquestionsandcorrespondingsub-questionsinnat-urallanguage.Inthisway,weeffectivelycreatealargenumberofaccurateplanningdataforfine-tuning.Beingfine-tunedwiththeseplanningdata,LLMs’capabilityofgeneratingplansforcomplexquestionsisenhanced,resultinginbetterfinalan-swersbyparsingandexecutingtheseplans.WerefertothisinnovativeframeworkasLearningtoPlanfromKnowledgeGraphs(LPKG).
Additionally,weconstructaComprehensiveLogicalQAbenchmark,CLQA-Wiki,fromasub-
setofWikidata(VrandecicandKr?tzsch,
2014)via
groundingrichpatternsasaforementioned.Exist-
ingcomplexQAbenchmarks(Yangetal.,
2018;
Ho
etal.,
2020;
Pressetal.,
2023;
Trivedietal.,
2022
)primarilyfocusonmulti-hopandcomparison-typequestionsandlacklogicaloperations.Furthermore,mostquestionsarelabeledwithonlyoneanswer,whereasinreality,theyoftenhavemultiplecorrectanswers.TheCLQA-Wikibenchmarkevenlycov-ersmulti-hop,comparison,intersection,anduniontypesofquestions,whichismorecomprehensiveandchallengingforcomplexQAevaluation.
Ourcontributionscanbesummarizedasfollows:
(1)WeintroduceanovelframeworkLPKGthatenhancestheplanningabilityofLLMsusingdataconstructedfromKGpatterns;(2)Wedevelopacomprehensiveandchallengingevaluationbench-mark,namedCLQA-Wiki,tomoreeffectivelyas-sesstheperformanceofLLMsoncomplexQAtasks;(3)OurproposedframeworkLPKGachievesbetterresultsthanpopularbaselinesonmultipleconventionalcomplexQAbenchmarks,andweverifytheeffectivenessoftheintroductionofKG-sourcedplanningdata.
2RelatedWorks
ReasoningandPlanningwithLLMsInthecon-textofLLMs,reasoningtypicallyinvolvesdecom-posingcomplexquestionsintosub-questions(
Mi-
alonetal.,
2023;
Haoetal.,
2023)
.Prominenttech-niquesincludeChain-of-Thought(CoT)prompt-ing(
Weietal.,
2022
)whichelicitsrationalesthatleadtothefinalanswers,anditsextension,usingself-consistency(
Wangetal.,
2023)orautomated
demonstrationselection(
Zhangetal.,
2023)
.Other
methods,suchasReAct(Yaoetal.,
2022),gen
-eratereasoningstepssequentiallybyintegratingplanning,withadditionalstrategieslikeTreeof
Thoughts(ToT)(Yaoetal.,
2024),Reasoningvia
Planning(RAP)(Haoetal.,
2023),andothermeth
-ods(
Khotetal.,
2023;
Zhouetal.,
2023
)facil-itatingcomplexquestiondecompositionthroughvariedplanningapproaches.Unlikemostmethodsthatrelyonin-contextlearningthroughprompten-gineering,ourapproachgeneratesplanningdatafromKGstofine-tuneLLM,therebyenhancingtheirplanningcapabilities.
Retrieval-AugmentedGenerationRetrieval-AugmentedGeneration(RAG)canenhanceLLMsbyincorporatingexternaldata,allowingmodelstoaccessup-to-dateinformationandfactualknowl-edgetomitigatehallucinations(
Gaoetal.,
2023;
Guuetal.,
2020;
Lewisetal.,
2020
).Eachmod-uleintheRAGpipelinecanbeoptimized,forin-stance,throughretrievertuning(
Shietal.,
2023;
Linetal.,
2023
),self-reflectionduringretrieval
(Asaietal.,
2023;
Yanetal.,
2024
),orqueryre-finement(
Chanetal.,
2024
).Toaddressmulti-
hopquestions,iterativeRAGmodels(Shaoetal.,
2023;
Fengetal.,
2023;
Pressetal.,
2023)have
beendeveloped,whichiterativelyconductretrieval-enhancedgenerationandgeneration-enhancedre-trieval.However,themultipleRAGstepsinexist-ingmethodsarenotoptimizedandrelyheavilyonin-contextlearning.OurapproachusesplanningdatafromKGstofacilitatemoreefficientRAG.
LLMswithKGsIntheexistingrealmofLLMs,KGsareprimarilyutilizedassourcesofstructuredfactualknowledge(
Panetal.,
2023
).Forexam-ple,Think-on-
Graph(Sunetal.,
2023)extracts
relevanttriplesfromKGstoassistinQA.Reason-
ingonGraph(RoG)(Luoetal.,
2023)generates
relation-basedplansandretrievescorrespondingpathsfromthesegraphs.WhileaidinginKGQAtaskswhereanswersaredirectlysourcedfrom
Step2:PlanningLLMTuningandInference
SFTInference
“WhichregionsborderDrakeBell'sbirthplaceandSantaAnaatthesametime?”
Sub_Question_1:str="WhatisthebirthplaceofDrakeBell?"Info_1:str=Search(query=Sub_Question_1)
Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)
Sub_Question_2:str=f"Whichareasborderwith{Ans_1}"Info_2:str=Search(query=Sub_Question_2)
Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)
Sub_Question_3:str="WhichareasborderwithSantaAna?"Info_3:str=Search(query=Sub_Question_3)
Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)
Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)
Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)
Step3:PlanParsingandExecution
RetrievalQALLM
SetOperate
FinalAnswer
…
Inter_Results1:str=Intersection…
Sub_Question_1:…
Info_1:str=Search…
…
Ans_1:str=Get_Answer…
Final_Answer…
…
…
Step1:DataConstruction
Sports
FranWalshAns_1
FluminenseSports
KnowledgeGraph
Spouse
Q1:WhatistheSpouseofFranWalsh?
A1:Ans_1
Sports
Q2:Whatsportsdoes{Ans_1}play?A2:Ans_2
Sports
Q3:WhatsportsdoesFluminenseplay?A3:Ans_3
FinalQ:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?
Output
Sub_Question_1:str="WhatistheSpouseofFranWalsh?"Info_1:str=Search(query=Sub_Question_1)
Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)
Sub_Question_2:str=f"Whatsportsdoes{Ans_1}play?"Info_2:str=Search(query=Sub_Question_2)
Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)
Sub_Question_3:str="Q3:WhatsportsdoesFluminenseplay?"
Info_3:str=Search(query=Sub_Question_3)
Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)
Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)
Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)
Input
##Example0##
…
##Example1##
...
##YourTurn##
Original_Question:str=‘WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?’
Grounding
Verbalization
Filling
Spouse
Figure2:OverviewofourLearningtoPlanfromKnowlegeGraph(LPKG)framework.
KGs,thesegraphsalsosupportrationalegeneration.Chain-of-Knowledge(CoK)(
Lietal.,
2024
)fur-therleveragesKGsalongwithotherheterogeneoussourcestogeneratefaithfulrationales.Unlikepre-viousstudies,ourapproachconstructsplanningdataforcomplexquestionsfromKGs,recogniz-ingthatpatternswithinKGsinherentlyrepresentmulti-stepplans.ThisdataisutilizedtoenhancetheplanningcapabilitiesofLLMs.
ComplexLogicalQueryinKGsRecentre-searchoncomplexlogicqueriesinKGsprimarilyfocusesonfirst-orderlogical(FOL)queriesthatincorporateoperationslikeconjunctions,disjunc-tions,negation,andexistentialquantifierswithin
incompleteKGs(Hamiltonetal.,
2018;
Renetal.,
2020;
RenandLeskovec,
2020;
Arakelyanetal.,
2021;
Chenetal.,
2022;
Xuetal.,
2022;
Xiong
etal.,
2024b;
Wuetal.,
2024
).Theseworksde-finediversepatternstoassessthecapabilityoflog-icaloperationsinvectorspaces,specificallytar-getinglogicalformsratherthannaturallanguage.Nonetheless,theirmethodologiesforpatterndefini-tionandextractioninspireourapproachtoderivingcomplexquestionsfromKGs.
3Method
3.1Overview
AsshowninFigure
2
,thereare3stepsinourLearningtoPlanfromKnowledgeGraphs(LPKG)
framework.(1)Inthedataconstructionstep,weconstructplanningdatafromKGs.Specifically,wedefinedsomebasicKGpatternsasshowninFigure
3.
WegroundpatternsinanexistingKGtoextractinstances.Foreachextractedinstance,wesequen-tiallyverbalizethesub-querieswithintheinstanceintonaturallanguagesub-questionsaccordingtotheirorderintheinstance,eventuallyassemblingthemintoacomplexquestion.Afterward,webuildinputandoutputtemplatesforplanningdata,wherecomplexquestionsareconcatenatedtotheinputprompt,andsub-questionsarefilledintothecorre-spondingpositionsintheoutputtextaccordingtothetypeofpatterns.(2)IntheplanningLLMtun-ingandinferencestep,wefine-tuneLLMsbasedonsuchplanningdatatoenabletheLLMstofol-lowinstructionstoinfertheplanforeachquestioninthedownstreamtestsets.(3)Inthethirdstep,suchaplanwillbeparsedandexecuted,therebyobtainingthefinalanswertoeachquestion.
3.2ConstructionofPlanningData
BasicKGPatterns.Inspiredbypreviouswork
oncomplexlogicquerieswithinKGs(Ren
andLeskovec,
2020
),wedefinethebasicKGpatternsasshowninFigure
3.
ThesetofKGpatternsisdenotedasP={1p,2p,3p,2i,3i,2u,ip,pi,compare}.Specifi-cally,p,i,urespectivelyindicateprojection,in-tersection,andunion.1p,2p,and3prepresent
c
u
ccompare
u
1p2p3p2u
2i3ipiip
Figure3:BasicKGpatterns.
queriesthatspanfromonetothreehops,2iand3irespectivelyrepresenttheintersectionoftwosub-queriesandthreesub-queries,2urepresentstheunionoftwosub-queries,andipandpirepresentcomplexqueriesthatcombinetwo-hopwithinter-sectionlogic.Inaddition,wealsocombinepairsoftriplesthathavenumerictailentitiesandthesamerelationstoconstructcomparisonpatterns,denotedascompare.
Grounding.GivenaKG,wefirstgroundthesepatternsinittoextractinstances:
Ipat=fpat(KG),pat∈P(1)
whereIpataretheinstancesgroundedbyknowl-edgegraphKGofpatternpat,fpatisthecorre-spondingextractionfunction.Forexample,anin-stanceofthe2ppatterncanbe“(Inkheart,(castmember,educatedat))”.Tobestmeettheneedsofopen-
domainQA,weuseWikidata15k(Chenetal.,
2023b
),asubsetoftheopen-domainKGWikidata,asKG.
Verbalization.Subsequently,basedonthegroundedinstances,weneedtoverbalizethembottom-upintosub-questionsandassemblethemintocomplexquestions.Thereareseveralmethodsforthisstep,suchasatemplates-basedmethod,manualannotation,orutilizinganLLM.Sincethetemplate-basedapproachoftenlacksfluencyinlanguageexpression,andthemanualmethodistime-consumingandlabor-intensive,weoptforanLLM-basedmethod.Specifically,wewriteasmallnumberofverbalizationexamplesforeachpatterntype.TheseexamplesareusedasdemonstrationsDe1tofillintheprompt.Finally,weconcatenateagroundedinstancei∈Ipattotheprompt,ask-inganLLMtoverbalizeittoanaturallanguagequestion:
{{Qsn}=1,Qc}=llm(concat(De1,i))(2)
where{Qsn}=1andQcrepresenttheresulting
sub-questionsandcomplexquestionrespectively,
concatisstringlevelconcatenation.WeuseGPT-4asllmhere.Itisimportanttonotethatherethellm’sroleismerelytotransformthedataformat;thesub-questionsandcomplexquestionstillorig-inatefromthestructureoftheKGitself,withoutintroducinganyknowledgefromthellminthetaskofquestionplanning.ThepromptweusecanbefoundinAppendix
C.1.
Filling.Wethenextractsub-questionsandcom-plexquestionsfromtheoutputofthellm.Subse-quently,webuiltasetofplanningtemplatesTpatfortheplanningprocessofquestionscorrespond-
ingtoeachpattern.The{Qsn}=1obtainedinthe
previousstepwillbefilledintofixedpositionsinTpatcorrespondingtotheirpatterntype,therebyobtainingtheoutputfortraining.TheQcobtainedinthepreviousstepisconcatenatedtotheendofafixedinstructionInsandsomeplanningdemon-strationsDe2(alsoconstructedfromKGs),thusobtainingtheinputfortrainingdata:
x=concat(Ins,De2,Qc)(3)
y=Tpat.fill({Qs}=1),pat∈P(4)
where.fillisafillingfunctionoftemplatesTpat.Inspiredby(
Aksitovetal.,
2023
),weuseacode-formattedinputxandoutputyhere(shownin“In-put”and“Output”inFigure
2
)tofacilitatefor-mattingandsubsequentparsingandexecutionoftheoutputplan(moredetailsinAppendix
C.2
).Intheend,weobtain9000trainingdataentries
Dtrain={xn,yn},with1000entriesforeach
pattern.Werandomlyselect100itemsfromthetrainingsetsformanualverification,withanaccu-racyrateofover95%.
3.3Fine-tuningandInferenceofPlanning
LLMs
WeusetheobtainedtrainingdataDtraintofine-tunetheplanningLLMsMpdirectlywiththestan-dardnexttokentrainingobjective:
E(x,y)∈DtrainLogpMp(y|x)(5)
Thefine-tunedplanningLLMMpcanbeusedtoinfertheplanPforeachquestionQtestinthedownstreamtestset:
P=Mp(concat(Ins,De2,Qtest))(6)
whereInsandDe2arethesameasthecontents
intheEquation(3
).Itshouldbenotedthatinthe
Type
Count
Type
Count
2pquestion
200
3pquestion
200
2iquestion
200
3iquestion
200
ipquestion
50
piquestion
50
2uquestion
200
comparequestion
100
Table1:DistributionofCLQA-Wiki.
multi-hopquestions,thespecificsub-questionsinthesecondandthirdhopsneedtobeconstructedbasedontheanswerstotheprevioushop’ssub-questions.SinceourPoutputsallprocessesatonce,theMpcannotknowtheanswerstotheprevi-oushop’ssub-questionswhenoutputtingtheplans.Therefore,wewilluseaplaceholdertoreplacetheanswertotheprevioushopsub-questions,allow-ingtheplanningtoproceedsmoothly(asshowninTable
9,
10,
13,
14
inAppendix
C.1
).Theseplace-holderswillthenbefilledinduringthesubsequent
parsingandexecutionprocess.
3.4PlanParsingandExecution
TheobtainedplanPneedstobeparsedandexe-cutedtoobtainthefinalansweroftheQtest.Duetoouradoptionofcode-formattedinputandoutputforfine-tuningtheMp,thePhereisalsohighlyfor-mattedcode,whichfacilitatesourparsingofeachstepoftheplanandexecutingthem.Inparticular:
?Whenastepincludesa“Search”function,wewillcallanexternalretrievaltool.
?Whenastepincludesa“GetAnswer”func-tion,we’llinvokeanexternalQALLMMQAtogetanswersforasub-questionbasedonthere-trievedinformation.Thepossibleplaceholdersinsub-questionswillbefilledwithpreviousanswers.WeaskQALLMtoorganizeanswersintheformofalist(promptisshowninTable
7
inAppendix
C.3
).
?When“Intersection”or“Union”appearsinthestep,wewillrunactualintersectionorunionfunctions.Thiscanbeeasilycompletedduetolistformatanswersinthepreviousstep.
ItisimportanttonotethattheplanningLLMMpandtheQALLMMQAarecompletelydecou-pledinourframework.HerewecanuseanyLLMoff-the-shelftohandlethetaskofQA.Ultimately,wecanobtaintheanswertoQtest.
4NewBenchMark:CLQA-Wiki
TheconventionalcomplexQAdatasetsinclude
HotPotQA(Yangetal.,
2018),2WikiMultihopQA
(Hoetal.,
2020
),MuSiQue(
Trivedietal.,
2022
),
andBamboogle(Pressetal.,
2023)
.DespitetheirwidespreaduseinevaluatingtheQAperformanceoflanguagemodels,weidentifysomeproblemswiththesedatasets:
(1)Allthesedatasetsareprimarilyfocusedonmulti-hopandcomparison-typequestions.Thetypesofquestionsarenotbalancedandcomprehen-siveenough,andlessattentionispaidtoquestionsinvolvingintersectionandunionlogic,whicharealsoverycommoninreality.
(2)ExceptforMuSiQue,thequestionsontherestoftheotherthreedatasetsonlyhaveonean-swer,whereasmanyquestionsinrealityoftenhavemultipleanswers.Forexample,theanswertoanintersectionquestion“WhichcountryborderswithRussiaandChinaatthesametime?”isaset[Mon-golia,Kazakhstan,NorthKorea].
Inlightofthis,weaimtoconstructanewtest-ingbenchmarkthatembodiesmorecomprehensivelogicandallowsforanunrestrictednumberofan-swerstomorethoroughlyevaluatetheperformanceoflanguagemodelsonvariouslogicalquestions.Consideringthedetailedpatternstructuresandun-restrictednumberofanswerentitiesinKGs,weconstructatestsetbasedonWikidata15k.
Similartothemethodusedtoconstructtheplan-ningdata,weextractinstancesfromWikidata15k(whichdonotappearinthetrainingdata)anduseGPT-4todoverbalization.Moreover,foreachinstance,wecanobtainalltheanswerentitiesfromWikidata15k,whichwethendesignateastheanswerstothequestions.Aftermanualqualitychecks,weobtainatestsetcalledCLQA-Wiki,whichcontains1,200piecesofdatafeaturingavarietyofComprehensiveLogicalQApairs.ThequestiontypesandtheirdistributionarelistedinTa-ble
1.
Itisworthnotingthatwehaveconstructed9typesoftestingquestionsuntilnow,andfornewlydefinedpatterns,wecanalsoquicklyconstructcorrespondingquestionsusingtheabovemethod,showingthebetterscalabilityofourdataset.
5Experiment
Weaimtoanswerthefollowingresearchquestionsinourexperiments:
?RQ1:CanLPKGoutperformbaselinemeth-odsonconventionalcomplexQAdatasets?
?RQ2:CanplanningdataderivedfromKGshelpimprovetheplanningabilityoftheLLMs?
?RQ3:CanplanningdataderivedfromKGs
bemorehelpfulinimprovingtheLLMs’planningabilitycomparedtonormaldistillationmethods?
?RQ4:CanLPKGoutperformbaselinemeth-odsonthenewbenchmarkCLQA-Wiki?
5.1ExperimentalSettings
DatasetsWefirstconductexperimentsonfourconventionalcomplexQAdatasets:
HotPotQA(Yangetal.,
2018),2WikiMulti
-
HopQA(2WikiMQA)(Hoetal.,
2020),MuSiQue
(Trivedietal.,
2022
),andBamboogle(
Pressetal.,
2023
).Amongthem,HotPotQA,2WikiMQA,andMuSiQuecontaincompletedtrainsets,developmentsets,andtestsets,whileBamboogleisasmalldatasetthatonlycontains125testdata.
Similartothepreviousmethod(Shaoetal.,
2023;
Aksitovetal.,
2023
),werespectivelyextractthefirst500entriesfromthedevelopmentsetofHotPotQA,2WikiMQA.ForMuSiQue,wefollow
Pressetal.
(2023)touseonly2-hopquestions
inthedevelopmentset.AndforBamboogle,weuseallofitsdataastestdata.Finally,weconducttestingonourbenchmarkCLQA-Wiki.
BaselinesWecompareourframeworktovariousbaselines:?Direct:DirectlyinputtheoriginalquestionintoLLM.?CoT:Follow
Kojimaetal.
(2022),weinstructLLMfirstly“Thinkstepbystep”
andthengivethefinalanswers.?DirectRAG:ThepromptsenttoLLMcontainstheoriginalquestionandretrievedinformationrelatedtotheoriginalquestion.?ReAct
(Yaoetal.,
2022
):Answeringquestionsthroughiterativeplanning,action,andobservation.Theactionhereistheretrievaltoolandobservationistheretrievedinformation.TheplanningandQAareconductedonasingleLLM.?Self-Ask
(Pressetal.,
2023
):SimilartoReAct,itfirstinstructsLLMtojudgewhethersub-questionsareneeded.Ifso,itwillrequestLLMtogeneratethesub-questions,thenconductexternalretrievalbasedonthesub-questions,andallowLLMtopro-videanswersbasedontheretrievedinformation.?ICLPKGAvariantofLPKGframework.Plan-ningLLMsarenotfine-tuned,whilejustusingIn-ContextLearningtodoPlanningwithsomeKG-sourcedplanningdemonstrations.
EvaluationMetricsExactMatch(EM)issetasanevaluationmetricinHotPotQA,2WikiMQA,Bamboogle,andMuSiQue.WhileinCLQA-Wiki,weuseRecallandPrecision.
ImplementationDetailsAllbaselinesarecon-ductedwithgpt-3.5-turbo-1106
1
(GPT-3.5).Thepromptsof“Direct”,“CoT”,and“DirectRAG”arewrittenbyourselves.TheReActandSelf-AskarereplicatedbasedontheirsourcecodewiththeGPT-3.5API.Tofacilitateassessment,wewillaskthemodeltoonlyoutputconciseanswerphrases. Inourframework:(1)Forpatterngrounding,weuseWikidata15kasKG,whichcontainsabout15kentitiesand263relations.Theextractiontoolingroundingismodifiedfromexistingworks
(RenandLeskovec,
2020
).(2)FortheplanningLLMMp,wechooseCodeQwen1.5-7B-ChatandLlama3-8B-Instruct,oneexcelsatcodingwhiletheotherexcelsatcommonsensereasoning.Wefine-tunethemwithLoratuning,runningon4x80GA100GPUsforabout3hours.Thefine-tuningisconductedfor2epochs,withalearningrateof5e-5andacosinelearningratescheduler.(3)Forretrieval,followingpreviousworks(
Shao
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 家居行業(yè)美工設(shè)計技法
- 計算機行業(yè)服務(wù)員工作總結(jié)
- 小學科學實驗教學的創(chuàng)新路徑與效果評估
- 教育心理學在課堂活動設(shè)計中的應(yīng)用研究
- 小學生注意力管理在數(shù)學課堂中的應(yīng)用
- 教育創(chuàng)新與農(nóng)業(yè)可持續(xù)發(fā)展戰(zhàn)略探索
- 審計合同范本(2篇)
- 宮頸癌篩查委托協(xié)議書
- 思政課中滲透心理健康教育的實踐探索
- 2025年塔城貨運資格證模擬考試題
- “十四五”期間推進智慧水利建設(shè)實施方案
- EPC項目機電安裝專業(yè)工程重難點分析及經(jīng)驗交流
- 大型活動聯(lián)合承辦協(xié)議
- 工程項目采購與供應(yīng)鏈管理研究
- 2024年吉林高考語文試題及答案 (2) - 副本
- 拆除電纜線施工方案
- 搭竹架合同范本
- Neo4j介紹及實現(xiàn)原理
- 焊接材料-DIN-8555-標準
- 工程索賠真實案例范本
- 重癥醫(yī)學科運用PDCA循環(huán)降低ICU失禁性皮炎發(fā)生率品管圈QCC持續(xù)質(zhì)量改進成果匯報
評論
0/150
提交評論