版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
arXiv:2207.04429v1[cs.RO]10Jul2022
LM-Nav:RoboticNavigationwithLargePre-TrainedModelsofLanguage,Vision,andAction
DhruvShah+β,B?aejOsiski+βω,BrianIchterγ,SergeyLevineβγβUCBerkeley,ωUniversityofWarsaw,γRoboticsatGoogle
Abstract:Goal-conditionedpoliciesforroboticnavigationcanbetrainedonlarge,unannotateddatasets,providingforgoodgeneralizationtoreal-worldset-tings.However,particularlyinvision-basedsettingswherespecifyinggoalsre-quiresanimage,thismakesforanunnaturalinterface.Languageprovidesamoreconvenientmodalityforcommunicationwithrobots,butcontemporarymethodstypicallyrequireexpensivesupervision,intheformoftrajectoriesannotatedwithlanguagedescriptions.Wepresentasystem,LM-Nav,forroboticnavigationthatenjoysthebene?tsoftrainingonunannotatedlargedatasetsoftrajectories,whilestillprovidingahigh-levelinterfacetotheuser.Insteadofutilizingalabeledinstructionfollowingdataset,weshowthatsuchasystemcanbeconstructeden-tirelyoutofpre-trainedmodelsfornavigation(ViNG),image-languageassocia-tion(CLIP),andlanguagemodeling(GPT-3),withoutrequiringany?ne-tuningorlanguage-annotatedrobotdata.WeinstantiateLM-Navonareal-worldmobilerobotanddemonstratelong-horizonnavigationthroughcomplex,outdoorenvi-ronmentsfromnaturallanguageinstructions.
Keywords:instructionfollowing,languagemodels,vision-basednavigation
1Introduction
Oneofthecentralchallengesinroboticlearningistoenablerobotstoperformawidevarietyoftasksoncommand,followinghigh-levelinstructionsfromhumans.Thisrequiresrobotsthatcanunderstandhumaninstructions,andareequippedwithalargerepertoireofdiversebehaviorstoexecutesuchinstructionsintherealworld.Priorworkoninstructionfollowinginnavigationhaslargelyfocusedonlearningfromtrajectoriesannotatedwithtextualinstructions[
1
–
5
].Thisenablesunderstandingoftextualinstructions,butthecostofdataannotationimpedeswideadoption.Ontheotherhand,recentworkhasshownthatlearningrobustnavigationispossiblethroughgoal-conditionedpoliciestrainedwithself-supervision.Theseutilizelarge,unlabeleddatasetstotrainvision-basedcontrollersviahindsightrelabeling[
6
–
11
].Theyprovidescalability,generalizability,androbustness,butusuallyinvolveaclunkymechanismforgoalspeci?cation,usinglocationsorimages.Inthiswork,weaimtocombinethestrengthsofbothapproaches,enablingaself-supervisedsystemforroboticnavigationtoexecutenaturallanguageinstructionsbyleveragingthecapabilitiesofpre-trainedmodelswithoutanyuser-annotatednavigationaldata.Ourmethodusesthesemodelstoconstructan“interface”thathumanscanusetocommunicatedesiredtaskstorobots.Thissystemenjoystheimpressivegeneralizationcapabilitiesofthepre-trainedlanguageandvision-languagemodels,enablingtheroboticsystemtoacceptcomplexhigh-levelinstructions.
Ourmainobservationisthatwecanutilizeoff-the-shelfpre-trainedmodelstrainedonlargecorporaofvisualandlanguagedatasets—thatarewidelyavailableandshowgreatfew-shotgeneraliza-tioncapabilities—tocreatethisinterfaceforembodiedinstructionfollowing.Toachievethis,wecombinethestrengthsoftwosuchrobot-agnosticpre-trainedmodelswithapre-trainednavigationmodel.Weuseavisualnavigationmodel(VNM:ViNG[
11
])tocreateatopological“mentalmap”oftheenvironmentusingtherobot’sobservations.Givenfree-formtextualinstructions,weusea
+Theseauthorscontributedequally,orderdecidedbyacoin?ip.Checkouttheprojectpageforexperimentvideos,code,andauser-friendlyColabnotebookthatrunsinyourbrowser:
/view/lmnav
2
Figure1:EmbodiedinstructionfollowingwithLM-Nav:Oursystemtakesasinputasetofrawobservationsfromthetargetenvironmentandfree-formtextualinstructions(left),derivinganactionableplanusingthreepre-trainedmodels:alargelanguagemodel(LLM)forextractinglandmarks,avision-and-languagemodel(VLM)forgrounding,andavisualnavigationmodel(VNM)forexecution.ThisenablesLM-Navtofollowtextualinstructionsincomplexenvironmentspurelyfromvisualobservations(right)withoutany?ne-tuning.
pre-trainedlargelanguagemodel(LLM:GPT-3[
12
])todecodetheinstructionsintoasequenceoftextuallandmarks.Wethenuseavision-languagemodel(VLM:CLIP[
13
])forgroundingthesetextuallandmarksinthetopologicalmap,byinferringajointlikelihoodoverthelandmarksandnodes.Anovelsearchalgorithmisthenusedtomaximizeaprobabilisticobjective,and?ndaplanfortherobot,whichisthenexecutedbyVNM.
OurprimarycontributionisLargeModelNavigation,orLM-Nav,anembodiedinstructionfollow-ingsystemthatcombinesthreelargeindependentlypre-trainedmodels—aself-supervisedroboticcontrolmodelthatutilizesvisualobservationsandphysicalactions(VNM),avision-languagemodelthatgroundsimagesintextbuthasnocontextofembodiment(VLM),andalargelanguagemodelthatcanparseandtranslatetextbuthasnosenseofvisualgroundingorembodiment(LLM)—toenablelong-horizoninstructionfollowingincomplex,real-worldenvironments.Wepresentthe?rstinstantiationofaroboticsystemthatcombinesthecon?uenceofpre-trainedvision-and-languagemodelswithagoal-conditionedcontroller,toderiveactionableplanswithoutany?ne-tuninginthetargetenvironment.Notably,allthreemodelsaretrainedonlarge-scaledatasets,withself-supervisedobjectives,andusedoff-the-shelfwithno?ne-tuning—nohumanannotationsoftherobotnavigationdataarenecessarytotrainLM-Nav.WeshowthatLM-Navisabletosuccess-fullyfollownaturallanguageinstructionsinnewenvironmentsoverthecourseof100sofmetersofcomplex,suburbannavigation,whiledisambiguatingpathswith?ne-grainedcommands.
2RelatedWork
Earlyworksinaugmentingnavigationpolicieswithnaturallanguagecommandsusestatisticalma-chinetranslation[
14
]todiscoverdata-drivenpatternstomapfree-formcommandstoaformallan-guagede?nedbyagrammar[
15
–
19
].However,theseapproachestendtooperateonstructuredstatespaces.Ourworkiscloselyinspiredbymethodsthatinsteadreducethistasktoasequencepredic-tionproblem[
1,
20,
21
].Notably,ourgoalissimilartothetaskofVLN—leveraging?ne-grainedinstructionstocontrolamobilerobotsolelyfromvisualobservations[
1,
2]
.
However,mostrecentapproachestoVLNusealargedatasetofsimulatedtrajectories—over1Mdemonstrations—annotatedwith?ne-grainedlanguagelabelsinindoor[
1,
3
–
5,
22
]anddriv-ingscenarios[
23
–
28
],andrelyonsim-to-realtransferfordeploymentinsimpleindoorenviron-ments[
29,
30
].However,thisnecessitatesbuildingaphoto-realisticsimulatorresemblingthetargetenvironment,whichcanbechallengingforunstructuredenvironments,especiallyforthetaskofoutdoornavigation.Instead,LM-Navleveragesfree-formtextualinstructionstonavigatearobotincomplex,outdoorenvironmentswithoutaccesstoanysimulationoranytrajectory-levelannotations.Recentprogressinusinglarge-scalemodelsofnaturallanguageandimagestrainedondiversedatahasenabledapplicationsinawidevarietyoftextual[
31
–
33
],visual[
13,
34
–
38
],andembodieddomains[
39
–
44
].Inthelattercategory,Shridharetal.[
39
],Khandelwaletal.[
44
]andJangetal.
[40
]?ne-tuneembeddingsfrompre-trainedmodelsonrobotdatawithlanguagelabels,Huangetal.
[41
]assumethatthelow-levelagentcanexecutetextualinstructions(withoutaddressingcontrol),
3
…
…
Text
Encoder
…
distance
CurrentObservation
I?TI?TI?T…I?T
1122131M
I
1
I
2
I
3
…
IN
actions
I?TI?TI?T…I?T
2122231M
ViT-L
ImageEncoder
I?TI?TI?T…I?T
3132331M
…………
(b)ViNGVNM
andAhnetal.[
42
]assumesthattherobothasasetoftext-conditionedskillsthatcanfollowatomictextualcommands.Alloftheseapproachesrequireaccesstolow-levelskillsthatcanfollowrudi-mentarytextualcommands,whichinturnrequireslanguageannotationsforroboticexperienceandastrongassumptionontherobot’scapabilities.Incontrast,wecombinethesepre-trainedvisionandlanguagemodelswithpre-trainedvisualpoliciesthatdonotuseanylanguageannotations[
11,
45]
without?ne-tuningthesemodelsinthetargetenvironmentorforthetaskofVLN.
Data-drivenapproachestovision-basedmobilerobotnavigationoftenusephotorealisticsimula-tors[
46
–
49
]orsuperviseddatacollection
[50
]tolearngoal-reachingpoliciesdirectlyfromrawobservations.Self-supervisedmethodsfornavigation[
6
–
11,
51
]insteadcanuseunlabeleddatasetsoftrajectoriesbyautomaticallygeneratinglabelsusingonboardsensorsandhindsightrelabeling.Notably,suchapolicycanbetrainedonlarge,diversedatasetsandgeneralizetopreviouslyunseenenvironments[
45,
52
].Beingself-supervised,suchpoliciesareadeptatnavigatingtodesiredgoalsspeci?edbyGPSlocationsorimages,butareunabletoparsehigh-levelinstructionssuchasfree-formtext.LM-Navusesself-supervisedpoliciestrainedinalargenumberofpriorenvironments,augmentedwithpre-trainedvisionandlanguagemodelsforparsingnaturallanguageinstructions,anddeploystheminnovelreal-worldenvironmentswithoutany?ne-tuning.
3Preliminaries
LM-Navconsistsofthreelarge,pre-trainedmodelsforprocessinglan-guage,associatingimageswithlan-guage,andvisualnavigation.
Largelanguagemodelsaregener-ativemodelsbasedontheTrans-formerarchitecture[
53
],trainedonlargecorporaofinternettext.LM-NavusestheGPT-3LLM
[12
],toparsetextualinstructionsintoase-quenceoflandmarks.
Vision-and-languagemodelsrefertomodelsthatcanassociateimages
aphotoofa
stopsign
T1
T2
T3
…
TM
CommandedSubgoal
IN?T1IN?T2IN?T3…IN?T
M
(a)
CLIPVLM
Figure2:LM-NavusesVLMtoinferajointprobabilitydistribu-tionovertextuallandmarksandimageobservations.VNMconsti-tutesanimage-conditioneddistancefunctionandpolicythatcancontroltherobot.
andtext,e.g.imagecaptioning,visualquestion-answering,etc.[
54
–
56]
.WeusetheCLIPVLM
[13
],amodelthatjointlyencodesimagesandtextintoanembeddingspacethatallowsittodeterminehowlikelysomestringistobeassociatedwithagivenimage.WecanjointlyencodeasetoflandmarkdescriptionstobtainedfromtheLLMandasetofimagesiktoobtaintheirVLMembeddings{T,Ik}(seeFig.
3
).Computingthecosinesimilaritybetweentheseembeddings,fol-lowedbyasoftmaxoperationresultsinprobabilitiesP(ik|t),correspondingtothelikelihoodthatimageikcorrespondstothestringt.LM-Navusesthisprobabilitytoalignlandmarkdescriptionswithimages.
Visualnavigationmodelslearnnavigationbehaviorandnavigationalaffordancesdirectlyfromvi-sualobservations[
11,
51,
57
–
59
],associatingimagesandactionsthroughtime.WeusetheViNGVNM
[11
],agoal-conditionedmodelthatpredictstemporaldistancesbetweenpairsofimagesandthecorrespondingactionstoexecute(seeFig.
3
).Thisprovidesaninterfacebetweenimagesandembodiment.TheVNMservestwopurposes:(i)givenasetofobservationsinthetargetenviron-ment,thedistancepredictionsfromtheVNMcanbeusedtoconstructatopologicalgraphg(V,E)thatrepresentsa“mentalmap”oftheenvironment;(ii)givena“walk”,comprisingofasequenceofconnectedsubgoalstoagoalnode,theVNMcannavigatetherobotalongthisplan.Thetopologicalgraphgisanimportantabstractionthatallowsasimpleinterfaceforplanningoverpastexperienceintheenvironmentandhasbeensuccessfullyusedinpriorworktoperformlong-horizonnaviga-tion[
52,
60,
61
].Todeduceconnectivitying,weuseacombinationoflearneddistanceestimates,temporalproximity(duringdatacollection),andspatialproximity(usingGPSmeasurements).Foreveryconnectedpairofvertices{vi,vj},weassignthisdistanceestimatetothecorrespondingedgeweightD(vi,vj).Formoredetailsontheconstructionofthisgraph,seeAppendix
B.
4
Weformulatethetaskofinstruc-tionfollowingonthegraphasthatofmaximizingtheprobabilityofsuccessfullyexecutingawalkthatmatchestheinstruction.AswewilldiscussinSection
4.2
,we?rstparsetheinstructionintoalistoflandmarks
=l1,l2,...,lnthatshouldbevis-
itedinorder.RecallthattheVNMisusedtobuildatopologicalgraphthatrepresentstheconnectivityoftheen-vironmentfrompreviouslyseenob-servations,withnodes{vi}corre-spondingtopreviouslyseenimages.
Forawalk=v1,v2,...,vT,we
factorizetheprobabilitythatitcorre-
Figure3:Systemoverview:(a)VNMusesagoal-conditioneddistancefunctiontoinferconnectivitybetweenthesetofrawobservationsandconstructsatopologicalgraph.(b)LLMtranslatesnaturallanguageinstruc-tionsintoasequenceoftextuallandmarks.(c)VLMinfersajointprobabilitydistributionoverthelandmarkdescriptionsandnodesinthegraph,whichisusedby(d)agraphsearchalgorithmtoderivetheoptimalwalkthroughthegraph.(e)TherobotdrivesfollowingthewalkintherealworldusingtheVNMpolicy.
4LM-Nav:InstructionFollowingwithPre-TrainedModels
LM-Navcombinesthecomponentsdiscussedearliertofollowtextualinstructionsintherealworld.
TheLLMparsesfree-forminstructionsintoalistoflandmarks(Sec.
4.2
),theVLMassociates
theselandmarkswithnodesinthegraphbyestimatingtheprobabilitythateachnodecorresponds
toeachPl(|)(Sec.
4.3
),andtheVNMisthenusedtoinferhoweffectivelytherobotcannavigate
betweeneachpairofnodesinthegraph,whichweconvertintoaprobabilityP(vi,vj)derivedfromtheestimatedtemporaldistances.To?ndtheoptimal“walk”onthegraphthatboth(i)adherestotheprovidedinstructionsand(ii)minimizestraversalcost,wederiveaprobabilisticobjective(Sec.
4.1)
andshowhowitcanbeoptimizedusingagraphsearchalgorithm(Sec.
4.4
).ThisoptimalwalkisthenexecutedintherealworldbyusingtheactionsproducedbytheVNMmodel.
4.1ProblemFormulation
Algorithm1:GraphSearch
1:Input:Landmarks(l1,l2,...,ln).
2:Input:Graphg(V,E).
3:Input:StartingnodeS.
4:Vi=0,...,nQ[li,v]=_o
v=V
5:Q[0,S]=0
6:Dijkstraalgorithm(g,Q[0,*])
7:foriin1,2,...,ndo
8:Vv=VQ[i,v]=Q[i_1,v]+CLIP(li,v)
9:Dijkstraalgorithm(g,Q[i,*])
10:endfor
11:destination=argmax(Q[n,*])
12:returnbacktrack(destination,Q[n,*])
spondstothegiveninstructioninto:(i)Pl,theprobabilitythatthewalkvisitsalllandmarksfromthedescription;(ii)Pt,theprobabilitythatthewalkcanbeexecutedsuccessfully.Let=l1,l2,...,lnbethelistoflandmarksdescribedinthenaturallanguageinstructions,andletP(li|vj)denotetheprobabilitythatnodevjcorrespondstothelandmarkdescriptionli.Thenwehave:
Pl(|)=1≤t1≤t≤tn≤TUP(lk|vtk),(1)
1≤k≤n
wheret1,t2,...,tnisassignmentofasubsequenceofwalk’snodetolandmarkdescriptions.
5
ToobtaintheprobabilityPt(),wemustconvertthedistanceestimatesprovidedbytheVNMmodel
intoprobabilities.Thishasbeenstudiedintheliteratureongoal-conditionedpolicies[
62,
63
].AsimplemodelbasedonadiscountedMDPformulationistomodeltheprobabilityofsuccessfullyreachingthegoalasγtothepowernumberoftimesteps,whichcorrespondstoaprobabilityofterminationof1_γateachtimestep.Wethenhave
Pt()=ⅡP(vj,vj+1)=ⅡγD(vj,vj+1),(2)
1≤j<n1≤j<n
whereD(vj,vj+1)referstothelength(inthenumberoftimesteps)oftheedgebetweennodesvjandvj+1,whichisprovidedbytheVNMmodel.The?nalprobabilisticobjectivethatoursystemneedstomaximizebecomes:
PM()=Pt()Pl(|)=ⅡγD(vj,vj+1)
1≤j<n
1≤t1≤x.≤tn≤tⅡP(lk|vtk).(3)
1≤k≤n
4.2ParsingFree-FormTextualInstructions
Theuserspeci?estheroutetheywanttherobottotakeusingnaturallanguage,whiletheobjectiveaboveisde?nedintermsofasequenceofdesiredlandmarks.Toextractthissequencefromtheuser’snaturallanguageinstructionweemployastandardlargelanguagemodel,whichinourprototypeisGPT-3[
12
].Weusedapromptwith3examplesofcorrectlandmarks’extractions,followedupbythedescriptiontobetranslatedbytheLLM.Suchanapproachworkedfortheinstructionsthatwetestediton.ExamplesofinstructionstogetherwithlandmarksextractedbythemodelcanbefoundinFig.
4.
Theappropriateselectionoftheprompt,includingthose3examples,wasrequiredformorenuancedcases.Fordetailsofthe“promptengineering”pleaseseeAppendix
A.
4.3VisuallyGroundingLandmarkDescriptions
AsdiscussedinSec.
4.1
,acrucialelementofselectingthewalkthroughthegraphiscomputingP(li|vj),theprobabilitythatlandmarkdescriptionlireferstonodevj(seeEquation
1
).Witheachnodecontaininganimagetakenduringinitialdatacollection,theprobabilitycanbecomputedusingCLIP[
13
]inthewaydescribedinSec.
3
astheretrievaltask.AspresentedinFig.
2,toemploy
CLIPtocomputeP(li|vj),weusetheimageatnodevjandcaptionpromptsintheformof“Thisisaphotoofa[li]”.TheresultingprobabilityP(li|vj),togetherwiththeinferrededges’distanceswillbeusedtoselecttheoptimalwalkinthegraph.
4.4GraphSearchfortheOptimalWalk
AsdescribedinSec.
4.1,
LM-Navaimsat?ndingawalk=(v1,v2,...,vT)thatmaximizestheprobabilityofsuccessfulexecutionthatadherestothegiveninstructions.WeformalizedthisprobabilityPMde?nedbyEqn.
3
.Wecande?neafunctionR(,)foramonotonicallyincreasingsequenceofindices=(t1,t2,...,tn):
n
T—1
R(,):=logP(li|vti)_αD(vj,vj+1),whereα=_logγ.(4)
whichhasthepropertythat()maximizesPMifandonlyifthereexistssuchthat,maximizesR.Inorderto?ndsuch,,weemploydynamicprogramming.Inparticularwede?neahelperfunctionQ(i,v)forie{0,1,...,n},veV:
Q(i,v)=
max
=(v1,v2,...,vj),vj=v
=(t1,t2,...,ti)
R(,).(5)
Q(i,v)representsthemaximalvalueofRforawalkendinginvthatvisitedthelandmarksuptoindexi.ThebasecaseQ(0,v)visitsnoneofthelandmarks,anditsvalueofRissimplyequaltominusthelengthofshortestpathfromnodeS.Fori>0wehave:
Q(i,v)=max╱Q(i_1,v)+logP(li|v),w∈nrs(v)Q(i,w)_α.D(v,w)、.(6)
6
Figure4:QualitativeexamplesofLM-Navinreal-worldenvironmentsexecutingtextualinstructions(left).ThelandmarksextractedbyLLM(highlightedintext)aregroundedintovisualobservationsbyVLM(center;overheadimagenotavailabletotherobot).TheresultingwalkofthegraphisexecutedbyVNM(right).
ThebasecaseforDPistocomputeQ(0,V).Then,ineachstepofDPi=1,2,...,nwecomputeQ(i,v).ThiscomputationresemblestheDijkstraalgorithm([
64
]).Ineachiteration,wepickthenodevwiththelargestvalueofQ(i,v)andupdateitsneighborsbasedontheEqn.
6
.Algorithm
1
summarizesthissearchprocess.Theresultofthisalgorithmisawalk=(v1,v2,...,vT)that
maximizestheprobabilityofsuccessfullycarryingouttheinstruction.Givensuchawalk,VNMcanexecutethepathbyusingitsactionestimatestosequentiallynavigatetothesenodes.
5SystemEvaluation
WenowdescribeourexperimentsdeployingLM-Navinavarietyofoutdoorsettingstofollowhigh-levelnaturallanguageinstructionswithasmallgroundrobot.Forallexperiments,theweightsofLLM,VLM,andVNMarefrozen—thereisno?ne-tuningorannotationinthetargetenvironment.Weevaluatethecompletesystem,aswellastheindividualcomponentsofLM-Nav,tounderstanditsstrengthsandlimitations.OurexperimentsdemonstratetheabilityofLM-Navtofollowhigh-levelinstructions,disambiguatepaths,andreachgoalsthatareupto800maway.
5.1MobileRobotPlatform
WeimplementLM-NavonaClearpathJackalUGVplatform(seeFig.
1
(right)).Thesensorsuiteconsistsofa6-DoFIMU,aGPSunitforapproximatelocalization,wheelencodersforlocalodom-etry,andfront-andrear-facingRGBcameraswitha170О?eld-of-viewforcapturingvisualobser-vationsandlocalizationinthetopologicalgraph.TheLLMandVLMqueriesarepre-computedonaremoteworkstationandthecomputedpathiscommandedtotherobotwirelessly.TheVNMrunson-boardandonlyusesforwardRGBimagesandun?lteredGPSmeasurements.
5.2FollowingInstructionswithLM-Nav
Ineachevaluationenvironment,we?rstconstructthegraphbymanuallydrivingtherobotandcollectingimageandGPSobservations.ThegraphisconstructedautomaticallyusingtheVNMfromthisdata,andinprinciplesuchdatacouldalsobeobtainedfrompasttraversals,orevenwithautonomousexplorationmethods[
45]
.Oncethegraphisconstructed,therobotcancarryoutin-structionsinthatenvironment.Wetestedoursystemon20queries,inenvironmentsofvaryingdif?culty,correspondingtoatotalcombinedlengthofover6km.Instructionsincludeasetofprominentlandmarksintheenvironmentthatcanbeidenti?edfromtherobot’sobservations,e.g.traf?ccones,buildings,stopsigns,etc.
Fig.
4
showsqualitativeexamplesofthepathtakenbytherobot.Notethattheoverheadimageandspatiallocalizationofthelandmarksisnotavailabletotherobotandisshownforvisualizationonly.InFig.
4
(a),LM-Navisabletosuccessfullylocalizethesimplelandmarksfromitspriortraversaland?ndashortpathtothegoal.Whiletherearemultiplestopsignsintheenvironment,theobjectiveinEqn.
3
causestherobottopickthecorrectstopsignincontext,soastominimizeoveralltraveldistance.Fig.
4
(b)highlightsLM-Nav’sabilitytoparsecomplexinstructionswith
7
Gostraighttowardthewhitebuilding.Continuestraightpassingbyawhitetruckuntilyoureachastopsign.
multiplelandmarksspecifyingtheroute—despitethepossibilityofashorterroutedirectlytothe?nallandmarkthatignoresinstructions,therobot?ndsapaththatvisitsallofthelandmarksinthecorrectorder.
Disambiguationwithinstructions.SincetheobjectiveofLM-Navistofollowinstructions,andnotmerelytoreachthe?nalgoal,differentin-structionsmayleadtodifferenttraversals.Fig.
5
showsanexamplewheremodifyingtheinstruc-tioncandisambiguatemultiplepathstothegoal.Giventheshorterprompt(blue),LM-Navprefersthemoredirectpath.Onspecifyingamore?ne-grainedroute(magenta),LM-Navtakesanalter-natepaththatpassesadifferentsetoflandmarks.
Missinglandmarks.WhileLM-Naviseffectiveatparsinglandmarksfrominstructions,localizingthemonthegraph,and?ndingapathtothegoal,itreliesontheassumptionthatthelandmarks(i)existintheenvironment,and(ii)canbeidenti?edbytheVLM.Fig.
4
(c)illustratesacasewheretheexecutedpathfailsto
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 《糖尿病伴低鉀血癥》課件
- 贛州師范高等??茖W(xué)?!堵窐蚴┕ぜ夹g(shù)》2023-2024學(xué)年第一學(xué)期期末試卷
- 贛南師范大學(xué)《機械工程測試技術(shù)英》2023-2024學(xué)年第一學(xué)期期末試卷
- 贛南科技學(xué)院《鍋爐原理課程設(shè)計》2023-2024學(xué)年第一學(xué)期期末試卷
- 《克隆與治療性克隆》課件
- 甘孜職業(yè)學(xué)院《數(shù)字影像工程》2023-2024學(xué)年第一學(xué)期期末試卷
- 七年級道德與法治上冊第一單元成長的節(jié)拍第三課發(fā)現(xiàn)自己第一框認識自己教案新人教版
- 《微小世界和我們》課件
- git內(nèi)部培訓(xùn)課件
- 中學(xué)生交通安全教育
- 通力電梯KCE電氣系統(tǒng)學(xué)習(xí)指南
- 風(fēng)電場崗位任職資格考試題庫大全-下(填空題2-2)
- 九年級數(shù)學(xué)特長生選拔考試試題
- 幼兒園交通安全宣傳課件PPT
- 門窗施工組織設(shè)計與方案
- 健身健美(課堂PPT)
- (完整版)財務(wù)管理學(xué)課后習(xí)題答案-人大版
- 錨索試驗總結(jié)(共11頁)
- 移動腳手架安全交底
- 人教版“課標(biāo)”教材《統(tǒng)計與概率》教學(xué)內(nèi)容、具體目標(biāo)和要求
- 矩形鋼板水箱的設(shè)計與計算
評論
0/150
提交評論