版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
RobustFine-tuningofZero-shotModelsviaVarianceReduction
BeierZhuJiequanCuiHanwangZhang
NanyangTechnologicalUniversity
arXiv:2411.06966v1[cs.CV]11Nov2024
beier002@e.ntu.edu.sg,hanwangzhang@.sg
Abstract
Whenfine-tuningzero-shotmodelslikeCLIP,ourdesideratumisforthefine-tunedmodeltoexcelinbothin-distribution(ID)andout-of-distribution(OOD).Recently,ensemble-basedmodels(ESM)havebeenshowntooffersignificantrobustnessimprovement,whilepreservinghighIDaccuracy.However,ourstudyfindsthatESMsdonotsolvetheID-OODtrade-offs:theyachievepeakperformanceforIDandOODaccuracyatdifferentmixingcoefficients.WhenoptimizedforOODaccuracy,theensemblemodelexhibitsanoticeabledeclineinIDaccuracy,andviceversa.Incontrast,weproposeasample-wiseensemblingtechniquethatcansimultaneouslyattainthebestIDandOODaccuracywithoutthetrade-offs.Specifically,weconstructaZero-ShotFailure(ZSF)setcontainingtrainingsamplesincorrectlypredictedbythezero-shotmodel.Foreachtestsample,wecalculateitsdistancetotheZSFsetandassignahigherweighttothefine-tunedmodelintheensembleifthedistanceissmall.WetermourmethodVarianceReductionFine-tuning(VRF),asiteffectivelyreducesthevarianceinensemblepredictions,therebydecreasingresidualerror.OnImageNetandfivederiveddistributionshifts,ourVRFfurtherimprovestheOODaccuracyby1.5-2.0ppovertheensemblebaselineswhilemaintainingorincreasingIDaccuracy.VRFachievessimilarlargerobustnessgains(0.9-3.1pp)onotherdistributionshiftsbenchmarks.Codesare
availablein
/BeierZhu/VRF.
1Introduction
Toensurethereliabilityofmachinelearningsystems,itisessentialtodevelopmodelsthatcangeneralizetounseen,out-of-distributionenvironments.
Largepre-trainedmodelssuchasCLIP[20]
andALIGN[10]haverecentlyshownremarkablerobustnessagainstchallengingdistributionshifts
.However,itiswidelyacknowledgedthattheseimprovementsinrobustnessaremostpronouncedinthezero-shotsetting,whileconventionalfine-tuningonthesemodelsoftencompromisesrobustness
whencomparedtozero-shotperformance[28,
15,
14]
.ThisphenomenonisknownastheID-OODtrade-offs,i.e.,improvingperformanceonin-distribution(ID)datacansometimesleadtodecreased
performanceonout-of-distribution(OOD)data[12,
25]
.
Inrecentyears,ensemble-basedmodels(ESMs)havedemonstratedsignificantsuccessinaddressing
theID-OODdilemma[17,
28,
14,
31]
.Specifically,denotetheinputasx,thezero-shotmodel
as(y|x;θzs)andthefine-tunedmodelas(y|x;θft),existingESMstypicallyemploytheoutput-
spaceensemble(OSE)[14,
31],whichoutputs
(y|x;θose)=α(y|x;θft)+(1?α)(y|x;θzs),
andtheweight-spaceensemble(WSE)[28,
17],whichoutputs
(y|x;θwse)=(y|x;αθft+(1?
α)θzs),whereα∈[0,1].Comparedtofine-tunedmodels,ESMsoffersignificantaccuracyenhance-mentsunderdistributionshift,whilemaintaininghighIDaccuracy.
However,ESMcannotfullyaddresstheID-OODtrade-offs.InFigure
1
(a),byvaryingthemixing
coefficientα,weplottheID-OODfrontiercurves(pinkline)fortheCLIPViT-B/16modelon38thConferenceonNeuralInformationProcessingSystems(NeurIPS2024).
2
口
59
4-—
L56二
o53
>
+3.6%IDAcc
efficient
Ens
emblingW
thvaringCo
---BesereerBes★zer
Fin
VR
tIDaccuratOODacco-shot
e-TunedF(ours)
Cywitha=0uracywith
.5
0.3
677073767982
ImageNetAccuracy(ID)
0.8
1.01.21.4DistancetoZSFSet(d(x))
1.4
1.2
0.8
Accft
Acczs
(a)(b)
Figure1:(a)ID-OODfrontiercurvesfortheCLIPViT-B/16modelontheID(ImageNet)andOOD(IN-{V2,R,A,Sketch}andObjectNet)datasetsbyvaryingthemixingcoefficientα.TheensemblemodelachievesitsbestIDandOODperformanceatdifferentαvalues.OurmethodVRFsimultaneouslyattainsthebestIDandOODaccuracy,outperformingtheensembleby3.6%onOODand1.6%onIDatitsoptimalperformancepoints.(b)Relationshipbetweentheratioof . increases.
ImageNet[3](ID)andfivederiveddistribution-shifteddatasets(OOD):ImageNet-V2[21],ImageNet
-
R[7],ImageNet-A[9],ImageNet-Sketch[27]andObjectNet[1]
.WefindthattheensemblemodelachievesitsoptimalIDandOODperformanceatdifferentαvalues:thebestIDaccuracyisachievedatα=0.5andthebestOODaccuracyisobtainedatα=0.3.WhentheensemblemodelreachesitsoptimalvalueforOOD,theperformanceonIDdecreasesby3.6%relativetoitspeak.Similarly,whentheensemblemodelisoptimizedforID,theperformanceonOODdecreasesby1.6%relativetoitsbestvalue–theID-OODtrade-offsstillpersistforESMs.Thisraisesanaturalquestion:
Canensemble-basedmodelssimultaneouslyattainthebestIDandOODaccuracy?
Inthispaper,weaffirmativelyanswerthisquestionbyproposingasample-wiseensemblingtechnique,dubbedvariancereductionfine-tuning(VRF).ThismethodismotivatedbyanempiricalfindingillustratedinFig
1
(b).Foreachsampleinthetrainingdataset,ifthefine-tunedmodelcorrectlypredictsthelabelwhilethezero-shotmodelfails,wecollectitsfeaturesrepresentationinthefine-tunedmodelasthezero-shotfailure(ZSF)set.Wethenmeasurethedistanced(x)ofeachtestsamplextotheZSFset.Basedonthisdistance,testsamplesaregroupedintobins,andwecomputethe
Section
C.7)
.increases.Intuitively,thecloserasampleistotheZSFset,themorelikelyitisthatthezero-shotmodelmakesincorrectpredictions,whereasthefine-tunedmodelismorelikelytobeaccurate,leadingtoahigher
higherweightforthefine-tunedmodel,andviceversa.
AsdepictedbytheorangediamondinFig.
1
(a),byleveragingthesample-wiseweights,ourVRFsimultaneouslyattainsthebestIDandOODaccuracy.InSection
5,weshowthatonavarietyof
differentmodelsandtasks,ourVRFapproachconsistentlyoutperformstheexistingfine-tuning
andensemblingmethods,includinglinearprobing,end-to-endfine-tuning,LP-FT[15],OSEand
WSE[28]
.Inspecific,onImageNetandfivederiveddistributionshifts,ourVRFfurtherimprovestheOODaccuracyby1.5-2.0ppovertheensemblebaselineswhilemaintainingorincreasingIDaccuracy.Furthermore,inSection
4,wejustifyourapproachbydemonstratingthatiteffectively
minimizesthevarianceoftheensemblemodels,resultinginreducedresidualerror.
3
2RelatedWork
MitigatingID-OODtrade-offs.Improvingperformanceonin-distributiondatacansometimeslead
toadecreaseinperformanceonout-of-distributiondata,andviceversa.ThisphenomenonisknownastheID-OODtrade-offs.Xieetal.
[29]leverageauxiliaryinformationasoutputsofauxiliarytasks
topre-trainamodeltoreduceOODerror.
KhaniandLiang[12]showthatself-trainingonlarge
amountsofunlabeleddatacanmitigatesuchtrade-offsbyremovingspuriousfeatures.Tripuranenietal.
[25]tacklethisproblembylearningrepresentationsthatarerobustacrossdiversetasks
.However,thesemethodsusuallynecessitateadditionalunlabeleddataorauxiliaryinformation.Incontrast,ourVRFisastraightforwardvariationoffine-tuningthatdoesnotrequireanyextradata.
Robustfine-tuningofzero-shotmodels.
Vision-languagemodelslikeCLIP[20]havedemonstrated
outstandingimprovementsinrobustness.Itiscommonlyacknowledgedthatconventionalfine-tuningmethodsoftencompromiserobustnesswhencomparedtozero-shotperformance.Therefore,
enhancingdownstreamrobustnesshasbeenthefocusofsubsequentworks[15,
28,
5,
19,
6,
30]
.Kumaretal.
[15]showthatatwo-processoflinearprobingfollowedbyfullfine-tuningcanalleviate
featuredistortion,leadingtostrongerOODperformancewithoutsacrificingIDaccuracy.Wortsmanetal.
[28]proposeamethodofweightinterpolationbetweenthezero-shotandthefine-tunedmodels
toimprovebothIDandOODaccuracy.Goyaletal.
[5]demonstratethatmimickingthecontrastive
pre-trainingobjectivestofine-tunethezero-shotmodelsoutperformstuningviathetraditional
supervisedcross-entropyloss.However,theID-OODtrade-offsarestillobservedwiththesemethods.
Incontrast,ourmethodVRFcansimultaneouslyachievethebestIDandOODaccuracy.
3Methods
3.1SetUp
Task:Consideraclassificationsettingwherethegoalistomapaninstancex∈Xtoalabely∈
andafine-tunedmodelf(;θft)whichistrainedonD.Below,weoutlinetheimplementationofthezero-shotandfine-tunedmodels:
Y=[K].Weareprovidedwithazero-shotmodelf(·;θzs),adownstreamdatasetD={xi,yi,
?Zero-shotmodels
(ZS):WeinvestigateCLIPmodels[20]asourzero-shotmodels
.CLIPmodelsarepre-trainedusingimage-textpairs{(x1,t1),...,(xB,tB)}fromtheInternet.TheobjectiveoftheCLIPmodelsistotrainavisualencoderΦvandatextencoderΦtsuchthatthecosinesimilarity<Φv(xi),Φt(ti)>ismaximizedrelativetounmatchedpairs.CLIPmodelsperformzero-shotinferenceforKclassesbymatchingxwithpotentialclassnames{c1,...,cK}.Concretely,byextendingtheclassname{ck}toaprompt“tk=aphotoofa{ck}”,thezero-shotmodeloutputsthescore(logit)forclasskasf(x;θzs)k=<Φv(x),Φt(tk)>.Thepredictedprobabilitiescanbecalculatedusingthesoftmaxfunction,i.e.,(y|x;θzs)=softmax(f(x;θzs))y.Themodeloutputsthelabelaspred(f(x;θzs))=argmaxif(x;θzs)i
?Linearclassifiers(LC):WelearnalinearclassifierontopofthevisualembeddingΦv(x)whilefreezingthevisualencoderΦv.Theparametersofthelinearclassifierareoptimizedtominimizethecross-entropylossonD.
?End-to-endfine-tuning(E2E-FT):Weupdateboththelinearclassifierandthevisualencoderbyminimizingthecross-entropylossonD.
?Linearprobingthenfullfine-tuning
[15](LP-FT):Weemployatwo-phasefine-tuning
approach:initiallytrainingalinearclassifier,followedbyfullfine-tuningstartingfromthesolutionderivedfromtrainingthelinearclassifier.
?Output-spaceensemble(OSE):Weperformlinearinterpolationoftheoutputsbetweenazero-shotmodelandafine-tunedmodel(e.g.,E2E-FTorLP-FT):
(y|x;θose)=α(y|x;θft)+(1?α)(y|x;θzs),whereα∈[0,1](1)
?Weight-spaceensemble
[28](WSE)
.Wecombinetheweightsthroughlinearinterpolationbetweenazero-shotmodelandafine-tunedmodel:
(y|x;θwse)=(y|x;αθft+(1?α)θzs),whereα∈[0,1](2)
4
Algorithm1VariationReductionFine-tuning
1:Given:TrainingdatasetD,azero-shotmodelfzsandafine-tunedmodelfft.
2:Buildzero-shotfailuresetVusingEq.
(3)
.?Step1:Identification
3:InferenceStage:
4:Givenatestsamplex,computeitsfeaturerepresentationv,zero-shotpredictionzs(y|x)andfine-tunedmodelpredictionft(y|x).
5:Computethek-NNdistancetoVasd(x)usingEq.
(4)
.?Step2:DistanceCalculation
6:Computetheweightω(x)usingEq.
(6)
.
7:Returnvrf(y|x)usingEq.
(5)
?Step3:Sample-WiseEnsembling
3.2VarianceReductionFine-tuning
Wenowpresentourproposedmethod,VRF,whichconsistsofthreesteps.First,beforetheinferencestage,wecollecttheZero-ShotFailure(ZSF)set.Second,foragiventestsample,wecalculateitsdistancetotheZSFset.Third,weassignweightstocombinepredictionsfromthezero-shotandfine-tunedmodelsbasedonthisdistance.
Step1(Identification).ForeachxiinthetrainingdatasetD,ifthefine-tunedmodelcorrectlypredictsthelabelwhilethezero-shotmodelfails,wecollectitsfeaturerepresentationvi=Φv(xi;θft)fromthefine-tunedmodeltoformthezero-shotfailuresetV.Specifically,Visdefinedas:
V={vis.t.yi=pred(fft(xi))andyipred(fzs(xi))}.(3)
Here,fzs(·)andfft(·)areusedtodenotef(·;θzs)andf(·;θft),respectively,forsimplicity.
Step2(DistanceCalculation).ThekeyempiricalobservationunderpinningVRFisthatinthevicinityoftheZSFset,atestsampletypicallyexhibitslowerzero-shotaccuracy(Acczs)andhigherfine-tunedaccuracy(Accft).Consequently,thedistancefromthesampletotheZSFsetincreases.Inthispaper,weadoptnon-parametricdensity
estimationusingnearestneighbors[24]tomeasurethedistanceofatestsampletothedataset
V.Specifically,duringinference,wederivethefeaturerepresentationvofatestsamplex,andcomputethe?2distances∥v?vi∥2w.r.t.vi∈V.WereorderVaccordingtotheincreasing?2distanceanddenotetheorderedsetinsequenceasV′=(v(1),v(2),...,v(|V|)).ThedistanceofxtoVisdefinedasthe?2distancetothek-thnearestneighbor(k-NN),i.e.,
d(x;V,k)=∥v?v(k)∥2.(4)Ifthereisnoambiguity,weused(x)todenoted(x;V,k)forreadability.SincethefeaturesinCLIPmodelsare?2normalized,d(x)areboundedbetween[0,2].
Fine-TunedAcc/Zero-ShotAcc
Step3(Sample-WiseEnsembling).Weimplementsample-wiseout-spaceensemblingintheform:
1.4
vrf(y|x)=ω(x)·ft(y|x)+(1?ω(x))·zs(y|x),(5)
1.2
1.0
whereω(x)∈(0,1).WeusethedistancetoZSFsetd(x)todeterminetheweightω.Asshownbythebluelinein
0.8
Fig
2,asmallervalueofd(x)
correspondstoalarger
ratio,andviceversa.Therefore,wesettheweightωtobe
weightofFTmodel
0.75
0.70
0.65
0.60
0.55
0.50
0.45
-●-AccuarcyRatio
(x),a=1.5,b=0.6
between0and1,weemployasigmoidfunctionσ()as:
wherea,b>0aretwohyper-parameterssweepedusing
inverselyproportionaltod(x).Giventhatωisbounded0.80.91.istaetoFst.3(d(x)1).41.51.6
ω(x)=σ(?(d(x)?a)/b)(6)Figure2:Relationshipbetween
,theweightω(x).
accuracyonIDvalidationset.WevisualizetheweightcurveingreenonFig
2,setting
a=1.5andb=0.6.WesummarizethewholeprocessinAlgorithm
1.
4Justification
WenowprovethatourVRFcaneffectivelyreducethevarianceofthecombinedmodel,resultinginlowererrorscomparedtoensemblingusingaconstantmixingcoefficient.
5
4.1Background
Theoutputsofawelltrainedclassifierareexpectedtoapproximatetheaposteriorclassdistribution.Apartfromtheirreducibleerror(Bayeserror),theresidualerrorofaclassifiercanbebrokendownintobiasandvariancecomponents.Inspecific,foratestsamplex,theprobabilityoutputofaclassifierparameterizedbyθcanbeexpressedas:
(y|x;θ)=P(y|x)+βy+ηy(x),(7)
reidualorfor–x
whereP(y|x)denotesthetrueaposteriordistribution,βyisthelabelbiasof(y|x;θ)whichisindependenttotheinputx,andηy(x)isrelatedtothegiveninputx.Inthisstudy,weprimarilyattributetheresidualerrortothevarianceterm(i.e.,βy=0),asthelabelbiasprobleminfoundationmodelshasbeeneffectivelyaddressedbyZhuetal.
[31]
.Tumeretal.
[26]haveproventhatthe
expectedresidualerrorEisgivenby:
E=V[ηy(x)](8)
s,
wheresisaconstantfactorrelatedtothederivativeofthetrueaposteriordistributionandisindependentofthetrainedmodel,andV[ηy(x)]isthevariance.
4.2VarianceReductionFine-tuningLeadstoLowerResidualError
Letusshiftourfocustotheeffectsofcombiningthezero-shotandfine-tunedmodels.Letgzs(·)andgft(·)betwofunctionsthatproduceweightsforensemblingthemodels.Subjecttotheconstraintthatgzs(x)+gft(x)=1,theresidualerrorofthecombinedclassifierisobtainedby:
vrf(y|x)=gzs(x)zs(y|x)+gft(x)ft(y|x)=P(y|x)+gzs(x)·ηzs(x)+gft(x)·ηft(x),(9)
、◆
、–
ηvrf(x)
whereweomitthesubscriptyofηforreadability.Thevarianceofηvrf(x)canbeexpressedas:
V[ηvrf(x)]=gzs(x)2·V[ηzs(x)]+gft(x)2·V[ηft(x)].(10)
Here,weassumetheresidualerrorsareindependentfollowingtheassumptionofthepreviousstudies
ofCLIPfine-tuning[14,
31]
.WefurtherexplorethecaseofcorrelatedresidualerrorsinSection
B.
AccordingtoEq.
(8),thereductioninvariancecanbereadilytranslatedintoareductioninerror
rates.ToobtainthesmallestvarianceV[ηvrf(x)],weminimizeEq.
(10)
usingLagrangemultipliertoenforcetheconstraintthatgzs(x)+gft(x)=1,andobtaintheoptimalweightfunctiongftas:
gft(x)===(1+)?1Ⅸ(11)
SinceⅨd(x)?1(asmallerdistanced(x)asshowninFig.
2),
wedesigntheweightingfunctiongft(x)=ω(x)Ⅸd(x)?1asinEq.
(6)
.
5Experiments
5.1ExperimentalSetup
Datasetswithdistributionshifts.
WeprovidetheresultsforImageNet[3]anditsfivederived
distributionshifts:(1)ImageNet-V2(IN-V2)[21]:Testimagessampledafteradecadeoftheoriginal
ImageNet.
(2)ImageNet-R(IN-R)[7]:Containsrenditions(e.g.
,art,cartoons,graffiti).(3)ImageNet
Sketch(IN-Sketch)[27]:Consistsofsketchesratherthannaturalphotos
.
(4)ImageNet-A(IN-A)[9]:
Collectsreal-worldimagesthataremisclassifiedbyResNetmodels.
(5)ObjectNet[1],atestset
featuringobjectswithdiversebackgrounds,rotations,andimagingviewpoints.Weextendour
analysistoincludeastandarddistributionshiftbenchmark[15,
14,
4]:CIFAR-10
→STL-10,where
theIDisCIFAR-10[13],andtheOODisSTL-10[2]
.Weremovedthe“monkey”classfromSTL-10,asitdoesnotexistinCIFAR-10.Inaddition,wealsoconsidersubpopulationshifts,wheretheIDdatacontainsafewsub-categories,andtheOODdatacomprisesdifferentsub-categorieswithinthe
6
Table1:AccuracyofvariousmethodsonImageNetandderiveddistributionshiftsforCLIPViT-B/32.
Method
IN
Distributionshifts
Avgshifts
IN-V2
IN-Sketch
IN-A
IN-R
ObjectNet
Zero-shot[20]
63.3
55.9
42.3
31.5
69.3
43.5
48.5
Linearclassifier[20]
75.4
63.4
38.8
26.1
58.7
41.5
45.7
E2E-FT[28]
76.2
64.2
38.7
21.0
57.1
40.1
44.2
+Weight-spaceensemble[28]
77.9
67.2
45.1
28.8
66.4
45.1
50.5
+Output-spaceensemble
77.3
66.0
44.2
27.1
68.4
44.4
50.0
+VRF(ours)
77.6
66.7
47.0
29.2
70.9
46.3
52.0
?
+0.3
+0.7
+2.8
+2.1
+2.5
+1.9
+2.0
LP-FT[15]
76.9
64.8
39.9
25.7
69.9
42.6
48.6
+Weight-spaceEnsemble[28]
78.0
67.0
44.8
31.2
65.8
46.1
51.0
+Output-spaceEnsemble
77.8
66.3
44.0
29.5
66.2
45.5
50.3
+VRF(ours)
77.8
66.7
46.1
31.0
70.0
46.3
51.8
?
+0.0
+0.4
+2.1
+1.5
+3.8
+0.8
+1.5
Table2:AccuracyofvariousmethodsonImageNetandderiveddistributionshiftsforCLIPViT-B/16.
Method
IN
Distributionshifts
Avgshifts
IN-V2
IN-Sketch
IN-A
IN-R
ObjectNet
Zero-shot[20]
68.3
61.9
48.3
50.1
77.6
54.2
58.4
Linearclassifier[20]
79.3
69.1
44.8
44.3
66.7
51.1
55.2
E2E-FT[28]
81.3
70.6
45.1
36.6
65.6
50.5
53.7
+Weight-spaceensemble[28]
82.5
73.1
51.6
47.6
75.1
55.7
60.6
+Output-spaceensemble
82.2
72.0
50.6
46.8
76.7
54.9
60.2
+VRF(ours)
82.3
72.1
52.9
48.4
78.7
56.4
61.8
?
+0.1
+0.1
+2.3
+1.6
+2.0
+1.5
+1.6
LP-FT[15]
81.5
70.7
46.7
41.4
66.4
52.4
55.5
+Weight-spaceensemble[28]
82.4
73.0
51.5
50.6
74.2
56.6
61.2
+Output-spaceensemble
82.1
72.3
50.9
50.9
74.9
55.7
60.9
+VRF(ours)
82.1
72.3
52.9
51.2
78.8
57.2
62.4
?
+0.0
+0.0
+2.0
+0.3
+3.9
+1.5
+1.5
sameparentcategory.
Following[15,
14],weadoptEntity30dataset[23],whichaimstocategorize
imagesintooneof30entitycategories,suchas“vehicle”and“insect”.
Baselines.
Weadopttwomodels:CLIPViT-B/32andalargerViT-B/16fromOpenAI[20]
.ThedefaultmodelusedinablationstudiesistheCLIPViT-B/16.Inadditiontothezero-shotmodels,wecompareourapproachagainstfivestandardmethodsforadaptingpre-trainedmodels:(1)linear
classifier[20],(2)E2E-FT,(3)LP-FT[15],(4)OSE,and(5)WSE[28]
.ThedescriptionsofthesemethodshavebeenincludedinSection
3.1.
Implementationdetails.Whenfine-tuningE2E-FTmodels,weadheretoWortsmanetal.
[28],
employingthedefaultPyTorchAdamWoptimizerfor10epochswithweightdecayof0.1andacosine-annealinglearningrateschedulewith500warm-upsteps.Unlessspecified,weusealearningrateof3×10?5,gradientclippingatnorm1.Whenfine-tuningLP-FT,wefirstadoptthesettingsofWortsmanetal.
[28]totrainthelinearclassifier,thenfullfine-tunethemodelsatalearning
rateof1×10?5.Forefficientlyperforming
k-NNsearch,weuseFaisslibrary[11]
.DenotethesizeoftheZSFsetis|V|,wescalethekaccordingtoapercentagep%ofthesampleset,wherek=floor(p%·|V|).Inthispaper,pissetto0.1%,avalueconsistentwiththedefaultsettingproposedbySunetal.
[24]
.Notethatallthehyperparameters,e.g.,α,a,b,aresearchedusingtheaccuracyonthein-distribution(ID)validationset.Deriveddistributionshiftdatasetsareonlyforevaluationandnotforhyperparametersweeps.SeeAppendix
C.1
forthedetailsofexperimentaldetails.
7
Method
CIFAR→STL
Entity-30
ID
OOD
ID
OOD
Zero-shot[20]
90.1
98.4
68.3
68.2
Linearclassifier
95.8
97.7
95.3
69.6
E2E-FT[28]
98.6
96.1
96.9
68.2
+WSE[28]
98.7
97.8
97.2
71.9
+OSE
98.6
96.6
97.0
71.5
+VRF(ours)
98.6
98.4
97.0
72.7
?
+0.0
+1.8
+0.0
+1.2
LP-FT[15]
98.5
96.3
96.9
68.8
+WSE[28]
98.7
97.9
97.3
72.1
+OSE
98.6
97.7
97.2
71.8
+VRF(ours)
98.6
98.6
97.4
72.9
?
+0.0
+0.9
+0.2
+1.1
Table3:AccuracyofvariousmethodsonCIFAR-10→STL-10andEntity-30.
Method
CIFAR→STL
Entity-30
ID
OOD
ID
OOD
Zero-shot[20]
88.3
97.1
65.2
66.5
Linearclassifier
95.0
96.6
93.3
68.1
E2E-FT[28]
97.9
93.5
94.4
65.1
+WSE[28]
98.2
95.7
94.6
68.8
+OSE
97.9
95.9
94.4
66.4
+VRF(ours)
97.8
97.3
94.5
69.5
?
-0.1
+1.4
+0.1
+3.1
LP-FT[15]
97.9
95.0
94.6
67.7
+WSE[28]
98.1
96.4
94.8
68.8
+OSE
98.1
96.4
94.7
68.5
+VRF(ours)
98.1
97.5
94.8
70.1
?
+0.0
+1.1
+0.1
+1.6
(a)CLIPViT-B/32(b)CLIPViT-B/16
CIFAR-10→STL-10Entity-30
(a.1)(a.2)(b.1)(b.2)
Figure3:ID-OODfrontiercurvesbyvaryingthemixingcoefficientαcurvesfortheCLIP
ViT-B/16.(a)CIFAR-10(ID)andSTL-10(OOD)results.(b)Entity-30results.
5.2Results
ImageNetanditsfiveshifteddistributionresults.InTable
1
and
2,wereporttheID-OOD
accuraciesoffine-tuningbaselinesforCLIPViT-32andCLIPViT-16models,respectively.ForOSEandWSE,wechoosethemixingcoefficientαwiththehighestIDvalidationaccuracy.Toenhanceclarityintheresults,wedenotetheimprovementoverOSEas?inTables
1
and
2.
WeobservethatourVRFbooststheaccuracyoffine-tunedmodels,includingensemblingbaselinemodels,acrossfiveImageNetdistributionshifteddatasets,whilemaintainingorimprovingtheImageNetin-distributionperformance.Forinstance,inTable
1,whenensemblingwiththeE2E-FTmodel,our
VRFoutperformstheOSEmodelby2.0%ondistributionshiftswhileincreasingtheIDaccuracyby0.3%.ComparedtoWSEmodels,ourVRFachievesadeltaof1.2%ondistributionshifts,whilemaintainingIDperformancewithin0.2%,asshowninE2E-FTpartofTable
2.
CIFAR-10→STL-10andEntity-30results.WereporttheaccuracyofvariousmethodsinTable
3
(a,b).Wenotethatfine-tuningbaselinescanenhancetheaccuracyonCIFAR-10comparedtothezero-shotmodels.However,thisimprovementcomesattheexpenseofreducedaccuracyonSTL-10.Forinstance,E2E-FTleadstoadecreaseofapproximately3.6%inSTL-10accuracy,asshowninTable
3(a)
.Previousensemblemethodscanmitigatethedegradationtosomeextent,buttheSTL-10performancestilllagsbehindthezero-shotperformance,e.g.,InTable
3(b),theaccuracyofE2E-FT
+WSEis97.8%whereasthezero-shotperformanceis98.4%.Incontrast,ourVRFsimultaneouslyimprovesaccuracyonbothCIFAR-10andSTL-10.Similarly,forEntity-30,ourVRFcanfurtherimprovementtheOODperformancewhencomparedtoWSEandOSEmethods.
Inaddition,weplottheID-OODfrontiercurvesinFigure
3
(a.1&b.1),respectively.SimilartotheresultsonImageNet(Figure
1(a)),theensemblemodelachievesitsbestIDandOODperformances
atdifferentαvalues.Forinstance,ontheCIFAR-10benchmark,whentheensemblemodelattainsitsoptimalIDvalueatα=0.7,theOODperformancedecreasesby2.0%relativetoitspeak.
8
Table4:ResultsofVRFforlinear-probedmodelsusingCLIPViT-B/16models.
Method
ImageNetIDOOD
CIFAR-10IDOOD
Entity-30IDOOD
Zero-shotclassifier[20]
68.358.4
90.198.4
68.368.2
Linearclassifier
79.3
55.2
95.8
97.7
95.3
69.6
WSE/OSE
79.9
57.8
95.8
97.7
95.5
70.5
VRF(ours)
79.8
58.5
95.8
98.4
95.4
71.4
Conversely,whentheoptimalOODvalueisreachedatα=0.3,theperformanceonIDdiminishesby2.7%fromitsbest.Incontrast,ourVRFsimultaneouslyattainstheIDandOODperformance.
WealsoanalyzetherelationbetweentheratioinFigure
3
(a.2&b.2).ConsistentwiththefindingsfromImageNet(Figure
1
(b)),weobservethattheratiodecreasesasd(x)increases,whichfurthersupportsourdesignofassigningahigherweighttofine-tunedmodelsifd(x)issmal
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- JJF(陜) 079-2021 瀝青延度儀校準(zhǔn)規(guī)范
- 《高層分析》課件
- 杭電電子設(shè)計課件驅(qū)動電路設(shè)計
- 道路運輸設(shè)備承攬合同三篇
- 主題教育活動的創(chuàng)新設(shè)計計劃
- WS-1紙張濕強劑相關(guān)行業(yè)投資規(guī)劃報告范本
- PMMA再生料相關(guān)行業(yè)投資方案
- 幼兒園心理健康宣傳計劃
- 創(chuàng)造性思維下的新年目標(biāo)計劃
- 學(xué)校秋季環(huán)境美化活動計劃
- 膜性腎病基礎(chǔ):流行病學(xué)病因?qū)W和發(fā)病機制
- 2024年統(tǒng)計法知識講座
- 廣東省中山市2023-2024學(xué)年七年級上學(xué)期期末生物試卷
- 醫(yī)院護理培訓(xùn)課件:《股骨頸骨折中醫(yī)護理查房》
- 新產(chǎn)品開發(fā)市場風(fēng)險評估與防范措施可行性研究報告
- 玩轉(zhuǎn)計算機網(wǎng)絡(luò)-計算機網(wǎng)絡(luò)原理智慧樹知到課后章節(jié)答案2023年下青島大學(xué)
- 犯罪現(xiàn)場勘察題庫(348道)
- 竣工財務(wù)決算審計工作方案
- 貴陽市云巖區(qū)2023-2024學(xué)年數(shù)學(xué)三年級第一學(xué)期期末綜合測試試題含答案
- Stevens-Johnson綜合征及中毒性表皮壞死松解癥課件
- 學(xué)前兒童健康教育與活動指導(dǎo)(第2版)高職PPT完整全套教學(xué)課件
評論
0/150
提交評論