FCOS完整詳細(xì)解讀

上傳人：秋*** IP屬地：陜西上傳時(shí)間：2022-11-29 格式：DOCX 頁數(shù)：16 大?。?25.61KB 積分：12 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩11頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

FCOS的head部分：cls分支和bbox分支其實(shí)是和retinanet一樣的，只不過沒有了A這個(gè)anchor的數(shù)量，以及回歸的對(duì)象不一樣，但是網(wǎng)絡(luò)的整體結(jié)構(gòu)還是和retinanet一樣。在計(jì)算流程上不一樣的地方我覺得不一樣的點(diǎn)是：retinanet是將每個(gè)rpn網(wǎng)絡(luò)的輸出concate起來，而FCOS是每層單獨(dú)預(yù)測(cè)，之后將每一層的結(jié)果concat起來，可能是因?yàn)镕COS在concate的時(shí)候不方便，因?yàn)榫W(wǎng)絡(luò)中多出了一個(gè)centerness的分支，下面我將從FCOS的測(cè)試代碼和訓(xùn)練代碼開始解析記錄，解析中省去了backbone以及FPN網(wǎng)絡(luò)的解析，主要在head部分。測(cè)試代碼流程：先經(jīng)過backbone：resnet50+FPN，得到features11features=self.backbone(images.tensors)4features[1].shapetorch.Size([1,256,features[2].shape50,68])torch.Size([1,256,features[3].shape25,34])torch.Size([1,256,features[4].shape13,17])11torch.Size([1,256,7,9])FPN頭網(wǎng)絡(luò)輸出FPN頭網(wǎng)絡(luò)輸出features[0].shape3torch.Size([1,256,100,136])11proposals,proposal_losses=self.rpn(images,features,targets)經(jīng)過FCOSModule()類中的head網(wǎng)絡(luò)結(jié)構(gòu),得到cls,reg,centerness輸出box_cls:11box_cls[0].shape2torch.Size([1,80,100,136])3box_cls[1].shape4torch.Size([1,80,50,68])5box_cls[2].shape6torch.Size([1,80,25,34])7box_cls[3].shape8torch.Size([1,80,13,17])9box_cls[4].shape10torch.Size([1,80,7,9])box_reg:11box_regression[0].shape2torch.Size([1,4,100,136])3box_regression[1].shape4torch.Size([1,4,50,68])5box_regression[2].shape6torch.Size([1,4,25,34])77box_regression[3].shape8torch.Size([1,4,13,17])9box_regression[4].shape10torch.Size([1,4,7,9])centerness:11centerness[0].shape2torch.Size([1,1,100,136])3centerness[1].shape4torch.Size([1,1,50,68])5centerness[2].shape6torch.Size([1,1,25,34])7centerness[3].shape8torch.Size([1,1,13,17])9centerness[4].shape10torch.Size([1,1,7,9])計(jì)算locationslocations通過backbone經(jīng)過FPN后得到的features得到，下面用第一層FPN分析：1 1 locations=pute_locations(features)defcompute_locationsdefcompute_locations(self,features):locations=[]forlevel,featureinenumerate(features):h,w=feature.size()[‐2:]locations_per_level=pute_locations_per_level(h,w,self.fpn_strides[level],feature.device8 )locations.append(locations_per_level)returnlocations傳入的參數(shù)只要有featuremap的H,W,網(wǎng)絡(luò)的步長(zhǎng),通過H,W可以得到有多少個(gè)感受野的中心點(diǎn)，通過步長(zhǎng)就可以得到以這個(gè)感受野的中心點(diǎn)為中心的矩形框大小locations_per_levellocations_per_level=pute_locations_per_level(h,w,self.fpn_strides[level],feature.device4 )具體的過程如下:11defcompute_locations_per_level(self,h,w,stride,device):234shifts_x=0,w*dtype=torch.arange(stride,step=stride,torch.float32,device=device567891011121314)shifts_y=torch.arange(0,h*stride,step=stride,dtype=torch.float32,device=device)shift_y,shift_x=torch.meshgrid(shifts_y,shifts_x)shift_x=shift_x.reshape(‐1)shift_y=shift_y.reshape(‐1)locations=torch.stack((shift_x,shift_y),dim=1)+stridereturnlocations//2stride=8，所以橫向的每個(gè)中心點(diǎn)相距8個(gè)像素的單位11shifts_x2tensor([0.,8.,16.,24.,32.,40.,48.,56.,64.,72.,80.,88.,3 96.,104.,112.,120.,128.,136.,144.,152.,160.,168.,176.,184.,4 192.,200.,208.,216.,224.,232.,240.,248.,256.,264.,272.,280.,5 288.,296.,304.,312.,320.,328.,336.,344.,352.,360.,368.,376.,6 384.,392.,400.,408.,416.,424.,432.,440.,448.,456.,464.,472.,7 480.,488.,496.,504.,512.,520.,528.,536.,544.,552.,560.,568.,8 576.,584.,592.,600.,608.,616.,624.,632.,640.,648.,656.,664.,9 672.,680.,688.,696.,704.,712.,720.,728.,736.,744.,752.,760.,10 768.,776.,784.,792.],device='cuda:0')11shifts_y2tensor([0.,8.,16.,24.,32.,40.,48.,56.,64.,72.,80.,88.,3 96.,104.,112.,120.,128.,136.,144.,152.,160.,168.,176.,184.,4 192.,200.,208.,216.,224.,232.,240.,248.,256.,264.,272.,280.,5 288.,296.,304.,312.,320.,328.,336.,344.,352.,360.,368.,376.,6 384.,392.,400.,408.,416.,424.,432.,440.,448.,456.,464.,472.,7 480.,488.,496.,504.,512.,520.,528.,536.,544.,552.,560.,568.,8 576.,584.,592.,600.,608.,616.,624.,632.,640.,648.,656.,664.,9 672.,680.,688.,696.,704.,712.,720.,728.,736.,744.,752.,760.,10 768.,776.,784.,792.],device='cuda:0')得到locations坐標(biāo):11locations2tensor([[4.,4.],3 [12.,4.],4 4 [20.,4.],5 ...,6 [780.,796.],7 [788.,796.],8 [796.,796.]],device='cuda:0')得到的是(4,4)，是因?yàn)檫@個(gè)是中心點(diǎn)，從特征圖到原圖的映射是映射到左上角的點(diǎn)，所以11locations=torch.stack((shift_x,shift_y),dim=1)+stride//2是在得到左上角坐標(biāo)后，將x,y分別加上stride//2就可以得到中心點(diǎn)的坐標(biāo)。第二層的FPN同理，由于網(wǎng)絡(luò)的stride=16，不是8，所以一個(gè)點(diǎn)的感受野面積就是16，所以中心點(diǎn)就是8，8+16，8+16*2...11tensor([[8.,8.],2 [24.,8.],3 [40.,8.],4 ...,5 [760.,792.],6 [776.,792.],7 [792.,792.]],device='cuda:0')第三層FPN同理，stride=32，所以locations是：11tensor([[16.,16.],2 [48.,16.],3 [80.,16.],4 ...,5 [720.,784.],6 [752.,784.],7 [784.,784.]],device='cuda:0')第四層FPN同理，stride=64，所以locations是:1 1 [[32.,32.],2 [96.,32.],3 [160.,32.],4 [224.,32.],5 [288.,32.],67 [672.,800.],8 [736.,800.],9 [800.,800.]],device='cuda:0')......第五層，stride=128，locations是:11tensor([[64.,64.],2 [192.,64.],3 [320.,64.],4 [448.,64.],5 [576.,64.],6 [704.,64.],7 [832.,64.],8 [64.,192.],9 [192.,192.],1011 [64.,832.],12 [192.,832.],13 [320.,832.],14 [448.,832.],15 [576.,832.],16 [704.,832.],17 [832.,832.]],device='cuda:0')......需要注意一件事，從第一層到第五層，感受野的面積是逐步增大的，featuremap的大小變小了，所以映射到原圖的感受野的點(diǎn)就少了，但是面積大了。以上得到的是從FPN的五層特征圖分別映射到原圖的坐標(biāo)locations11locations=pute_locations(features)在得到網(wǎng)絡(luò)3個(gè)輸出以及映射坐標(biāo)中心點(diǎn)A后，開始整合selfself._forward_test(locations,box_cls,box_regression,centerness,images.image_sizes4 )box_cls輸出的是:這個(gè)L,r,t,b就是每個(gè)中心點(diǎn)坐標(biāo)A點(diǎn)對(duì)于GT框的偏移量，所以在得到L,r,t,b之后，由因?yàn)?x,y)是A點(diǎn)的中心坐標(biāo)，所以左上角的坐標(biāo)就是x0=(x-l),y0=(y-l),x1=(x+r),y1=(y+b)。_forward_test代碼:11def_forward_test(self,locations,box_cls,box_regression,centerness,image_sizes):23456boxes=self.box_selector_test(locations,box_cls,box_regression,centerness,image_sizes)returnboxes,{}1defforward1defforward(self,locations,box_cls,box_regression,centerness,image_sizes):2345678910111213"""Arguments:anchors:list[list[BoxList]]box_cls:list[tensor]box_regression:list[tensor]image_sizes:list[(h,w)]Returns:boxlists(list[BoxList]):thepost‐processedanchors,afterapplyingboxdecodingandNMS"""sampled_boxes=[]for_,(l,o,b,c)inenumerate(zip(locations,box_cls,box_regression,centerness)):141516171819sampled_boxes.append(self.forward_for_single_feature_map(l,o,b,c,image_sizes))20212223boxlists=list(zip(*sampled_boxes))boxlists=[cat_boxlist(boxlist)forboxlistinboxlists]ifnotself.bbox_aug_enabled:boxlists=self.select_over_all_levels(boxlists)2425 returnboxlists調(diào)用forward_for_single_feature_mapdefdefforward_for_single_feature_map(self,locations,box_cls,box_regression,centerness,image_sizes):5 """667891011Arguments:anchors:list[BoxList]box_cls:tensorofsizeN,A*C,H,Wbox_regression:tensorofsizeN,A*4,H,W"""N,C,H,W=box_cls.shape12#putinthesameformataslocationsbox_cls=box_cls.view(N,C,H,W).permute(0,2,3,1)box_cls=box_cls.reshape(N,‐1,C).sigmoid()box_regression=box_regression.view(N,4,H,W).permute(0,2,3box_regression=box_regression.reshape(N,‐1,4)centerness=centerness.view(N,1,H,W).permute(0,2,3,1)centerness=centerness.reshape(N,‐1).sigmoid()11box_cls.shape2torch.Size([1,10000,80])3box_regression.shape4torch.Size([1,10000,4])5centerness.shape6torch.Size([1,10000])將分類的輸出經(jīng)過sigmoid后，candiate_inds有個(gè)閾值的篩選，將低于nms閾值的設(shè)置為False.candidate_indscandidate_inds=box_cls>self.pre_nms_thresh#candidate_inds.shape3#torch.Size([1,10000,80])下面這兩行代碼有點(diǎn)迷：pre_nms_top_npre_nms_top_n=candidate_inds.view(N,‐1).sum(1)pre_nms_top_n=pre_nms_top_n.clamp(max=self.pre_nms_top_n)3然后關(guān)鍵的一步來了，將centerness點(diǎn)乘box_cls，注意這個(gè)box_cls是經(jīng)過了sigmoid輸出的#multiplytheclassificationscores#multiplytheclassificationscoreswithcenternessscoresbox_cls=box_cls*centerness[:,:,None]作者這么做的意思是在featuremap上的點(diǎn)都乘以一個(gè)權(quán)值，而不是去關(guān)注這個(gè)點(diǎn)的類別，所以就對(duì)這個(gè)點(diǎn)的81個(gè)通道都乘以一個(gè)centerness的權(quán)值：11box_cls.shape2torch.Size([1,10000,80])3centerness.shape44torch.Size([1,10000])5centerness[:,:,None].shape6torch.Size([1,10000,1])乘完前的：11box_cls2tensor([[[0.0050,0.0027,0.0050,...,0.0025,0.0026,0.0037],3 [0.0032,0.0013,0.0027,...,0.0013,0.0014,0.0023],4 [0.0027,0.0012,0.0021,...,0.0013,0.0014,0.0022],5 ...,6 [0.0027,0.0021,0.0042,...,0.0010,0.0014,0.0025],7 [0.0030,0.0021,0.0044,...,0.0010,0.0015,0.0022],8 [0.0060,0.0038,0.0070,...,0.0022,0.0027,0.0035]]],9 device='cuda:0')乘完后的:11box_cls2tensor([[[0.0015,0.0008,0.0015,...,0.0008,0.0008,0.0011],3 [0.0014,0.0006,0.0012,...,0.0006,0.0006,0.0010],4 [0.0009,0.0004,0.0007,...,0.0005,0.0005,0.0008],5 ...,6 [0.0008,0.0006,0.0012,...,0.0003,0.0004,0.0007],7 [0.0009,0.0006,0.0014,...,0.0003,0.0005,0.0007],8 [0.0016,0.0010,0.0019,...,0.0006,0.0007,0.0009]]],9 device='cuda:0')驗(yàn)證一下centerness[0][0]是0.3064，所以第一行的box_cls都會(huì)乘以一個(gè)0.3064：centernesscenterness[0,0]tensor(0.3064,device='cuda:0')之后的這段操作沒看懂：1010per_box_regression=box_regression[i]per_box_regression=per_box_regression[per_box_loc]per_candidate_nonzeros=per_candidate_inds.nonzero()per_box_loc=per_candidate_nonzeros[:,0]per_class=per_candidate_nonzeros[:,1]+1789results=[]foriinrange(N):per_box_cls=box_cls[i]per_candidate_inds=candidate_inds[i]per_box_cls=per_box_cls[per_candidate_inds]61313per_locations=locations[per_box_loc]1415 per_pre_nms_top_n=pre_nms_top_n[i]16171819202122ifper_candidate_inds.sum().item()>per_pre_nms_top_n.item(per_box_cls,top_k_indices=\per_box_cls.topk(per_pre_nms_top_n,sorted=False)per_class=per_class[top_k_indices]per_box_regression=per_box_regression[top_k_indices]per_locations=per_locations[top_k_indices]23detections=torch.stack([per_locations[:,0]‐per_box_regression[:,0],per_locations[:,1]‐per_box_regression[:,1],per_locations[:,0]+per_box_regression[:,2],per_locations[:,1]+per_box_regression[:,3],29 ],dim=1)30前面的：11sampled_boxes=[]2for_,(l,o,b,c)inenumerate(zip(locations,box_cls,box_regression,centerness)):sampled_boxes.append(self.forward_for_single_feature_map(l,o,b,c,image_sizes6 )7 )89101112boxlists=list(zip(*sampled_boxes))boxlists=[cat_boxlist(boxlist)forboxlistinboxlists]ifnotself.bbox_aug_enabled:boxlists=self.select_over_all_levels(boxlists)1314 returnboxlists得到的sampled_boxes就是FPN每一層得到的bbox坐標(biāo)，類別，置信度，封裝在了BoxList這個(gè)類中:00:BoxList(num_boxes=1000,image_width=1066,image_height=800,mode=xyxy)1:BoxList(num_boxes=984,image_width=1066,image_height=800,mode=xyxy)22:BoxList(num_boxes=492,image_width=1066,image_height=800,mode=xyxy)3:BoxList(num_boxes=138,image_width=1066,image_height=800,mode=xyxy)4:BoxList(num_boxes=14,image_width=1066,image_height=800,mode=xyxy)打印下第一層的輸出：11boxlists[0][0].dict 2{'bbox':tensor([[684.2922,1...='cuda:0'),'extra_fields':{'labels':tensor([1,1,1,...='cuda:0'),'scores':tensor([0.2035,0.20...='cuda:0')},'mode':'xyxy','size':(1066,800)}specialvariablesfunctionvariables5'bbox':tensor([[684.2922,199.6761,731.8024,234.6115],6 [596.9469,203.1248,666.8824,258.1436],7 [602.8936,203.0078,665.6298,258.0040],將所有層的bboxconcate經(jīng)過nms處理resultresult=boxlist_ml_nms(boxlists[i],self.nms_thresh)defboxlist_ml_nms():3 ...keep=_box_ml_nms(boxes,scores,labels.float(),nms_thresh)returnresult對(duì)處理后的result還要進(jìn)一步過濾,一張圖片檢測(cè)出物體的限制在100以下：就拿某張圖片來說，result的長(zhǎng)度是478代表檢測(cè)出了478個(gè)物體，需要過濾：defselect_over_all_levelsdefselect_over_all_levels(self,boxlists):num_images=len(boxlists)results=[]foriinrange(num_images):#multiclassnmsresult=boxlist_ml_nms(boxlists[i],self.nms_thresh)number_of_detections=len(result)89101112131415161718#Limittomax_per_imagedetections**overallclasses**ifnumber_of_detections>self.fpn_post_nms_top_n>0:cls_scores=result.get_field("scores")image_thresh,_=torch.kthvalue(cls_scores.cpu(),number_of_detections‐self.fpn_post_nms_top_n+1)keep=cls_scores>=image_thresh.item()keep=torch.nonzero(keep).squeeze(1)result=result[keep]191920results.append(result)returnresults1image_thresh1image_thresh,_=torch.kthvalue(234cls_scores.cpu(),number_of_detections‐self.fpn_post_nms_top_n+1)#y,i=torch.kthvalue(x,k,n)沿著n維度返回第k小的數(shù)據(jù)。將在輸入模型的尺寸下的預(yù)測(cè)出來的bbox，resize到原圖下的大小:#alwayssingleimageispassedatatime#alwayssingleimageispassedatatimeprediction=predictions[0]3#reshapeprediction(aBoxList)intotheoriginalimagesizeheight,width=original_image.shape[:‐1]prediction=prediction.resize((width,height))defrun_on_opencv_image(defrun_on_opencv_image(self,image):predictions=pute_prediction(image)top_predictions=self.select_top_predictions(predictions)4接下來作者還設(shè)置了對(duì)于每一類都應(yīng)該有的閾值限制,select_top_predictions函數(shù)中的實(shí)現(xiàn)：defselect_top_predictionsdefselect_top_predictions(self,predictions):scores=predictions.get_field("scores")labels=predictions.get_field("labels")thresholds=self.confidence_thresholds_for_classes[(labels‐1).ong()]keep=torch.nonzero(scores>thresholds).squeeze(1)predictions=predictions[keep]scores=predictions.get_field("scores")_,idx=scores.sort(0,descending=True)returnpredictions[idx]上面的scores,labels是100的大小:11labels2tensor([1,1,1,1,25,25,25,25,25,25,68,68,68,36,40,68,25,27,3 28,25,28,68,28,42,68,28,26,3,27,25,27,39,39,42,39,42,4 4 27,36,25,1,1,1,1,1,1,25,1,25,27,25,14,57,28,35,5 39,3,27,25,27,39,27,28,35,39,25,39,39,39,34,1,1,1,6 1,1,1,1,1,1,25,1,3,1,27,25,27,1,1,1,1,1,7 1,1,8,1,1,1,1,1,1,1])labels.shapetorch.Size([100])在經(jīng)過對(duì)每一類的閾值過濾后得到了15個(gè)bbox：predictionspredictionsBoxList(num_boxes=15,image_width=640,image_height=480,mode=xyxy)再按照分?jǐn)?shù)從大到小排序就可以得到presions[idx]最后的最后就是先給bbox畫框再給bbox畫label了。resultresult=self.overlay_boxes(result,top_predictions)result=self.overlay_class_names(result,top_predictions)至此，下面的函數(shù)已經(jīng)執(zhí)行完了，整個(gè)預(yù)測(cè)過程結(jié)束11composite=coco_demo.run_on_opencv_image(img)訓(xùn)練代碼中的IOUloss+Centernessloss首先分析下labelcls_loss分類損失個(gè)人認(rèn)為是和fcn分割網(wǎng)絡(luò)的loss的featuremap上采樣，而是通過從featuremap中點(diǎn)的映射到原圖是否在中心點(diǎn)內(nèi)確定這個(gè)點(diǎn)的label的類別，得到其abel后與網(wǎng)絡(luò)輸出的同樣的大小的softmax進(jìn)行交叉熵?fù)p失計(jì)算，這個(gè)計(jì)算肯定要flatten的，并且作者這里使用的是focalloss。box_reg_loss_func回歸損失IOUloss，這個(gè)target心點(diǎn)坐標(biāo)與gt邊框的l,r,t,b，pred是網(wǎng)絡(luò)預(yù)測(cè)出來的l,r,t,b,兩個(gè)計(jì)算iouloss，代碼回歸的loss是正label的loss，這一步是通過之后在相應(yīng)的位置乘以一個(gè)label為0或者1來消去對(duì)應(yīng)的loss的。classclassIOULoss(nn.Module):definit(self,loss_type="iou"):super(IOULoss,self).init()self.loss_type=loss_type5defforward(self,pred,target,weight=None):pred_left=pred[:,0]pred_top=pred[:,1]pred_right=pred[:,2]pred_bottom=pred[:,3]11target_lefttarget0

人人文庫> 全部分類> 應(yīng)用文書 > 技術(shù)指導(dǎo)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

FCOS完整詳細(xì)解讀

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

FCOS完整詳細(xì)解讀

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔