日本免费全黄少妇一区二区三区-高清无码一区二区三区四区-欧美中文字幕日韩在线观看-国产福利诱惑在线网站-国产中文字幕一区在线-亚洲欧美精品日韩一区-久久国产精品国产精品国产-国产精久久久久久一区二区三区-欧美亚洲国产精品久久久久

ssd檢測不到怎么辦 ssd檢測( 五 )


c_conv10_2 = c_conv10_2.view(batch_size, -1, self.n_classes) # (N, 36, n_classes)
c_conv11_2 = self.cl_conv11_2(conv11_2_feats) # (N, 4 * n_classes, 1, 1)
c_conv11_2 = c_conv11_2.permute(0, 2, 3, 1).contiguous() # (N, 1, 1, 4 * n_classes)
c_conv11_2 = c_conv11_2.view(batch_size, -1, self.n_classes) # (N, 4, n_classes)
# A total of 8732 boxes
# Concatenate in this specific order
locs = torch.cat([l_conv4_3, l_conv7, l_conv8_2, l_conv9_2, l_conv10_2, l_conv11_2], dim=1) # (N, 8732, 4)
classes_scores = torch.cat([c_conv4_3, c_conv7, c_conv8_2, c_conv9_2, c_conv10_2, c_conv11_2], dim=1) # (N, 8732, n_classes)
return locs, classes_scores
這可能看起來很復雜,但它基本上獲得了我們從基礎 VGG-16 和輔助卷積中獲得的所有特征圖,并應用卷積層來預測每個特征圖的類別和邊界框 。
組合成完整代碼
現(xiàn)在讓我們把它們放在一起,看看最終的架構(gòu),如下所示 。
class SSD300(nn.Module):
"""
The SSD300 network – encapsulates the base VGG network, auxiliary, and prediction convolutions.
"""
def __init__(self, n_classes, device):
super(SSD300, self).__init__()
self.n_classes = n_classes
self.device = device
self.base = VGGBase()
self.aux_convs = AuxiliaryConvolutions()
self.pred_convs = PredictionConvolutions(n_classes)
# Since lower level features (conv4_3_feats) have considerably larger scales, we take the L2 norm and rescale
# Rescale factor is initially set at 20, but is learned for each channel during back-prop
self.rescale_factors = nn.Parameter(torch.FloatTensor(1, 512, 1, 1)) # there are 512 channels in conv4_3_feats
nn.init.constant_(self.rescale_factors, 20)
# Prior boxes
self.priors_cxcy = self.create_prior_boxes()
self.to(device)
def forward(self, image):
"""
Forward propagation.
:param image: images, a tensor of dimensions (N, 3, 300, 300)
:return: 8732 locations and class scores (i.e. w.r.t each prior box) for each image
"""
# Run VGG base network convolutions
conv4_3_feats, conv7_feats = self.base(image) # (N, 512, 38, 38), (N, 1024, 19, 19)
# Rescale conv4_3 after L2 norm
norm = conv4_3_feats.pow(2).sum(dim=1, keepdim=True).sqrt() # (N, 1, 38, 38)
conv4_3_feats = conv4_3_feats / norm # (N, 512, 38, 38)
conv4_3_feats = conv4_3_feats * self.rescale_factors # (N, 512, 38, 38)
# Run auxiliary convolutions
# (N, 512, 10, 10), (N, 256, 5, 5), (N, 256, 3, 3), (N, 256, 1, 1)
conv8_2_feats, conv9_2_feats, conv10_2_feats, conv11_2_feats = self.aux_convs(conv7_feats)
# Run prediction convolutions
# (N, 8732, 4), (N, 8732, n_classes)
locs, classes_scores = self.pred_convs(conv4_3_feats, conv7_feats, conv8_2_feats, conv9_2_feats, conv10_2_feats, conv11_2_feats)
return locs, classes_scores
請注意,較低級別的特征(conv4_3_feats)具有相當大的尺度,因此我們采用 L2 范數(shù)并重新調(diào)整它 。重新縮放因子最初設置為 20,但在反向傳播期間為每個通道學習 。
損失函數(shù)
可以看出,定位損失是 L1 平滑損失,而分類損失是眾所周知的交叉熵損失 。
匹配策略
在訓練期間,我們需要確定哪些生成的先驗框應該與我們要包含在損失計算中的地面實況框相對應 。因此,我們將每個真實框與具有最高 Jaccard 重疊的先驗框進行匹配 。此外,我們還選擇了重疊至少為 0.5 的先驗框,以允許網(wǎng)絡預測多個重疊框的高分 。
在匹配步驟之后,大多數(shù)先驗/默認框用作負樣本 。然而,為了避免正負樣本之間的不平衡,我們最多保持 3:1 的比例,因為這樣可以更快地優(yōu)化和穩(wěn)定學習 。再一次,定位損失僅在正(非背景)先驗上計算 。

推薦閱讀