




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
Adversarial
Example
Detection姜育剛,馬興軍,吳祖煊Recap:
week
3
1.
Adversarial
Examples
2.
Adversarial
Attacks
3.
Adversarial
Vulnerability
UnderstandingIn-class
Adversarial
Attack
Competitionhttps://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8
In-class
Adversarial
Attack
CompetitionIn-class
Adversarial
Attack
CompetitionAdversarial
attack
competition(account
for
30%)必須使用學(xué)校郵箱注冊比賽(否則無成績)比賽時間:Phase
1:10月1號–
10月28號Phase
2:評估階段,學(xué)生不參與沒卡的同學(xué)可以使用Google
Colab:/
按排名算分:第一名30分最后一名15分Adversarial
Example
Detection
(AED)A
binary
classification
problem:
clean
(y=0)
or
adv
(y=1)?An
anomaly
detection
problem:
benign
(y=0)
or
abnormal
(y=1)?
Principles
for
AEDAll
binary
classification
methods
can
be
applied
for
AEDPrinciples
for
AEDAll
anomaly
detection
methods
can
be
applied
for
AEDPrinciples
for
AEDUse
as
much
information
as
you
canInput
statisticsManual
featuresTraining
dataAttention
mapTransformationMixupDenoising…ActivationsDeep
featuresProbabilitiesLogitsGradientsLoss
landscapeUncertainty…Principles
for
AEDLeverage
unique
characteristics
of
adversarial
examplesTwinsStrangersExtremely
close
to
the
clean
sampleFar
away
in
predictionPrinciples
for
AEDBuild
detectors
based
on
existing
understandingsHigh
dimensional
pocketsLocal
linearityTilting
boundaryPrinciples
for
AEDIt’s
is
still
feature
engineering!Challenges
in
AEDThe
diversity
of
adversarial
examples
used
for
training
the
detectors
determine
the
detection
performanceDetectors
are
also
machine
learning
models:
they
are
also
vulnerable
to
adversarial
attacks
The
detectors
need
to
detect
both
existing
and
unknown
attacksThe
detectors
need
to
be
robust
to
adaptive
attacksExisting
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(誘捕檢測法)Existing
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(誘捕檢測法)Secondary
Classification
MethodsTake
adversarial
examples
as
a
new
classAdversarialRetraining
(對抗重訓(xùn)練)Grosse
et
al.
Onthe(Statistical)DetectionofAdversarialExamples,
arXiv:1702.06280Secondary
Classification
MethodsClean
samples
as
class
0,
adversarial
as
class
1AdversarialClassification
(對抗分類)Gong
et
al.
Adversarialandcleandataarenottwins,
arXiv:1704.04960Secondary
Classification
MethodsTraining
a
detector
for
each
intermediate
layerCascade
Classifiers
(級聯(lián)分類器)Metzen,JanHendrik,etal."Ondetectingadversarialperturbations."
arXivpreprintarXiv:1702.04267
(2017).Existing
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(誘捕檢測法)Principle
Component
Analysis
(PCA)The
last
few
components
differentiate
adversarial
examplesHendrycks,Dan,andKevinGimpel.“Earlymethodsfordetectingadversarialimages.”
arXiv:1608.00530
(2016);
Carlini
and
Wagner."Adversarialexamplesarenoteasilydetected:Bypassingtendetectionmethods."
AISec.2017.Blue:
a
clean
sampleYellow:
an
adv
exampleAn
artifact
caused
by
the
black
backgroundDimensionality
ReductionBhagoji,ArjunNitin,DanielCullina,andPrateekMittal."Dimensionalityreductionasadefenseagainstevasionattacksonmachinelearningclassifiers."arXiv:1704.02654
2.1(2017).Train
on
PCA
reduced
dataExisting
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(誘捕檢測法)Distribution
DetectionGrosse
et
al.
Onthe(Statistical)DetectionofAdversarialExamples,
arXiv:1702.06280MaximumMeanDiscrepancy
(MMD)Two
datasets:
Distribution
DetectionFeinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."
arXivpreprintarXiv:1703.00410
(2017).KernelDensityEstimation
(KDE)Adversarial
examples
are
in
low
density
spaceDistribution
DetectionFeinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."
arXivpreprintarXiv:1703.00410
(2017).KernelDensityEstimation
(KDE)Adversarial
examples
are
in
low
density
space
Bypassing
10
Detection
MethodsAdversarialExamplesAreNotEasilyDetected:BypassingTenDetectionMethods.
Carlini
and
Wagner,
AISec
2017.Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018Definition(LocalIntrinsicDimensionality)AdversarialexamplesareinhighdimensionalsubspacesLocal
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018AdversarialSubspacesandExpansionDimension:
Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018Estimatinglocalintrinsicdimensionality.Amsaleg
et
al.KDD
2015EstimationofLID:
Hill(MLE)estimator(Hill1975,Amsalegetal.2015):BasedonExtremeValueTheory:Nearestneighbordistancesareextremeevents.LowertaildistributionfollowsGeneralizedParetoDistribution(GPD).
Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018InterpretationofLIDforAdversarialSubspaces:LIDdirectlymeasuresexpansionrateoflocaldistancedistributions.Theexpansionofadversarialsubspaceishigherthannormaldatasubspace.LIDassessesthespace-fillingcapabilityofthesubspace,basedonthedistancedistributionoftheexampletoitsneighbors.Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018LID
of
adversarial
examples
(red)
are
higherLID
at
deeper
layers
are
more
differentiableLocal
Intrinsic
Dimensionality
(LID)Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018Experiments&Results:DatasetFeatureFGMBIM-aBIM-bJSMAOptMNISTKD78.1298.1498.6168.7795.15BU32.3791.5525.4688.7471.30LID96.8999.6099.8392.2499.24CIFAR-10KD64.9268.3898.7085.7791.35BU70.5381.6097.3287.3691.39LID82.3882.5199.7895.8798.94SVHNKD70.3977.1899.5786.4687.41BU86.7884.0786.9391.3387.13LID97.6187.5599.7295.0797.60Local
Intrinsic
Dimensionality
(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.
Maet
al.
ICLR
2018Experiments&Results:Train\TestattackFGMBIM-aBIM-bJSMAOptFGSMKD64.9269.1589.7185.7291.22BU70.5381.672.6586.7991.27LID82.3882.3091.6189.9393.32Detectors
trained
on
simple
attacks
FGSM
can
detect
complex
attacksAn
Improved
Detector
of
LID/pdf/2212.06776.pdf
An
Improved
Detector
of
LID/pdf/2212.06776.pdfMahalanobisDistance
(MD)Mahalanobis,PrasantaChandra."Onthegeneralizeddistanceinstatistics."NationalInstituteofScienceofIndia,1936.
The
MD
of
between
two
data
points:MahalanobisDistance
(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”
NeurIPS
2018.
MahalanobisDistance
(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”
NeurIPS
2018.MahalanobisDistance
(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”
NeurIPS
2018.Experiments&Results:Existing
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(誘捕檢測法)Bayes
UncertaintyBayesianUncertainty(BU)
Feinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."
arXivpreprintarXiv:1703.00410
(2017).Feature
SqueezingXu
et
al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."
arXiv:1704.01155
(2017).Bit
depth
reductionSqueezing
clean
and
adv
examplesReducing
input
dimensionality
improves
robustnessThe
prediction
inconsistency
before
and
after
squeezing
can
detect
advsRandom
TransformationTian
et
al."Detectingadversarialexamplesthroughimagetransformation."
AAAI2018.The
prediction
of
advs
will
change
after
random
transformationsLog-OddsRoth
et
al.“Theoddsareodd:Astatisticaltestfordetectingadversarialexamples.”
ICML2019.Add
random
noise
to
the
input
Log-OddsHuetal.“Anewdefenseagainstadversarialimages:Turningaweaknessintoastrength.”
NeurIPS
2019.原則1:對抗樣本的梯度更均勻原則2:對抗樣本難以被攻擊第二次測試準則1:隨機噪聲不會改變預(yù)測結(jié)果測試準則1:再次攻擊需要更多的擾動Existing
MethodsSecondary
Classification
Methods
(二級分類法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布檢測法)Prediction
Inconsistency
(預(yù)測不一致性)Reconstruction
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 小班藝術(shù)活動豐富孩子的創(chuàng)意計劃
- 倉庫物流環(huán)節(jié)的協(xié)同管理計劃
- 中學(xué)教育改革計劃
- 醫(yī)保相關(guān)知識培訓(xùn)課件
- 貴州豐聯(lián)礦業(yè)有限公司普安縣久豐煤礦變更礦產(chǎn)資源綠色開發(fā)利用方案(三合一)評審意見
- 《且末縣邦泰礦業(yè)投資有限公司新疆且末縣屈庫勒克東礦區(qū)金(銻)礦礦產(chǎn)資源開發(fā)利用與生態(tài)保護修復(fù)方案》專家意見的認定
- 《8 節(jié)日美食》(教學(xué)設(shè)計)-2023-2024學(xué)年五年級下冊綜合實踐活動皖教版
- 如何改善個人責(zé)任意識提升組織效率
- 第三單元第十三課《使用電子郵件和QQ》-教學(xué)設(shè)計 2023-2024學(xué)年粵教版(2019)初中信息技術(shù)七年級上冊
- 2025年粘土主題組合創(chuàng)意標準教案
- 中華人民共和國文物保護法學(xué)習(xí)課程PPT
- 中班健康《身體上的洞洞》課件
- 2023年國家漢辦(HSK)漢語水平考試四級考試真題
- ROE預(yù)算管理:培養(yǎng)經(jīng)營干部
- 山西省中小學(xué)生轉(zhuǎn)學(xué)申請表及說明、電子操作流程
- 工行全國地區(qū)碼
- 2021年八省聯(lián)考數(shù)學(xué)試卷
- 《景陽岡》課本劇劇本
- qc小組活動記錄
- Book Unit Learning Chinese Style的實用學(xué)習(xí)課件
- 益盟軟件使用說明書
評論
0/150
提交評論