數(shù)據(jù)與模型安全 課件 第6周:數(shù)據(jù)投毒和防御_第1頁(yè)
數(shù)據(jù)與模型安全 課件 第6周:數(shù)據(jù)投毒和防御_第2頁(yè)
數(shù)據(jù)與模型安全 課件 第6周:數(shù)據(jù)投毒和防御_第3頁(yè)
數(shù)據(jù)與模型安全 課件 第6周:數(shù)據(jù)投毒和防御_第4頁(yè)
數(shù)據(jù)與模型安全 課件 第6周:數(shù)據(jù)投毒和防御_第5頁(yè)
已閱讀5頁(yè),還剩76頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Data

Poisoning:

Attacks

and

Defenses姜育剛,馬興軍,吳祖煊Recap:

week

5

1.

Adversarial

DefenseEarly

Defense

MethodsEarly

Adversarial

Training

MethodsAdvanced

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressAdversarial

Attack

CompetitionLink:https://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8

Data

Poisoning:

Attacks

and

DefensesA

Brief

History

of

Data

PoisoningData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchA

Recap

of

the

Attack

TaxonomyAttacktimingPoisoningattackEvasionattackAttacker’sgoalTargetedattackUntargetedattackAttacker’sknowledgeBlack-boxWhite-boxGray-boxUniversalityIndividualUniversalData

Poisoning

is

Training

Time

AttackEvasion(Causation)attackTesttimeattackChangeinputexamplePoisoningattackTrainingtimeattackChangeclassificationboundaryData

Poisoning:

Attacks

and

DefensesA

Brief

History

of

Data

PoisoningData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchA

Brief

History:

The

Eearliest

WorkKearns

and

Li.

“Learninginthepresenceofmaliciouserrors”,

SIAM

Journal

on

Computing,

1993Poisoning

IntrusionDetectionSystemBarreno,Marco,etal.“Canmachinelearningbesecure?.”

ASIACCS,

2006.TheattackmodelPoisoning

IntrusionDetectionSystemBarreno,Marco,etal.“Canmachinelearningbesecure?.”

ASIACCS,

2006.ThedefensemodelSubvertYourSpamFilterNelson,Blaine,etal."Exploitingmachinelearningtosubvertyourspamfilter."

LEET

8.1(2008):9.Usenet

dictionary

attack:

Add

legitimate

words

into

spam

emails1%poisoning

can

subvert

a

spam

filterThe

Concept

of

Poisoning

AttackBiggio,Nelson

andLaskov."Poisoningattacksagainstsupportvectormachines."arXiv:1206.6389

(2012).asingleattackdatapointcausedtheclassificationerrortorisefromtheinitialerrorratesof2–5%to15–20%Data

Poisoning:

Attacks

and

DefensesA

Brief

History

of

Data

PoisoningData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchAttack

Pipeline模型訓(xùn)練污染少量訓(xùn)練樣本(越少越好)無效模型、被控制模型投毒攻擊!=后門攻擊后門攻擊的一種實(shí)現(xiàn)方式是通過數(shù)據(jù)投毒Attack

PipelineLiu,Ximeng,etal."Privacyandsecurityissuesindeeplearning:Asurvey."

IEEEAccess

9(2020):4566-4593.Attack

TypesCore

idea:

how

to

influence

the

training

processLabel

PoisoningIncorrect

labels

break

supervised

Learning!Random

LabelsLabel

FlippingPartial

Label

FlippingSelf-supervised

learning?Biggio,Nelson

andLaskov."Poisoningattacksagainstsupportvectormachines."arXiv:1206.6389

(2012).Zhang

andZhu."Agame-theoreticanalysisoflabelflippingattacksondistributedsupportvectormachines."

CISS,2017.Label

Flipping

Attack(“指鹿為馬”攻擊)p-tamperingattacksMahloujifarandMahmoody.“Blockwisep-tamperingattacksoncryptographicprimitives,extractors,andlearners.”

TCC,

2017.Mahloujifar,

Mahmoodyand

Mohammed."Universalmulti-partypoisoningattacks."

ICML,2019.投毒?Pr=0.8在線更新模型‘dog’‘dog’induces

bias

shift篡改攻擊(“暗度陳倉(cāng)”攻擊)Feature

Space

PoisoningShafahi,Ali,etal.“Poisonfrogs!targetedclean-labelpoisoningattacksonneuralnetworks.”

NeurIPS

2018.Feature

Collision

Attack(“聲東擊西”攻擊)A

white-box

data

poisoning

methodFeature

flipping,

does

not

change

labels看上去是’狗’,但是在特征空間是’魚’優(yōu)缺點(diǎn):需要知道目標(biāo)模型對(duì)遷移學(xué)習(xí)很強(qiáng)對(duì)從頭訓(xùn)練并不強(qiáng)ConvexPolytopeAttackZhu,Chen,etal.“Transferableclean-labelpoisoningattacksondeepneuralnets.”

ICML

2019.凸多面體攻擊(“四面楚歌”攻擊)Improve

the

transferability

to

different

DNNs尋找一組毒化樣本將目標(biāo)樣本包圍在一個(gè)凸包內(nèi)借助多個(gè)預(yù)訓(xùn)練模型來尋找“包圍”樣本:

m個(gè)預(yù)訓(xùn)練模型

權(quán)重約束ConvexPolytopeAttack

vs

Feature

Collision

AttackZhu,Chen,etal.“Transferableclean-labelpoisoningattacksondeepneuralnets.”

ICML

2019.基于SVM的示例藍(lán)色毛球:目標(biāo)樣本,不存在與訓(xùn)練集中紅色毛球:包圍樣本普通紅/藍(lán)點(diǎn):普通訓(xùn)練樣本Bi-level

Optimization

AttackMei

andZhu.“Usingmachineteachingtoidentifyoptimaltraining-setattacksonmachinelearners.”

AAAI

2015.投毒攻擊是一種“雙層優(yōu)化”:投毒完成后,訓(xùn)練模型才能知道其效果是一個(gè)最大-最小化(max-min)問題內(nèi)部最小化:在投毒數(shù)據(jù)上更新模型外部最大化:在更新后的模型上生成更強(qiáng)的投毒數(shù)據(jù)MetaPoisonHuang,W.Ronny,etal.“Metapoison:Practicalgeneral-purposeclean-labeldatapoisoning.”

NeurIPS

2020.One

advanced

bi-level

optimization

attack不修改類標(biāo)有目標(biāo)(Targeted攻擊)驗(yàn)證集上的性能不變使用元學(xué)習(xí)尋找高效投毒樣本可攻擊微調(diào)和端到端模型成功攻擊商業(yè)模型Google

Cloud

AutoML

APIMetaPoisonHuang,W.Ronny,etal.“Metapoison:Practicalgeneral-purposeclean-labeldatapoisoning.”

NeurIPS

2020.A

Bi-level

Min-Min

Optimization

AttackK-step優(yōu)化策略:內(nèi)層多步(’look

ahead’),外層一步使用m個(gè)模型和周期性初始化來增加探索MetaPoisonHuang,W.Ronny,etal.“Metapoison:Practicalgeneral-purposeclean-labeldatapoisoning.”

NeurIPS

2020.優(yōu)化階段投毒后影響MetaPoisonHuang,W.Ronny,etal.“Metapoison:Practicalgeneral-purposeclean-labeldatapoisoning.”

NeurIPS

2020.毒化0.1%的數(shù)據(jù)即可達(dá)到很高的ASR示例:狗毒化成鳥Witches‘Brew:思想Geiping,Jonas,etal.“Witches‘Brew:IndustrialScaleDataPoisoningviaGradientMatching.”

ICLR

2021.依然是Min-Min雙層優(yōu)化問題Trick:在生成毒化樣本時(shí),使其梯度與目標(biāo)樣本一致直觀理解:讓毒化樣本和目標(biāo)樣本在訓(xùn)練過程中觸發(fā)同樣的梯度,即讓毒化樣本更像目標(biāo)樣本W(wǎng)itches‘Brew:實(shí)驗(yàn)結(jié)果Geiping,Jonas,etal.“Witches‘Brew:IndustrialScaleDataPoisoningviaGradientMatching.”

ICLR

2021.Generative

Attack(生成式攻擊)/machine-learning/gan/gan_structure對(duì)抗生成網(wǎng)絡(luò)(GAN):一次訓(xùn)練,無限使用Autoencoder-based

Generative

AttackYang,Chaofei,etal."Generativepoisoningattackmethodagainstneuralnetworks."

arXiv:1703.01340

(2017).從正常的5開始,使用直接梯度從隨機(jī)噪聲開始,使用直接梯度從正產(chǎn)的5開始,使用生成方法pGAN涉及三個(gè)模型:

D(判別器)、G(生成器)、C(分類器)對(duì)抗損失與GAN一樣:分類損失(原始數(shù)據(jù)+生成數(shù)據(jù)損失):Yang,Chaofei,etal."Generativepoisoningattackmethodagainstneuralnetworks."

arXiv:1703.01340

(2017).pGAN綠色:正常類(目標(biāo)類),正常樣本藍(lán)色:正常類,正常樣本紅色:毒化類,毒化樣本可以生成真正靠近目標(biāo)類的投毒樣本Yang,Chaofei,etal."Generativepoisoningattackmethodagainstneuralnetworks."

arXiv:1703.01340

(2017).差異化攻擊:對(duì)哪些樣本投毒更有效?Kohet

al."Strongerdatapoisoningattacksbreakdatasanitizationdefenses."

MachineLearning

111.1(2022):1-47.衡量樣本影響力的指標(biāo):

對(duì)影響大的樣本投毒Data

Poisoning:

Attacks

and

DefensesA

Brief

History

of

Data

PoisoningData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchData

Poisoning

DefenseShen,Yanyao,andSujaySanghavi.“Learningwithbadtrainingdataviaiterativetrimmedlossminimization.”

ICML

2019Robust

Learning

with

Trimmed

Loss

Loss低的是好樣本Loss高的是壞樣本讓模型盡量在Loss低的樣本上訓(xùn)練問題樣本:噪聲標(biāo)簽、系統(tǒng)噪聲、生成模型—+壞數(shù)據(jù)、后門樣本Data

Poisoning

DefenseRobust

Learning

with

Trimmed

Loss

是一個(gè)min-min問題內(nèi)部最小化:選擇低loss的樣本子集S外部最小化:在子集S上訓(xùn)練模型Shen,Yanyao,andSujaySanghavi.“Learningwithbadtrainingdataviaiterativetrimmedlossminimization.”

ICML

2019深度劃分聚合(DeepPartitionAggregation,DPA)Levine,Alexander,andSoheilFeizi.“Deeppartitionaggregation:Provabledefenseagainstgeneralpoisoningattacks.”

ICLR

2021分而治之:適用于投毒樣本比較少的情況將訓(xùn)練集劃分為k個(gè)均勻子集:在每個(gè)子集上訓(xùn)練一個(gè)基分類器:投票決策:反后門學(xué)習(xí)(Anti-Backdoor

Learning,

ABL)Li,Yige,etal.“Anti-backdoorlearning:Trainingcleanmodelsonpoisoneddata.”

NeurIPS

2021學(xué)的快的樣本不是好樣本Trainingloss

onCleansamples

(blue)VS.Poisonedexamples

(yellow)研究10種基于投毒的后門攻擊毒化樣本在訓(xùn)練初期就學(xué)完了毒化樣本的損失下降很快反后門學(xué)習(xí)(Anti-Backdoor

Learning,

ABL)先隔離再反學(xué)習(xí)ProblemFormulation

OverviewofABLLGA:

local

gradient

ascent;

GGA:

global

gradient

ascentLi,Yige,etal.“Anti-backdoorlearning:Trainingcleanmodelsonpoisoneddata.”

NeurIPS

2021Data

Poisoning:

Attacks

and

DefensesA

Brief

History

of

Data

PoisoningData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchUnlearnable

Exampleshttps://exposing.ai/megaface/互聯(lián)網(wǎng)上充斥著大量的個(gè)人數(shù)據(jù)Personal

Data

Are

Used

For

Training

Commercial

ModelsDatasetcollectedfromtheInternet:Withoutawareness[1].Trainingcommercialmodels[2].Privacyconcerns[3].[1]Prabhu&Abeba,"Largeimagedatasets:Apyrrhicwinforcomputervision?."

arXiv:2006.16923,2020.[2]HillKashmir,“TheSecretiveCompanyThatMightEndPrivacyasWeKnowIt.”NYtimes,2020.[3]Shan,Shawn,etal."Fawkes:Protectingpersonalprivacyagainstunauthorizeddeeplearningmodels.”USENIXSecuritySymposium,2020Unlearnable

ExamplesGoal:making

data

unlearnable

(unusable)

to

machine

learningModify

TrainingImages->Make

them

UselessHuang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.Adversarial

noisesaresmall,imperceptibletohumaneyes.Szegedyetal.2013,Goodfellowetal.2014Adversarial

Noise

=

Error-maximizingNoiseAdversarialExamplesfoolDNNattesttimeby

maximizingerrors.Adversarial

noise

can

mislead

ML

modelsNO

Error

to

Learn?Huang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.GeneratingError-MinimizingNoise

想要影響模型的訓(xùn)練那一定是一個(gè)雙層優(yōu)化問題Huang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.Sample-wiseNoise+=每個(gè)樣本都有一套自己的噪聲Huang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.Class-wiseNoise+=每類樣本共享一套噪聲規(guī)律?為什么這一個(gè)噪聲圖案可以讓一整類的數(shù)據(jù)沒有了錯(cuò)誤?Huang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.Comparisontheeffectofdifferentnoisesontraining:Error-Minimizingnoisecancreateunlearnableexamplesinbothsettings.ExperimentsHuang,Hanxun,etal.“Unlearnableexamples:Makingpersonaldataunexploitable.”

ICLR

2021.Isthenoisetransferabletoothermodels?YesIsthenoisetransferabletootherdatasets?YesIsthenoiserobusttodataaugmentation?YesExperimentsWhatpercentageofthedataneedstobeunlearnable?Unfortunately,

it

needs

100%

training

data

to

be

poisoned.ExperimentsHow

about

protecting

part

of

the

data

or

just

one

class?UnlearnableExampleswillnotcontributetomodeltraining.ExperimentsProtecting

FaceImagesNomorefacialrecognitions?Ifeveryonepostunlearnableimages.Conclusion&LimitationsAnewexcitingresearchproblem.UnlearnableExamples.Error-minimizingnoise.Limitationstorepresentationallearning.Limitationstoadversarialtraining

(已被ICLR2022的一篇工作解決:

Robust

Unlearnable

Examples).

Related

works:Cherepanovaetal."LowKey:LeveragingAdversarialAttackstoProtectSocialMediaUsersfromFacialRecognition."

ICLR,2021.Fowletal.“AdversarialExamplesMakeStrongPoisons.”

NeurIPS

2021.Fowletal."Preventingunauthorizeduseofproprietarydata:Poisoningforsecuredatasetrelease."

arXiv:2103.02683Radiya-DixitandTramèr."DataPoisoningWon'tSaveYouFromFacialRecognition."

arXiv:2106.14851Shanetal.“Fawkes:Protectingprivacyagainstunauthorizeddeeplearningmodels.”

USENIX

Security,

2021Unlearnable

ClustersProtection

Against

Unknown

LabellingExisting

ApproachesError-minimizingnoise使模型的訓(xùn)練損失下降到0來讓模型覺得“已經(jīng)沒有什么信息可學(xué)了”[1]。AdversarialPoisoning利用非魯棒特征[2]這個(gè)概念使模型去學(xué)習(xí)錯(cuò)誤的非魯棒特征[3]。[1]Unlearnableexamples:Makingpersonaldataunexploitable,ICLR2021.[2]AdversarialExamplesAreNotBugs,TheyAreFeatures,NeurIPS2019.[3]Adversarialexamplesmakestrongpoisons,NeurIPS2021.UniversalAdversarialPerturbation

(UAP)[1]Universaladversarialperturbations,CVPR2017.[2]ImageNetPre-trainingAlsoTransfersNon-robustness,AAAI2023.UniversalAdversarialPerturbation(UAP)是指被施加在任一圖像上之后都弄愚弄模型的一種class-wise

perturbation[1]。它既可以“覆蓋”圖像中原本的語(yǔ)義特征,還可以“獨(dú)立”地工作[2]。Unlearnable

Clusters

(UCs)在不依賴標(biāo)簽信息(分類層)的前提下,實(shí)現(xiàn)打破一致性(uniformity)和差異性(discrepancy)K-means

Initial

DisruptingDiscrepancyandUniformity

MethodologyExperimentsExperimentsExperimentsDataset:37-classPetsDataset:37-classPetsTransferable

Unlearnable

ExamplesLinear

SeparabilityAvailabilityAttacksCreateShortcuts.

Yu,Daetal.,

KDD

2022.Class-wise

perturbationsEMNt-SNE

visualization

in

input

space(32

x

32

x

3)

dim

=

3072

dimreshapeLinear

SeparabilityQuantifying

Linear

SeparabilityAvailabilityAttacksCreateShortcuts.

Yu,Daetal.,

KDD

2022.Input

x:

perturbationLabel

y:

corresponding

label

of

perturbed

imageModel:

Linear

model

and

Two-layer

Neural

NetworkLinearly

Separable

PerturbationsAvailabilityAttacksCreateShortcuts.

Yu,Daetal.,

KDD

2022.Goal:

to

show

simple

linear

separability

is

enough

for

poisoning.Syntheticnoise

(SN):horsehorsecat

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論