數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第1頁(yè)
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第2頁(yè)
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第3頁(yè)
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第4頁(yè)
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第5頁(yè)
已閱讀5頁(yè),還剩55頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Data

Extraction

and

Model

Stealing姜育剛,馬興軍,吳祖煊Recap:

week

7A

Brief

History

of

Backdoor

LearningBackdoor

AttacksBackdoor

DefensesFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchData

Extraction

Attack通過(guò)模型逆向得到訓(xùn)練數(shù)據(jù):8001/dss/imageClassify

TerminologyThe

following

terms

describe

the

same

thing:Data

Extraction

AttackData

Stealing

AttackTraining

Data

Extraction

AttackModel

Memorization

AttackModel

Inversion

AttackSecurity

ThreatsMysocialsecuritynumberis078-Personal

Info

LeakageSensitive

Info

LeakageThreats

to

National

SecurityIllegal

Data

Trading…Memorization

of

DNNsEvidence

1:

DNN

learns

different

levels

of

representationsMemorization

of

DNNsEvidence

2:

DNN

can

memorize

random

labels/pixels真實(shí)標(biāo)簽隨機(jī)標(biāo)簽亂序像素隨機(jī)像素高斯噪聲Zhang,Chiyuan,etal.“Understandingdeeplearningrequiresrethinkinggeneralization.”ICLR

2017.Memorization

of

DNNsEvidence

3:

The

success

of

GANs

and

diffusion

models/;

/

Intended

vs.

Unintended

MemorizationIntended

MemorizationTask-relatedStatisticsInputs

and

LabelsArpitetal.“Acloserlookatmemorizationindeepnetworks.”

ICML,2017.

Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.第一層Filter正常CIFAR-10第一層Filter隨機(jī)標(biāo)注CIFAR-10自然語(yǔ)言翻譯模型記憶:“我的社保號(hào)碼是xxxx”Unintended

MemorizationTask-irrelevant

but

memorizedEven

appear

only

a

few

times出現(xiàn)4次就能全記住現(xiàn)有數(shù)據(jù)竊取攻擊黑盒竊取主動(dòng)測(cè)試:煤礦里的金絲雀“隨機(jī)號(hào)碼為****”“我的社保號(hào)碼為****”主動(dòng)注入,然后先兆數(shù)據(jù)在語(yǔ)言模型中的“曝光度”(Exposure)Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.意外記憶測(cè)試和量化:’先兆’黑盒竊取針對(duì)通用語(yǔ)言模型:逆向出大量的:名字、手機(jī)號(hào)、郵箱、社保號(hào)等大模型比小模型更容易記住這些信息即使只在一個(gè)文檔里出現(xiàn)也能被記住Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.訓(xùn)練數(shù)據(jù)萃取攻擊Training

Data

Extraction

AttackDefinition

of

MemorizationCarlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.模型知識(shí)提取k-逼真記憶攻擊步驟Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.步驟1:生成大量文本;步驟2:文本篩選和確認(rèn)實(shí)驗(yàn)結(jié)果604條“意外”記憶只在一個(gè)文檔里出現(xiàn)的記憶模型越大記憶越強(qiáng)Memorization

ofDiffusion

Models美國(guó)馬里蘭大學(xué)和紐約大學(xué)聯(lián)合研究發(fā)現(xiàn),生成擴(kuò)散模型會(huì)記憶原始訓(xùn)練數(shù)據(jù),導(dǎo)致在特定文本提示下,泄露原始數(shù)據(jù)生成的:原始的:Memorization

ofDiffusion

ModelsDefinition

of

Replication:Wesaythatageneratedimagehasreplicatedcontentifitcontainsanobject(eitherintheforegroundorbackground)thatappearsidenticallyinatrainingimage,neglectingminorvariationsinappearancethatcouldresultfromdataaugmentation.Somepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.OriginalSegmixDiagonal

OutpaintingPatch

OutpaintingCreate

Synthetic

and

Real

DatasetsExisting

image

retrieval

datasets:OxfordParisINSTREGPR1200Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Train

Image

Retrieval

ModelsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Similarity

metric:

inner

product

token-wise

inner

productDiffusion

model:

DDPMDataset:

Celeb-AThe

top-2

matches

of

diffusion

models

trained

on

300,

3000,

and

30000

images

(the

full

set

is

30000).Results:Green:

copyBlue:

close

but

no

exact

copyOthers:

similar

but

not

the

sameMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Gen-train

vs

train-train

similarity

score

distribution數(shù)據(jù)越少Copy越多Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Many

close

copy

but

no

exact

match

(similarity

score

<0.65)Case

study:

ImageNet

LDMMost

similar:

theatercurtain,peacock,andbananasLeast

similar:

sealion,bee,andswingMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionLAIONAestheticsv26+:

12M

imagesRandom

select

9000

images

as

source

and

use

their

captions

to

promptMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionSome

keywords

(those

in

red)

are

associated

with

certain

fixed

patterns.

Key

wordsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionStyle

copying

using

text

prompt:

<Name

of

the

painting>

by

<name

of

the

artist>Memorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).PretrainingdatadetectionMIN-K%PROBMemorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).Detection

on

WIKIMIAA

dynamic

benchmark:

WIKIMIA白盒竊取白盒竊取需要利用梯度信息,也稱梯度逆向攻擊(Gradient

Inversion

Attack)針對(duì)梯度共享的訓(xùn)練:分布式訓(xùn)練聯(lián)邦學(xué)習(xí)并行訓(xùn)練無(wú)中心化訓(xùn)練兩種分布式訓(xùn)練范式白盒竊取白盒竊取需要利用梯度信息,也稱梯度逆向攻擊(Gradient

Inversion

Attack)Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.迭代逆向(逐層)遞歸逆向逼近反推白盒竊?。旱嫦虻嫦颍和ㄟ^(guò)構(gòu)造數(shù)據(jù)來(lái)接近真實(shí)梯度真實(shí)梯度,假設(shè)已知一次前傳兩次后傳生成數(shù)據(jù)產(chǎn)生的梯度

Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。旱嫦蛞延泄ぷ鲄R總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。哼f歸逆向遞歸逆向:基于真實(shí)梯度追層逆向推導(dǎo)關(guān)鍵點(diǎn):圖像大?。?2x32)Batch大小(大多為1)模型大小真實(shí)梯度,已知Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。哼f歸逆向已有工作匯總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒防御已有工作匯總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.This

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchAI模型訓(xùn)練代價(jià)高昂BERTGoogle$160萬(wàn)大規(guī)模、高性能的AI模型訓(xùn)練耗費(fèi)巨大數(shù)據(jù)資源計(jì)算資源人力資源模型竊取的動(dòng)機(jī)巨大的商業(yè)價(jià)值盡量保持模型性能不希望被發(fā)現(xiàn)寶貴的AI模型模型竊取為其所用模型竊取的方式輸入輸出模型微調(diào)模型剪枝竊取攻擊StealingmachinelearningmodelsviapredictionAPIs,

USENIXSecurity,

2016;

Practicalblack-boxattacksagainstmachinelearning,

ASIACCS,

2017;

Knockoffnets:Stealingfunctionalityofblack-boxmodels,

CVPR,

2019;

Maze:Data-free

modelstealing

attackusingzeroth-ordergradientestimation,

CVPR,

2021;基于方程式求解的攻擊攻擊思路示例基于方程式求解的攻擊Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.100%竊取某些商業(yè)模型所需的查詢數(shù)和時(shí)間基于方程式求解的攻擊:竊取參數(shù)攻擊算法參數(shù)個(gè)數(shù)為d通過(guò)d+1個(gè)輸入,構(gòu)造d+1個(gè)下列方程

主要特點(diǎn):針對(duì)傳統(tǒng)機(jī)器學(xué)習(xí)模型:SVM、LR、DT可精確求解,需要模型返回精確的置信度竊取得到的模型還可能泄露訓(xùn)練數(shù)據(jù)(數(shù)據(jù)逆向攻擊)Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.基于方程式求解的攻擊:竊取超參Wang,Binghui,andNeilZhenqiangGong."Stealinghyperparametersinmachinelearning."

S&P,2018.攻擊思想:模型訓(xùn)練完了的狀態(tài)應(yīng)該是Loss梯度為0

基于替代模型的攻擊Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.攻擊思想:在查詢目標(biāo)模型的過(guò)程中訓(xùn)練一個(gè)替代模型模擬其行為基于替代模型的攻擊Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.Knockoff

Nets攻擊:“仿冒網(wǎng)絡(luò)”基于替代模型的攻擊Knockoff

Nets攻擊:攻擊流程采樣大量查詢樣本訓(xùn)練替代模型強(qiáng)化學(xué)習(xí),學(xué)習(xí)如何高效選擇樣本Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高準(zhǔn)確(accuracy)vs高保真(fidelity)竊取攻擊藍(lán)色:目標(biāo)決策邊界橙色:高準(zhǔn)確竊取綠色:高保真竊取基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高準(zhǔn)確(accuracy)vs高保真(fidelity)竊取攻擊目標(biāo)模型(黑盒)查詢圖片替代模型模型輸出作為標(biāo)簽指導(dǎo)替代模型訓(xùn)練

概率輸出類別輸出基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.功能等同竊取FunctionallyEquivalentExtraction攻擊步驟:尋找在某個(gè)Neuron上,讓ReLU=0的關(guān)鍵點(diǎn)在關(guān)鍵點(diǎn)兩側(cè)探索邊界,確定對(duì)應(yīng)權(quán)重只能竊取兩層網(wǎng)絡(luò)基于替代模型的攻擊Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.加密分析竊取CryptanalyticExtraction思想:ReLU的二級(jí)導(dǎo)為0

&有限差分(finite

difference)ReLU=0基于替代模型的攻擊加密分析竊取CryptanalyticExtraction竊取0-deep神經(jīng)網(wǎng)絡(luò):竊取1-deep神經(jīng)網(wǎng)絡(luò):Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.基于替代模型的攻擊Yuan,Xiaoyong,eta

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論