




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1arXivv[cs.LG]31DecarXivv[cs.LG]31Dec2022Zhu1*andWanleiZhou21*SchoolofComputerScience,UniversityofTechnologySydney,Broadway,Sydney,2007,NSW,Australia.2*SchoolofDataScience,CityUniversityofMacau,Macau,China.*Correspondingauthor(s).E-mail(s):Tianqing.Zhu@.au;Contributingauthors:Yunjiao.Lei@.au;Yuleisuiutseduauwlzhoucityuedu.mo;AbstractReinforcementlearning(RL)isoneofthemostimportantbranchesofAI.Duetoitscapacityforself-adaptionanddecision-makingindynamicenvironments,reinforcementlearninghasbeenwidelyappliedinmultipleareas,suchashealthcare,datamarkets,autonomousdriv-ing,androbotics.However,someoftheseapplicationsandsystemshavebeenshowntobevulnerabletosecurityorprivacyattacks,resultinginunreliableorunstableservices.Alargenumberofstudieshavefocusedonthesesecurityandprivacyproblemsinreinforcementlearning.However,fewsurveyshaveprovidedasystematicreviewandcomparisonofexistingproblemsandstate-of-the-artsolutionstokeepupwiththepaceofemergingthreats.Accordingly,wehereinpresentsuchacomprehensivereviewtoexplainandsummarizethechallengesassociatedwithsecurityandprivacyinreinforcementlearningfromanewperspective,namelythatoftheMarkovDecisionProcess(MDP).Inthissurvey,we?rstintroducethekeyconceptsrelatedtothisarea.Next,wecoverthesecurityandprivacyissueslinkedtothestate,action,environment,andrewardfunctionoftheMDPprocess,respectively.Wefurtherhighlightthespecialcharacteristicsofsecurityandprivacymethodologiesrelatedtoreinforcementlearning.Finally,wediscussthepossiblefutureresearchdirectionswithinthisarea.SpringerNature2021LATEXtemplate2NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyKeywords:ReinforcementLearning,Security,PrivacyPreservation,MarkovDecisionProcess,Multi-agentSystem1IntroductionReinforcementlearning(RL)isoneofthemostimportantbranchesofAI.Duetoitsstrongcapacityforself-adaptation,reinforcementlearninghasbeenwidelyappliedinmultipleareas,includinghealthcare[1],?nancialmar-kets[2],mobileedgecomputing(MEC)[3,4]androbotics[5].Reinforcementlearningisconsideredtobeaformofadaptive(orapproximate)dynamicpro-gramming[6]andhasachievedoutstandingperformanceinsolvingcomplexsequentialdecision-makingproblems.Reinforcementlearning’sstrongperfor-mancehasledtoitsimplementationanddeploymentacrossabroadrangeof?eldsinrecentyears,suchastheInternetofthings(IoT)[7],recommendsys-tems[8],healthcare[9],robotics[10],?nance[11],self-drivingcars[12],andsmartgrids[13],andsoon.Unlikeothermachinelearningtechniques,Rein-forcementlearninghasastrongabilitytolearnbytrialanderrorindynamicandcomplexenvironments.Inparticular,itcanlearnfromtheenvironmentwhichhasminimuminformationabouttheparameterstobelearned[14],andcanasamethodtoaddressoptimalproblems[15,16].Inthereinforcementlearningcontext,anagentcanbeviewedasaself-contained,concurrentlyexecutingthreadofcontrol[17].Itcaninteractwiththeenvironmentandobtainastateoftheenvironmentasinput.Thestateoftheenvironmentcanbethesituationsurroundingtheagent’slocation.Taketheroadconditionsinanautonomousdrivingscenarioasanexample.In?gure1,thegreenvehicleisanagent,andalltheobjectsarounditcanberegardedastheenvironment;thus,theenvironmentcomprisestheroad,thetra?csigns,othercars,etc.Basedonthestateoftheenvironment,theagentchoosesanactionasoutput.Next,theactionchangesthestateoftheenvironment,andtheagentwillreceiveascalarsignalthatcanberegardedasanindicatorofthevalueforthestatetransitionfromtheenvironment.Thisscalarsignalisalwaysrepresentedasareward.Theagent’spurposeistolearnanoptimalpolicyovertimebytrialanderrorinordertogainamaximalaccumulatedrewardasreinforcement.Inaddition,thecombinationofdeeplearningandreinforcementlearningfurtherenhancestheabilityofreinforcementlearning[18].1.1ReinforcementlearningsecurityandprivacyissuesHowever,reinforcementlearningisweaktosecurityattacks.Itistenderforattackerstoleveragethebreachabledatasource[19].Forexample,datapoi-soningattacks[20]andadversarialperturbations[21]areverypopularexistingproposedoverthepastfewyearstoaddressthesesecurityconcerns.SomeresearchershavefocusedonprotectingthemodelfromattacksandensuringSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy3Fig.1Anautonomousdrivingscenario.Thegreencarisanagent.theenvironmentcomprisestheroad,thetra?csigns,othercars,etc.thatthemodelstillperformswellwhileunderattack.Theaimistomakesurethemodeltakessafeactionsthatareexactlyknown,ortogetoptimalpolicyunderworsesituations,suchasbyusingadversarialtraining[22].Figure2presentsanexampleofsecurityattacksinreinforcementlearn-inginanautonomousdrivingscenario.Anautonomouscarisdrivingontheroadandobservingitsenvironmentthroughsensors.Tokeepsafewhiledriv-ingautonomously,itwillcontinuallyadjustitsbehaviorbasedontheroadconditions.Inthiscase,anattackermayfocusonin?uencingtheautonomousdrivingconditions.Forexample,ataparticulartime,theoptimalactionforthecartotakeistogostraight;however,anactionattackmaydirectlyin?uencetheagenttoturnright(theattackmayalsoimpactthevalueofthereward).Withregardtoenvironmentalin?uencingattacks,theattackermayconceiveorfalselyinsertacarintherightfrontoftheenvironment,andthisdisturbingmaymisleadtheautonomouscarintotakingawrongaction.Asforrewardattacks,rivalsmaytrytochangethevalueofthereward(e.g.,from+1to-1)andtherebyimpactthepolicyoftheautonomouscar.Moreover,reinforcementlearningalsohasbeensubjecttoprivacyattacksduetoitsweaknessesthatcanbeleveragedbyattackers.Establishedsamplesusedinreinforcementlearningcontainthelearningagent’sprivateinforma-tion,whichisvulnerabletoawidevarietyofattacks.Forexample,indiseasetreatmentapplicationswithreinforcementlearning[1],real-timehealthdataisrequired,andtoachieveanaccuratedosageofmedicine,theinformationisalwayscollectedandtransmittedinplaintext.Thismaycausedisclosureofusers’privateinformation;consequently,thereinforcementlearningsystemmaycollectdatafrompublicresources.Mostcollecteddatasetscontainpri-vateorsensitiveinformationthathasahighprobabilityofbeingdisclosed[23].Moreover,reinforcementlearningmayalsorequiredatasharing[24]andneedstotransmitinformationduringthesharingprocess.Thus,attacksonnetworklinkscanalsobesuccessfulinareinforcementlearningcontext.Furthermore,cloudcomputing,whichisalwaysusedforreinforcementlearningcomputationSpringerNature2021LATEXtemplate4NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyFig.2Asimpleexampleofasecurityattackinreinforcementlearninginthecontextofautomaticdriving.Anactionattack,environmentalattackandrewardattackareshownrespectively.Anactionattackworksbyin?uencingthechoiceofactiondirectly,suchasbytemptingtheagenttotaketheaction“turnright”ratherthantheoptimalaction“gostraight”.Environmentalattacksattempttochangetheagent’sperceptionoftheenviron-mentsoastomisleaditintotakinganincorrectaction.Finally,therewardattackworksbychangingthevalueofarewardgivenforaspeci?cactioninastate.andstoragehasinherentvulnerabilitiestocertainattacks[25].Ratherthanchangingora?ectingthemodel,theattackersmaychoosetofocusonobtainingorinferringtheprivacydata;forexample,Panetal.[26]inferredinformationaboutthesurroundingenvironmentbasedonthetransitionmatrix.Themainapproachestodefendingprivacyandsecurityinthereinforcementlearningcontextincludeencryptiontechnology[27]andinformation-hidingtechniques,suchasdi?erentialprivacy[28].Inaddition,somearti?cialalgo-atedlearning(FL)whichcanpreserveprivacyforthelearningmechanismandstructure.Yuetal.[30]adoptfederatedlearning(FL)intoadeepreinforce-mentlearningmodelinadistributedmanner,withthegoalofprotectingdataprivacyforedgedevices.1.2OutlineandSurveyOverviewAsanincreasingnumberofsecurityandprivacyissuesinreinforcementlearn-ingemerge,itismeaningfultoanalyzeandcompareexistingstudiestohelpsparkideasabouthowsecurityandprivacymightbeimprovedinfutureinthisspeci?c?eld.Overrecentyears,severalsurveysonthesecurityandprivacyofreinforcementlearninghavebeencompleted:(1)Chenetal.[31]reviewedtheresearchrelatedtoreinforcementlearningfromtheperspectiveofarti?cialintelligencesecurityaboutadversarialattacksanddefence.Theauthorsanalysedthecharacteristicsofadversarialattackiesrespectively(2)Luongetal.[32]presentedaliteraturereviewonapplicationsofdeepreinforcementlearningincommunicationsandnetworking;SuchastheInternetofThings(IoT).Theauthorsdiscusseddeepreinforcementlearningapproachesproposedaboutissuesincommunicationsandnetworking,whichSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy5includedynamicnetworkaccess,dataratecontrol,wirelesscaching,datao?oading,networksecurity,andconnectivitypreservation.(3)Anothersurveypaper[14]conductedaliteraturereviewonsecuringIoTdevicesusingreinforcementlearning.Thispaperpresenteddi?erenttypesofcyber-attacksagainstdi?erentIoTsystemsanddiscussedsecuritysolutionsbasedonreinforcementlearningagainsttheseattacks.(4)Wuetal.[33]surveyedthesecurityandprivacyrisksofthekeycompo-nentsofablockchainfromtheperspectiveofmachinelearning,andhelptoabetterunderstandingofthesemethodsinthecontextofIIoT.Chenetal.[34]alsoexploreddeepreinforcementlearninginthecontextofIoT.Ourworkdi?ersfromtheaboveworks.However,theworksmentionedaboveareallfocusedontheIoTorcommu-nicationnetworks.Theyareabouttheapplicationofreinforcementlearning.Veryfewexistingsurveyshavecomprehensivelypresentedthesecurityandprivacyissuesinreinforcementlearningratherthantheapplication.Someofthemconcentrateontheattackand/ordefensemethods.However,theyarejustanalysingthewholein?uence.Accordingly,inthispaper,wehighlighttheobjectsthattheattacksaimatandprovideacomprehensivereviewofthekeymethodsusedtoattackanddefendtheseobjects.Themaincontributionsofoursurveycanbesummarizedasfollows:●ThesurveyorganizestherelevantexistingstudiesfromanovelanglethatisbasedonthecomponentsoftheMarkovdecisionprocess(MDP).WeclassifycurrentresearchesonattacksanddefencesbasedontheirobjectsinMDP.Thisprovidesanewperspectivethatenablesfocusingonthetargetofthemethodsacrosstheentirelearningprocess.●Thesurveyprovidesaclearaccountoftheimpactcausedbythetargetedobjects.TheseobjectsarecomponentsinMDPthatarerelatedtoeachotherandmayexistinthesametimeor/andspace.AdoptingthisapproachenablesustofollowtheMDPtocomprehendtherelevantobjectsandtherelationshipsbetweenthem●Thesurveycomparesthemainmethodsofattackingordefendingthecom-ponentsofMDP,andtherebyshedssomelightsontheadvantagesanddisadvantagesofthesemethods.Theremainderofthispaperisstructuredasfollows.We?rstpresentpre-liminaryconceptsinreinforcementlearningsystemsinSection2.WethenoutlinethesecurityandprivacychallengesinreinforcementlearninginSection3.Next,wepresentfurtherdetailsonsecurityinreinforcementlearninginSection4,followedbyanoverviewofprivacyinreinforcementlearninginSection5.Wefurtherdiscussthesecurityandprivacyinreinforcementlearn-ingapplicationsinsection6.Finally,Sections7and8presentouravenuesfordiscussionandfutureworkandconclusionrespectively.SpringerNature2021LATEXtemplate6NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy2Preliminary2.1NotationTable1liststhenotationsusedinthisarticle.RLisreinforcementlearning,andDRLisdeepreinforcementlearning.MDPstandsfortheMarkovDecisionProcess,whichiswidelyusedinreinforcementlearning.MDPcanbedenotedbyatuple(S,A,T,r,γ),whichismadeupoftheagentactionspaceA,theenvironmentstatespaceS,therewardfunctionr,thetransitionmatrixT,andadiscountfactorγe[0,1).Thetransitionmatrixisaprobabilitymappingfromstate-actionpairstostatesT:(SxA)xS→[0,1].Theagent’spurposeisto?ndanoptimalpolicythatcanmapenvironmentstatestoagentactionstomaximizelong-termreward.vπ(s)andQπ(s,a)arethestateandaction-statevalues,whichcanregardasameansofevaluatingthepolicy.Table1Themainnotationsthroughthepaper.notationsmeaningRLReinforcementlearningDRLDeepreinforcementlearningMDPMarkovdecisionprocessATheactionspaceoftheagentSThestatespaceoftheenvironmentTThetransitionmatrixrTherewardfunctionγAdiscountfactorwhichiswithintherange(0,1)πPolicyv┐(s)StatevalueQ┐(s,a)Action-statevalue2.2ReinforcementlearningThereinforcementlearningmodelcontainstheenvironmentstatesS,theagentactionsA,andscalarreinforcementsignalsthatcanberegardedasrewardsr.Alltheelementsandtheenvironmentcanbeconceptualizedasawholesystem.Atstept,whenanagentinteractswiththeenvironment,itcanreceiveastateoftheenvironmentstasinput.Basedonthestateoftheenvironmentst,theagentchoosesanactionatusingthepolicyπasoutput.Next,theactionchangesthestateoftheenvironmenttost+1.Atthesametime,theagentwillobtainarewardrtfromtheenvironment.Thisrewardisascalarsignalthatcanberegardedasanindicatorofthevalueforthestatetransition.Inthisprocess,theagentlearnsapieceofknowledge,whichmayberecordedasst,at,rt,st+1inaQtable.Qtablehascalculatedthemaximumosethebestactionateachstate.Inthenextstep,theupdatedst+1andrt+1willbesenttotheagentagain.Theagent’spurposeistolearnanoptimalpolicyπsoastogainthehighestpossibleaccumulatedrewardr.ToarriveattheSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy7optimalpolicyπ,theagentcantrainbyapplyingatrialanderrorapproachoverthelong-termepisodes.AMarkovDecisionProcess(MDP)withdelayedrewardsisusedtohan-dlereinforcementlearningproblems,suchthatMDPisakeyformalisminreinforcementlearning.Fig.3TheinteractionbetweenagentandenvironmentwithMDP.Theagentinteractswiththeenvironmenttogainknowledge,whichmayberecordedasatableoraneuralnetworkmodel(inDRL),andthentakesanactionthatwillreacttotheenvironmentstate.Iftheenvironmentmodelisgiven,twosimpleiterativealgorithmscanbechosentoarriveatanoptimalmodelintheMDPcontext:namely,valueiter-ation[35]andpolicyiteration[36].Whentheinformationofthemodelisnotknowninadvance,theagentneedstolearnfromtheenvironmenttoobtainthisdatabasedonanappropriatealgorithm,whichisusuallyakindofstatisticalalgorithm.AdaptiveHeuristicCriticandTD(λ),whichisapolicyiterationmechanism,wereusedintheearlystagesofreinforcementlearningtolearnanoptimalpolicywithsamplesfromtherealworld[37].Subsequently,theQ-learningalgorithmincreasedinpopularity[38,39]andisnowalsoaveryimportantalgorithminreinforcementlearning.TheQ-learningalgorithmisalsoaniterativeapproachusedtoselectanactionwithamaximumQvalue,whichisanevaluationvalue,inordertoensurethatthechosenpolicyisopti-mal.Moreover,duetoitsabilitytodealwithhigh-dimensionaldataandtoapproximatethefunction,deeplearninghasbeencombinedwithreinforce-mentlearningtocreatethe?eldof“deepreinforcementlearning”(DRL)[40].Thiscombinationhasledtosigni?cantachievementsinseveral?elds,suchaslearningfromvisualperceptual[18]androbotics[41].AnexampleofreinforcementlearningispresentedinFigure4.The?guredepictsarobotsearchingforanobjectintheGridWorldenvironment.Theredcirclerepresentsthetargetobject,thegreyboxesdenotetheobstacles,andthewhiteboxesdenotetheroad.Therobot’spurposeisto?ndaroutetotheredcircle.Ateachstep,therobothasfourchoicesofaction:walkingup,down,leftandright.Inthebeginning,theagentreceivesinformationfromtheenvironmentwhichmaybeobtainedthroughsensorssuchasradarorcamera.SpringerNature2021LATEXtemplate8NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyTheagentthenchoosesanactionandreceivesacorrespondingreward.Inthepositionshowninthe?gure,choosingtheactionofup,leftorright,mayresultinalowerreward,asthereareobstaclesinthesethreedirections.However,takingtheactionofmovingdownwillresultinahigherreward,asitwillbringtheagentclosertoitsgoal.Fig.4Asimpleexampleofreinforcementlearning,inwhicharobottriesto?ndanobjectintheGridWorldenvironment.Thebluerobotcanbeseenastheagentinreinforcementlearning.Theredcircleisthetargetobject.Thegreyboxesdenotetheobstacles,whilethewhiteboxesdenotetheroad.Therobot’spurposeisto?ndaroutetotheredcircle.2.3MarkovDecisionProcess(MDP)TheMarkovdecisionprocess(MDP)isaframeworkusedtomodeldecisionsinanenvironment[42].Fromtheperspectiveofreinforcementlearning,MDPisanapproachwhichhasadelayedreward.InMDP,thestatetransitionsarenotrelatedtoanypreviousenvironmentstatesoragentactions.Thatistosay,thenextstateisindependentofthepreviousstatesandbasedonthecurrentenvironmentstate.MDPcanbedenotedasthetuple(S,A,T,r,γ),whichismadeupoftheagentactionspaceA,theenvironmentstatespaceS,therewardfunctionr,thetransitionmatrixT,andadiscountfactorγe[0,1).Thetransitionmatrixcanbede?nedasaprobabilitymappingfromstate-actionpairstostatesT:(SxA)xS→[0,1].Theagent’spurposeisto?ndanoptimalpolicyπthatcanmapenvironmentstatestoagentactionsinawaythatmaximizesitslong-termreward.Thediscountfactorγisappliedtotheaccumulatedrewardtodiscountfuturerewards.Inmanycases,thegoalofareinforcementlearningalgorithmwithMDPistomaximizetheexpecteddiscountedcumulativereward.Attimestept,wedenotetheenvironmentstate,agentaction,andrewardbyst,atandrtrespectively.Moreover,weusevπ(s)andQπ(s,a)toevaluateSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy9thestateandaction-statevalue.Thestatevaluefunctioncanbeexpressedasfollows:Vπ(s)=Eπ┌γkrt+k+1Ist=s,π┐Theaction-statevaluefunctionisasfollows:Qπ(s,a)=Eπ┌γkrt+k+1Ist=s,at=a,π┐(1)(2)whereγisthediscountfactorandrt+k+1istherewardoft+k+1step.Inawidevarietyofworks,Q-learningwasthemostpopulariterationmethodappliedtodiscountedin?nite-horizonMDPs.2.4DeepreinforcementlearningInsomecases,reinforcementlearning?ndsitdi?culttodealwithhigh-dimensionaldata,suchasvisualinformation.Deeplearningenablesreinforce-mentlearningtoaddresstheseproblems.Deeplearningisatypeofmachinelearningthatcanuselow-dimensionalfeaturestorepresenthigh-dimensionaldatathroughtheapplicationofamulti-layerArti?cialNeuralNetwork(ANN).Consequently,itcanworkwithhigh-dimensionaldatain?eldssuchasimageandnaturallanguageprocessing.Moreover,deepreinforcementlearning(DRL)combinesreinforcementlearningwithdeepneuralnetworks,therebyenablingreinforcementlearningtolearnfromhigh-dimensionalsituations.Hence,DRLcanlearndirectlyfromraw,high-dimensionaldata,andcanaccordinglyacquiretheabilitytounderstandthevisualworld.Moreover,DRLalsohasapowerfulfunctionapproximationcapacity,whichalsoemploysdeepneuralnetworkstotrainapproximatefunctionsinreinforcementlearning;forexam-ple,toproducetheapproximatefunctionofaction-statevalueQπ(s,a)andpolicyTheprocessofDRLisnearlythesameasthatofreinforcementlearning.Theagent’spurposeisalsotoobtainanoptimalpolicythatcanmapenvi-ronmentstatestoagentactionsinawaythatmaximizeslong-termreward.Themaindi?erencebetweentheDRLandreinforcementlearningprocessesliesintheQtable.AsshowninFigure3,inreinforcementlearning,thistablemaybeaformthatrecordsthemapfromstatetoaction;bycontrast,indeepreinforcementlearning,aneuralnetworkistypicallyusedtorepresenttheQtable.3SecurityandprivacychallengesinreinforcementlearningInthissection,wewillbrie?ydiscusssomerepresentativeattacksthatcausesecurityandprivacyissuesinreinforcementlearning.Inmoredetail,weSpringerNature2021LATEXtemplate10NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyexploredi?erenttypesofsecurityattacks(speci?cally,adversarialandpoi-soningattacks)andprivacyattacks(speci?cally,geneticalgorithm(GA)andinversereinforcementlearning(IRL)).Moreover,somerepresentativedefencemethodswillalsobediscussed(speci?cally,di?erentialprivacy,cryptography,andadversariallearning).Wefurtherpresentthetaxonomybasedonthecom-ponentsofMDPinthissection,alongwiththerelationshipsandimpactsamongthesecomponentsinreinforcementlearning.3.1Attackmethodology3.1.1SecurityattacksInthispart,wediscusssecurityattacksdesignedtoin?uenceorevendestroythereinforcementlearningmodelinthereinforcementlearningcontext.Specif-ically,webrie?yintroducesomerecentlyproposedattackmethodsdevelopedforthispurpose.Oneofthepopularmeaningsoftheterm”securityattack”isanadversar-ialattackwithadversarialexamples[43,44].Thecommonformofadversarialexamplesinvolvesaddingimperceptibleperturbationstodatawithapre-de?nedgoal;theseperturbationscandeceivethesystemintomakingmistakesthatcausemalfunctions,orpreventitfrommakingoptimaldecisions.Becausereinforcementlearninggathersexamplesdynamicallythroughoutthetrain-ingprocess,attackerscandirectlyaddimperceptibleperturbationstostates,environmentinformation,andrewards,allofwhichmayin?uencetheagentduringreinforcementlearningtraining.Forexample,considertheadditionoftinyperturbationstostatesinordertoproduces+6[40,45](6istheaddedperturbation).Eventhissmallchangemaya?ectthefollowingreinforcementlearningprocess.Attackersdeterminewhereandwhentoaddperturbations,andwhatperturbationstoadd,inordertomaximizethee?ectivenessoftheirattack.Manyalgorithmsthataddadversarialperturbationshavebeenproposed.Examplesincludethefastgradientsignmethod(FGSM),whichcancalculateadversarialexamples,thestrategically-timedattack,whichfocusesonselectingthetimestepofadversarialattacks,andenchantingattack(EA),whichcanmisleadtheagentregardingtheexpectedstatethroughaseriesofcraftedadversarialexamples.Moreover,defensestoadversarialexampleshavealsobeenstudied.Themostrepresentativemethodisadversarialtraining[46],whichtrainsagentsunderadversarialexamplesandtherebyimprovesmodelrobustness.Otherdefensivemethodsfocusonmodifyingtheobjectivefunction,suchasbyaddingtermstothefunctionoradoptingadynamicactivationfunction.Anothercommontypeofsecurityattackisthepoisoningattack,whichfocusesonmanipulatingtheperformanceofamodelbyinsertingmaliciouslycrafted”poisondata”intothetrainingexamples.Apoisoningattackisoftenselectedwhenanattackerhasnoabilitytomodifythetrainingdataitself;instead,theattackeraddsexamplestothetrainingset,andthoseexamplesSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy11canalsoworkattesttime.Attacksbasedonapoisonedtraining
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 規(guī)范的采血流程
- 安徽省A10聯盟2024-2025學年高二下學期3月階段考試 數學試題(人教A版)D卷【含答案】
- 江蘇省江陰初級中學2024-2025學年高三下學期八校聯考數學試題含解析
- 曲靖醫(yī)學高等??茖W?!都b箱與多式聯運2》2023-2024學年第二學期期末試卷
- 山東省臨沂市臨沭縣一中2025年高三高中數學試題競賽模擬(二)數學試題含解析
- 錫林郭勒職業(yè)學院《環(huán)境科學專業(yè)英語》2023-2024學年第二學期期末試卷
- 汪清縣2024-2025學年三年級數學第二學期期末統考試題含解析
- 山東服裝職業(yè)學院《數學模型建立》2023-2024學年第一學期期末試卷
- 江西衛(wèi)生職業(yè)學院《急救醫(yī)學》2023-2024學年第一學期期末試卷
- 曲阜師范大學《景觀設計與規(guī)劃》2023-2024學年第二學期期末試卷
- 5G-Advanced 網絡技術演進白皮書
- 【H公司員工培訓的現狀、問題和對策探析(含問卷)13000字(論文)】
- 新疆建設項目交通影響評價技術標準
- 債權轉讓項目合同范本
- 安徽省合肥市瑤海區(qū)部分學校2023-2024學年英語八下期末統考模擬試題含答案
- 水電站砂石加工系統封閉施工方案
- 三年級下冊《春天的歌》作業(yè)設計
- 2024輸送機械檢查與維護規(guī)范第1部分帶式輸送機
- 勞務班組施工合同范本(2024版)
- 個人代持協議書
- 人音版小學六年級下冊音樂教案
評論
0/150
提交評論