2024人工智能AI技術(shù)教程_第1頁
2024人工智能AI技術(shù)教程_第2頁
2024人工智能AI技術(shù)教程_第3頁
2024人工智能AI技術(shù)教程_第4頁
2024人工智能AI技術(shù)教程_第5頁
已閱讀5頁,還剩47頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

2024人工智能AI技術(shù)教程

課程

講義名稱

備注

1

課程介紹

Overviewandsystem/AIbasics

2

人工智能系統(tǒng)概述

SystemperspectiveofSystemforAI

SystemforAI:ahistoricview;Fundamentalsofneuralnetworks;FundamentalsofSystemforAI

3

深度神經(jīng)網(wǎng)絡(luò)計(jì)算框架基礎(chǔ)

ComputationframeworksforDNN

BackpropandAD,Tensor,DAG,ExecutiongraphPapersandsystems:PyTorch,TensorFlow

4

矩陣運(yùn)算與計(jì)算機(jī)體系結(jié)構(gòu)

ComputerarchitectureforMatrixcomputation

Matrixcomputation,CPU/SIMD,GPGPU,ASIC/TPU

Papersandsystems:Blas,TPU

5

分布式訓(xùn)練算法

Distributedtrainingalgorithms

Dataparallelism,modelparallelism,distributedSGDPapersandsystems:

6

分布式訓(xùn)練系統(tǒng)

Distributedtrainingsystems

MPI,parameterservers,all-reduce,RDMAPapersandsystems:Horovod

7

異構(gòu)計(jì)算集群調(diào)度與資源管理系統(tǒng)

Schedulingandresourcemanagementsystem

RunningDNNjoboncluster:container,resourceallocation,schedulingPapersandsystems:KubeFlow,OpenPAI,Gandiva,HiveD

8

深度學(xué)習(xí)推導(dǎo)系統(tǒng)

Inferencesystems

Efficiency,latency,throughput,anddeployment

課程

講義名稱

備注

9

計(jì)算圖編譯優(yōu)化

Computationgraphcompilationandoptimization

IR,sub-graphpatternmatch,Matrixmultiplicationandmemoryoptimization

Papersandsystems:XLA,MLIR,TVM,NNFusion

10

模型壓縮和稀疏化處理

Efficiencyviacompressionandsparsity

Modelcompression,SparsityPruning

11

自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)

AutoMLsystems

Hyperparametertuning,NAS

Papersandsystems:Hyperband,SMAC,ENAS,AutoKeras,NNI

12

強(qiáng)化學(xué)習(xí)系統(tǒng)

Reinforcementlearningsystems

TheoryofRL,systemsforRL

Papersandsystems:AC3,RLlib,AlphaZero

13

模型安全與隱私保護(hù)

SecurityandPrivacy

Federatedlearning,security,privacyPapersandsystems:DeepFake

14

用AI技術(shù)優(yōu)化計(jì)算機(jī)系統(tǒng)

AIforsystems

AIfortraditionalsystemsproblems,forsystemalgorithms

Papersandsystems:LearnedIndexes,Learnedquerypath

課程

講義名稱

備注

Lab1(forweek1,2)

框架及工具入門示例

Asimplethroughoutend-to-endAIexample,froma

systemperspective

Understandthesystemsfromdebuggerinfoand

systemlogs

Lab2(forweek3)

定制一個(gè)新的張量運(yùn)算

Customizeoperators

Designandimplementacustomizedoperator(bothforwardandbackward):inpython

Lab3(forweek4)

CUDA實(shí)現(xiàn)和優(yōu)化CUDAimplementation

AddaCUDAimplementationforthecustomizedoperator

Lab4(forweek5,6)

AllReduce實(shí)現(xiàn)和優(yōu)化

AllReduce

ImproveoneofAllReduceoperators’

implementationonHorovod

Lab5(forweek7,8)

配置Container來進(jìn)行云上訓(xùn)練或推理準(zhǔn)備

Configurecontainersforcustomizedtrainingandinference

Configurecontainers

Lab6

學(xué)習(xí)使用調(diào)度管理系統(tǒng)

Schedulingandresourcemanagementsystem

GetfamiliarwithOpenPAIorKubeFlow

Lab7

分布式訓(xùn)練任務(wù)練習(xí)

Distributedtraining

Trydifferentkindsofallreduceimplementations

Lab8

自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)練習(xí)

AutoML

SearchforanewneuralnetworkNNstructureeforImage/NLPtasks

Lab9

強(qiáng)化學(xué)習(xí)系統(tǒng)練習(xí)

RLSystems

Configureandgetfamiliarwithoneofthefollowing

RLSystems:RLlib,…

Self-driving Surveillancedetection Translation Medicaldiagnostics Game

Personalassistant

DeepLearning

深度學(xué)習(xí)正在改變世界

Art

Imagerecognition Speechrecognition Naturallanguage Generativemodel Reinforcementlearning

cat

dog

honeybadger

??1 ??2 ??3 ??4 ??5

CatDogRaccoon

loss

??error

????1

??error

????2

??error

????3

??error

????4

??error

????5

Errors

Dog

RDMA

海量的(標(biāo)識(shí))數(shù)據(jù)

深度學(xué)習(xí)算法的進(jìn)步 語言、框架

計(jì)算能力

14Mimages

深度學(xué)習(xí)+系統(tǒng)的進(jìn)步:編程語言、優(yōu)化、計(jì)算機(jī)體系結(jié)構(gòu)、并行計(jì)算以及分布式系統(tǒng)

E.g.,imageclassificationproblem

MNIST

ImageNet

WebImages

60Ksamples

16Msamples

BillionsofImages

10categories

1000categories

Openedcategories

TESTERRORRATE(%)

12

5

7.7

3.3

1.4

4.7

1.7

0.23

LeNet,convolution,max-pooling,softmax,1998

EfficientNet,

3.1%

NAS2019

AlexNet,16.4%

ReLU,Dropout,

2012

Inception,6.7%Batchnormalization,2015

ResNet,3.57%Residualway,2015

Imagerecognition

Speechrecognition

Naturallanguage

Reinforcementlearning

TPUv3

360Tops

V100

TPUv1

125Tops

90Tops

Performance(Op/Sec)

?

TPU

Dedicated

Hardware

GPU

CPU

Moore’slaw

5Kops

ENIAC

~500Gops

XeonE5

108x

105x

1960

1970 1980 1990 2000 2010

2019

CompilerBackend

TVM

TensorFlowXLA

LanguageFrontend

SwiftforTensorFlow

MxNetTensorFlowCNTK

PyTorch

Custompurposemachinelearningalgorithms

TheanoDisBeliefCaffe

Deeplearningframeworks

Algebra&

linearlibs

CPU

GPU

Densematmulengine

GPU

FPGA

SpecialAIaccelerators

TPU

GraphCore

OtherASICs

Custompurposemachinelearningalgorithms

TheanoDisBeliefCaffe

Deeplearningframeworksprovideeasierwaystoleveragevariouslibraries

MachineLearningLanguageandCompiler

PowerfulCompilerInfrastructure:

Codeoptimization,sparsityoptimization,hardwaretargeting

AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible

Controlflow,recursion,sparsity

Algebra&

linearlibs

CPU

GPU

AIframeworkDensematmulengine

SIMDMIMD

SparsitySupport

ControlFlowandDynamicityAssociatedMemory

End-to-EndAIUserExperiences

Model,Algorithm,Pipeline,Experiment,Tool,LifeCycleManagement

Experience

ProgrammingInterfacesComputationgraph,(auto)Gradientcalculation

IR,Compilerinfrastructure

Frameworks

HardwareAPIs(GPU,CPU,FPGA,ASIC)

ResourceManagement/Scheduler

ScalableNetworkStack(RDMA,IB,NVLink)

DeepLearningRuntime:Optimizer,Planner,Executor

Runtime

Architecture

(singlenodeandCloud)

class3

class4

class5

class6

class7

class8

更廣泛的AI系統(tǒng)生態(tài)

class12

機(jī)器學(xué)習(xí)新模式

(RL)

深度學(xué)習(xí)算法和框架

class11

class13

class10

自動(dòng)機(jī)器學(xué)習(xí)

(AutoML)

安全與隱私

模型推導(dǎo)、壓縮與優(yōu)

廣泛用途的高效新型通用AI算法

多種深度學(xué)習(xí)框架的支持與進(jìn)化

深度神經(jīng)網(wǎng)絡(luò)編譯架

構(gòu)及優(yōu)化

核心系統(tǒng)軟硬件

深度學(xué)習(xí)任務(wù)運(yùn)行和優(yōu) 通用資源管理和調(diào)度系化環(huán)境 統(tǒng)

新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計(jì)算棧

(2)開始訓(xùn)練

定義網(wǎng)絡(luò)結(jié)構(gòu)

Fullyconnected

通常用作分類問題的最后幾層

Convolutionalneuralnetwork

通常用作圖像、語音等Locality強(qiáng)的數(shù)據(jù)

Recurrentneuralnetwork

通常用作序列及結(jié)構(gòu)化的數(shù)據(jù),比如文本信息、知識(shí)圖

Transformerneuralnetwork

通常用作序列數(shù)據(jù),比如文本信息

#ArecursiveTreeBankmodelinadozenlinesofJPLcode#Walkthetree,accumulatingembeddingvecs

#Wordembeddingmodelisusedattheleafnodetomapword#indexintohigh-dimensionalsemanticwordrepresentation.

#Getsemanticrepresentationsforleftandrightchildren.

#Acompositionfunctionisusedtolearnsemantic#representationforphraseattheinternalnode.

#Maptreeembeddingtosentiment

更多樣化的結(jié)構(gòu)

更強(qiáng)大的建模能力

更復(fù)雜的依賴關(guān)系

更細(xì)粒度的計(jì)算模式

ExecutionRuntime

CPU,GPU,RDMAdevices

Graphdefinition(IR)

xw

*b

+

y

Front-end

LanguageBinding:Python,Lua,R,C++

Optimization

Batching,Cache,Overlap

x

y

z

*

a

+

b

Σ

c

TensorFlow

Data-FlowGraph(DFG)

asIntermediateRepresentation

x

y

z

??x

??y

*

a

*??

??z

+

b

Σ

c

+??

??a

??b

Σ??

AddgradientbackpropagationtoData-FlowGraph(DFG)

TensorFlow

x y z

??x ??y

CPUcode

GPUcode

* *

a

+ +??

b ??b

Σ Σ??

c

??a

??z

x

y

z

??x

??y

*

a

*??

??z

+

b

Σ

c

+??

??a

??b

Σ??

......

1

Operators

IDE

Programmingwith:VSCode,JupiterNotebook

Language

IntegratedwithmainstreamPL:PyTorchandTensorFlowinsidePython

Compiler

Intermediaterepresentation

Compilation

Optimization

Basicdatastructure:Tensor

Lexicalanalysis:Token

Usercontrolled:mini-batch

Basiccomputation:DAG

Parsing:AST

Dataparallelismandmodelparallelism

Advancefeatures:controlflow

Semanticanalysis:SymbolicAD

Loopnetsanalysis:pipelineparallelism,controlflow

GeneralIRs:MLIR

Codeoptimization

Dataflowanalysis:CSP,Arithmetic,Fusion

Codegeneration

Hardwaredependentoptimizations:matrixcomputation,layout

Resourceallocationandscheduler:memory,recomputation,

Runtimes

Singlenode:CuDNN

Multimode:Parameterservers,Allreducer

Computationclusterresourcemanagementandjobscheduler

Hardware

Hardwareaccelerators:CPU/GPU/ASIC/FPGA

Networkaccelerators:RDMA/IB/NVLink

Experience

Frameworks

Architecture

CompilerBackend

TVM

TensorFlowXLA

LanguageFrontend

SwiftforTensorFlow

MxNetTensorFlowCNTK

PyTorch

Deeplearningframeworks

SpecialAIaccelerators

TPU

GraphCore

OtherASICs

AIFrameworkDense

matmulengine

GPU

FPGA

import"tensorflow/core/framework/to";import"tensorflow/core/framework/op_to";import"tensorflow/core/framework/tensor_to

MachineLearningLanguageandCompiler

PowerfulCompilerInfrastructure:

Codeoptimization,sparsityoptimization,hardwaretargeting

AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible

Controlflow,recursion,sparsity

SIMDMIMD

SparsitySupport

ControlFlowandDynamicityAssociatedMemory

//SyntacticallysimilartoLLVM:

func@testFunction(%arg0:i32){

%x=call@thingToCall(%arg0):(i32)->i32br^bb1

^bb1:

%y=addi%x,%x:i32

return%y:i32}

深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模

8layers

1.4GFLOP

16%Error

2012

AlexNet

Image

152layers

22.6GFLOP

3.5%Error

2015

ResNet

Speech

提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度

大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度

InferenceperformanceServinglatency

80GFLOP

7,000hrsofData

8%Error

2014

DeepSpeech1

465GFLOP

12,000hrsofData

5%Error

2015

DeepSpeech2

Differentarchitectures:CNN,

RNN,Transformer,…

Highcomputationresource

requirements:modelsize,…

Differentgoals:latency,

throughput,accuracy,…

Betransparenttovarioususerrequirements

Transparentlyapplyoverheterogeneoushardwareenvironment

Scale-out LocalEfficiency MemoryEffectiveness

系統(tǒng)、算法和硬件必須相互結(jié)合:

算法層面:模型的結(jié)構(gòu),是否可壓縮、可稀疏化,batch的大小、學(xué)習(xí)算法

系統(tǒng)層面:各個(gè)層次的并行化,去重,Overlap,調(diào)度與資

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論