




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
EfficientMethodsandHardwareforDeepLearning
SongHan
StanfordUniversity
1
DeepLearningisChangingOurLives
Self-DrivingMachineTranslation
2
AlphaGoSmartRobots
2
ModelsareGettingLarger
IMAGERECOGNITIONSPEECHRECOGNITION
16X
Model
10X
TrainingOps
152layers
465GFLOP
22.6GFLOP12,000hrsofData
~5%Error~3.5%error
8layers
1.4GFLOP
~16%Error
80GFLOP
7,000hrsofData
~8%Error
2012
AlexNet
2015
ResNet
2014
DeepSpeech1
2015
DeepSpeech2
Dally,NIPS’2016workshoponEfficientMethodsforDeepNeuralNetworks
3
ThefirstChallenge:ModelSize
Hardtodistributelargemodelsthroughover-the-airupdate
4
TheSecondChallenge:Speed
ErrorrateTrainingtime
ResNet18:10.76%2.5days
ResNet50:7.02%5days
ResNet101:6.21%1week
ResNet152:6.16%1.5weeks
SuchlongtrainingtimelimitsMLresearcher’sproductivity
Trainingtimebenchmarkedwithfb.resnet.torchusingfourM40GPUs
5
TheThirdChallenge:EnergyEfficiency
AlphaGo:1920CPUsand280GPUs,
$3000electricbillpergame
onmobile:drainsbattery
ondata-center:increasesTCO
6
TheProblemofLargeDNN
Hardwareengineersuffersfromthelargemodelsize
largermodel=>morememoryreference=>moreenergy
OperationEnergy[pJ]
RelativeEnergyCost
32bitintADD0.1
32bit?oatADD0.9
32bitRegisterFile1
32bitintMULT3.1
32bit?oatMULT3.7
32bitSRAMCache5
32bitDRAMMemory640
110100100010000
1=1000
8
TheProblemofLargeDNN
Hardwareengineersuffersfromthelargemodelsize
largermodel=>morememoryreference=>moreenergy
OperationEnergy[pJ]
RelativeEnergyCost
32bitintADD0.1
32bit?oatADD0.9
32bitRegisterFile1
32bitintMULT3.1
32bit?oatMULT3.7
32bitSRAMCache5
32bitDRAMMemory640
110100100010000
howtomakedeeplearningmoreefficient?
9
ImprovetheEfficiencyofDeepLearning
byAlgorithm-HardwareCo-Design
10
ApplicationasaBlackBox
AlgorithmSpec2006
Hardware
11
OpentheBoxbeforeHardwareDesign
?
Algorithm
Hardware
Breakstheboundarybetweenalgorithmandhardware
12
What’sintheBox:DeepLearning101
weights/activations
testdata
trainingdataset
TrainingInference
Model:CNN,RNN,LSTM…
traininghardwareinferencehardware
14
ProposedParadigm
Conventional
TrainingInference
SlowPower-
Hungry
Proposed
Training
Hanetal.ICLR’17
Compression
Pruning
Quantization
Hanetal.NIPS’15
Hanetal.ICLR’16
Accelerated
Inference
Hanetal.ISCA’16
Hanetal.FPGA’17
(Bestpaperaward)
FastPower-
Ef?cient
(Bestpaperaward)
15
ProposedParadigm
Conventional
TrainingInference
SlowPower-
Hungry
Proposed
Training
Hanetal.ICLR’17
Compression
Pruning
Quantization
Hanetal.NIPS’15
Hanetal.ICLR’16
Accelerated
Inference
Hanetal.ISCA’16
Hanetal.FPGA’17
(Bestpaperaward)
FastPower-
Ef?cient
(Bestpaperaward)
16
TheGoal&Trade-off
SmallFast
AccurateEnergyEfficient
17
Agenda
?ModelCompression(Small)
?Pruning[NIPS’15]
Compression
?TrainedQuantization[ICLR’16]
Pruning
Quantization
?HardwareAcceleration(Fast,Ef?cient)
?EIEAccelerator[ISCA’16]
Accelerated
Inference
?ESEAccelerator[FPGA’17]
?Ef?cientTraining(Accurate)
?Dense-Sparse-DenseRegularization[ICLR’17]
Training
CompressionAccelerationRegularization
18
SmallFast
Agenda
AccurateEnergyEfficient
?ModelCompression(Small)
?Pruning[NIPS’15]
?TrainedQuantization[ICLR’16]
?HardwareAcceleration(Fast,Ef?cient)
?EIEAccelerator[ISCA’16]
?ESEAccelerator[FPGA’17]
?Ef?cientTraining(Accurate)
?Dense-Sparse-DenseRegularization[ICLR’17]
CompressionAccelerationRegularization
19
LearningbothWeightsandConnections
forEfficientNeuralNetworks
Hanetal.NIPS2015
CompressionAccelerationRegularization
21
PruningNeuralNetworks
[Lecunetal.NIPS’89]
[Hanetal.NIPS’15]
PruningTrainedQuantizationHuffmanCoding
22
[Hanetal.NIPS’15]
PruningNeuralNetworks
-0.01x2+x+1
60Million
6M
10xlessconnections
PruningTrainedQuantizationHuffmanCoding
23
[Hanetal.NIPS’15]
PruningNeuralNetworks
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40%50%60%70%80%90%100%
ParametersPrunedAway
PruningTrainedQuantizationHuffmanCoding
24
[Hanetal.NIPS’15]
PruningNeuralNetworks
Pruning
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40%50%60%70%80%90%100%
ParametersPrunedAway
PruningTrainedQuantizationHuffmanCoding
25
[Hanetal.NIPS’15]
RetraintoRecoverAccuracy
PruningPruning+Retraining
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40%50%60%70%80%90%100%
ParametersPrunedAway
PruningTrainedQuantizationHuffmanCoding
26
[Hanetal.NIPS’15]
IterativelyRetraintoRecoverAccuracy
PruningPruning+RetrainingIterativePruningandRetraining
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40%50%60%70%80%90%100%
ParametersPrunedAway
PruningTrainedQuantizationHuffmanCoding
27
[Hanetal.NIPS’15]
PruningRNNandLSTM
*Karpathyetal"DeepVisual-
SemanticAlignmentsforGeneratingImageDescriptions"
PruningTrainedQuantizationHuffmanCoding
28
[Hanetal.NIPS’15]
PruningRNNandLSTM
?Original:abasketballplayerinawhiteuniformis
90%
playingwithaball
?Pruned90%:abasketballplayerinawhiteuniformis
playingwithabasketball
90%
?Original:abrowndogisrunningthroughagrassyfield
?Pruned90%:abrowndogisrunningthroughagrassy
area
90%
?Original:amanisridingasurfboardonawave
?Pruned90%:amaninawetsuitisridingawaveona
beach
?Original:asoccerplayerinredisrunninginthefield
95%
?Pruned95%:amaninaredshirtandblackandwhite
blackshirtisrunningthroughafield
PruningTrainedQuantizationHuffmanCoding
29
[Hanetal.NIPS’15]
PruningChangesWeightDistribution
BeforePruningAfterPruningAfterRetraining
Conv5layerofAlexnet.Representativeforothernetworklayersaswell.
PruningTrainedQuantizationHuffmanCoding
30
SmallFast
Agenda
AccurateEnergyEfficient
?ModelCompression(Small)
?Pruning[NIPS’15]
Pruning
?TrainedQuantization[ICLR’16]
Quantization
Pruning
Quantization
?HardwareAcceleration(Fast,Ef?cient)
?EIEAccelerator[ISCA’16]
?ESEAccelerator[FPGA’17]
?Ef?cientTraining(Accurate)
?Dense-Sparse-DenseRegularization[ICLR’17]
CompressionAccelerationRegularization
31
DeepCompression:CompressingDeep
NeuralNetworkswithPruning,Trained
QuantizationandHuffmanCoding
Hanetal.
ICLR2016
BestPaper
PruningTrainedQuantizationHuffmanCoding
32
[Hanetal.ICLR’16]
TrainedQuantization
32bit
4bit
8xlessmemoryfootprint
2.09,2.12,1.92,1.87
2.0
PruningTrainedQuantizationHuffmanCoding
33
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
34
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
35
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
36
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
37
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
38
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
39
[Hanetal.ICLR’16]
TrainedQuantization
PruningTrainedQuantizationHuffmanCoding
40
BeforeTrainedQuantization:
ContinuousWeight
[Hanetal.ICLR’16]
WeightValue
PruningTrainedQuantizationHuffmanCoding
41
AfterTrainedQuantization:
DiscreteWeight
[Hanetal.ICLR’16]
WeightValue
PruningTrainedQuantizationHuffmanCoding
42
AfterTrainedQuantization:
DiscreteWeightafterTraining
[Hanetal.ICLR’16]
WeightValue
PruningTrainedQuantizationHuffmanCoding
43
[Hanetal.ICLR’16]
BitsPerWeight
PruningTrainedQuantizationHuffmanCoding
44
[Hanetal.ICLR’16]
Pruning+TrainedQuantization
AlexNetonImageNet
PruningTrainedQuantizationHuffmanCoding
45
[Hanetal.ICLR’16]
HuffmanCoding
?In-frequentweights:usemorebitstorepresent
?Frequentweights:uselessbitstorepresent
PruningTrainedQuantizationHuffmanCoding
46
[Hanetal.ICLR’16]
SummaryofDeepCompression
PruningTrainedQuantizationHuffmanCoding
47
[Hanetal.ICLR’16]
Results:CompressionRatio
Network
Original
Size
Compressed
Size
Compression
Ratio
Original
Accuracy
Compressed
Accuracy
LeNet-3001070KB27KB40x98.36%98.42%
LeNet-51720KB44KB39x99.20%99.26%
AlexNet240MB6.9MB35x80.27%80.30%
VGGNet550MB11.3MB49x88.68%89.09%
FitinCache!
GoogleNet28MB2.8MB10x88.90%88.92%
ResNet-1844.6MB4.0MB11x89.24%89.28%
Canwemakecompactmodelstobeginwith?
CompressionAccelerationRegularization
48
1x18convolu.on8?lters8
ReLU8
1x18and83x38convolu.on8?lters8
ReLU8
CompressionAccelerationRegularization
49
CompressingSqueezeNet
NetworkApproachSizeRatioTop-1
Accuracy
Top-5
Accuracy
AlexNet-240MB1x57.2%80.3%
AlexNetSVD48MB5x56.0%79.4%
AlexNet
Deep
Compression
6.9MB35x57.2%80.3%
SqueezeNet-4.8MB50x57.5%80.3%
SqueezeNet
Deep
Compression
0.47MB510x57.5%80.3%
CompressionAccelerationRegularization
50
Results:Speedup
Average
0.6x
CPUGPUmGPU
CompressionAccelerationRegularization
51
Results:EnergyEfficiency
Average
CPUGPUmGPU
CompressionAccelerationRegularization
52
IndustrialImpact
DeepCompression
“AtBaidu,our#1motivationforcompressingnetworksistobringdownthesizeofthebinary?le.
Asamobile-?rstcompany,wefrequentlyupdatevariousappsviadifferentappstores.We'vevery
sensitivetothesizeofourbinary?les,andafeaturethatincreasesthebinarysizeby100MB
willreceivemuchmorescrutinythanonethatincreasesitby10MB.”—AndrewNg
CompressionAccelerationRegularization
53
Challenges
?Onlinede-compressionwhilecomputing
–Specialpurposelogic
?Computationbecomesirregular
–Sparseweight
–Sparseactivation
–Indirectlookup
?Parallelizationbecomeschallenging
–Synchronizationoverhead.
–Loadimbalanceissue.
–Scalability
CompressionAccelerationRegularization
54
HavingOpenedtheBox,HWDesign?
Algorithm
?Hardware
Breakstheboundarybetweenalgorithmandhardware
CompressionAccelerationRegularization
55
SmallFast
Agenda
AccurateEnergyEfficient?ModelCompression(Small)
?Pruning[NIPS’15]
?TrainedQuantization[ICLR’16]
?HardwareAcceleration(Fast,Ef?cient)
?EIEAccelerator[ISCA’16]
?ESEAccelerator[FPGA’17]
?Ef?cientTraining(Accurate)
?Dense-Sparse-DenseRegularization[ICLR’17]
CompressionAccelerationRegularization
56
EIE:EfficientInferenceEngineon
CompressedDeepNeuralNetwork
Hanetal.ISCA2016
58
OperationEnergy[pJ]
32bitintADD0.1
32bit?oatADD0.9
32bitRegisterFile1
32bitintMULT3.1
32bit?oatMULT3.7
32bitSRAMCache5
32bitDRAMMemory640
110100100010000
1=1000
Howtoreducethememoryfootprint?
57
RelatedWork
SpMat
Act_0Act_1
Ptr_EvenArithmPtr_Odd
SpMat
Eyeriss[1]TPU[2]DaDiannao[3]EIE[thiswork]
MITGoogleCASStanford
Data?ow8-bitIntegereDRAMCompression
[1]Yu-HsinChen,etal."Eyeriss:Anenergy-efficientreconfigurableacceleratorfordeepconvolutionalneuralnetworks."ISSCC2016
[2]NormJouppi,“GooglesuperchargesmachinelearningtaskswithTPUcustomchip”,2016
[3]YunjiChen,etal."Dadiannao:Amachine-learningsupercomputer."Micro2014
[4]SongHanetal.“EIE:EfficientInferenceEngineonCompressedDeepNeuralNetwork”,ISCA2016
59
CompressionAccelerationRegularization
59
[Hanetal.ISCA’16]
EIE:EfficientInferenceEngine
0*A=0W*0=02.09,1.92=>2
SparseWeightSparseActivation
WeightSharing
90%staticsparsity70%dynamicsparsity
4-bitweights
10xlesscomputation3xlesscomputation
5xlessmemoryfootprint8xlessmemoryfootprint
CompressionAccelerationRegularization
60
[Hanetal.ISCA’16]
EIE:ParallelizationonSparsity
?a!0a10a3"
×
??
w0,0w0,10w0,3
??
00w1,20
??
??
??
0w2,10w2,3
??
??
0000
??
??
??
00w4,2w4,3
??
??
w5,0000
??
??
?000w6,3?
0w7,100
=
??
b0
??
b1
??
??
??
?b
??
2
??
b3
??
??
??
?b
4
??
??
b5
??
??
?b6?
?b
7
R?eLU
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?b
b0
b1
0
b3
0
b5
b6
0
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
61
[Hanetal.ISCA’16]
EIE:ParallelizationonSparsity
PEPEPEPE
PEPEPEPE
CentralControl
PEPEPEPE
PEPEPEPE
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
?00w1,20???
?????
PE1?
b1b1
?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
?????
=
???
?"b4?
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???b6
??
000wb
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
62
[Hanetal.ISCA’16]
EIE:ParallelizationonSparsity
logically
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
b1b1
PE1?00w1,20????
?????
?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
??
?=??
????
?"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
?000w6,3????
"b7
b6b6
0w7,1000
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
VirtualWeightW0,0W0,1W4,2W0,3W4,3
physically
RelativeIndex01200
ColumnPointer0123
CompressionAccelerationRegularization
63
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
64
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
65
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
66
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
67
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
68
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
69
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
70
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
71
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
72
[Hanetal.ISCA’16]
Dataflow
?a!0a10a3"
!?b
?????
PE0w0,0w0,10w0,3b0b0
?????
00w1,20b1b1
?????
?????
PE1?????
PE20w2,10w2,30
"b2
?????
?????
PE30000b3b3
?????
R#eLU
????
=?
?????
00w4,2w4,30"b4
"b4
00w4,2w4,30
?????
?????
w5,0000b5b5
?????
?????
???
b6b6
000w??
0w7,1000
"b
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
CompressionAccelerationRegularization
73
[Hanetal.ISCA’16]
EIEArchitecture
Weightdecode
Compressed4-bit
DNNModelWeight
Virtualweight
EncodedWeight
Look-up
RelativeIndex
Index
SparseFormat
Input
4-bit
Accum
Image
RelativeIndex
16-bit
Realweight
ALU
Mem
16-bit
AbsoluteIndex
Prediction
Result
AddressAccumulate
CompressionAccelerationRegularization
74
[Hanetal.ISCA’16]
MicroArchitectureforeachPE
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Act
SRAM
Leading
NZero
Detect
Weight
Col
Sparse
EvenPtrSRAMBank
Decoder
Start/
BypassDestSrc
ActAct
MatrixRegs
End
SRAM
AddrRegsRegs
Address
OddPtrSRAMBank
AbsoluteAddress
Accum
RelativeIndex
PointerReadSparseMatrixAccessArithmeticUnitActR/W
ReLU
SRAMRegsComb
CompressionAccelerationRegularization
75
[Hanetal.ISCA’16]
LoadBalance
ActValue
ActQueuePEPEPEPE
ActIndex
PEPEPEPE
CentralControl
PEPEPEPE
PEPEPEPE
SRAMRegsComb
CompressionAccelerationRegularization
76
[Hanetal.ISCA’16]
ActivationSparsity
PEPEPEPE
ActValue
ActIndex
ActQueue
ActValue
Leading
NZero
PEPEPEPE
CentralControl
ActIndex
Detect
PEPEPEPE
EvenPtrSRAMBankPEPEPEPE
OddPtrSRAMBank
PointerRead
SRAMRegsComb
CompressionAccelerationRegularization
77
[Hanetal.ISCA’16]
WeightSparsity
PEPEPEPE
ActValue
ActIndex
ActQueue
ActValue
Leading
NZero
PEPEPEPE
CentralControl
ActIndex
Detect
PEPEPEPE
PEPEPEPE
EvenPtrSRAMBank
Col
Start/
End
Sparse
Matrix
Regs
SRAM
AddrOddPtrSRAMBank
PointerReadSparseMatrixAccess
SRAMRegsComb
CompressionAccelerationRegularization
78
[Hanetal.ISCA’16]
WeightSharing
PEPEPEPE
PEPEPEPE
CentralControl
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Decoded
Weight
Leading
NZero
Detect
PEPEPEPE
PEPEPEPE
EvenPtrSRAMBank
OddPtrSRAMBank
Col
Start/
End
Addr
Sparse
Matrix
SRAM
Regs
Weight
Decoder
PointerReadSparseMatrixAccess
SRAMRegsComb
CompressionAccelerationRegularization
79
[Hanetal.ISCA’16]
AddressAccumulate
PEPEPEPE
PEPEPEPE
CentralControl
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Decoded
Weight
Leading
NZero
Detect
PEPEPEPE
PEPEPEPE
EvenPtrSRAMBank
OddPtrSRAMBank
PointerRead
Col
Start/
End
Sparse
Matrix
SRAM
Regs
Addr
SparseMatrixAccess
Relative
Index
Weight
Decoder
Address
Accum
Absolute
Index
SRAMRegsComb
CompressionAccelerationRegularization
80
[Hanetal.ISCA’16]
Arithmetic
PEPEPEPE
PEPEPEPE
CentralControl
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Leading
NZero
Detect
PEPEPEPE
PEPEPEPE
EvenPtrSRAMBank
OddPtrSRAMBank
PointerRead
Col
Start/
End
Sparse
Matrix
SRAM
Regs
Addr
SparseMatrixAccess
Relative
Index
Weight
Decoder
Address
Accum
Bypass
AbsoluteAddress
ArithmeticUnit
SRAMRegsComb
CompressionAccelerationRegularization
81
[Hanetal.ISCA’16]
WriteBack
PEPEPEPE
PEPEPEPE
CentralControl
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Act
SRAM
Leading
NZero
Detect
PEPEPEPE
PEPEPEPE
Weight
Col
Start/
End
Sparse
EvenPtrSRAMBank
OddPtrSRAMBank
Decoder
DestSrc
ActAct
MatrixRegs
SRAM
Bypass
AddrRegsRegs
Address
AbsoluteAddress
Accum
Relative
PointerReadSparseMatrixAccessArithmeticUnitActR/W
Index
SRAMRegsComb
CompressionAccelerationRegularization
82
[Hanetal.ISCA’16]
Relu,Non-zeroDetection
PEPEPEPE
PEPEPEPE
CentralControl
ActValue
ActIndex
ActQueue
ActIndex
ActValue
Encoded
Weight
Act
SRAM
Leading
NZero
Detect
PEPEPEPE
PEPEPEPE
Weight
Col
Start/
End
Sparse
EvenPtrSRAMBank
OddPtrSRAMBank
Decoder
DestSrc
ActAct
MatrixRegs
SRAM
Bypass
AddrRegsRegs
Address
AbsoluteAddress
Accum
Relative
PointerReadSparseMatrixAccessArithmeticUnitActR/W
Index
ReLU
SRAMRegsComb
CompressionAccelerationRegularization
83
[Hanetal.ISCA’16]
What’sSpecial
PEPEPEPE
PEPEPEPE
Ac
Ac
ActValue
E
W
Act
SRAM
Leading
NZero
Detect
CentralControl
PEPEPEPE
PEPEPEPE
EvenPtrSRAMBank
Col
Start/
End
Sparse
Matrix
SRAM
Regs
Bypass
Dest
Act
Regs
Src
Act
Regs
AddrReLU
OddPtrSRAMBank
bsoluteAddress
R
PointerReadSparseMatrixAccesshmeticUnitActR/W
SRAMRegsComb
CompressionAccelerationRegularization
84
[Hanetal.ISCA’16]
PostLayoutResultofEIE
Technology45nm
#PEs64
on-chipSRAM8MB
MaxModelSize84Million
StaticSparsity10x
DynamicSparsity3x
Quantization4-bit
ALUWidth16-bit
Area40.8mm^2
MxVThroughput81,967layers/s
Power586mW
1.Postlayoutresult
2.ThroughputmeasuredonAlexNetFC-7
CompressionAccelerationRegularization
85
[Hanetal.ISCA’16]
Benchmark
?CPU:IntelCore-i75930k
?GPU:NVIDIATitanX
?MobileGPU:NVIDIAJetsonTK1
LayerSize
Weight
Density
Activation
Density
FLOP
Reduction
Description
AlexNet-64096×92169%35%33x
AlexNetfor
image
AlexNet-74096×40969%35%33x
classi?cation
AlexNet-81000×409625%38%10x
VGG-64096×250884%18%100x
VGG-16for
VGG-74096×40964%37%50ximage
classi?cation
VGG-81000×409623%41%10x
NeuralTalk-We600×409610%100%10x
RNNandLSTM
NeuralTalk-Wd8791×60011%100%10xforimage
caption
NeuralTalk-LSTM2400×120110%100%10x
CompressionAccelerationRegularization
86
[Hanetal.ISCA’16]
SpeeduponEIE
SpMat
CPUDense(Baseline)CPUCompressedGPUDenseGPUCompressedmGPUDensemGPUCompressedEIE
1018x
618x1000x507x
CPUDense(Baseline)CPUCompressedGPUDenseGPUCompressedmGPUDenseUCompressedEIE
8x9x
248xAct_0Act_1
210x
189x
135x
Ptr_EvenArithmPtr_Odd
9x10x10x9x9x
10x5x5x
2x3x2x3x3x
2x3x
1x1x
3x2x
1x
SpMat
1x1x1x1x1x1x1x1x1x1x
1.1x1.0x1.0x1x1x
1x0.5x
0.6x
0.3x0.5x0.5x0.5x0.6x
0.1x
Alex-6Alex-7Alex-8VGG-6VGG-7VGG-8NT-WeNT-WdNT-LSTMGeoMean
189x
48x
15x
GeoMean
3x
3x
1x
0.6x
CPUGPUmGPUEIEAlex-6Alex-7Alex-8VGG-6VGG-7VGG-8NT-WeNT-WdNT-LSTMGeoMean
CompressionAccelerationRegularization
87
Alex-6Alex-7Alex-8VGG-6VGG-7VGG-8NT-WeNT-WdNT-LSTMGeoMean
[Hanetal.ISCA’16]
EnergyEfficiencyonEIE
SpMat
CPUDense(Baseline)CPUCompressedGPUDenseGPUCompressedmGPUDensemGPUCompressedEIE
119,797x76,784x
61,533x
00000x
34,522x
24,207x
14,826x
11,828x9,485x10,904x8,053x
Act_0Act_1
10000x
Ptr_EvenArithmPtr_Odd
CPUDense(Baseline)CPUCompressedGPUDenseGPUCompressedmGPUDensePUCompressedEIE
1000x
18x17x014x20xSpMat
10x8x14x15x
7x12x7x10x10x
9x
5x6x6x5x7x
10x5x
6x6x4x6x
10x15x
3x
13x14x2x
5x8x7x7x9x1x1x1x1x1x1x1x1x1x1x
7x1x
Alex-6Alex-7Alex-8VGG-6VGG-7VGG-8NT-WeNT-WdNT-LSTMGeoMean
24,207x
GeoMean36x
23x
6x7x
9x
1x
CPUGPUmGPUEIE
Alex-6Alex-7Alex-8VGG-6VGG-7VGG-8NT-WeNT-WdNT-LSTMGeoMean
CompressionAccelerationRegularization
88
[Hanetal.ISCA’16]
Comparison:Throughput
EIE
Throughput(Layers/sinlogscale)1E+06
ASIC1E+05
ASIC
ASIC1E+04
GPU
ASIC1E+03
1E+02CPUmGPU
FPGA1E+01
1E+00
Core-i75930kTitanXTegraK1A-EyeDaDianNaoTrueNorthEIEEIE
22nm28nm28nm28nm28nm28nm45nm28nm
CPUGPUmGPUFPGAASICASICASICASIC
64PEs256PEs
CompressionAccelerationRegularization
89
[Hanetal.I
溫馨提示
- 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 薪酬管理制度理論意義
- 集訓(xùn)基地庫(kù)房管理制度
- 鐵路管理制度讀后感悟
- 保安室人員管理制度
- 高職學(xué)校寢室管理制度
- 裝卸人員日常管理制度
- 行政餐廳宿舍管理制度
- 供銷社公司管理制度
- 項(xiàng)目公司技術(shù)管理制度
- 西安深化預(yù)算管理制度
- 2024年中考英語(yǔ)復(fù)習(xí):補(bǔ)全對(duì)話 中考真題練習(xí)題匯編(含答案解析)
- 《電力機(jī)車制動(dòng)機(jī)》 課件 項(xiàng)目三 CCB-II制動(dòng)系統(tǒng)
- 乳豬料生產(chǎn)工藝
- 醫(yī)療放射事故應(yīng)急處理與輻射泄漏處置培訓(xùn)課件
- 蔚來培訓(xùn)課件
- 山東省地震安全性評(píng)價(jià)收費(fèi)項(xiàng)目及標(biāo)準(zhǔn)
- 牙周病的護(hù)理課件
- 腎上腺占位的教學(xué)查房課件
- 護(hù)理人員緊急調(diào)配方案課件
- 機(jī)房搬遷服務(wù)投標(biāo)方案(技術(shù)標(biāo))
- 銀行跨境人民幣結(jié)算業(yè)務(wù)創(chuàng)新與營(yíng)銷策略
評(píng)論
0/150
提交評(píng)論