內容參考講稿recurrent neural_第1頁
內容參考講稿recurrent neural_第2頁
內容參考講稿recurrent neural_第3頁
內容參考講稿recurrent neural_第4頁
內容參考講稿recurrent neural_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、Machine Perception and Interaction Group (MPIG) .cn 跟我學CS231(7)袁洪慧MPIG Open Seminar 0220公眾號:mpig_robotRecurrent Neural NetworkRNN的應用RNN的分類Recurrent Neural NetworkRNN的正向傳播Truncated Backpropagation LSTMOther RNN Variantssummary目錄RNN的應用對于序列化的特征任務,都適合用RNN來解決:情感分析關鍵字提取語音識別機器翻譯股票分析RNN的分類“Vanilla” Neural N

2、etworkVanilla Neural NetworksRecurrent Neural Networks: Process Sequencese.g. Image Captioning image - sequence of wordse.g. Sentiment Classification sequence of words - sentimente.g. Machine Translation seq of words - seq of wordse.g. Video classification on frame levelRecurrent Neural Networkusual

3、ly want to predict a vector at some time stepsRecurrent Neural NetworkhRecurrent Neural NetworkWe can process a sequence of vectors x by applying a recurrence formula at every time step: new statesome function with parameters Wold state input vector atsome time stepRecurrent Neural NetworkWe can pro

4、cess a sequence of vectors x by applying a recurrence formula at every time step:Notice: the same function and the same set of parameters are used at every time step.(Simple) Recurrent Neural NetworkThe state consists of a single “hidden” vector h:RNN的展開圖RNN的正向傳播RNN: Computational GraphRe-use the sa

5、me weight matrix at every time-step:RNN: Computational Graph: Many to Many RNN: Computational Graph: Many to OneRNN: Computational Graph: One to ManySequence to Sequence: Many-to-one + one-to-manyMany to one: Encode input sequence in a single vectorOne to many: Produce output sequence from single in

6、put vectorTruncated Backpropagation Backpropagation through time梯度截斷(Gradient Clipping)為梯度設置閾值,超過該閾值的梯度值都會被cut,這樣更新的幅度就不會過大,因此容易收斂。具體做法:Truncated Backpropagation through timeTruncated Backpropagation through timeVanilla RNN Gradient FlowComputing gradient of h0 involves many factors of W (and repeat

7、ed tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013Largest singular value 1: Exploding gradients Largest singular value 1: Vanishing

8、 gradients Gradient clipping: Scale Computing gradient gradient if its norm is too bigSimple-RNN在實際應用中并不多,原因:如果輸入越長的話,展開的網(wǎng)絡就越深,對于“深度”網(wǎng)絡訓練的困難最常見的是 Gradient Explode 和 Gradient Vanish 的問題。Simple-RNN基于先前的詞預測下一個詞,但在一些更加復雜的場景中,例如,“I grew up in France I speak fluent French” “France”則需要更長時間的預測,而隨著上下文之間的間隔不斷

9、增大時,Simple-RNN會喪失學習到連接如此遠的信息的能力。LSTM(Long Short-Term Memory)Long Short Term Memory (LSTM)RNN和LSTM框圖 LSTM的核心思想逐步理解 LSTM之遺忘門逐步理解 LSTM之輸入門LSTM還需要記住東西,所以有了圖示“記憶門”。逐步理解 LSTM逐步理解 LSTM之輸出門Other RNN VariantsGRU(Gated Recurrent Unit)GRU是和LSTM功能幾乎一樣的另一種網(wǎng)絡。最終的模型比標準的 LSTM 模型要簡單,也是非常流行的變體SummaryRNNs allow a lot

10、of flexibility in architecture design Vanilla RNNs are simple but dont work very well Common to use LSTM or GRU: their additive interactions improve gradient flow Backward flow of gradients in RNN can explode or vanish. Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM) Better/simpler architectures are a hot

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論