生成對(duì)抗網(wǎng)絡(luò)(GANs)

上傳人：神*** IP屬地：江西上傳時(shí)間：2021-12-16 格式：PPTX 頁(yè)數(shù)：41 大?。?.85MB 積分：12 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩36頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、山西大學(xué)計(jì)算機(jī)科學(xué)與信息技術(shù)學(xué)院山西大學(xué)計(jì)算機(jī)科學(xué)與信息技術(shù)學(xué)院山西大學(xué)山西大學(xué)大數(shù)據(jù)科學(xué)與產(chǎn)業(yè)大數(shù)據(jù)科學(xué)與產(chǎn)業(yè)研究院研究院2017年年12月月知識(shí)工程知識(shí)工程O(píng)UTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks（DCGAN） Conditional Generative Adversarial Nets （CGAN）OUTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Ad

2、versarial Networks（DCGAN） Conditional Generative Adversarial Nets （CGAN）有監(jiān)督學(xué)習(xí)經(jīng)常比無(wú)監(jiān)督的能獲得更好的訓(xùn)練效果。但真實(shí)世界中，有監(jiān)督學(xué)習(xí)需要的數(shù)據(jù)標(biāo)注（label）是相對(duì)少的。所以研究者們從未放棄去探索更好的無(wú)監(jiān)督學(xué)習(xí)策略，希望能從海量的無(wú)標(biāo)注數(shù)據(jù)中學(xué)到對(duì)于這個(gè)真實(shí)世界的表示甚至知識(shí)，從而去更好地理解我們的真實(shí)世界。評(píng)價(jià)無(wú)監(jiān)督學(xué)習(xí)好壞的方式有很多，其中生成任務(wù)就是最直接的一個(gè)。只有當(dāng)我們能生成/創(chuàng)造我們的真實(shí)世界，才能說(shuō)明我們是完完全全理解了它。然而，生成任務(wù)所依賴的生成式模型（generative models

3、）往往會(huì)遇到兩大困難。首先是我們需要大量的先驗(yàn)知識(shí)去對(duì)真實(shí)世界進(jìn)行建模，其中包括選擇什么樣的先驗(yàn)、什么樣的分布等等。而建模的好壞直接影響著我們的生成模型的表現(xiàn)。另一個(gè)困難是，真實(shí)世界的數(shù)據(jù)往往很復(fù)雜，我們要用來(lái)擬合模型的計(jì)算量往往非常龐大，甚至難以承受。 Ian Goodfellow提出的Generative Adversarial Networks(GANs)很好的避開(kāi)了這兩個(gè)困難。每一個(gè) GAN 框架，都包含著一對(duì)模型一個(gè)生成模型（G）和一個(gè)判別模型（D）。因?yàn)?D 的存在，才使得 GAN 中的 G 不再需要對(duì)于真實(shí)數(shù)據(jù)的先驗(yàn)知識(shí)和復(fù)雜建模，也能學(xué)習(xí)去逼近真實(shí)數(shù)據(jù)，最終讓其生成的數(shù)據(jù)達(dá)到

4、以假亂真的地步 D 也無(wú)法分別。論文中的模型優(yōu)化公式：Sample minibatch of m examples x_1,x_2,.,x_msample minibatch of m noise samplesz_1,z_2,.,z_mGeneratorDiscriminator代碼說(shuō)明及實(shí)驗(yàn)成果：# 定義判別器def discriminator(x): # 計(jì)算D_h1=ReLU（x*D_W1+D_b1）,該層的輸入為含784個(gè)元素的向量 D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1) # 計(jì)算第三層的輸出結(jié)果。因?yàn)槭褂玫氖荢igmoid函數(shù)，則

5、該輸出結(jié)果是一個(gè)取值為0,1間的標(biāo)量（見(jiàn)上述權(quán)重定義） # 即判別輸入的圖像到底是真（=1）還是假（=0） D_logit = tf.matmul(D_h1, D_W2) + D_b2 D_prob = tf.nn.sigmoid(D_logit) # 返回判別為真的概率和第三層的輸入值，輸出D_logit是為了將其輸入tf.nn.sigmoid_cross_entropy_with_logits()以構(gòu)建損失函數(shù) return D_prob, D_logit#定義一個(gè)可以生成m*n階隨機(jī)矩陣的函數(shù)，該矩陣的元素服從均勻分布，隨機(jī)生成的z就為生成器的輸入def sample_Z(m, n):

6、return np.random.uniform(-1., 1., size=m, n)# 定義生成器def generator(z): # 第一層先計(jì)算 y=z*G_W1+G-b1,然后投入激活函數(shù)計(jì)算G_h1=ReLU（y）,G_h1 為第二次層神經(jīng)網(wǎng)絡(luò)的輸出激活值 G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1) # 以下兩個(gè)語(yǔ)句計(jì)算第二層傳播到第三層的激活結(jié)果，第三層的激活結(jié)果是含有784個(gè)元素的向量，該向量轉(zhuǎn)化2828就可以表示圖像 G_log_prob = tf.matmul(G_h1, G_W2) + G_b2 G_prob = tf.nn

7、.sigmoid(G_log_prob) return G_prob#分別輸入真實(shí)圖片和生成的圖片，并投入判別器以判斷真?zhèn)蜠_real = discriminator(X)D_fake = discriminator(G_sample)#以下為原論文的判別器損失和生成器損失D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake)G_loss = -tf.reduce_mean(tf.log(D_fake)#定義判別器和生成器的優(yōu)化方法為Adam算法，關(guān)鍵字var_list表明最小化損失函數(shù)所更新的權(quán)重矩陣D_solver = t

8、f.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)GAN的優(yōu)勢(shì)： 1.根據(jù)實(shí)際的結(jié)果，它們看上去可以比其它模型產(chǎn)生了更好的樣本（圖像更銳利、清晰）。 2.生成對(duì)抗式網(wǎng)絡(luò)框架能訓(xùn)練任何一種生成器網(wǎng)絡(luò)。大部分其他的框架需要該生成器網(wǎng)絡(luò)有一些特定的函數(shù)形式，比如輸出層是高斯的。重要的是所有其他的框架需要生成器網(wǎng)絡(luò)遍布非零質(zhì)量（non-zero mass）。生成對(duì)抗式網(wǎng)絡(luò)能學(xué)習(xí)可以僅在與數(shù)據(jù)接近的細(xì)

9、流形（thin manifold）上生成點(diǎn)。 3.不需要設(shè)計(jì)遵循任何種類的因式分解的模型，任何生成器網(wǎng)絡(luò)和任何鑒別器都會(huì)有用。 4.無(wú)需利用馬爾科夫鏈反復(fù)采樣，無(wú)需在學(xué)習(xí)過(guò)程中進(jìn)行推斷（Inference），回避了近似計(jì)算棘手的概率的難題。GAN的缺點(diǎn)： 1.解決不收斂（non-convergence）的問(wèn)題。目前面臨的基本問(wèn)題是：所有的理論都認(rèn)為 GAN 應(yīng)該在納什均衡（Nash equilibrium）上有卓越的表現(xiàn)，但梯度下降只有在凸函數(shù)的情況下才能保證實(shí)現(xiàn)納什均衡。當(dāng)博弈雙方都由神經(jīng)網(wǎng)絡(luò)表示時(shí)，在沒(méi)有實(shí)際達(dá)到均衡的情況下，讓它們永遠(yuǎn)保持對(duì)自己策略的調(diào)整是可能的【OpenAI Ian G

10、oodfellow的Quora】。 2.難以訓(xùn)練：崩潰問(wèn)題（collapse problem）。GAN的學(xué)習(xí)過(guò)程可能發(fā)生崩潰問(wèn)題（collapse problem），生成器開(kāi)始退化，總是生成同樣的樣本點(diǎn)，無(wú)法繼續(xù)學(xué)習(xí)?！綢mproved Techniques for Training GANs】 3.無(wú)需預(yù)先建模，模型過(guò)于自由不可控。與其他生成式模型相比，GAN這種競(jìng)爭(zhēng)的方式不再要求一個(gè)假設(shè)的數(shù)據(jù)分布，而是使用一種分布直接進(jìn)行采樣sampling，從而真正達(dá)到理論上可以完全逼近真實(shí)數(shù)據(jù)，這也是GAN最大的優(yōu)勢(shì)。然而，這種不需要預(yù)先建模的方法缺點(diǎn)是太過(guò)自由了，對(duì)于較大的圖片，較多的 pixel的

11、情形，基于簡(jiǎn)單 GAN 的方式就不太可控了。在GANGoodfellow Ian, Pouget-Abadie J 中，每次學(xué)習(xí)參數(shù)的更新過(guò)程，被設(shè)為D更新k回，G才更新1回，也是出于類似的考慮。OUTLINEGenerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks（DCGAN） Conditional Generative Adversarial Nets （CGAN）In this work we introduce the conditional version of gene

12、rative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal mode

13、l, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditio

14、ned on some extra information y. y could be any kind of auxiliary information,such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer.In the generator the prior input noise pz(z), and y a

15、re combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed. In the discriminator x and y are presented as inputs and to a discriminative function (embodied again by a MLP in this case).OUTLINEGe

16、nerative Adversarial Nets(GANS) Deep Convolutional Generative Adversarial Networks（DCGAN） Conditional Generative Adversarial Nets （CGAN）In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning

17、 with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constrai

18、nts, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Add

19、itionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.In this paper, we make the following contributions： We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to tr

20、ain in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN) We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms. We visualize the filters learnt by GANs and empirically show that specific f

21、ilters have learned to draw specific objects. We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.Background:Historical arrempts to scale up GANs using CNNs to model images have been unsuccessful.We

22、also encountered difficulties attempting to scale GANs using CNN architecures commonly used in the supervised literature.However,after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resol

23、ution and deeper generative models.Core our approach is adopting and modifying three recently demonstrated changes of CNN architectures.Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional- strided convolutions

24、 (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses Tanh. Use LeakyReLU activation in the discriminator for all layers.APPROACH AND MO

25、DEL ARCHITECTUREThe first is the all convolutional net which replaces deterministic spatial pooling functions(such as maxpooling) with stried convolutions.We use this approach in our generator,allowing it to learn its own spatial upsampling,and discriminator.Second is the trend towards eliminating f

26、ully connected layers on top of convolutional features.The strongest example of this is global average pooling which has been utilized in state of the art image classification models(Mordvintsev et al.).We found global average pooling increased model stability but huir convergence speed.A middle gro

27、und of directly connecting the highest convolutional features to the input and output respectively of the generator and discrominator worked well.The first layer of the GAN,which takes a uniform noise distribution Z as input ,could be called fully connected as it is just a matrix multiplication, but

28、 the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.Generate model:Discriminator m

29、odel: h0 = lrelu(conv2d(image, self.df_dim, name=d_h0_conv) h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name=d_h1_conv) h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name=d_h2_conv) h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name=d_h3_conv) h4 = linear(tf.reshape(h3, self.batch_size, -1),

30、 1, d_h4_lin)Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. Directly a

31、pplying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the o

32、utput layer which uses the Tanh function. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al.

33、, 2013).訓(xùn)練細(xì)節(jié)1、mini-batch訓(xùn)練，batch size是128.2、所有的參數(shù)初始化由(0, 0.02)的正態(tài)分布中隨即得到3、LeakyReLU的斜率是0.2.4、雖然之前的GAN使用momentum來(lái)加速訓(xùn)練，DCGAN使用調(diào)好超參的Adam optimizer。5、learning rate=0.00026、將momentum參數(shù)beta從0.9降為0.5來(lái)防止震蕩和不穩(wěn)定。4.1 LSUNAs visual quality of samples from generative image models has improved, concerns of over-f

34、itting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link betwe

35、en how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high qualitysamples vi

36、a simply overfitting/memorizing training examples. No data augmentation was applied to the images.4.1.1 DEDUPLICATIONTo further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We fit a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving t

人人文庫(kù)> 全部分類> 教育資料 > 課設(shè)設(shè)計(jì)

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

生成對(duì)抗網(wǎng)絡(luò)(GANs)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

生成對(duì)抗網(wǎng)絡(luò)(GANs)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔