АНАЛІЗ ГЕНЕРАТИВНИХ МОДЕЛЕЙ ГЛИБОКОГО НАВЧАННЯ ТА ОСОБЛИВОСТЕЙ ЇХ РЕАЛІЗАЦІЇ НА ПРИКЛАДІ WGAN

Ya. O. Isaienkov; O. B. Mokin

doi:10.31649/1997-9266-2022-160-1-82-94

Authors

Ya. O. Isaienkov Vinnytsia National Technical University
O. B. Mokin Vinnytsia National Technical University

DOI:

https://doi.org/10.31649/1997-9266-2022-160-1-82-94

Keywords:

data generation, generative adversarial network, autoencoder, deep learning, GAN, WGAN

Abstract

The paper presents architecture features, the learning process, and the scope of generative deep learning models. The main tasks of such models include data generation (images, music, texts, videos), transferring styles from one data to another, improving data quality, data clustering, anomaly detection, etc. It is noted that the results of generative models are commonly used for entertainment purposes. In addition, they can be used as data for learning other machine learning models, sources of new ideas for creative professions, tools for anonymization of sensitive data, etc. The article analyzes the advantages and disadvantages of basic generative models like autoencoders, variational autoencoders, generative adversarial networks (GAN), Wasserstein GAN (WGAN), StyleGAN, StyleGAN2, and BigGAN. The paper also describes a step-by-step study of the generative model implementation on the example of WGAN, which includes the basic architecture implementation and more complex elements. Examples of such elements are the introduction of conditional generation to add the ability to select the desired class and the algorithm of bilinear sampling to solve the problem of the so-called ‘checkerboard effect’. The final model, created as a result of the study and named CWGAN-GP_128, is capable of generating realistic images of dandelions and marigolds at a resolution of 128x128 pixels. The model learned on the authors' data set consists of 900 photos (450 for each class). The learning process includes affine transformations such as rotations and inversions to augment the images. It is emphasized that although the results of generative models are often easy to evaluate visually, along with the rapid progress of GAN, the problem of automating the process of checking the quality of generated data is growing. The final model is open for public access, and the results are accessible on the authors' website thisflowerdoesnotexist.herokuapp.com.

Author Biographies

Ya. O. Isaienkov, Vinnytsia National Technical University

Post-Graduate Student of the Chair of System Analysis and Information Technologies

O. B. Mokin, Vinnytsia National Technical University

д-р техн. наук, професор, професор кафедри системного аналізу та інформаційних технологій

References

This person does not exist. [Online]. Available: https://thispersondoesnotexist.com/. Accessed on: February 2, 2022.

GauGAN2. [Online]. Available: http://gaugan.org/gaugan2/. Accessed on: February 2, 2022.

T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8107-8116. https://doi.org/10.1109/CVPR42600.2020.00813 .

S. Sundaram, and N. Hulkund, “GAN-based Data Augmentation for Chest X-ray Classification,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/pdf/2107.02970.pdf . Accessed on: February 2, 2022.

S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “CNN-Generated Images Are Surprisingly Easy to Spot… for Now,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8692-8701. https://doi.org/10.1109/CVPR42600.2020.00872 .

This vessel does not exist. [Online]. Available: https://thisvesseldoesnotexist.com/ . Accessed on: February 2, 2022.

J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded Diffusion Models for High Fidelity Image Generation,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/pdf/2106.15282.pdf . Accessed on: February 2, 2022.

Augmented Reality for Jewelry. [Online]. Available: https://tryon.jewelry/main . Accessed on: February 2, 2022.

Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan., “Generative Adversarial Networks: A Survey Toward Private and Secure Applications,” ACM Computing Surveys, vol. 54, no. 6, pp. 1-38, July, 2022. https://doi.org/10.1145/3459992 .

R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia, “Deepfakes and beyond: A survey of face manipulation and fake detection,” in Information Fusion, vol 64, pp. 131-148, December, 2020. https://doi.org/10.1016/j.inffus.2020.06.014.

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative Adversarial Text to Image Synthesis,” in Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 1060-1069.

P. Isola, J. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967-5976. https://doi.org/10.1109/CVPR.2017.632 .

J.-Y. Liu, Y.-H. Chen, Y.-C. Yeh, and Y.-H. Yang, “Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization,” in arXiv e-prints, 2020. [Online]. Available: https://arxiv.org/pdf/2005.08526.pdf . Accessed on: February 2, 2022.

M. Bińkowski et al., “High Fidelity Speech Synthesis with Adversarial Networks,” in arXiv e-prints, 2019. [Online]. Available: https://arxiv.org/pdf/1909.11646v2.pdf . Accessed on: February 3, 2022.

M. Pasini, “MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms,” in arXiv e-prints, 2019. [Online]. Available: https://arxiv.org/pdf/1910.03713.pdf . Accessed on: February 2, 2022.

Voice Cloning for Content Creators. [Online]. Available: https://www.respeecher.com/. Accessed on: February 2, 2022.

D. Croce, G. Castellucci, and R. Basili, “GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2114–2119. https://doi.org/10.18653/v1/2020.acl-main.191 .

L. Liu, Y. Lu, M. Yang, Q. Qu, J. Zhu, and H. Li, “Generative Adversarial Network for Abstractive Text Summarization,” in arXiv e-prints, 2017. [Online]. Available: https://arxiv.org/pdf/1711.09357.pdf . Accessed on: February 3, 2022.

H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Efficient GAN-Based Anomaly Detection,” in arXiv e-prints, 2018. [Online]. Available: https://arxiv.org/pdf/1802.06222.pdf . Accessed on: February 3, 2022.

S. Mukherjee, H. Asnani, E. Lin, and S. Kannan, “ClusterGAN: Latent Space Clustering in Generative Adversarial Networks,” in arXiv e-prints, 2018. [Online]. Available: https://arxiv.org/pdf/1809.03627.pdf . Accessed on: February 3, 2022.

M. Kramer, “Nonlinear principal component analysis using autoassociative neural networks,” in AIChE, vol.37, no. 2, pp. 233-243, February, 1991. https://doi.org/10.1002/aic.690370209 .

D. Kingma, and M. Welling, “Auto-Encoding Variational Bayes,” in arXiv e-prints, 2013. [Online]. Available: https://arxiv.org/pdf/1312.6114.pdf . Accessed on: February 2, 2022.

I. Goodfellow et al., “Generative adversarial networks,” in Communications of the ACM, vol. 63, no. 11, pp. 139-144, November. 2020. https://doi.org/10.1145/3422622.

A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” in arXiv e-prints, 2015. [Online]. Available: https://arxiv.org/pdf/1511.06434.pdf . Accessed on: February 2, 2022.

M. Arjovsky, S. Chintala; and L. Bottou, “Wasserstein Generative Adversarial Networks,” in Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 214-223.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein GANs,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), 2017, pp. 5769-5779.

A. Brock, J. Donahue, and K. Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” in arXiv e-prints, 2018. [Online]. Available: https://arxiv.org/pdf/1809.11096.pdf . Accessed on: February 2, 2022.

T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4396-4405. http://doi.org/10.1109/CVPR.2019.00453 .

T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8107-8116. http://doi.org/10.1109/CVPR42600.2020.00813 .

A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and Checkerboard Artifacts,” Distill, October, 2016. http://doi.org/10.23915/distill.00003 .

This Flower Does Not Exist. [Online]. Available: https://thisflowerdoesnotexist.herokuapp.com/ . Accessed on: February 2, 2022.

Analysis of Generative Deep Learning Models and Features of Their Implementation on the Example of WGAN

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ya. O. Isaienkov, Vinnytsia National Technical University

O. B. Mokin, Vinnytsia National Technical University

References

Downloads

Published

How to Cite

Issue

Section

Metrics

Downloads

License

Most read articles by the same author(s)

Language

Make a Submission

Information

Visitors

Current Issue