登录
首页 » Others » 反向传播算法推导—全连接神经网络

反向传播算法推导—全连接神经网络

于 2020-12-09 发布
0 363
下载积分: 1 下载次数: 6

代码说明:

反向传播算法是人工神经网络训练时采用的一种通用方法,在现代深度学习中得到了大 规模的应用。全连接神经网络(多层感知器模型,MLP),卷积神经网络(CNN),循环神 经网络(RNN)中都有它的实现版本。算法从多元复合函数求导的链式法则导出,递推的 计算神经网络每一层参数的梯度值。算法名称中的“误差”是指损失函数对神经网络每一层 临时输出值的梯度。反向传播算法从神经网络的输出层开始,利用递推公式根据后一层的误 差计算本层的误差,通过误差计算本层参数的梯度值,然后将差项传播到前一层(w, x,)+b这个神经元接受的输入信号为向量(),向量()为输入向量的组合权重,为徧置项,是标量。神经儿对输入冋量进行加权求和,并加上偏置项最后经过激活函数变换产生输出为表述简洁,我们把公式写成向量和矩阵形式。对每个神经元,它接受的来自前一层神经元的输入为向量,本节点的权重向量为,偏置项为,该神经元的输出值为先计算输入向量与权重向量的内积,加上偏置项,再送入一个函数进行变换,得到输出这个函数称为激活函数,典型的是函数。为什么需要激活函数以及什么样的函数可以充当激活函数,在之前的公众号文章“理解神经网终的激活函数”中已经进行了介绍。神绎网络一般有多个层。第一层为输入层,对应输入向量,神绎元的数量等于特征向量的维数,这个层不对数据进行处理,只是将输入向量送入下一层中进行计算。中间为隐含层,可能有多个。最后是输出层,神经元的数量等于要分类的类别数,输出层的输岀值被用来做分类预测。下面我们来看一个简单神经网络的例了,如下图所示这个网络有层。第一层是输入层,对应的输入向量为,有个神经元,写成分量形式为(),它不对数据做任何处理,直接原样送入下一层。中间层有个神经元,接受的输入数据为向量,输出向量为,写成分量形式为。第三个层为输出层,接受的输入数据为向量,输出向量为,写成分量形式为()。第一层到第层的权重矩阵为(,第二层到第三层的权重矩阵为()。权重矩阵的每一行为一个权重向量,是层所有神经元到本层某一个神经儿的连接权重,这里的上标表小层数如果激活函数选用函数,则第二层神经元的输出值为+(-(+0)+(1+(0)(-(()第三层神经元的输出值为如果把代入上面二式中,可以将输出向量表示成输出向量的函数。通过调整权重矩阵和偏置项可以实现不同的函数映射,因此神经网终就是一个复合函数需要解决的·个核心问题是·旦神经网络的结构(即神经元层数,每层神经元数量)桷定之后,怎样得到权重矩阵和偏置项。这些参数是通过训练得到的,这是本文推导的核心任务个简单的例子首先以前面的层神经网络为例,推导损失函数对神经网络所有参数梯度的计算方法假设训练样本集中有个样本()。其中为输入向量,为标签向量。现在要确定神经网络的映射函数:什么样的函数能很好的解释这批训练栟本?答案是神经网络的预测输出要尽可能的接近样本的标签值,即在训练集上最小化预测误差,如果使用均方误差,则优化的目标为:∑‖()-其中()和都是向量,求和项内部是向量的范数平方,即各个分量的平方和。上面的误差也称为欧氏距离损失函数,除此之外还可以使用其他损失函数,如交叉熵、对比损失等。优化目标函数的自变量是各层的权重矩阵和梯度向量,一般情况下无法保证目标函数是凸函数,因此这不是一个凸优化问题,有陷入局部极小值和鞍点的风险(对于这些概念和问题之前的公众号文章“理解梯度下降法”,“理解凸优化”中己经做了详细介绍)这是神经网络之前一直被诟病的一个问题。可以使用梯度下降法进行求解,使用梯度下降法需要计算出损失函数对所有权重矩阵、偏置向量的梯度值,接下来的关键是这些梯度值的计算。在这里我们先将问题简化,只考虑对单个样本的损失函数()-‖后面如果不加说明,都使用这种单样木的损失函数。如果计算出了对单个样木损失函数的棁度值,对这些梯度值计算均值即可得到整个目标函数的梯度值。和(要被代入到网络的后一层中,是复合函数的内层变量,我们先考虑外层的和。权重矩阵是一个x的矩阵,它的两个行分别为向量(和是个维的列向量,它的两个元素为()和()。网络的输入是向量,第一层映射之后的输出是向量首先计算损失函数对权重矩阵每个元素的偏导数,将欧氏距离损尖函数展开,有((+))(())6(如果,即对权重矩阵第行的元素求导,上式分了中的后半部分对来说是常数。根据链式法则有S()+()O如果,即对矩阵第二行的元素求导,类似的有:可以统一写成可以发现,第一个下标决定了权重矩阵的第行和偏置向量的第个分量,第二个下标决定了向量的第个分量。这可以看成是一个列向量与一个行向量相乘的结果,写成矩阵形式为上式中乘法⊙为向量对应元素相乘,第二个乘法是矩阵乘法。是个维列向量,+也是一个维列向量,两个向量执行⊙运算的结果还是个维列向量。是一个元素的列向量,其转置为维行向量,前面这个:维列向量与的乘积为的矩阵,这正好与矩阵的尺寸相等。在上面的公式中,权重的偏导数在求和项中由部分组成,分别是网络输出值与真实标签值的误差激活区数的导数+(),本层的输入值。神经网络的输出值、激活函数的导数值本层的输入值都可以在正向传播吋得到,因此可以晑效的计算出来。对所有训练样本的偏导数计算均值,可以得到总的偏导数对偏置项的偏导数为:如果上式分子中的后半部分对来说是常数,有:()⊥()如果类似的有这可以统写成:写成矩阵形式为偏置项的导数由两部分组成,分别是神经网络预测值与真实值之间的误差,激活函数的导数值,与权重矩阵的偏导数相比唯一的区别是少了。接下来计算对和的偏导数,由于是复合函数的内层,情况更为复杂。()是个的短阵,它的个行向量为(),(,(,(。偏置项()是维向量,个分量分别是(),(,(),(。首先计算损失函数对的元素的偏导数:而上式分子中的两部分都有,因此都与有关。为了表述简活,我们令:根据链式法则有:其巾((和和都是标量和()是两个()向量的内积,的每一个分量都是()的函数。接下来计算和这里的一是个向量,衣示的每个分量分别对求导。当时有:后面个分量相对于求导变量(都是常数。类似的当时有:()0)(()和时的结果以此类推。综合起来有:同理有:()十如果令合并得到()()[()-)。()。()写成矩阵形式为()最后计算偏置项的偏导数()类似的我们得到:合并后得到()写成矩阵形式为:(0)至此,我得到了这个简单网络对所有参数的偏导数,接下来我们将这种做法推广到更般的情况。从上面的结果可以看岀一个规律,输出层的权重矩阵和偏置向量梯度计算公式中共用了()-)()对」隐含层也有类似的结果完整的算法现在考虑一般的情况。假设有个训练样本(),其中为输入向量,为标签向量。训练的目标是最小化样木标签值与神经网络预测值之闩的误差,如果使用均方误差,则优化的目标为:其中为神经网络所有参数的集合,包括各层的权重和偏置。这个最优化问题是·个不带约束条件的问题,可以用梯度下降法求解。上面的误差函数定义在整个训练样本集上,梯度下降法每一次迭代利用了所有训练样本,称为批量棁度卜降法。如果样木数量很大,每次迭代都用所有样木进计算成木太高。为了解决这个问题,可以采用单样本梯度下降法,我们将上面的损失函数写成对单个样本的损失函数之和:定义对单个样本()的损失函数为)=-()如果采用单个样本进行迭代,梯度下降法第次迭代时参数的更新公式为:nV如果要用所有样本进行迭代,根据单个样本的损失函数梯度计算总损失梯度即可,即所有样本梯度的均值用梯度下降法求解需要初始化优化变量的值。一般初始化为一个随机数,如用正态分布(a)产生这些随机数,其中G是一个很小的正数到日前为止还有一个关键问题没有解决:日标函数是一个多层的复合函数,因为神经网络中每一层都有权重矩阵和偏置向量,且每一层的输出将会作为下一层的输入。因此,直接计算损失函数对所有权重和偏置的梚度很复杂,需要使用复合函数的求导公式进行递推计算几个重要的结论在进行推导之前,我们首先来看下面几种复合函数的求导。又如下线性映射函数:其中是维向量,是×的矩阵,是维向量。问题:假设有函数,如果把看成常数,看成的函数,如何根据函数对的梯度值Ⅴ计算函数对的梯度值Ⅴ?根据链式法则,由于只和有关,和其他的≠无关,因此有:c∑(对于的所有元素有:写成矩阵形式为:问题:如果将看成常数,将看成的函数,如何根据V计算Ⅴ?由于任意的和所有的都有关系,根据链式法则有写成矩阵形式为这是一个对称的结果,在计算函数映射时用矩阵乘以向量得到,在求梯度时用矩阵的转置乘以的梯度得到的梯度。问题:如果有向量到向量的映射:

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • Matlab神经网络工具箱教-神经网络工具箱图文简介.rar
    Matlab神经网络工具箱教程-神经网络工具箱图文简介.rar神经网络工具箱教程PPT:
    2021-05-07下载
    积分:1
  • 蚁群聚类算法matlab实现
    蚁群聚类算法的matlab实现,有说明和详尽的报告。
    2020-12-04下载
    积分:1
  • ios电子阅读器完整源码
    ios电子阅读器 电子阅读器源码 完整app源码
    2021-05-06下载
    积分:1
  • winhex 模板
    一共所括102个模板文件AFP_Structured_Fields.tplBMP.tplBoot Sector FAT.tplBoot Sector FAT32.tplBoot Sector NTFS.tplCDFS Directory Entry Ascii.tplCDFS Directory Entry Unicode.tplCDFS Path Tables Ascii.tplCDFS Path Tables Unicode.tplCDFS Volume Descriptor.tplCDFS路径表.tplDalet BWF file header.txtDalet
    2020-12-05下载
    积分:1
  • 基于S变换分析电能质量.zip
    【实例简介】电能质量扰动分析分析方法主要为傅里叶、小波变换、S变换等。 基本S变换的分析电能质量扰动的程序。含暂态的暂降和稳态的谐波两种,可稍作修改更改为其他干扰状态。
    2021-11-28 00:32:22下载
    积分:1
  • 基于C#的onvif协议之抓图
    本demo,基于vs2017开发,采用C#开发语言,实现了onvif协议之抓图功能
    2020-11-27下载
    积分:1
  • Principles of Mobile Communication Third Edition
    Gordon L. Stüber著,为英文原文第三版。其第二版已经翻译成中文,即为移动通信原理(第二版),由电子工业出版社出版,但翻译水平很差。Gordon L. StuberPrinciples of mobileCommunicationThird edition② SpringerGordon L. StuberGeorgia Institute of TechnologyAtlanta gaUSAstuber @ece gatech. eduISBN978-1-4614-0363-0e-ISBN978-1-46140364-7DOI10.1007/978-1-46140364-7Springer New York Dordrecht Heidelberg londonLibrary of Congress Control Number: 2011934683C Springer Science+Business Media, LLC 2002, 2011All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher(Springer Science+ Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer softwareor by similar or dissimilar methodology now known or hereafter developed is forbiddenThe use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subjectto proprietary rightsPrinted on acid-free paperSpringerispartofSpringerScience+businessMedia(www.springer.com)To Alana, our son nickolasmy father lothar Stuiberand my late mother Beatrice stiiberPrefacePrinciples of Mobile Communication, third edition, is a major revision of thesecond edition. Like its earlier editions, this book provides a mathematicallyrigorous overview of physical layer wireless communications. The basic pedagogicmethodology is to include fully detailed derivations from first principles. The text isintended to provide enough principle material to guide the novice student, whilit the same time having plenty of detailed material to satisfy graduate studentsinclined to pursue research in the area. The book is intended to stress the principlesof wireless communications that are applicable to a wide array of wireless standardsIt is intended to serve as a textbook and reference for graduate students, and a usefulreference for practicing engineersOrganization of the bookChapter 1 begins with an overview that is intended to introduce a broad array ofissues relating to wireless communications. Included is a brief description of theevolution of various wireless standards, the basic concepts of cellular frequencyreuse, the land mobile radio propagation environment, link budgets, and coverageand capacity of cellular radio systemsChapter 2 provides an extensive treatment of radio propagation, since goodunderstanding of the physical wireless channel is essential for the developmentand deployment of wireless systems. The chapter begins with a treatment ofthe narrow-band faded envelope for conventional fixed-to-mobile channels foundin cellular radio systems, mobile-to-mobile channels found in mobile ad henetworks, and multiple-input multiple-output (MIMO) channels where multipleantennas are used at both the transmitter and receiver to achieve high capacity afterconsidering the narrow-band channel, we consider the statistical treatment of wide-band channels. The emulation of wireless channels is essential for the developmentand testing of wireless systems, and the chapter provides a detailed discussion ofchannel simulation techniques. Finally, the chapter concludes with a discussion ofshadowing and path loss models for land mobile radio environmentsPreraceChapter 3 provides a detailed treatment of co-channel interference which is therimary impairment in spectrally efficient cellular frequency reuse systems. Veryoften the receivers in such systems are affected by multiple co-channel interferersand the probability distribution of the total interfering power is considered. Thchapter also considers the link outage performance due to co-channel interferencein a variety of wireless environmentsChapter 4 covers the various types of modulation schemes that are used in mobilecommunication systems along with their spectral characteristics. The chapter beginswith the mathematical representation of bandpass modulated signals, along withNyquist pulse shaping. Later, a large variety of modulation schemes used in wirelesssystems are considered, including both single-carrier and multi-carrier modulation,and both linear and nonlinear modulation techniques This is followed by a treatmentof the power density spectrum of modulated signals. Although quite mathematicalin nature, power spectrum is an important topic, since wireless systems are requiredto operate within a specified out-of-band emission maskChapter 5 discusses the error probability performance of various digital modulation schemes on narrow-band fat fading channels. The performance is evaluatedwith a variety of receiver structures, including coherent detectors, differentialcoherent detectors and noncoherent detectorsChapter 6 includes a treatment of multi-antenna techniques for combatingenvelope fading. The chapter includes a discussion of various diversity combiningtechniques for coherent, differentially coherent, and noncoherent receiver detectionof signals on fading channels with additive white gaussian noise. also consideredis optimal combining which is effective when the primary additive impairment isco-channel interference rather than noise. Finally, the chapter considers the use ofmultiple antennas at the transmitter in the context of classical beam-forming andtransmit diversityChapter 7 provides an extensive treatment of digital signaling on intersymbolinterference(ISI) channels that are typical of broadband land mobile radio systemsThe chapter begins with the characterization of Isi channels and goes on todiscuss techniques for combating ISI based on symbol-by-symbol equalization andsequence estimation. Later, error probability for maximum likelihood sequenceestimation is considered. The chapter concludes with a discussion of co-channeldemodulation for the purpose of mitigating co-channel interference on Isi channelsChapter 8 covers error control coding techniques for wireless systems. Thechapter begins with a discussion of basic block coding including space-time blockcodes Convolutional coding is considered next along with the Viterbi and BCJRalgorithms for decoding convolutional codes, followed by trellis coded modulationThe chapter then provides a detailed discussion on the design and performanceanalysis of convolutional and trellis codes for awgn channels, and interleaved fatfading channels, and fading isi channels. Later, space-time trellis codes are treatedand the chapter concludes with Turbo codinChapter 9 is devoted to spread spectrum techniques The chapter begins with anintroduction to direct sequence and frequency hop spread spectrum. This is followedy a detailed treatment of spreading sequences. Also included is a discussionPrefaceof the effects of tone interference on direct sequence spread spectrum, and theRAKE receiver performance on wide-band channels. The chapter wraps up witha discussion of cdma multiuser detectionChapter 10 is devoted to multi-carrier techniques. It considers the performanceof ofdm on frequency-selective channels and considers the effect of residualISI and problem of residual ISI cancellation. Later, the chapter examines singlecarrier frequency-domain equalization(SC-FDE)techniques. This is followed by atreatment of orthogonal frequency division multiple access (OFDMA)on both theforward and reverse links. The chapter concludes with a discussion of single-carrierfrequency division multiple acceSs (SC-FDMA)Chapter 11 considers frequency planning techniques for cellular systems. Thechapter begins with a discussion of cell sectoring, cell splitting, and reuse partition-ing. Later, the chapter considers radio planning for OFDMa cellular systems. Thisis followed by hierarchical overlay/underlay architectures based on cluster planningFinally, the chapter wraps up with macrodiversity TDMA cellular architecturesChapter 12 considers CDMa considers CDMA cellular systems, consideringtopics such as capacity and power control This is followed by a discussion ofhierarchical macrodiversity CDMa architectures and their performanceChapter 13 is devoted to cellular radio resource management. The chapter beginswith an introduction to basic hard and soft handoff. Later, the chapter considers theimportant problem of link quality evaluation, including signal strength averaging,velocity estimation, and velocity adaptive hard handoff algorithms later, a detailedanalysis of hard and soft handoff is provided. Finally, the chapter wraps up withmethods for estimating received carrier-to-interference plus noise ratio(CINR)The Appendix includes a brief and focused tutorial discussion of probabilityand random processes. A good understanding of the material in the Appendix isessential, since the concepts are widely used throughout the textUSing This Book for InstructionThe book has been developed from a graduate-level course on physical wirelesscommunications that I have taught at Georgia Tech since 1993. Normally, I prefera graduate-level course in digital communications as a prerequisite for this courseHowever, such a prerequisite may be waived to the extent that there is extensivebackground material in each chapter. A course may cover the introductory materialin each chapter and skip the more specialized material. In my own classes, I alwaystry to judge the mathematical level of the students early and adapt accordinglyThe book obviously contains far too much material to be taught in a onesemester course. However, i believe that it can serve as a suitable text in mostsituations through the appropriate instructor selection of background sections. Myown preference for a one semester course is to include the following material inorder: Chap. 1, Chap 2(skipping the more advanced material), and the first twosections of Chap 3. In moving to modulation waveforms in Chap. 4, an instructormay have to treat/assume basic signal-space representation. However, most students
    2020-12-09下载
    积分:1
  • 小波分析理论与Matlab7实现(书及代码)
    小波分析理论与matlab7实现,飞思科技产品研发中心编著,共十九章
    2020-12-10下载
    积分:1
  • 1993IPIXdata
    1993,IPIX海杂波数据,需要的可以下载
    2020-12-08下载
    积分:1
  • 基于DCT的图像压缩码算法的MATLAB实现
    基于DCT的图像压缩编码算法的MATLAB实现
    2020-06-05下载
    积分:1
  • 696518资源总数
  • 106155会员总数
  • 8今日下载