密度函数估计是统计学中的一个基本问题。密度估计分为参数估计和非参数估计,对于后者,直方图估计、核估计以及k-近邻估计等都是研究的重点[1-2]。随着小波理论的完善,加上其具有诸如正交性、紧支性、多分辨分析(MRA)等优良特性,使得小波分析的应用成为近年来非参数统计与计量研究的热点。1988年,Doukhan[3]首先提出了小波密度估计的概念。随后,许多学者对此进行研究,并给出了收敛阶的证明[4-5]。但由于单小波不能同时满足正交性、对称性及紧支性,在实际应用中造成了很大困扰,基于此,双正交小波和多小波相继被提出。
Locke等[6]首次将多小波应用到密度函数估计中, 并给出了估计器表达式,最后通过模拟实验对比了小波与多小波估计的结果。黄守勇等[7]给出了线性多小波密度估计的收敛阶及证明。本文基于文献[6]提出的多小波密度估计,结合吕军等[8]给出的一类二元多小波构造方法,提出了一类二元多小波密度估计,并证明其在积分均方误差(MISE)意义下存在收敛上界。仿真和实例数据的实验结果证明了方法和估计的可行性。
1 小波密度估计 1.1 二元单小波密度估计假设{(Xi, Yi)}i=1n是二元随机变量(X, Y)的样本观测值,f(x, y)为其密度函数,且f(x, y)∈L2(R2),L2(R2)为R2中平方可积函数的全体。应用一元小波函数的张量积构造空间L2(R2)的二元小波函数,定义尺度函数φ(x, y)和小波函数ψi(x, y)分别为
$ \begin{array}{*{20}{l}} {\varphi (x,y) = \varphi (x)\varphi (y)}\\ {{\psi _i}(x,y) = \left\{ {\begin{array}{*{20}{l}} {\varphi (x)\psi (y),\;\:l = 1}\\ {\varphi (y)\psi (x),\;\:l = 2}\\ {\psi (x)\psi (y),\;\:l = 3} \end{array}} \right.} \end{array} $ |
记
$ \begin{array}{*{20}{l}} {{\varphi _{j,{k_1},{k_2}}}(x,y) = {2^j}\varphi (x - {k_1},y - {k_2})}\\ {{\psi _{l,j,{k_1},{k_2}}}(x,y) = {2^j}{\psi _i}(x - {k_1},y - {k_2})} \end{array} $ |
式中,j为分辨率水平,k1、k2为平移参数,且j∈Z,k1, k2∈Z。
对任意的整数J>j0,j0∈Z,由多分辨分析定义[9]可知,二元实值函数f(x, y)∈L2(R2)在VJ空间的投影可展开为
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {P_J}f(x,y) = \sum\limits_{{k_1},{k_2}} {{\alpha _{J,{k_1},{k_2}}}} {\varphi _{J,{k_1},{k_2}}}(x,y) = \\ \sum\limits_{{k_1},{k_2}} {{\alpha _{{j_0},{k_1},{k_2}}}} {\varphi _{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{\beta _{l,j,{k_1},{k_2}}}} } } \cdot \\ {\psi _{l,j,{k_1},{k_2}}}(x,y) \end{array} $ | (1) |
由尺度函数和小波函数的正交性可得
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\alpha _{{j_0},{k_1},{k_2}}} = \langle f(x,y),{\varphi _{{j_0},{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\varphi _{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\beta _{l,j,{k_1},{k_2}}} = \langle f(x,y),{\psi _{l,j,{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\psi _{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y \end{array} $ |
由数学期望定义可得
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\alpha _{{j_0},{k_1},{k_2}}} = \int {\int {f(x,y)} } {\varphi _{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = \\ E({\varphi _{{j_0},{k_1},{k_2}}}(X,Y))\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\beta _{l,j,{k_1},{k_2}}} = \int {\int {f(x,y)} } {\psi _{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = \\ E({\psi _{l,j,{k_1},{k_2}}}(X,Y)) \end{array} $ |
设抽样数据为{(Xi, Yi)}i=1n,则由矩估计可得系数估计值为
$ \begin{array}{*{20}{l}} {{{\hat \alpha }_{{j_0},{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\varphi _{{j_0},{k_1},{k_2}}}} ({X_i},{Y_i})}\\ {{{\hat \beta }_{l,j,{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\psi _{l,j,{k_1},{k_2}}}} ({X_i},{Y_i})} \end{array} $ |
得到式(1)的样本估计为
$ \begin{array}{*{20}{l}} {{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {{\hat f}_J}(x,y) = \sum\limits_{{k_1},{k_2}} {{{\hat \alpha }_{J,{k_1},{k_2}}}} {\varphi _{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {{{\hat \alpha }_{{j_0},{k_1},{k_2}}}} \cdot }\\ {{\varphi _{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{{\hat \beta }_{l,j,{k_1},{k_2}}}} } } {\psi _{l,j,{k_1},{k_2}}}(x,y)} \end{array} $ | (2) |
多小波是一种特殊的小波,它的基函数由向量函数构成[10-12]。假设多尺度向量函数和多小波向量函数分别为
$ \begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }} = {{[{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^1},{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^2}, \cdots ,{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^r}]}^{\rm{T}}},r \in {\bf{Z}}}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }} = {{[{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^1},{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^2}, \cdots ,{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^r}]}^{\rm{T}}},r \in {\bf{Z}}} \end{array} $ |
这里一类二元多尺度函数和一类二元多小波函数的构造采用文献[8]的方法,即
$ \begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x,y) = \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x)\varphi (y)}\\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_l}(x,y) = \left\{ {\begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x)\psi (y),}&{l = 1}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}(x)\varphi (y),}&{l = 2}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}(x)\psi (y),}&{l = 3} \end{array}} \right.} \end{array} $ |
记
$ \begin{array}{*{20}{l}} {{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{j,{k_1},{k_2}}}(x,y) = {2^j}\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x - {k_1},y - {k_2})}\\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y) = {2^j}{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_l}(x - {k_1},y - {k_2})} \end{array} $ |
定义1 r重的多分辨分析是L2(R2)中满足以下条件的闭子空间VJ的嵌套序列,即:
(1) Vj∈Vj+1,j∈Z;
(2)
(3) h(x, y)∈Vj
(4) h(x, y)∈Vj
(5) 存在r个函数Φ1(x, y),Φ2(x, y),…,Φr(x, y),使得{Φw(x-k1, y-k2),1≤w≤r, k1,k2∈Z}是空间V0的标准正交基。
令Wj是Vj+1中关于VJ的正交补空间,则L2(R2)能分解为空间Wj的直和,即
$ {L^2}({\mathit{\boldsymbol{R}}^2}) = \mathop \oplus \limits_{j \in {\bf{Z}}} {W_j} $ |
故任意的二元实值函数f(x, y)∈L2(R2)在VJ空间的投影可展开为
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {P_J}f = \sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \alpha }}}} _{J,{k_1},{k_2}}^{\rm{T}}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \alpha }}}} _{{j_0},{k_1},{k_2}}^{\rm{T}}} \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^T} } } {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}}(x,y) = \\ \sum\limits_{{k_1},{k_2}} {{{\left( {\begin{array}{*{20}{c}} {\alpha _{{j_0},{k_1},{k_2}}^1}\\ \vdots \\ {\alpha _{{j_0},{k_1},{k_2}}^r} \end{array}} \right)}^{\rm{T}}}} \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(x,y)}\\ \vdots \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(x,y)} \end{array}} \right) + \\ \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{{\left( {\begin{array}{*{20}{c}} {\beta _{l,j,{k_1},{k_2}}^1}\\ \vdots \\ {\beta _{l,j,{k_1},{k_2}}^r} \end{array}} \right)}^{\rm{T}}}} } } \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^d(x,y)}\\ \vdots \\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y)} \end{array}} \right) \end{array} $ | (3) |
由多尺度函数和多小波函数的正交性可得
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\mathit{\boldsymbol{\underline \alpha }} _{{j_0},{k_1},{k_2}}} = \langle f(x,y),{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = E({\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(X,Y)) \buildrel \Delta \over = \\ \left( {\begin{array}{*{20}{c}} {E(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(X,Y))}\\ \vdots \\ {E(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(X,Y))} \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {\int {\int {f(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(x,y){\rm{d}}x{\rm{d}}y}\\ \vdots \\ {\int {\int {(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(x,y){\rm{d}}x{\rm{d}}y)} \end{array}} \right)\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\mathit{\boldsymbol{\underline \beta }} _{l,j,{k_1},{k_2}}} = \langle f(x,y),{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = E({\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(X,Y)) \buildrel \Delta \over = \\ \left( {\begin{array}{*{20}{c}} {E(\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^1(X,Y))}\\ \vdots \\ {E(\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^r(X,Y))} \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {\int {\int {f(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^1(x,y){\rm{d}}x{\rm{d}}y}\\ \vdots \\ {\int {\int {(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^r(x,y){\rm{d}}x{\rm{d}}y)} \end{array}} \right) \end{array} $ |
设抽样数据为{(Xi, Yi)}i=1n,同样由矩估计得到系数估计值为
$ \begin{array}{*{20}{l}} {{{\underline {\hat \alpha } }_{{j_0},{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}} ({X_i},{Y_i})}\\ {{{\underline {\hat \beta } }_{l,j,{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}} ({X_i},{Y_i})} \end{array} $ |
从而得到式(3)的样本估计为
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {{\hat f}_J}(x,y) = \sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \alpha }}} _{J,{k_1},{k_2}}^{\rm{T}}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \alpha }}} _{{j_0},{k_1},{k_2}}^{\rm{T}}} \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y) \end{array} $ | (4) |
引理1[4] 若定义在L2(R2)的二元实值函数f(x, y)∈Bp, ms(M),M为球半径,当且仅当存在一个常数M*,M*依赖于M,使得f(x, y)的小波系数满足
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {2^{{j_0}(1 - 2/p)}}{\left( {\sum\limits_{{k_1},{k_2} \in {\bf{Z}}} | {\alpha _{{j_0},{k_1},{k_2}}}{|^p}} \right)^{1/p}} + \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\left( {\sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^\infty {{{\left( {{2^{j(s + 1 - 2/p)}}{{\left( {\sum\limits_{{k_1},{k_2} \in \mathit{\boldsymbol{Z}}} | {\beta _{j,{k_1},{k_2},l}}{|^p}} \right)}^{1/p}}} \right)}^m}} } } \right)^{1/m}} \le \\ {M^*} < \infty \end{array} $ |
其中s为平滑参数,p、m为空间的范数指标,且s>0,1≤p≤∞,1≤m≤∞。
引理2 假设{(Xi, Yi)}i=1n是二元随机变量的样本观测值,J>j0,k1, k2∈Z,1≤w≤r,则
证明:
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E[\hat \alpha _{J,{k_1},{k_2}}^w] = E\left[ {\frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i})} \right] = \\ E[\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w({X_1},{Y_1})] = \int {\int {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} } (x,y)f(x,y){\rm{d}}x{\rm{d}}y = \\ \alpha _{J,{k_1},{k_2}}^w \end{array} $ |
综上可得
$ E[\hat \alpha _{J,{k_1},{k_2}}^w] = \alpha _{J,{k_1},{k_2}}^w $ |
引理3 假设二元随机变量的密度函数有上界,且J>j0,k1, k2∈Z,1≤w≤r,则存在常数C>0,使
证明:
由1.2节尺度系数估计值可知
$ \hat \alpha _{J,{k_1},{k_2}}^w = \frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i}) $ |
由引理2可得
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E{(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} = E{(\hat \alpha _{J,{k_1},{k_2}}^w - E(\hat \alpha _{J,{k_1},{k_2}}^w))^2} = \\ {\rm{Var}}(\hat \alpha _{J,{k_1},{k_2}}^w) \end{array} $ |
应用Holder不等式可得
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{Var}}(\hat \alpha _{J,{k_1},{k_2}}^\omega ) = {\rm{Var}}\left( {\frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i})} \right) = \\ \begin{array}{*{20}{l}} {\frac{1}{n} {\rm{Var}} (\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y)) = \frac{1}{n}[E{{(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y))}^2} - }\\ {(E{{(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y))}^2}] = \frac{1}{n}\left[ {\int {\int {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} } (x,y){)^2} \cdot } \right.} \end{array}\\ \left. {f(x,y){\rm{d}}x{\rm{d}}y - {{(\alpha _{J,{k_1},{k_2}}^w)}^2}} \right] \le \frac{1}{n}\left[ {\int {\int {(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(} } x,y){)^2} \cdot } \right.\\ \left. {{\rm{d}}x{\rm{d}}y{{\left\| {f(x,y)} \right\|}_\infty } - {{(\alpha _{J,{k_1},{k_2}}^w)}^2}} \right] \le C\frac{1}{n} \end{array} $ |
综上所述
$ E{(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} \le C\frac{1}{n} $ |
定理1 假设二元随机变量(X, Y)的密度函数有上界,且f(x, y)∈Bp, ms(M),其中s>0,p=2,m≥1,J满足2J≤n1/(2s+2),1≤w≤r,则存在常数C>0,使得
$ E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le C{n^{ - 2s/(2 + 2s)}} $ |
证明:
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le \left\| {f(x,y) - {P_y}f} \right\|_2^2 + \\ E\left\| {{P_J}f - {{\hat f}_J}(x,y)} \right\|_2^2 = A + B \end{array} $ |
其中
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A = \left\| {f(x,y) - {P_J}f} \right\|_2^2 = \left\| {\sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } \cdot } \right.\\ \left. {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}} \right\|_2^2 = \sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } \langle {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}},{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}\rangle \cdot \\ {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}} = \sum\limits_{w = 1}^r {\sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\beta _{l,j,{k_1},{k_2}}^w} } } } \beta _{l,j,{k_1},{k_2}}^w \end{array} $ |
由引理1可得
$ A \le C{2^{ - 2Js}} $ |
经计算有
$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} B = E\left\| {{P_J}f - {{\hat f}_J}(x,y)} \right\|_2^2 = E\left\| {\sum\limits_{{k_1},{k_2}} ( {{\underline {\mathit{\boldsymbol{\hat \alpha }}} }_{J,{k_1},{k_2}}} - } \right.\\ \begin{array}{*{20}{l}} {\left. {{{\underline{\mathit{\boldsymbol{ \alpha }} }}_{J,{k_1},{k_2}}}{)^{\rm{T}}}{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y)} \right\|_2^2 = E\int {\int {({P_J}f - {{\hat f}_J}(} } x,y){)^2} \cdot }\\ {{\rm{d}}x{\rm{d}}y = \sum\limits_{{k_1},{k_2}} E ({{({{\underline {\mathit{\boldsymbol{\hat \alpha }}} }_{J,{k_1},{k_2}}} - {{\underline{\mathit{\boldsymbol{ \alpha }}} }_{J,{k_1},{k_2}}})}^{\rm{T}}}(\langle {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}},{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}\rangle ) \cdot } \end{array}\\ ({\underline {\mathit{\boldsymbol{\hat \alpha }}} _{J,{k_1},{k_2}}} - {\underline {\mathit{\boldsymbol{\alpha }}} _{J,{k_1},{k_2}}})) = \sum\limits_{w = 1}^r {\sum\limits_{{k_1},{k_2}} E } {(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} \end{array} $ |
根据引理3可得
$ B \le C{2^{2J}}\frac{1}{n} $ |
又因为2J≤n1/(2s+2),所以有
$ E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le C{n^{ - 2s/(2 + 2s)}} $ |
本节通过模拟及实例说明提出方法的可行性,并通过均方根误差值来对比小波密度估计及多小波密度估计的优劣。均方根误差定义为
$ {E_{{\rm{RMS}}}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^n {(\hat f(} {X_i},{Y_i}) - f({X_i},{Y_i}){)^2}} $ |
例1 设二元随机变量(X, Y)服从均匀分布,其中x∈[0, 1],y∈[0, 1],密度函数为
$ f(x,y) = \left\{ {\begin{array}{*{20}{l}} {1,}&{0 \le x \le 1,0 \le y \le 1}\\ {0,}&{{\rm{其他}}} \end{array}} \right. $ |
本例选取CL2*Db4(*表示乘积)构成的二元多小波以及Db4*Db8构成的二元单小波,对密度函数进行估计,分辨率水平取J=4,样本量n=10 000。
图 1为均匀分布的真实的密度函数图像,图 2和图 3分别为多小波及单小波估计的图像,其中,多小波误差为0.164,单小波误差为0.388。由图 1~3可以看出,两种估计均能真实地描述服从均匀分布随机数据的规律,但由于尺度函数的紧支性,当平移向量很大时,图像在边界处渐进有偏,且多小波的偏离程度明显小于单小波。
例2 设二元随机变量(X, Y)服从正态分布,其中x∈[0, 1],y∈[0, 1],密度函数为
$ f(x,y) = \frac{{\frac{1}{{2\pi /36}}{{\rm{e}}^{ - \frac{{{{(x - 0.5)}^2} + {{(\gamma - 0.5)}^2}}}{{2/36}}}}}}{{0.997{\kern 1pt} {\kern 1pt} {\kern 1pt} {4^2}}} $ |
本例选取STT*Db4和CL2*Db4构成的两个二元多小波以及Db4*Db8构成的二元单小波,对服从正态分布的数据进行仿真实验, 分辨率水平J=4,样本量n=10 000。
图 4表示真实的正态分布密度函数,图 5~7分别为不同基函数的多小波及单小波估计的密度函数图像。从图 4~7可以看出,对于边界处为0的正态分布,多小波估计具有较好的估算精度,能够客观地反映出数据的分布规律,且边界处拟合度更佳。由表 1和图 8可以得出,随样本量的增加,多小波密度估计的误差和运行时间总是小于单小波,且在大样本下优势更加明显。
从线性表达式(式(2)和(4))可以看出,密度函数的信息包含在系数和基函数中,所以估计结果的质量取决于分辨率水平J的选取。表 2给出了不同分辨率水平对估计误差值的影响,可以看出,随着分辨率水平J的增加,估计的误差值先变小后变大,且对于不同的基函数,最优分辨率水平不同。
例3 实例分析中,二元多小波可以用来估计美国黄石公园中喷泉喷发时长和间隔时长的密度函数,该数据集可在www.geyserstudy.org/geyser.aspx?pGeyserNo=OLDFAITHFU上公开获取。本例选取n=1 922个样本,基函数为CL2*Db4构成的二元多小波,分辨率水平J=3,结果如图 9~11所示。
图 9和图 10分别为喷泉的间隔时长和喷发时长分布直方图,其中横坐标的时长均进行了归一化处理,图 11为相应的密度估计图像。本例选取的样本量较少,且数据分布随机性更强,不再服从某一已知的分布函数。从图 9~11可以看出,一类二元多小波密度估计图像与喷泉的直方图趋势吻合,能够客观地反映出数据的真实分布规律,说明该方法也适用于更一般的数据分析,在实际应用中是有效的。
4 结论本文研究了一类二元多小波函数进行概率密度估计的问题,给出了线性多小波估计器,并且证明其在积分均方误差意义下存在收敛上界。在仿真实验中,通过选取不同分布的二元数据进行多小波密度估计,并与单小波进行对比,验证了提出方法能够较好地反映数据的真实分布规律且在某些条件下优于单小波。最后,对实例的数据分析结果表明本文方法在实际应用中是有效的。
[1] |
HARVEY A, ORYSHCHENKO V. Kernel density estimation for time series data[J]. International Journal of Forecasting, 2012, 28(1): 3-14. |
[2] |
KUNG Y H, LIN P S, KAO C H. An optimal k-nearest neighbor for density estimation[J]. Statistics and Probability Letters, 2012, 82(10): 1786-1791. DOI:10.1016/j.spl.2012.05.017 |
[3] |
DOUKHAN P. Formes de Toeplitz associées à une analyse multi-échelle[J]. C R Acad Sci Paris, 1988, 306: 663-666. |
[4] |
CHESNEAU C, DEWAN I, DOOSTI H. Nonparametric estimation of a two dimensional continuous-discrete density function by wavelets[J]. Statistical Methodology, 2014, 18: 64-78. DOI:10.1016/j.stamet.2013.09.002 |
[5] |
SHIRAZI E, DOOSTI H. Multivariate wavelet-based density estimation with size-biased data[J]. Statistical Methodology, 2015, 27: 12-19. DOI:10.1016/j.stamet.2015.05.002 |
[6] |
LOCKE J B, PETER A M. Multiwavelet density estimation[J]. Applied Mathematics and Computation, 2013, 219(11): 6002-6015. DOI:10.1016/j.amc.2012.11.099 |
[7] |
黄守勇, 朱炳科, 李旭光, 等. 多小波密度函数估计方法[J]. 北京化工大学学报(自然科学版), 2015, 42(3): 125-128. HUANG S Y, ZHU B K, LI X G, et al. Estimation methodology for a multiwavelet density function[J]. Journal of Beijing University of Chemical Technology (Natural Science), 2015, 42(3): 125-128. (in Chinese) |
[8] |
吕军, 石晓煜, 轩亚男, 等. r重二元正交多小波的构造[J]. 山东理工大学学报(自然科学版), 2017, 31(1): 28-33, 38. LYU J, SHI X Y, XUAN Y N, et al. On the construction of r weight two element orthogonal multi-wavelet[J]. Journal of Shandong University of Technology (Natural Science Edition), 2017, 31(1): 28-33, 38. (in Chinese) DOI:10.3969/j.issn.1672-6197.2017.01.006 |
[9] |
刘明才. 小波分析及其应用[M]. 2版. 北京: 清华大学出版社, 2013: 133-136. LIU M C. Wavelet analysis and it's applications[M]. 2nd ed. Beijing: Tsinghua University Press, 2013: 133-136. (in Chinese) |
[10] |
程正兴, 张玲玲. 多小波分析与应用[J]. 工程数学学报, 2001, 18(1): 99-107. CHENG Z X, ZHANG L L. Analysis of multiwavelet and application[J]. Journal of Engineering Mathematics, 2001, 18(1): 99-107. (in Chinese) DOI:10.3969/j.issn.1005-3085.2001.01.017 |
[11] |
程正兴, 杨守志, 张玲玲. 多小波理论的发展与研究[J]. 工程数学学报, 2001, 18(5): 1-16. CHENG Z X, YANG S Z, ZHANG L L. The study and evolution of the theory of multiwavelets[J]. Journal of Engineering Mathematics, 2001, 18(5): 1-16. (in Chinese) |
[12] |
刘志刚, 曾怡达, 钱清泉. 多小波在电力系统信号消噪中的应用[J]. 中国电机工程学报, 2004, 24(1): 30-34. LIU Z G, ZENG Y D, QIAN Q Q. Denoising of electric power system signals based on different multiwavelets[J]. Proceedings of the CSEE, 2004, 24(1): 30-34. (in Chinese) DOI:10.3321/j.issn:0258-8013.2004.01.006 |