文章快速检索     高级检索
  北京化工大学学报(自然科学版)  2020, Vol. 47 Issue (4): 121-127   DOI: 10.13543/j.bhxbzr.2020.04.018
0

引用本文  

王淑媛, 崔丽鸿. 基于一类二元多小波函数的密度估计[J]. 北京化工大学学报(自然科学版), 2020, 47(4): 121-127. DOI: 10.13543/j.bhxbzr.2020.04.018.
WANG ShuYuan, CUI LiHong. Estimation of a density function based on a class of two-dimensional multiwavelets[J]. Journal of Beijing University of Chemical Technology (Natural Science), 2020, 47(4): 121-127. DOI: 10.13543/j.bhxbzr.2020.04.018.

第一作者

王淑媛, 女, 1993年生, 硕士生.

通信联系人

崔丽鸿, E-mail:mathcui@163.com

文章历史

收稿日期:2020-01-08
基于一类二元多小波函数的密度估计
王淑媛 , 崔丽鸿     
北京化工大学 数理学院, 北京 100029
摘要:基于独立同分布的随机样本提出了一类二元多小波密度估计的方法。首先,给出了线性多小波估计器;然后,在Besov球(Bp,msM))上给出了线性估计在积分均方误差(MISE)意义下的收敛上界,并给予证明;最后,通过仿真和实例数据进行实验验证,说明方法和估计的有效性。
关键词一类二元多小波    密度估计    收敛阶    
Estimation of a density function based on a class of two-dimensional multiwavelets
WANG ShuYuan , CUI LiHong     
College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing 100029, China
Abstract: We present a method for estimating a class of two-dimensional multiwavelets for a density function based on i.i.d. random samples. Firstly, we propose a linear multiwavelet estimator. Then, we propose and prove the upper bound of convergence in the sense of the integrated mean square error (MISE) for the proposed linear estimator in the Besov space, Bp, ms(M). Finally, we provide simulated and real experimental date to illustrate the effectiveness of the proposed method and estimator.
Key words: a class of two-dimensional multiwavelet    density estimation    order of convergence    
引言

密度函数估计是统计学中的一个基本问题。密度估计分为参数估计和非参数估计,对于后者,直方图估计、核估计以及k-近邻估计等都是研究的重点[1-2]。随着小波理论的完善,加上其具有诸如正交性、紧支性、多分辨分析(MRA)等优良特性,使得小波分析的应用成为近年来非参数统计与计量研究的热点。1988年,Doukhan[3]首先提出了小波密度估计的概念。随后,许多学者对此进行研究,并给出了收敛阶的证明[4-5]。但由于单小波不能同时满足正交性、对称性及紧支性,在实际应用中造成了很大困扰,基于此,双正交小波和多小波相继被提出。

Locke等[6]首次将多小波应用到密度函数估计中, 并给出了估计器表达式,最后通过模拟实验对比了小波与多小波估计的结果。黄守勇等[7]给出了线性多小波密度估计的收敛阶及证明。本文基于文献[6]提出的多小波密度估计,结合吕军等[8]给出的一类二元多小波构造方法,提出了一类二元多小波密度估计,并证明其在积分均方误差(MISE)意义下存在收敛上界。仿真和实例数据的实验结果证明了方法和估计的可行性。

1 小波密度估计 1.1 二元单小波密度估计

假设{(Xi, Yi)}i=1n是二元随机变量(X, Y)的样本观测值,f(x, y)为其密度函数,且f(x, y)∈L2(R2),L2(R2)为R2中平方可积函数的全体。应用一元小波函数的张量积构造空间L2(R2)的二元小波函数,定义尺度函数φ(x, y)和小波函数ψi(x, y)分别为

$ \begin{array}{*{20}{l}} {\varphi (x,y) = \varphi (x)\varphi (y)}\\ {{\psi _i}(x,y) = \left\{ {\begin{array}{*{20}{l}} {\varphi (x)\psi (y),\;\:l = 1}\\ {\varphi (y)\psi (x),\;\:l = 2}\\ {\psi (x)\psi (y),\;\:l = 3} \end{array}} \right.} \end{array} $

$ \begin{array}{*{20}{l}} {{\varphi _{j,{k_1},{k_2}}}(x,y) = {2^j}\varphi (x - {k_1},y - {k_2})}\\ {{\psi _{l,j,{k_1},{k_2}}}(x,y) = {2^j}{\psi _i}(x - {k_1},y - {k_2})} \end{array} $

式中,j为分辨率水平,k1k2为平移参数,且jZk1, k2Z

对任意的整数J>j0j0Z,由多分辨分析定义[9]可知,二元实值函数f(x, y)∈L2(R2)在VJ空间的投影可展开为

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {P_J}f(x,y) = \sum\limits_{{k_1},{k_2}} {{\alpha _{J,{k_1},{k_2}}}} {\varphi _{J,{k_1},{k_2}}}(x,y) = \\ \sum\limits_{{k_1},{k_2}} {{\alpha _{{j_0},{k_1},{k_2}}}} {\varphi _{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{\beta _{l,j,{k_1},{k_2}}}} } } \cdot \\ {\psi _{l,j,{k_1},{k_2}}}(x,y) \end{array} $ (1)

由尺度函数和小波函数的正交性可得

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\alpha _{{j_0},{k_1},{k_2}}} = \langle f(x,y),{\varphi _{{j_0},{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\varphi _{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\beta _{l,j,{k_1},{k_2}}} = \langle f(x,y),{\psi _{l,j,{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\psi _{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y \end{array} $

由数学期望定义可得

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\alpha _{{j_0},{k_1},{k_2}}} = \int {\int {f(x,y)} } {\varphi _{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = \\ E({\varphi _{{j_0},{k_1},{k_2}}}(X,Y))\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\beta _{l,j,{k_1},{k_2}}} = \int {\int {f(x,y)} } {\psi _{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = \\ E({\psi _{l,j,{k_1},{k_2}}}(X,Y)) \end{array} $

设抽样数据为{(Xi, Yi)}i=1n,则由矩估计可得系数估计值为

$ \begin{array}{*{20}{l}} {{{\hat \alpha }_{{j_0},{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\varphi _{{j_0},{k_1},{k_2}}}} ({X_i},{Y_i})}\\ {{{\hat \beta }_{l,j,{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\psi _{l,j,{k_1},{k_2}}}} ({X_i},{Y_i})} \end{array} $

得到式(1)的样本估计为

$ \begin{array}{*{20}{l}} {{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {{\hat f}_J}(x,y) = \sum\limits_{{k_1},{k_2}} {{{\hat \alpha }_{J,{k_1},{k_2}}}} {\varphi _{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {{{\hat \alpha }_{{j_0},{k_1},{k_2}}}} \cdot }\\ {{\varphi _{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{{\hat \beta }_{l,j,{k_1},{k_2}}}} } } {\psi _{l,j,{k_1},{k_2}}}(x,y)} \end{array} $ (2)
1.2 一类二元多小波密度估计

多小波是一种特殊的小波,它的基函数由向量函数构成[10-12]。假设多尺度向量函数和多小波向量函数分别为

$ \begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }} = {{[{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^1},{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^2}, \cdots ,{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}^r}]}^{\rm{T}}},r \in {\bf{Z}}}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }} = {{[{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^1},{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^2}, \cdots ,{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}^r}]}^{\rm{T}}},r \in {\bf{Z}}} \end{array} $

这里一类二元多尺度函数和一类二元多小波函数的构造采用文献[8]的方法,即

$ \begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x,y) = \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x)\varphi (y)}\\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_l}(x,y) = \left\{ {\begin{array}{*{20}{l}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x)\psi (y),}&{l = 1}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}(x)\varphi (y),}&{l = 2}\\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}(x)\psi (y),}&{l = 3} \end{array}} \right.} \end{array} $

$ \begin{array}{*{20}{l}} {{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{j,{k_1},{k_2}}}(x,y) = {2^j}\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}(x - {k_1},y - {k_2})}\\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y) = {2^j}{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_l}(x - {k_1},y - {k_2})} \end{array} $

定义1  r重的多分辨分析是L2(R2)中满足以下条件的闭子空间VJ的嵌套序列,即:

(1) VjVj+1jZ

(2)$L^{2}\left(\boldsymbol{R}^{2}\right)=\bigcup\limits_{j \in \bf{Z}} V_{j}, \bigcap\limits_{j \in \bf{Z}} V_{j}=0$

(3) h(x, y)∈Vj$ \Leftrightarrow $h(2x, 2y)∈Vj+1jZ

(4) h(x, y)∈Vj$ \Leftrightarrow $h(xk1, yk2)∈VjjZk1, k1Z

(5) 存在r个函数Φ1(x, y),Φ2(x, y),…,Φr(x, y),使得{Φw(xk1, yk2),1≤wr, k1k2Z}是空间V0的标准正交基。

WjVj+1中关于VJ的正交补空间,则L2(R2)能分解为空间Wj的直和,即

$ {L^2}({\mathit{\boldsymbol{R}}^2}) = \mathop \oplus \limits_{j \in {\bf{Z}}} {W_j} $

故任意的二元实值函数f(x, y)∈L2(R2)在VJ空间的投影可展开为

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {P_J}f = \sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \alpha }}}} _{J,{k_1},{k_2}}^{\rm{T}}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \alpha }}}} _{{j_0},{k_1},{k_2}}^{\rm{T}}} \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^T} } } {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}}(x,y) = \\ \sum\limits_{{k_1},{k_2}} {{{\left( {\begin{array}{*{20}{c}} {\alpha _{{j_0},{k_1},{k_2}}^1}\\ \vdots \\ {\alpha _{{j_0},{k_1},{k_2}}^r} \end{array}} \right)}^{\rm{T}}}} \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(x,y)}\\ \vdots \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(x,y)} \end{array}} \right) + \\ \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {{{\left( {\begin{array}{*{20}{c}} {\beta _{l,j,{k_1},{k_2}}^1}\\ \vdots \\ {\beta _{l,j,{k_1},{k_2}}^r} \end{array}} \right)}^{\rm{T}}}} } } \left( {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^d(x,y)}\\ \vdots \\ {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y)} \end{array}} \right) \end{array} $ (3)

由多尺度函数和多小波函数的正交性可得

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\mathit{\boldsymbol{\underline \alpha }} _{{j_0},{k_1},{k_2}}} = \langle f(x,y),{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = E({\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(X,Y)) \buildrel \Delta \over = \\ \left( {\begin{array}{*{20}{c}} {E(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(X,Y))}\\ \vdots \\ {E(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(X,Y))} \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {\int {\int {f(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^1(x,y){\rm{d}}x{\rm{d}}y}\\ \vdots \\ {\int {\int {(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}^r(x,y){\rm{d}}x{\rm{d}}y)} \end{array}} \right)\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\mathit{\boldsymbol{\underline \beta }} _{l,j,{k_1},{k_2}}} = \langle f(x,y),{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y)\rangle = \int {\int {f(x,y)} } \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y){\rm{d}}x{\rm{d}}y = E({\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(X,Y)) \buildrel \Delta \over = \\ \left( {\begin{array}{*{20}{c}} {E(\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^1(X,Y))}\\ \vdots \\ {E(\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^r(X,Y))} \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {\int {\int {f(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^1(x,y){\rm{d}}x{\rm{d}}y}\\ \vdots \\ {\int {\int {(x,y)} } \mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}^r(x,y){\rm{d}}x{\rm{d}}y)} \end{array}} \right) \end{array} $

设抽样数据为{(Xi, Yi)}i=1n,同样由矩估计得到系数估计值为

$ \begin{array}{*{20}{l}} {{{\underline {\hat \alpha } }_{{j_0},{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}} ({X_i},{Y_i})}\\ {{{\underline {\hat \beta } }_{l,j,{k_1},{k_2}}} = \frac{1}{n}\sum\limits_{i = 1}^n {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}} ({X_i},{Y_i})} \end{array} $

从而得到式(3)的样本估计为

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {{\hat f}_J}(x,y) = \sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \alpha }}} _{J,{k_1},{k_2}}^{\rm{T}}} {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y) = \sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \alpha }}} _{{j_0},{k_1},{k_2}}^{\rm{T}}} \cdot \\ {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{{j_0},{k_1},{k_2}}}(x,y) + \sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^{J - 1} {\sum\limits_{{k_1},{k_2}} {\underline {\mathit{\boldsymbol{\hat \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}(x,y) \end{array} $ (4)
2 收敛阶证明

引理1[4]  若定义在L2(R2)的二元实值函数f(x, y)∈Bp, ms(M),M为球半径,当且仅当存在一个常数M*M*依赖于M,使得f(x, y)的小波系数满足

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {2^{{j_0}(1 - 2/p)}}{\left( {\sum\limits_{{k_1},{k_2} \in {\bf{Z}}} | {\alpha _{{j_0},{k_1},{k_2}}}{|^p}} \right)^{1/p}} + \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\left( {\sum\limits_{l = 1}^3 {\sum\limits_{j = {j_0}}^\infty {{{\left( {{2^{j(s + 1 - 2/p)}}{{\left( {\sum\limits_{{k_1},{k_2} \in \mathit{\boldsymbol{Z}}} | {\beta _{j,{k_1},{k_2},l}}{|^p}} \right)}^{1/p}}} \right)}^m}} } } \right)^{1/m}} \le \\ {M^*} < \infty \end{array} $

其中s为平滑参数,pm为空间的范数指标,且s>0,1≤p≤∞,1≤m≤∞。

引理2  假设{(Xi, Yi)}i=1n是二元随机变量的样本观测值,J>j0k1, k2Z,1≤wr,则$E\left[{\hat \alpha _{J, {k_1}, {k_2}}^w} \right] = \alpha _{J, {k_1}, {k_2}}^w$

证明:

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E[\hat \alpha _{J,{k_1},{k_2}}^w] = E\left[ {\frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i})} \right] = \\ E[\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w({X_1},{Y_1})] = \int {\int {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} } (x,y)f(x,y){\rm{d}}x{\rm{d}}y = \\ \alpha _{J,{k_1},{k_2}}^w \end{array} $

综上可得

$ E[\hat \alpha _{J,{k_1},{k_2}}^w] = \alpha _{J,{k_1},{k_2}}^w $

引理3 假设二元随机变量的密度函数有上界,且J>j0k1, k2Z,1≤wr,则存在常数C>0,使$E\left(\hat{\alpha}_{J, k_{1}, k_{2}}^{ w}-\alpha_{J, k_{1}, k_{2}}^{w}\right)^{2} \leqslant C \frac{1}{n}$

证明:

由1.2节尺度系数估计值可知

$ \hat \alpha _{J,{k_1},{k_2}}^w = \frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i}) $

由引理2可得

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E{(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} = E{(\hat \alpha _{J,{k_1},{k_2}}^w - E(\hat \alpha _{J,{k_1},{k_2}}^w))^2} = \\ {\rm{Var}}(\hat \alpha _{J,{k_1},{k_2}}^w) \end{array} $

应用Holder不等式可得

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\rm{Var}}(\hat \alpha _{J,{k_1},{k_2}}^\omega ) = {\rm{Var}}\left( {\frac{1}{n}\sum\limits_{i = 1}^n {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} ({X_i},{Y_i})} \right) = \\ \begin{array}{*{20}{l}} {\frac{1}{n} {\rm{Var}} (\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y)) = \frac{1}{n}[E{{(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y))}^2} - }\\ {(E{{(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(X,Y))}^2}] = \frac{1}{n}\left[ {\int {\int {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w} } (x,y){)^2} \cdot } \right.} \end{array}\\ \left. {f(x,y){\rm{d}}x{\rm{d}}y - {{(\alpha _{J,{k_1},{k_2}}^w)}^2}} \right] \le \frac{1}{n}\left[ {\int {\int {(\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}^w(} } x,y){)^2} \cdot } \right.\\ \left. {{\rm{d}}x{\rm{d}}y{{\left\| {f(x,y)} \right\|}_\infty } - {{(\alpha _{J,{k_1},{k_2}}^w)}^2}} \right] \le C\frac{1}{n} \end{array} $

综上所述

$ E{(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} \le C\frac{1}{n} $

定理1  假设二元随机变量(X, Y)的密度函数有上界,且f(x, y)∈Bp, ms(M),其中s>0,p=2,m≥1,J满足2Jn1/(2s+2),1≤wr,则存在常数C>0,使得

$ E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le C{n^{ - 2s/(2 + 2s)}} $

证明:

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le \left\| {f(x,y) - {P_y}f} \right\|_2^2 + \\ E\left\| {{P_J}f - {{\hat f}_J}(x,y)} \right\|_2^2 = A + B \end{array} $

其中

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A = \left\| {f(x,y) - {P_J}f} \right\|_2^2 = \left\| {\sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } \cdot } \right.\\ \left. {{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}} \right\|_2^2 = \sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}^{\rm{T}}} } } \langle {\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}},{\mathit{\boldsymbol{ \boldsymbol{\varPsi} }}_{l,j,{k_1},{k_2}}}\rangle \cdot \\ {\underline{\mathit{\boldsymbol{ \beta }}} _{l,j,{k_1},{k_2}}} = \sum\limits_{w = 1}^r {\sum\limits_{l = 1}^3 {\sum\limits_{j = J}^\infty {\sum\limits_{{k_1},{k_2}} {\beta _{l,j,{k_1},{k_2}}^w} } } } \beta _{l,j,{k_1},{k_2}}^w \end{array} $

由引理1可得

$ A \le C{2^{ - 2Js}} $

经计算有

$ \begin{array}{l} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} B = E\left\| {{P_J}f - {{\hat f}_J}(x,y)} \right\|_2^2 = E\left\| {\sum\limits_{{k_1},{k_2}} ( {{\underline {\mathit{\boldsymbol{\hat \alpha }}} }_{J,{k_1},{k_2}}} - } \right.\\ \begin{array}{*{20}{l}} {\left. {{{\underline{\mathit{\boldsymbol{ \alpha }} }}_{J,{k_1},{k_2}}}{)^{\rm{T}}}{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}(x,y)} \right\|_2^2 = E\int {\int {({P_J}f - {{\hat f}_J}(} } x,y){)^2} \cdot }\\ {{\rm{d}}x{\rm{d}}y = \sum\limits_{{k_1},{k_2}} E ({{({{\underline {\mathit{\boldsymbol{\hat \alpha }}} }_{J,{k_1},{k_2}}} - {{\underline{\mathit{\boldsymbol{ \alpha }}} }_{J,{k_1},{k_2}}})}^{\rm{T}}}(\langle {\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}},{\mathit{\boldsymbol{ \boldsymbol{\varPhi} }}_{J,{k_1},{k_2}}}\rangle ) \cdot } \end{array}\\ ({\underline {\mathit{\boldsymbol{\hat \alpha }}} _{J,{k_1},{k_2}}} - {\underline {\mathit{\boldsymbol{\alpha }}} _{J,{k_1},{k_2}}})) = \sum\limits_{w = 1}^r {\sum\limits_{{k_1},{k_2}} E } {(\hat \alpha _{J,{k_1},{k_2}}^w - \alpha _{J,{k_1},{k_2}}^w)^2} \end{array} $

根据引理3可得

$ B \le C{2^{2J}}\frac{1}{n} $

又因为2Jn1/(2s+2),所以有

$ E\left\| {f(x,y) - {{\hat f}_J}(x,y)} \right\|_2^2 \le C{n^{ - 2s/(2 + 2s)}} $
3 实验分析

本节通过模拟及实例说明提出方法的可行性,并通过均方根误差值来对比小波密度估计及多小波密度估计的优劣。均方根误差定义为

$ {E_{{\rm{RMS}}}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^n {(\hat f(} {X_i},{Y_i}) - f({X_i},{Y_i}){)^2}} $

例1  设二元随机变量(X, Y)服从均匀分布,其中x∈[0, 1],y∈[0, 1],密度函数为

$ f(x,y) = \left\{ {\begin{array}{*{20}{l}} {1,}&{0 \le x \le 1,0 \le y \le 1}\\ {0,}&{{\rm{其他}}} \end{array}} \right. $

本例选取CL2*Db4(*表示乘积)构成的二元多小波以及Db4*Db8构成的二元单小波,对密度函数进行估计,分辨率水平取J=4,样本量n=10 000。

图 1为均匀分布的真实的密度函数图像,图 2图 3分别为多小波及单小波估计的图像,其中,多小波误差为0.164,单小波误差为0.388。由图 1~3可以看出,两种估计均能真实地描述服从均匀分布随机数据的规律,但由于尺度函数的紧支性,当平移向量很大时,图像在边界处渐进有偏,且多小波的偏离程度明显小于单小波。

图 1 均匀分布的密度函数 Fig.1 Density function of a uniform distribution
图 2 多小波(CL2*Db4)密度估计 Fig.2 Multiwavelet (CL2*Db4) density estimation
图 3 单小波(Db4*Db8)密度估计 Fig.3 Single wavelet(Db4*Db8) density estimation

例2  设二元随机变量(X, Y)服从正态分布,其中x∈[0, 1],y∈[0, 1],密度函数为

$ f(x,y) = \frac{{\frac{1}{{2\pi /36}}{{\rm{e}}^{ - \frac{{{{(x - 0.5)}^2} + {{(\gamma - 0.5)}^2}}}{{2/36}}}}}}{{0.997{\kern 1pt} {\kern 1pt} {\kern 1pt} {4^2}}} $

本例选取STT*Db4和CL2*Db4构成的两个二元多小波以及Db4*Db8构成的二元单小波,对服从正态分布的数据进行仿真实验, 分辨率水平J=4,样本量n=10 000。

图 4表示真实的正态分布密度函数,图 5~7分别为不同基函数的多小波及单小波估计的密度函数图像。从图 4~7可以看出,对于边界处为0的正态分布,多小波估计具有较好的估算精度,能够客观地反映出数据的分布规律,且边界处拟合度更佳。由表 1图 8可以得出,随样本量的增加,多小波密度估计的误差和运行时间总是小于单小波,且在大样本下优势更加明显。

图 4 正态分布的密度函数 Fig.4 Density function of a normal distribution
图 5 多小波(STT*Db4)密度估计 Fig.5 Multiwavelet(STT*Db4) density estimation
图 6 多小波(CL2*Db4)密度估计 Fig.6 Multiwavelet (CL2*Db4) density estimation
图 7 单小波(Db4*Db8)密度估计 Fig.7 Single wavelet (Db4*Db8) density estimation
下载CSV 表 1 随样本变化估计的运行时间 Table 1 Estimated run times of different samples
图 8 随样本变化多小波(STT*Db4)与单小波(Db4*Db8)密度估计误差结果 Fig.8 Multiwavelet (STT*Db4) and single wavelet (Db4*Db8) density estimation error results with sample variation

从线性表达式(式(2)和(4))可以看出,密度函数的信息包含在系数和基函数中,所以估计结果的质量取决于分辨率水平J的选取。表 2给出了不同分辨率水平对估计误差值的影响,可以看出,随着分辨率水平J的增加,估计的误差值先变小后变大,且对于不同的基函数,最优分辨率水平不同。

下载CSV 表 2 随分辨率水平变化估计的误差值 Table 2 Error values of estimates at different resolution

例3  实例分析中,二元多小波可以用来估计美国黄石公园中喷泉喷发时长和间隔时长的密度函数,该数据集可在www.geyserstudy.org/geyser.aspx?pGeyserNo=OLDFAITHFU上公开获取。本例选取n=1 922个样本,基函数为CL2*Db4构成的二元多小波,分辨率水平J=3,结果如图 9~11所示。

图 9 喷泉间隔时长直方图 Fig.9 Histogram of the fountain interval duration
图 10 喷泉喷发时长直方图 Fig.10 Histogram of the fountain eruption duration
图 11 多小波(CL2*Db4)密度估计 Fig.11 Multiwavelet (CL2*Db4) density estimation

图 9图 10分别为喷泉的间隔时长和喷发时长分布直方图,其中横坐标的时长均进行了归一化处理,图 11为相应的密度估计图像。本例选取的样本量较少,且数据分布随机性更强,不再服从某一已知的分布函数。从图 9~11可以看出,一类二元多小波密度估计图像与喷泉的直方图趋势吻合,能够客观地反映出数据的真实分布规律,说明该方法也适用于更一般的数据分析,在实际应用中是有效的。

4 结论

本文研究了一类二元多小波函数进行概率密度估计的问题,给出了线性多小波估计器,并且证明其在积分均方误差意义下存在收敛上界。在仿真实验中,通过选取不同分布的二元数据进行多小波密度估计,并与单小波进行对比,验证了提出方法能够较好地反映数据的真实分布规律且在某些条件下优于单小波。最后,对实例的数据分析结果表明本文方法在实际应用中是有效的。

参考文献
[1]
HARVEY A, ORYSHCHENKO V. Kernel density estimation for time series data[J]. International Journal of Forecasting, 2012, 28(1): 3-14.
[2]
KUNG Y H, LIN P S, KAO C H. An optimal k-nearest neighbor for density estimation[J]. Statistics and Probability Letters, 2012, 82(10): 1786-1791. DOI:10.1016/j.spl.2012.05.017
[3]
DOUKHAN P. Formes de Toeplitz associées à une analyse multi-échelle[J]. C R Acad Sci Paris, 1988, 306: 663-666.
[4]
CHESNEAU C, DEWAN I, DOOSTI H. Nonparametric estimation of a two dimensional continuous-discrete density function by wavelets[J]. Statistical Methodology, 2014, 18: 64-78. DOI:10.1016/j.stamet.2013.09.002
[5]
SHIRAZI E, DOOSTI H. Multivariate wavelet-based density estimation with size-biased data[J]. Statistical Methodology, 2015, 27: 12-19. DOI:10.1016/j.stamet.2015.05.002
[6]
LOCKE J B, PETER A M. Multiwavelet density estimation[J]. Applied Mathematics and Computation, 2013, 219(11): 6002-6015. DOI:10.1016/j.amc.2012.11.099
[7]
黄守勇, 朱炳科, 李旭光, 等. 多小波密度函数估计方法[J]. 北京化工大学学报(自然科学版), 2015, 42(3): 125-128.
HUANG S Y, ZHU B K, LI X G, et al. Estimation methodology for a multiwavelet density function[J]. Journal of Beijing University of Chemical Technology (Natural Science), 2015, 42(3): 125-128. (in Chinese)
[8]
吕军, 石晓煜, 轩亚男, 等. r重二元正交多小波的构造[J]. 山东理工大学学报(自然科学版), 2017, 31(1): 28-33, 38.
LYU J, SHI X Y, XUAN Y N, et al. On the construction of r weight two element orthogonal multi-wavelet[J]. Journal of Shandong University of Technology (Natural Science Edition), 2017, 31(1): 28-33, 38. (in Chinese) DOI:10.3969/j.issn.1672-6197.2017.01.006
[9]
刘明才. 小波分析及其应用[M]. 2版. 北京: 清华大学出版社, 2013: 133-136.
LIU M C. Wavelet analysis and it's applications[M]. 2nd ed. Beijing: Tsinghua University Press, 2013: 133-136. (in Chinese)
[10]
程正兴, 张玲玲. 多小波分析与应用[J]. 工程数学学报, 2001, 18(1): 99-107.
CHENG Z X, ZHANG L L. Analysis of multiwavelet and application[J]. Journal of Engineering Mathematics, 2001, 18(1): 99-107. (in Chinese) DOI:10.3969/j.issn.1005-3085.2001.01.017
[11]
程正兴, 杨守志, 张玲玲. 多小波理论的发展与研究[J]. 工程数学学报, 2001, 18(5): 1-16.
CHENG Z X, YANG S Z, ZHANG L L. The study and evolution of the theory of multiwavelets[J]. Journal of Engineering Mathematics, 2001, 18(5): 1-16. (in Chinese)
[12]
刘志刚, 曾怡达, 钱清泉. 多小波在电力系统信号消噪中的应用[J]. 中国电机工程学报, 2004, 24(1): 30-34.
LIU Z G, ZENG Y D, QIAN Q Q. Denoising of electric power system signals based on different multiwavelets[J]. Proceedings of the CSEE, 2004, 24(1): 30-34. (in Chinese) DOI:10.3321/j.issn:0258-8013.2004.01.006