Home > general > grid_fun > gridfit.m

gridfit

PURPOSE ^

gridfit: estimates a surface on a 2d grid, based on scattered data

SYNOPSIS ^

function [zgrid,xgrid,ygrid] = gridfit(x,y,z,xnodes,ynodes,varargin)

DESCRIPTION ^

 gridfit: estimates a surface on a 2d grid, based on scattered data

 Retrieved from the Matlab File Exchange on 12 sept 2010 from:
 http://www.mathworks.de/matlabcentral/fileexchange/8998
 By Thijs Damsma

          Replicates are allowed. All methods extrapolate to the grid
          boundaries. Gridfit uses a modified ridge estimator to
          generate the surface, where the bias is toward smoothness.

          Gridfit is not an interpolant. Its goal is a smooth surface
          that approximates your data, but allows you to control the
          amount of smoothing.

 usage #1: zgrid = gridfit(x,y,z,xnodes,ynodes);
 usage #2: [zgrid,xgrid,ygrid] = gridfit(x,y,z,xnodes,ynodes);
 usage #3: zgrid = gridfit(x,y,z,xnodes,ynodes,prop,val,prop,val,...);

 Arguments: (input)
  x,y,z - vectors of equal lengths, containing arbitrary scattered data
          The only constraint on x and y is they cannot ALL fall on a
          single line in the x-y plane. Replicate points will be treated
          in a least squares sense.

          ANY points containing a NaN are ignored in the estimation

  xnodes - vector defining the nodes in the grid in the independent
          variable (x). xnodes need not be equally spaced. xnodes
          must completely span the data. If they do not, then the
          'extend' property is applied, adjusting the first and last
          nodes to be extended as necessary. See below for a complete
          description of the 'extend' property.

          If xnodes is a scalar integer, then it specifies the number
          of equally spaced nodes between the min and max of the data.

  ynodes - vector defining the nodes in the grid in the independent
          variable (y). ynodes need not be equally spaced.

          If ynodes is a scalar integer, then it specifies the number
          of equally spaced nodes between the min and max of the data.

          Also see the extend property.

  Additional arguments follow in the form of property/value pairs.
  Valid properties are:
    'smoothness', 'interp', 'regularizer', 'solver', 'maxiter'
    'extend', 'tilesize', 'overlap'

  Any UNAMBIGUOUS shortening (even down to a single letter) is
  valid for property names. All properties have default values,
  chosen (I hope) to give a reasonable result out of the box.

   'smoothness' - scalar or vector of length 2 - determines the
          eventual smoothness of the estimated surface. A larger
          value here means the surface will be smoother. Smoothness
          must be a non-negative real number.

          If this parameter is a vector of length 2, then it defines
          the relative smoothing to be associated with the x and y
          variables. This allows the user to apply a different amount
          of smoothing in the x dimension compared to the y dimension.

          Note: the problem is normalized in advance so that a
          smoothness of 1 MAY generate reasonable results. If you
          find the result is too smooth, then use a smaller value
          for this parameter. Likewise, bumpy surfaces suggest use
          of a larger value. (Sometimes, use of an iterative solver
          with too small a limit on the maximum number of iterations
          will result in non-convergence.)

          DEFAULT: 1


   'interp' - character, denotes the interpolation scheme used
          to interpolate the data.

          DEFAULT: 'triangle'

          'bilinear' - use bilinear interpolation within the grid
                     (also known as tensor product linear interpolation)

          'triangle' - split each cell in the grid into a triangle,
                     then linear interpolation inside each triangle

          'nearest' - nearest neighbor interpolation. This will
                     rarely be a good choice, but I included it
                     as an option for completeness.


   'regularizer' - character flag, denotes the regularization
          paradignm to be used. There are currently three options.

          DEFAULT: 'gradient'

          'diffusion' or 'laplacian' - uses a finite difference
              approximation to the Laplacian operator (i.e, del^2).

              We can think of the surface as a plate, wherein the
              bending rigidity of the plate is specified by the user
              as a number relative to the importance of fidelity to
              the data. A stiffer plate will result in a smoother
              surface overall, but fit the data less well. I've
              modeled a simple plate using the Laplacian, del^2. (A
              projected enhancement is to do a better job with the
              plate equations.)

              We can also view the regularizer as a diffusion problem,
              where the relative thermal conductivity is supplied.
              Here interpolation is seen as a problem of finding the
              steady temperature profile in an object, given a set of
              points held at a fixed temperature. Extrapolation will
              be linear. Both paradigms are appropriate for a Laplacian
              regularizer.

          'gradient' - attempts to ensure the gradient is as smooth
              as possible everywhere. Its subtly different from the
              'diffusion' option, in that here the directional
              derivatives are biased to be smooth across cell
              boundaries in the grid.

              The gradient option uncouples the terms in the Laplacian.
              Think of it as two coupled PDEs instead of one PDE. Why
              are they different at all? The terms in the Laplacian
              can balance each other.

          'springs' - uses a spring model connecting nodes to each
              other, as well as connecting data points to the nodes
              in the grid. This choice will cause any extrapolation
              to be as constant as possible.

              Here the smoothing parameter is the relative stiffness
              of the springs connecting the nodes to each other compared
              to the stiffness of a spting connecting the lattice to
              each data point. Since all springs have a rest length
              (length at which the spring has zero potential energy)
              of zero, any extrapolation will be minimized.

          Note: The 'springs' regularizer tends to drag the surface
          towards the mean of all the data, so too large a smoothing
          parameter may be a problem.


   'solver' - character flag - denotes the solver used for the
          resulting linear system. Different solvers will have
          different solution times depending upon the specific
          problem to be solved. Up to a certain size grid, the
          direct \ solver will often be speedy, until memory
          swaps causes problems.

          What solver should you use? Problems with a significant
          amount of extrapolation should avoid lsqr. \ may be
          best numerically for small smoothnesss parameters and
          high extents of extrapolation.

          Large numbers of points will slow down the direct
          \, but when applied to the normal equations, \ can be
          quite fast. Since the equations generated by these
          methods will tend to be well conditioned, the normal
          equations are not a bad choice of method to use. Beware
          when a small smoothing parameter is used, since this will
          make the equations less well conditioned.

          DEFAULT: 'normal'

          '\' - uses matlab's backslash operator to solve the sparse
                     system. 'backslash' is an alternate name.

          'symmlq' - uses matlab's iterative symmlq solver

          'lsqr' - uses matlab's iterative lsqr solver

          'normal' - uses \ to solve the normal equations.


   'maxiter' - only applies to iterative solvers - defines the
          maximum number of iterations for an iterative solver

          DEFAULT: min(10000,length(xnodes)*length(ynodes))


   'extend' - character flag - controls whether the first and last
          nodes in each dimension are allowed to be adjusted to
          bound the data, and whether the user will be warned if
          this was deemed necessary to happen.

          DEFAULT: 'warning'

          'warning' - Adjust the first and/or last node in
                     x or y if the nodes do not FULLY contain
                     the data. Issue a warning message to this
                     effect, telling the amount of adjustment
                     applied.

          'never'  - Issue an error message when the nodes do
                     not absolutely contain the data.

          'always' - automatically adjust the first and last
                     nodes in each dimension if necessary.
                     No warning is given when this option is set.


   'tilesize' - grids which are simply too large to solve for
          in one single estimation step can be built as a set
          of tiles. For example, a 1000x1000 grid will require
          the estimation of 1e6 unknowns. This is likely to
          require more memory (and time) than you have available.
          But if your data is dense enough, then you can model
          it locally using smaller tiles of the grid.

          My recommendation for a reasonable tilesize is
          roughly 100 to 200. Tiles of this size take only
          a few seconds to solve normally, so the entire grid
          can be modeled in a finite amount of time. The minimum
          tilesize can never be less than 3, although even this
          size tile is so small as to be ridiculous.

          If your data is so sparse than some tiles contain
          insufficient data to model, then those tiles will
          be left as NaNs.

          DEFAULT: inf


   'overlap' - Tiles in a grid have some overlap, so they
          can minimize any problems along the edge of a tile.
          In this overlapped region, the grid is built using a
          bi-linear combination of the overlapping tiles.

          The overlap is specified as a fraction of the tile
          size, so an overlap of 0.20 means there will be a 20%
          overlap of successive tiles. I do allow a zero overlap,
          but it must be no more than 1/2.

          0 <= overlap <= 0.5

          Overlap is ignored if the tilesize is greater than the
          number of nodes in both directions.

          DEFAULT: 0.20


   'autoscale' - Some data may have widely different scales on
          the respective x and y axes. If this happens, then
          the regularization may experience difficulties. 
          
          autoscale = 'on' will cause gridfit to scale the x
          and y node intervals to a unit length. This should
          improve the regularization procedure. The scaling is
          purely internal. 

          autoscale = 'off' will disable automatic scaling

          DEFAULT: 'on'


 Arguments: (output)
  zgrid   - (nx,ny) array containing the fitted surface

  xgrid, ygrid - as returned by meshgrid(xnodes,ynodes)


 Speed considerations:
  Remember that gridfit must solve a LARGE system of linear
  equations. There will be as many unknowns as the total
  number of nodes in the final lattice. While these equations
  may be sparse, solving a system of 10000 equations may take
  a second or so. Very large problems may benefit from the
  iterative solvers or from tiling.


 Example usage:

  x = rand(100,1);
  y = rand(100,1);
  z = exp(x+2*y);
  xnodes = 0:.1:1;
  ynodes = 0:.1:1;

  g = gridfit(x,y,z,xnodes,ynodes);

 Note: this is equivalent to the following call:

  g = gridfit(x,y,z,xnodes,ynodes, ...
              'smooth',1, ...
              'interp','triangle', ...
              'solver','normal', ...
              'regularizer','gradient', ...
              'extend','warning', ...
              'tilesize',inf);


 Author: John D'Errico
 e-mail address: woodchips@rochester.rr.com
 Release: 2.0
 Release date: 5/23/06

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Mon 29-Nov-2010 15:42:51 by m2html © 2005