DATAMANIP commands


PROGRAM NAME

kstats - Compute Statistics of Data Object

DESCRIPTION

kstats computes the mean, variance, standard deviation, RMS level, skew, kurtosis, minimum value, maximum value, and integral (positive, negative, and total sums) of the Input data object (i). Information such as the number of unmasked or ungated points (positive, negative, zero-valued, and total counts), the coordinates of the minimum and maximum values, and the dimensions of the data object are also provided.
If there is a validity mask associated with the source object, the calculation is performed using only data that have corresponding mask values not equal to zero. The mask data is not transferred to the output object.


If the source object contains map data as well as value data, then the value data will be mapped through the map (the value data type becomes that of the map) before the operation is performed. The destination object contains value data with no associated map.

If the data has a validity mask as well as a map, the mapped data is also masked. When mapping a value datum that has a mask value of zero (invalid), the mapping process will first look for an entry in the map that corresponds to the "invalid" value. If one does not exist (for example, the value is NaN), the mapped data is set to the pad value. In either case, the data remains masked. The mask is resized to correspond to the size of the mapped value data to properly mask it. For example, the value data may have the dimensions of

	width = 100, 
	height = 50, 
	elements = depth = time = 1.  

If the map dimensions are

	map width = 3, 
	map height = whatever, 
	map element = map depth = map time = 1, 

the resulting value (and mask) size will be

	width = 100, 
	height = 50, 
	elements = 3, 
	depth = time = 1.

All statistics and information will reflect the mapping. For example, if the dimensions of the data are requested (wsize, hsize, etc), kstats returns the dimensionality after mapping.
This program fails if the input object lacks both map data and value data.


All statistics are calculated as double, and the Binary Output data type is double.

Location and time data do not affect the statistics calculations, and are not transferred to the output object.

A flag can be set for each statistic and information option so that a subset of all available options can be calculated and stored or printed. The flags are mutually exclusive with the Calculate All Statistics flag (all). (The command line flag names are: mean, var, sd, rms, skew, kur, minval, maxval, wmin, hmin, dmin, tmin, emin, wmax, hmax, dmax, tmax, emax, sum, psum, nsum, pts, ppts, npts, zpts, wsize, hsize, dsize, tsize, and esize.) If no flag is specified when running kstats from the command line, all statistics and information will be stored.

Statistics are computed according to the equations given below. In the equations, N is the number of samples, SUM is the sum from i=0 to i=N-1, and x(i) is the sample value of x at i.

Mean (-mean) mean = (1/N) * SUM(x(i)) i=0..N-1

Variance (-var) variance = (1/(N-1)) * (SUM (x(i) - mean)**2) i=0..N-1

Standard Deviation (-sd) standard deviation = sqrt(variance)

RMS (-rms) RMS = sqrt(1/N * SUM(x(i)**2)) i=0..N-1

Skewness (-skew) Skewness is a measure of the tendency of the deviations from the mean to be larger in one direction than in the other. It is a measure of the asymmetry of a distribution about its mean. A positive skew value signifies a distribution whose tail extends out towards more positive x, and a negative tail signifies a distribution whose tail extends out towards more negative x. Population skewness is unitless and is defined as:

E[((x-mean)/(stddev))**3],
where stddev is the standard deviation of the data. kstats computes the sample skewness as:

skew = 1/N * SUM( ((x(i) - mean)/stddev) **3 ) i=0..N-1

If the variance is equal to zero, skewness will be set to 0.0.

Kurtosis Kurtosis is a unitless measure of the tail heaviness of a distribution and is defined as:

E[((x-mean)/(stddev))^4] - 3,
where stddev is the standard deviation of the data. kstats computes the sample kurtosis as:

kurtosis= (1/N * SUM( ((x(i) - mean)/stddev) **4)) - 3 i=0..N-1

If the variance is equal to zero, kurtosis will be set to 0.0.

Sums or Integrals (-sum, -psum, -nsum) total integral = SUM x(i)
positive part of integral = SUM x(i), for all x(i) > 0
negative part of integral = SUM x(i), for all x(i) < 0

Define Processing Unit To support analysis of subregions within the data object, such as lines, planes, volumes, and vectors, an option for defining processing units, or regions, is provided. These regions are defined by the settings of the Processing Unit options, which can be either the Whole Data Set (whole), or any combination of Width (w), Height (h), Depth (d), Time (t), and Elements (e). Statistics for each region will be computed and printed separately. If none of these flags are supplied, then a single set of statistics will be computed for the entire data object.

Gating Input If a gating source object is supplied, the operation is only performed where the gating value is non-zero.

ASCII Output The ASCII Output (f) option allows the user to specify a device or file for printing the specified information in formatted ASCII. A filename of # will send the output to stderr. If neither an ASCII Output or a Binary Output is supplied, the formatted ASCII will automatically go to stdout.

Binary Output If the Binary Output (-o) is selected, the selected statistics and information are stored in as double float data in the given file.

Each statistic is stored as an element of a N-D vector defined along the "elements" dimension of the output object. N is the number of statistics and information specified by the user. The order in which the information is stored in the statistics vector is given in the output object's comment attribute.

When M sets of statistics are calculated for M multiple regions of the input data object (see discussion on independent region analysis above), the statistics vectors are incremented along the width dimension of the output statistics object - the output object would have a resulting dimensionality of width=M, elements=N (height = depth = time = 1). For example, if the input data object had the dimensionality width = 256, height = 256, elements = 7 (depth = time = 1), and the user defined the analysis regions to be width-height, the output statistics object would have dimensionality width = 7, elements = N (height = depth = time = 1). Where the vector at w=0 would contain information pertaining to the first region (starting at w=0, h=0, d=0, t=0, e=0), the vector at w=1 would contain information about the second region (starting at w=0, h=0, d=0, t=0, e=1), and so forth.

REQUIRED ARGUMENTS

-i
type: infile
desc: Input object

OPTIONAL ARGUMENTS

-igate
type: infile
desc: Gating input data object
default: {none}
-f
type: outfile
desc: Formatted ASCII output file
default: {none}
-o
type: outfile
desc: Binary output statistics file
default: {none}

Mutually Exclusive Group; if desired, specify ONE of:

-whole
type: flag
desc: compute single set of statistics for entire data set
OR

AT LEAST ONE OF the Group:

-w
type: flag
desc: include width in processing unit
AND/OR
-h
type: flag
desc: include height in processing unit
AND/OR
-d
type: flag
desc: include depth in processing unit
AND/OR
-t
type: flag
desc: include time in processing unit
AND/OR
-e
type: flag
desc: include elements in processing unit

Mutually Exclusive Group; if desired, specify ONE of:

-all
type: flag
desc: compute and store all statistics information
OR

AT LEAST ONE OF the Group:

-mean
type: flag
desc: compute mean
AND/OR
-sum
type: flag
desc: compute total integral
AND/OR
-var
type: flag
desc: compute variance
AND/OR
-wmin
type: flag
desc: store width coordinate of the minimum value
AND/OR
-psum
type: flag
desc: compute positive part of integral
AND/OR
-sd
type: flag
desc: compute standard deviation
AND/OR
-hmin
type: flag
desc: store height coordinate of the minimum value
AND/OR
-nsum
type: flag
desc: compute negative part of integral
AND/OR
-rms
type: flag
desc: compute root mean square
AND/OR
-dmin
type: flag
desc: store depth coordinate of the minimum value
AND/OR
-pts
type: flag
desc: compute total number of contributing points
AND/OR
-skew
type: flag
desc: compute skewness
AND/OR
-tmin
type: flag
desc: store time coordinate of the minimum value
AND/OR
-ppts
type: flag
desc: compute number of positive contributing points
AND/OR
-kur
type: flag
desc: compute kurtosis
AND/OR
-emin
type: flag
desc: store elements coordinate of the minimum value
AND/OR
-npts
type: flag
desc: compute number of negative contributing points
AND/OR
-minval
type: flag
desc: compute minimum value
AND/OR
-zpts
type: flag
desc: compute number of zero-valued contributing pts
AND/OR
-maxval
type: flag
desc: compute maximum value
AND/OR
-wmax
type: flag
desc: store width coordinate of the maximum value
AND/OR
-wsize
type: flag
desc: store size of objects width dimension
AND/OR
-hmax
type: flag
desc: store height coordinate of the maximum value
AND/OR
-hsize
type: flag
desc: store size of objects height dimension
AND/OR
-dmax
type: flag
desc: store depth coordinate of the maximum value
AND/OR
-dsize
type: flag
desc: store size of objects depth dimension
AND/OR
-tmax
type: flag
desc: store time coordinate of the maximum value
AND/OR
-tsize
type: flag
desc: store size of objects time dimension
AND/OR
-emax
type: flag
desc: store elements coordinate of the maximum value
AND/OR
-esize
type: flag
desc: store size of objects elements dimension

EXAMPLES

kstats -i object.xv -f ascii -all

Will create an ASCII statitics file. Since the -region option is not specified, kstats defaults to calculating statistical information for the object as a whole.

kstats -i object.xv -f ascii_file -region -e

Will create an ASCII statistics file. The statistical information is now calculated by slicing the object along the element dimension. For example, on a 3-band RGB color image, this would compute the statistics for each RGB vector.

SEE ALSO

RESTRICTIONS

kstats cannot compute complex statistics at this time. If you wish to process the real and imaginary components of a complex data set independently, first separate components using kcmplx2real.

In the current implementation, the statistics are accumulated. This can result in overflow. kstats will be rewritten in the future to better handle large values and large data sets.

REFERENCES

COPYRIGHT

Copyright (C) 1993 - 1997, Khoral Research, Inc. ("KRI") All rights reserved.