int lkkmeans(kobject in_obj1,
kobject in_obj2,
int n,
int k,
int map_flag,
int spectrum_flag,
kobject out_obj1,
kobject out_obj2,
kobject out_obj3,
kobject out_obj4,
kfile *statsfile
)
The K-means algorithm is based on minimization of the sum of the squared distances from all points in a cluster to a cluster center. The user chooses K initial cluster centers and the input vectors are iteratively distributed among the K cluster domains. New cluster centers are computed from these results, such that the sum of the squared distances from all points in a cluster to the new cluster center is minimized.
Although the K-means algorithm does not really converge (in a continuous space), it may converge in a discrete space or a practical upper limit can be chosen for convergence. The user has the option of specifying the maximum number of iterations using the n argument.
There are two ways to specify the initial cluster centers. If the in_obj2 argument is non-NULL, then the cluster centers are read from that object. The vectors are assumed to be stored along the E direction. Only the first K centers (as specified by the -k argument) will be read. If the in_obj2 argument is NULL, then the first K vectors in the in_obj1 object will be used as the initial cluster centers.
It should be noted that it is possible to specify an initial cluster that lies at a sufficient distance from all input vectors that it will have no vectors assigned to it during a pass of the K-means algorithm. If this happens, lkkmeans will reinitialize the value of that cluster to the mean value of a moving pair of the existing cluster centers, thus avoiding degeneracy.
If map_flag and spectrum_flag are false, the out_obj1 object will contain a value segment specifying the cluster number to which each input vector was assigned. If map_flag is true, then a map segment will also be generated. The final cluster centers will be stored row by row in the map. The values in the value segment can be interpreted as "pointing" to a particular row in the map where the associated cluster for that input vector can be found.
If spectrum_flag is true, then the out_obj1 output object will contain a special map segment (regardless of map_flag) with additional information required for use with the spectrum program in the most general sense. Here, not only the cluster centers are stored, but so are the number of vectors associated with each cluster and the packed upper triangle of the covariance matrix for each cluster. See the spectrum manual for additional information on how this data is used and the additional capabilities that become available when the extra data supplied by setting spectrum_flag to true.
If out_obj2 is non-NULL, the associated data object will contain the cluster centers (mean vectors), stored row by row in the value segment. The dimensions of the value segment will be WxHx1x1x1 where W is the number of elements in each mean vector and W is the number of clusters.
If out_obj3 is non-NULL, the associated data object will contain the cluster variances, stored row by row in the value segment. The dimensions of the value segment will be WxHx1x1x1 where W is the number of elements in each vector of variances and W is the number of clusters.
If out_obj4 is non-NULL, the associated data object will contain the cluster membership counts, stored row by row in the value segment. The dimensions of the value segment will be 1xHx1x1x1 where H is the number of clusters. The membership counts simply state the number of vectors that were present in the input object that were assigned to each of the final cluster centers.
If statsfile is non-NULL, the an ASCII file will be written to the associated file descriptor which will contain statistics obtained during the execution of lkkmeans. The output includes the following information: Total Number of K-means Iterations, Total Number of Clusters, Number of Vectors Per Cluster, Cluster Center Values, Cluster Center Variance Values, Trace of Covariance Matrix.
Results obtained by the K-means algorithm can be influenced by the number and choice of initial cluster centers and the geometrical properties of the data.
For the out_obj1, out_obj2, and out_obj3 output objects, the data will be stored as type KDOUBLE. For the out_obj4 output object, the data will be stored as type KINT. For the out_obj1 output object, the value data will be stored as type KSHORT and all map data as type KDOUBLE.
lkkmeans was converted from the K1.5 lkmeans routine, which was written by Tom Sauer and Charlie Gage, with assistance and ideas from Dr. Don Hush, University of New Mexico, Dept. of EECE. Significant modifications were made to the algorithm by Scott Wilson during conversion to K2.
none
none
$DATAMANIP/objects/library/kdatamanip/src/lkkmeans.c