org.das2.qds.util.AutoHistogram

Self-configuring histogram dynamically adjusts range and bin size as data is added. Also it tries to identify outlier points, which are available as a {@code Map} going from value to number observed. Also for each bin, we keep track of a running mean and variance, which are useful for identifying continuous bins and total moments. Introduced to support automatic cadence algorithm, should be generally useful in data discovery.


USER_PROP_BIN_START


USER_PROP_BIN_WIDTH


USER_PROP_INVALID_COUNT


USER_PROP_OUTLIERS


USER_PROP_MIN_GT_ZERO


USER_PROP_TOTAL

Long, total number of valid points.


binOf

binOf( QDataSet hist, double d ) → int

convenient method for getting the bin location of a value from a completed histogram's metadata. Note this is inefficient since it must do HashMap lookups to get the bin width and bin start, so use this carefully.

Parameters

hist -
d -

Returns:

the index of the bin for the point.

search for examples view on GitHub view source


doit

doit( QDataSet ds ) → QDataSet

Returns:

org.das2.qds.QDataSet

search for examples view on GitHub view source


getHistogram

getHistogram( ) → DDataSet

get the histogram of the data accumulated thus far.

Returns:

org.das2.qds.DDataSet

search for examples view on GitHub view source


moments

moments( QDataSet hist ) → RankZeroDataSet

returns the mean of the dataset that has been histogrammed.

Parameters

hist - a rank 1 dataset with each bin containing the count in each bin. DEPEND_0 are the labels for each bin. The property "means" returns a rank 1 dataset containing the means for each bin. The property "stddevs" contains the standard deviation within each bin.

Returns:

rank 0 dataset (a Datum) whose value is the mean, and the property("stddev") contains the standard deviation

search for examples view on GitHub view source


monoExtent

monoExtent( QDataSet dep0 ) → QDataSet

fast extent only works when monotonic. Returns null if there is no valid data.

Parameters

dep0 -

Returns:

rank 1 bins dataset or null

search for examples view on GitHub view source


peakIds

peakIds( QDataSet hist ) → QDataSet

return a list of all the peaks in the histogram. A peak is defined as a local maximum, then including the adjacent bins consistent with the peak population, and not belonging to another peak.

Parameters

hist -

Returns:

QDataSet covarient with hist.

search for examples view on GitHub view source


peaks

peaks( QDataSet hist ) → QDataSet

return a list of all the peaks in the histogram. See peakIds to see how peaks are identified. Once the bins of a peak have been identified, then the mean and stddev of each peak is returned. Note the stddev typically reads low, probably because the tails have been removed.

Parameters

hist - the result of AutoHistogram

Returns:

QDataSet rank 1 dataset with length equal to the number of identified peaks

search for examples view on GitHub view source


simpleRange

simpleRange( QDataSet hist2 ) → QDataSet

returns the simple range, the min and the max containing the data.

Parameters

hist2 - the result of autoHistogram.

Returns:

rank 1 bins dataset showing the min and max. value(0) is the min, value(1) is the max.

search for examples view on GitHub view source