Applying a Data Cleaning Procedure

Riffyn includes several power data grouping, summarization, and cleaning procedures under the Clean Data panel in Measure Mode.

To create a new data cleaning procedure, go to the Cleaned Data tab within the Collect Data tab of an experiment.


Click the wrench icon to access the data cleaning procedures. At this point an existing procedure can be selected to apply or a new procedure created.

Data can be grouped using exact matches or other algorithms such as K-Means Clustering, Density Based Spatial Clustering, or a Sliding Window (Fixed Interval) Alignment. Additionally, data can be summarized using mean, mode, minimum, maximum, or median.

A summary of the methods available for grouping:

  • Exact Match Alignment - this method will align data based on an exact match (typically a label).
  • K-Means Clustering - this method aims to partition observations into a K number of clusters in which each observation belongs to the cluster with the nearest mean, max, min, etc. More information about k-means clustering.
  • Density Based Spatial Clustering - given a set of points in some space, this method groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions. Epsilon controls signal to noise ratio allowable in creating a cluster (value between 0 and 1). Minimum points dictates the minimum number of points that must be in each cluster. More information about density based clustering.
  • Fixed Interval Alignment - Given a starting point (for example a start time), group data points into StepSize increments. For example, if a data set has many time points in approximately 1 second intervals, a starting point of 0 and StepSize of 30 will group those data points into 30 second-long clusters.

Summarize Data: no clustering takes place, and properties are summarized using mean, min, max, mode, or median.

Have more questions? Submit a request


Please sign in to leave a comment.