Using Data Cleaning Procedures

Data cleaning in Riffyn can be used for two purposes: reducing the size of large data sets, or summarizing data sets before export.

Summarizing data sets can be useful to pre-summarize runs by a common value such as a resource name or a property value.

Reducing large data sets (such as many multi valued properties on a run) can be done before exporting data if every multi-value data point is not needed in the export table. A typical use case is summarizing time course data - if data was recorded every second for 5 days, this would yield 432,000 data points per run in the export data table - likely far too much for a typical analysis. Data cleaning could summarize this data down to one data point per hour for a total of 120 rows instead. This reduced data size is faster to export, and makes analyses easier whilst still giving similar information about your 432,000 data points.

Multi-value data sets are typically very large and should be cleaned in advance. If your data size is too large Riffyn will suggest applying a data cleaning procedure.

Data sets can be further reduced through the use of Custom Join Rules during export, which can also be added to the default export method for a process.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.