Cross-step formulas, fixed properties, & data joining (Data Flow)

In addition to documenting, tracking and designing Material Flows and Workflows, Riffyn SDE allows you to design data flows.  This sometimes is the most unfamiliar and anxiety-causing aspect of process and experiment design for new users. 

Why would data flow cause anxiety? Because the usual tools of science like Excel, don't give you much opportunity to track data as a flow across your experiment. Researchers often just look at the data on one step at a time, and maybe copy/paste some data together from a few steps. Or if they're brave, they might set-up some VLOOKUPS in Excel to join data from disparate steps. But in general, the attention to data flow is very limited. So when you Riffyn SDE gives you the opportunity to extend and formalize data flows, it can be a bit disorienting.

Let's use this article to give you the lay of the land.


What's the "data" we are flowing?

Everything in Riffyn SDE is technically "data," but when we talk about flowing data in an experiment, we mean transferring the property data of resources (the measurements and attributes of samples, materials, equipment) from step to step along a process.


How does data flow in Riffyn SDE?

There are three primary ways to flow data along or between experiments: (1) fixed properties, (2) cross-step formulas, and (3) data joining on export.


(1) Fixed properties are attributes of a resource (like serial number, make, model, capacity, color, etc) that do not change (or at least change rarely). Moreover, those attributes are global, in that they are true whenever and wherever the resource gets used. Therefore fixed properties ride along (i.e., "flow") with resources to whatever experiment or step they are assigned. That means that fixed properties serve as a way to take data recorded into them in one step and flow that data into other experiments or steps.

The alternative to fixed properties are...drum-roll...non-fixed properties. They are for things like temperature, pressure, cell count, etc. that are changing from step to step or experiment to experiment. This type of property captures data locally to a particular step and experiment.  So when data is saved to a non-fixed property of a resource on a step in an experiment, the data will not flow along with it to other places of use. When you use the resource elsewhere, it will initially have no data on it (until you collect more data specific to that step).

Use of fixed properties is described here.


(2) Cross-step formulas are formulas that use data from other steps in an experiment as input arguments to the formula. For example, if you were calculating the concentration of succinate in a fermentation broth, you might grab the volume of the broth from the ferment step, and the sample concentration of succinate from the downstream HPLC step, and combine them in a single formula on the ferment step. This type of calculation only applies to runs that are linked together across steps via output-input resources.

Thus cross-step formulas are essentially flowing data across steps for related runs and resources. This is often a good way to transfer information between non-fixed properties on different steps. It's also very handy for naming samples based on acquired experiment data.

Use of cross-step formulas is described here.


(3) Joining data is a data analysis procedure that links up all the information from runs that are spread across multiple steps on an experiment.  The result is a single data table where each row traces the entire genealogy of processing steps of a sample or reaction from the beginning to end of experiment. For example, the first few columns of a particular row might contain the material preparation data, like water volume and mass of salts added, while the last few columns of the row might contain the assay data like spectral absorbance measurements of samples generated from the reaction or cell culture. Thus by scanning across a row in that table, you can see the "flow" of data from end to end in an experiment.

Data joining operations also allow you connect subsets of data (like certain samples with certain time points of a reaction) in an exact or "fuzzy" (non-exact) way. In addition, data joining can be combined with grouping and summarization (cleaning) operations to slim down your data and prep it for more useful analysis. For example, you might compute the average pH of fermentation broth over a 24 hour interval before joining it with the succinate concentration measure in samples taken once every 24 hours.

Use of data joining operations on export is described here.


What does all this data flowing get me?

The purpose of these flow operations is to integrate your experiment data across processing steps, across people, across time, across geography — to give you an integrated view of cause and effect in your experiments. Such integrated data sets can be very large, and sometimes complex — too large for tools like Excel.   That is why we recommend you get a more powerful third-party application designed for data analytics.





Have more questions? Submit a request


Article is closed for comments.