Skip to content

Thinking about improving data updates #647

Closed
@monfera

Description

@monfera

tl;dr
There's more and more code that couples the aspect of plotting logic with the aspect of incrementally propagating changes, e.g. see all things going on in Plotly.restyle. Would be good to discuss ways to improve on the situation. Manual code leads to a tangle and some small, simple library focused on change propagation e.g. MobX would be worth looking into.

Plotting turns a stream of user intent into a stream of side effects such as DOM updates

Plotting can be conceived of as a black box:

  • input streams are plot specifications, typically the payloads in Plotly.plot, Plotly.restyle, Plotly.relayout, animation inducing user calls as well as DOM events such as window.resize and mousedown
  • output is a stream of side effecting operations, e.g. DOM mutations, WebGL API calls, and sometimes event callbacks
  • currently, some output is provided by encouraging users to read directly from internal object state but it's something to move away from, by providing a query API and/or event callbacks with meaningful data, so I'll ignore this

The use of 'stream' highlights the fact that with user pointer operations, restyle/relayout, animation etc. generally make plotting a temporal process, rather than something that can be modeled with a function with some input JSON and an output SVG - even if some of the uses are as simple as this special case.

Plotting logic is a directed acyclic graph of computation nodes

We have multiple pieces of input (e.g. data[0].x) at the input and DOM mutating calls as the output. However there's complex calculation in the middle that can be thought of as a DAG. For example,

  • the above x vector serves as the basis for calculating a [min, max] domain that will determine the bounds of the X axis
  • the x vector is also trivially input to scatter point positions, however, a scale transform converts domain values to e.g. pixel coordinates
  • for things like the boxplot, there may be various aggregations building atop of the x vector
  • aesthetics might depend on things like how long the x vector is; maybe defaulting from scatterplot to a density plot at some threshold

All such calculations themselves can be input to downstream calculations.

Plotting needs to be economical

While it would be possible to make a single function whose inputs are {domRoot, userIntentHistory}, it's impractical: response times with a naive implementation would be too high (keeping `userIntentHistory is merely of modest size impact). There's no way to recompute everything from scratch and expect a 60FPS frame rate when turning a WebGL plot or animating something.

This means that there needs to be some kind of caching, therefore state management. The sole purpose of maintaining state is caching (besides this, we may retain userIntentHistory to allow time travel, and of course the output streams are linked to calls that modify the DOM).

Means of reducing recomputation costs

Ideally we'd like to

  • Only recompute what's strictly needed. For example, if I add a new highest value to vector x it needs to lead to an increased visible X axis domain, provided it's set to automatic. However, if the newly inserted value is inside the bounds, there's no need to recalculate anything that depends only on the [min, max] domain. Sure, sometimes there's no harm due to speed of recalculation or lack of need for speed, but there are cases when it's useful to be fairly granular about recalculations due to some specific performance need. Solving these specific performance needs one by one, without a formal change propagation approach is brittle.
  • It may even be useful, necessary and easy to pick calculation algorithms to be incremental. For example, a newly arriving X value can be directly used to update the [min, max] bounds, as opposed to inserting it in the preexisting large vector and applying the vector extent calculation. Similarly, many types of aggregates can be calculated on-line as well as batch. For example, mean, variance and standard deviation.

Some possible tools

Handwritten userland JavaScript isn't quite good for managing a dependency graph, because given enough nodes and optimization rounds, there will be inevitable cache invalidation issues, and potentially, memory leaks. Keeping things consistent and in in sync is also a challenge especially in the presence of asynchronous events. Most importantly, coupling the plot logic aspect with the incremental recalculation aspect makes both aspects hard to decipher, debug and further develop.

There are a lot of tools that provide some kind of framework for calculating and propagating values that can change over time, responding to input, inspired by Functional Reactive Programming. Without endorsing any of these excellent libraries (xstream, most.js etc.) perhaps MobX would feel closest to the current architecture in that it gives you objects that have properties acting like calculated spreadsheet cells, and as @etpinard suggested in the 2.0 wishlist, object-oriented, but investigation would be needed to see how it fits. All these libs are around 10k compressed.

History

We've touched on related topics in the past; a few inspirations:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions