Thinking about improving data updates

**tl;dr**
There's more and more code that couples the aspect of plotting logic with the aspect of incrementally propagating changes, e.g. see all things going on in `Plotly.restyle`. Would be good to discuss ways to improve on the situation. Manual code leads to a tangle and some small, simple library focused on change propagation e.g. [MobX](https://docs.google.com/presentation/d/1tP0VWjproXsZex3R8zWCA7Bra7RtMcu4daJXCUbs8pc/edit#slide=id.g1102c17e6b_0_31) would be worth looking into.

**Plotting turns a stream of user intent into a stream of side effects such as DOM updates**

Plotting can be conceived of as a black box:
- input streams are plot specifications, typically the payloads in `Plotly.plot`, `Plotly.restyle`, `Plotly.relayout`, animation inducing user calls as well as DOM events such as `window.resize` and `mousedown`
- output is a stream of side effecting operations, e.g. DOM mutations, WebGL API calls, and sometimes event callbacks
- currently, some output is provided by encouraging users to read directly from internal object state but it's something to move away from, by providing a query API and/or event callbacks with meaningful data, so I'll ignore this

The use of 'stream' highlights the fact that with user pointer operations, restyle/relayout, animation etc. _generally_ make plotting a temporal process, rather than something that can be modeled with a function with some input JSON and an output SVG - even if some of the uses are as simple as this special case.

**Plotting logic is a directed acyclic graph of computation nodes**

We have multiple pieces of input (e.g. `data[0].x`) at the input and DOM mutating calls as the output. However there's complex calculation in the middle that can be thought of as a DAG. For example, 
- the above `x` vector serves as the basis for calculating a `[min, max]` domain that will determine the bounds of the X axis
- the `x` vector is also trivially input to scatter point positions, however, a `scale` transform converts domain values to e.g. pixel coordinates
- for things like the boxplot, there may be various aggregations building atop of the `x` vector
- aesthetics might depend on things like how long the `x` vector is; maybe defaulting from scatterplot to a density plot at some threshold

All such calculations themselves can be input to downstream calculations.

**Plotting needs to be economical**

While it would be possible to make a single function whose inputs are `{domRoot, userIntentHistory}`, it's impractical: response times with a naive implementation would be too high (keeping `userIntentHistory is merely of modest size impact). There's no way to recompute everything from scratch and expect a 60FPS frame rate when turning a WebGL plot or animating something.

This means that there needs to be some kind of caching, therefore state management. The sole purpose of maintaining state is caching (besides this, we may retain `userIntentHistory` to allow time travel, and of course the output streams are linked to calls that modify the DOM).

**Means of reducing recomputation costs**

Ideally we'd like to 
- Only recompute what's strictly needed. For example, if I add a new highest value to vector `x` it needs to lead to an increased visible X axis domain, provided it's set to automatic. However, if the newly inserted value is inside the bounds, there's no need to recalculate anything that depends only on the `[min, max]` domain. Sure, sometimes there's no harm due to speed of recalculation or lack of need for speed, but there _are_ cases when it's useful to be fairly granular about recalculations due to some specific performance need. Solving these specific performance needs one by one, without a formal change propagation approach is brittle.
- It may even be useful, necessary and easy to pick calculation algorithms to be incremental. For example, a newly arriving X value can be directly used to update the `[min, max]` bounds, as opposed to inserting it in the preexisting large vector and applying the vector `extent` calculation. Similarly, many types of aggregates can be calculated _on-line_ as well as _batch_. For example, mean, variance and standard deviation.

**Some possible tools**

Handwritten userland JavaScript isn't quite good for managing a dependency graph, because given enough nodes and optimization rounds, there will be inevitable cache invalidation issues, and potentially, memory leaks. Keeping things consistent and in in sync is also a challenge especially in the presence of asynchronous events. Most importantly, coupling the plot logic aspect with the incremental recalculation aspect makes both aspects hard to decipher, debug and further develop.

There are a lot of tools that provide some kind of framework for calculating and propagating values that can change over time, responding to input, inspired by Functional Reactive Programming. Without endorsing any of these excellent libraries ([xstream](https://github.com/staltz/xstream), [most.js](https://github.com/cujojs/most) etc.) perhaps [MobX](https://github.com/mobxjs/mobx) would feel closest to the current architecture in that it gives you objects that have properties acting like calculated spreadsheet cells, and as @etpinard suggested in the 2.0 wishlist, object-oriented, but investigation would be needed to see how it fits. All these libs are around 10k compressed.

**History**

We've touched on related topics in the past; a few inspirations:
- Wishlist suggestions such as using more OO; not storing data in the DOM; using pure functions; more complete data in callbacks: https://github.com/plotly/plotly.js/issues/420
- Customer filed PRs for faster (re)calculation and way faster incremental calculation, e.g. on large WebGL meshes
- Most of the Plotly.restyle function, whose 500 lines do heavy amounts of manual work, e.g. https://github.com/plotly/plotly.js/blob/master/src/plot_api/plot_api.js#L1736-L1814 and https://github.com/plotly/plotly.js/pull/617#discussion_r67130816
- Previous discussions e.g. on the animation topic


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Thinking about improving data updates #647

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Thinking about improving data updates #647

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions