Open
Description
- merge Node/MapNode/BaseInterface into a single construct
- a workflow can be a node in the graph
- a mapnode can support a workflow
- file inputs automatically encode full paths
- expansion happens during execution, unlike currently where expansion happens before execution
- this will allow better integration of mapnode-like expansion
- also allow for much greater scalability
- much better semantic control
- allow sequence of nodes or a subgraph to run completely in memory without writing anything to disk
- allow chunking workflows depending on execution environment
- better handling of rerun
- when assuming that disks are immutable, we can greatly speed up re-execution by eliminating parts of graphs during the hash-checks
- this will require an input hash and an output hash. in fact i have been thinking of breaking the input hash into hash([connection hash, parameter hash])
- make the workflow engine just depend on scipy and networkx
- or even better be pure python if we can figure out a simple set of sparse matrix operations
- allow URLs as inputs and checking/caching of URLs
- Interfaces
- all type checking happens within interfaces
- implement some CWL ideas into interface specification (we have most covered)
- read/write CWL from interfaces/workflows
- support local and remote services
- file i/o from/to a remote store (file system, ssh, s3, http(s), dropbox, gdrive, etc.,.)
- interfaces as containers (docker, vagga, etc.,.), remote services
- version matching for reproducibility
- flexible and extensible type checking including file types, file content, etc.,.
- simpler workflow specification
- simpler non-pythonic support (CWL, DSL, etc.,.)
- possible FASTR like magic - wf.connect(a, 'out', b, 'in') --> b.in = a.out