Skip to content

Bad memory management - sequences stored in dataset object #72

Closed
@jacoblambert

Description

@jacoblambert

Running simple code such as

from pandaset import DataSet
seq_num = 0
dataset = DataSet('...')
for sequence in dataset.sequences():
    print("Sequence {}, {} of {}".format(sequence, seq_num, len(dataset.sequences())))
    seq = dataset[sequence]
    seq.load()
    del seq
    seq_num += 1

Quickly leads to sigkill due to lack of memory. Why? Because loaded sequences are also stored in the DataSet object, such that after deleting seq, you can still access the loaded data from dataset[sequence] without doing .load() again.

So is there any practical way of iterating through the data? The dataset class does not support item deletion, the sequence class does not support copying... The only way I've found was to delete the dataset object every iteration which slows down things unnecessarily.

Seems an .unload() method would be simple enough. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions