Skip to content

HDF5 issue with matlab: Large .mat files need to be saved with -v 7.3 #2162

Closed
@gladomat

Description

@gladomat

Summary

I'm running a Bayesian analysis using Per Síden's code using spatial priors and the resulting SPM.mat files are massive for ultra high field data (upwards of 2 GB). I had to change the SPM code to make the saves possible, for example in spm_contrasts.m line 313: save('SPM.mat', 'SPM', '-v7.3').

In nipype, however, I get an error:

File "/home/raid2/mihai/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 65, in mat_reader_factory
    raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files

Actual behavior

Nipype crashes as it can't open or save -v 7.3 .mat files.

Expected behavior

Nipype should save and open -v 7.3 files natively.

How to replicate the behavior

I changed said line in spm_contrasts.m to save -v 7.3 .mat files and ran a second level analysis. At the calculate contrasts step I encounter the above error.

Script/Workflow details

Sorry, it's kinda long. I'll shorten the subject list to 1 subject.

from os.path import join as opj
import os
from nipype.interfaces.io import SelectFiles, DataSink, DataGrabber
from nipype.interfaces.spm import (OneSampleTTestDesign, EstimateModel,
                                   EstimateContrast, Threshold)
from nipype.interfaces.utility import IdentityInterface
from nipype.pipeline.engine import Workflow, Node
import time

# Start timing
start = time.time()

experiment_dir = '/data/pt_nmc002/TDMGB_FMRI_bbreg_native/' # location of experiment folder
output_dir = '2nd_level/spespk_frequentist_33-subs'     # name of 2nd-level output folder
input_dir_cons = 'normed_contrasts'# name of norm output folder

subject_list = [
       "sub-01",
        ]

    # list of contrast identifiers
contrasts = ['con_0001']
con_names = ['all sounds']

fwhmlist = [1, 2]

# One Sample T-Test Design - creates one sample T-Test Design
onesamplettestdes = Node(OneSampleTTestDesign(),
                             overwrite=True,
                         name="onesampttestdes")
# EstimateModel - estimate the parameters of the model
# Even for second level it should be 'Classical': 1.
level2estimate = Node(EstimateModel(estimation_method={'Classical': 1}),
                      name="level2estimate")
# EstimateContrast - estimates simple group contrast
level2conestimate = Node(EstimateContrast(group_contrast=True),
                         name="level2conestimate")
cont1 = ['Group', 'T', ['mean'], [1]]
level2conestimate.inputs.contrasts = [cont1]

## Create the 2nd level pipeline
l2analysis = Workflow(name='l2analysis', base_dir=experiment_dir+'/tmp')
l2analysis.config['execution']['crashdump_dir'] = base_dir=experiment_dir + '/tmp/l2analysis/crash_files'

# Connect up the 2nd-level analysis components
l2analysis.connect([(onesamplettestdes, level2estimate, [('spm_mat_file',
                                                          'spm_mat_file')] ),
                    (level2estimate, level2conestimate, [('spm_mat_file',
                                                          'spm_mat_file'),
                                                         ('beta_images',
                                                          'beta_images'),
                                                         ('residual_image',
                                                          'residual_image')]),
                    ])

# collect all the con images for each contrast.
contrast_ids = list(range(1, len(contrasts) + 1))
l2source = Node(DataGrabber(infields=['fwhm', 'con']), name="l2source")
# we use .*i* to capture both .img (SPM8) and .nii (SPM12)
l2source.inputs.template = os.path.abspath(experiment_dir + '/*/' + \
                            input_dir_cons + '/*/_fwhm_%d/con_%04d*.*i*')
# iterate over all contrast images
l2source.iterables = [('fwhm', fwhmlist),
                      ('con', contrast_ids)]
l2source.inputs.sort_filelist = True

# Datasink - creates output folder for important outputs
datasink = Node(DataSink(base_directory=experiment_dir,
                         container=output_dir),
                name="datasink")

# Use the following DataSink output substitutions
substitutions = [('_con_1_fwhm_1', 'all_sounds_fwhm_1'),
                 ('_con_1_fwhm_2', 'all_sounds_fwhm_2')]

    # Connect SelectFiles and DataSink to the workflow
l2analysis.connect([(l2source, onesamplettestdes, [('outfiles', 'in_files')]),
                    (level2conestimate, datasink, [('spm_mat_file',
                                                    '@spm_mat'),
                                                   ('spmT_images',
                                                    '@T'),
                                                   ('con_images',
                                                    '@con')]),
                    ])

l2analysis.write_graph(dotfilename='l2analysis_frequentist.dot', graph2use='orig', format='pdf')
l2analysis.run('MultiProc', plugin_args={'n_procs': 21})

# Time again and spit out difference.
stop = time.time()
if (stop-start) < 3600:
    print('Elapsed time: %.2f mins.' %((stop-start)/60))
else:
    print('Elapsed time: %.2f hours.' %((stop-start)/60/60))

Platform details:

Linux 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

nipype version: 0.12.1

Execution environment

Native linux with conda installer python 2.7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions