Skip to content

fix issue 676 : implemented ExcelReport class #730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 5, 2019

Conversation

alixdamman
Copy link
Collaborator

Not documented yet but ready to start the discussion.

@alixdamman alixdamman requested a review from gdementen December 7, 2018 08:55
Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • looks mostly good. Can we distribute the .crtx files?
  • do you use all those .crtx files? If not, remove the unused ones.

"Please provide one for at least 'width' or 'heigth'")
width = cls._get_width(width)
height = cls._get_height(height)
cls._default_size = (width, height)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missunderstood my comment about this 😢. Let's talk about it tomorrow...

@alixdamman alixdamman added this to the 0.31 milestone Apr 26, 2019
@alixdamman alixdamman force-pushed the master branch 3 times, most recently from 9264074 to 01669f2 Compare May 10, 2019 10:03
@alixdamman
Copy link
Collaborator Author

alixdamman commented Jun 13, 2019

Using win32com instead of xwlings leads to problems when exporting NaN values:

    demo = load_example_data('demo')
    pop_be = demo.pop['Belgium']
    pop_be_nan = pop_be.astype('O')
    pop_be_nan[2013] = nan
    with open_excel('test_nan.xlsx', overwrite_file=True) as wb:
        wb['Sheet1'] = ''
        wb['Sheet1']['A1'] = pop_be_nan.dump()
        wb.save()

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells
        data_range(data_cells(5, 1), data_cells(7, 4)).Value = pop_be_nan.dump()
        wb.save()

produces an Excel file with the following values:

gender\time | 2013 | 2014       | 2015
Male            |          | 5493792 | 5524068
Female         |          | 5687048 | 5713206

gender\time | 2013   | 2014       | 2015
Male            | 65535 | 5493792 | 5524068
Female         | 65535 | 5687048 | 5713206

FWIW, I think the magic done by xlwings is here : https://github.com/xlwings/xlwings/blob/master/xlwings/_xlwindows.py#L1040

I suggest to use slwings to dump the data but use win32com when it comes to generate graphical elements. What do you think?

@gdementen
Copy link
Contributor

gdementen commented Jun 14, 2019

Using win32com instead of xwlings leads to problems when exporting NaN values:

[snip]

FWIW, I think the magic done by xlwings is here : https://github.com/xlwings/xlwings/blob/master/xlwings/_xlwindows.py#L1040

I suggest to use slwings to dump the data but use win32com when it comes to generate graphical elements. What do you think?

We could workaround the issue easily (by doing essentially what xlwings does). Whether it is worth it or not depends on the speed difference and the time it takes to generate the reports. I mean, if it's smaller than a 10% difference, or the total time for our biggest reports is smaller than 1 second, it's probably not worth avoiding xlwings, but in my tests I think I remember the difference was important both relatively and in seconds. That said, there was multiple factors involved (early binding vs late binding, disabling screen updates, etc.) so I am unsure I timed and/or remember the right thing!

@alixdamman
Copy link
Collaborator Author

We could workaround the issue easily (by doing essentially what xlwings does).

Yes but you know how it goes. We start with a small workaround (we essentially copy/paste what xlwings does), then users complain because some special values were not taken into account. So, we adapt our workaround (again by copy/pasting some code from xlwings), etc, etc... All of this involving more code to maintain while xlwings does the job.

I think using

   app = wb.app.api
   app.ScreenUpdating = False

already improve quite the performance. Once the early binding implemented, I think it should be enough.

In any case, there is still the possibility to go back to win32com in the future if we note that the performance is really too bad. But for the moment, I would like to avoid more hacky code to maintain.

@gdementen
Copy link
Contributor

Oh well, I know that pretty well. Honestly, it's all because at some point JD compared the timings of the APL code and the Python code and I wanted to make sure we are faster 😉. Before you scrap the win32com code, could you please time it on a realistic example and then time again the same example once you have switched to xlwings (and report the result here)? Then we can have some facts to remember.

FWIW, the screenupdating thing is done by default in larray for a while now.

@alixdamman
Copy link
Collaborator Author

If APL is faster, I can guess it's essentially because special values are not handled.
JD is not the only user of larray.

Before you scrap the win32com code, could you please time it on a realistic example and then time again the same example once you have switched to xlwings (and report the result here)? Then we can have some facts to remember.

I already made the move in my local branch and then added new commits but switching from xlwings to win32com involved to modify a few lines of code. I'll make the benchmark but after I'm happy with the API I end up with.

@alixdamman
Copy link
Collaborator Author

Well, OK, you win. I made the following benchmark script:

    from xlwings import xlplatform

    def dump2(array):
        return [[xlplatform.prepare_xl_data_element(x) for x in row] for row in array.dump()]


    demo = load_example_data('demography')
    pop = demo.pop[2015, 'M', 'BE'].astype(float)
    pop[0] = nan
    print(pop.info)

    with open_excel(os.path.join(TESTDATADIR, 'test_pop.xlsx'), overwrite_file=True) as wb:
        # ---- win32com ----

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = pop.dump()"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('win32com - current    ', t)
        wb.save()

    with open_excel(os.path.join(TESTDATADIR, 'test_pop2.xlsx'), overwrite_file=True) as wb:
        # ---- win32com ----

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = dump2(pop)"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('win32com + check value', t)
        wb.save()

    with open_excel(os.path.join(TESTDATADIR, 'test_pop_xw.xlsx'), overwrite_file=True) as wb:
        # ---- xlwings ----

        data_sheet = wb.sheets[0]
        data_range = data_sheet.range
        data_cells = data_sheet.cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).value = pop.dump()"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('xlwings               ', t)
        wb.save()

Results are:

3 x 121
 geo [3]: 'BruCap' 'Fla' 'Wal'
 age [121]: 0 1 2 ... 118 119 120
dtype: float64
memory used: 2.84 Kb
win32com - current     1.4690143459999998
win32com + check value 1.3786742059999995
xlwings                8.167913915

@gdementen
Copy link
Contributor

@alixdamman can you finalize this by next week or do we schedule it for 0.32?

@alixdamman
Copy link
Collaborator Author

Depends if you have time to review the PR

@alixdamman
Copy link
Collaborator Author

Depends also if we agree on the api. I am not sure it is a good idea to by default add each new graph to the right and having a method newline to start a new row of graphs. We may want to use something more like subplot from matplotlib and forcing users to specify the place of the graph in a predefined grid. What do you think?

@gdementen
Copy link
Contributor

Requiring users to use newline bothers me too, but the subplot api where you must (AFAIK!) specify the exact grid position for each "item" is painful. It is nice to have in some situations but I don't want that to be the only way to work. On the other hand, specifying the number of graphs per line once and then calling add_graph repeatedly (and it would call "newline" automatically) seems like a nice improvement.

Besides, it goes in the direction of something I suggested to Marie when I reviewed some of the code in demo (and she found it interesting and readable):

    dashboard.add_graphs({"Taux de fécondité - {igeo} - {inat}": icf},
                         axis_per_loop_variable={'igeo': AX_GEO, 'inat': P.AX_NAT2},
                         template='CompPP_FEC_1GEO_nat1',
                         graphs_per_line=len(P.AX_NAT2))

It should not be too hard to implement such a method. Something like this untested code should do the trick:

def add_graphs(self, array_per_title, axis_per_loop_variable, template=None, graphs_per_line=1):
    loop_variable_names = axis_per_loop_variable.keys()
    axes = axis_per_loop_variable.values()
    titles = array_per_title.keys()
    arrays = array_per_title.values()
    i = 0
    for loop_variables_values, arrays_chunk in la.zip_array_items(arrays, axes=axes):
        loop_variables_dict = dict(zip(loop_variable_names, loop_variable_values))
        for title_template, array_chunk in arrays_chunk:
            title = title_template.format(loop_variables_dict)
            self.add_graph(array_chunk, title=title, template=template)
            i += 1
            if i % graphs_per_line == 0:
                self.newline()

@alixdamman
Copy link
Collaborator Author

I like the idea. I will start to implement it.
Although, I find the argument name axis_per_loop_variable not very intuitive.
It is hard to guess what it does at first sight.

@gdementen
Copy link
Contributor

the weird name was meant to make it clear in which direction the mapping goes. Maybe swap keys and values and name it simply: "loop_variables"?
loop_variables={AX_GEO: 'igeo', P.AX_NAT2: 'inat'}

@alixdamman
Copy link
Collaborator Author

what bothered me is the meaning of 'variable'. How these variables are used?

@gdementen
Copy link
Contributor

they are used to construct the titles. in the example above, they are used to fill "{igeo}" and "{inat}".

@alixdamman alixdamman requested a review from gdementen August 5, 2019 08:02
Copy link
Contributor

@gdementen gdementen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo the test failures of course). It seems strange to me that the test actually passed on Python2???

@gdementen
Copy link
Contributor

also this needs a mention in the changelog

… Excel file with multiple graphs at once
@alixdamman alixdamman merged commit 4a9c954 into larray-project:master Aug 5, 2019
@alixdamman alixdamman deleted the dashboard_676 branch August 5, 2019 12:14
alixdamman added a commit to alixdamman/larray that referenced this pull request Aug 7, 2019
alixdamman added a commit to alixdamman/larray that referenced this pull request Aug 7, 2019
alixdamman added a commit that referenced this pull request Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants