fix issue 676 : implemented ExcelReport class #730

alixdamman · 2018-12-07T08:55:19Z

Not documented yet but ready to start the discussion.

gdementen

looks mostly good. Can we distribute the .crtx files?
do you use all those .crtx files? If not, remove the unused ones.

larray/inout/xw_reporting.py

larray/tests/test_excel.py

larray/inout/xw_reporting.py

gdementen · 2019-02-06T12:06:18Z

larray/inout/xw_reporting.py

+                                 "Please provide one for at least 'width' or 'heigth'")
+            width = cls._get_width(width)
+            height = cls._get_height(height)
+            cls._default_size = (width, height)


You missunderstood my comment about this 😢. Let's talk about it tomorrow...

larray/inout/xw_reporting.py

larray/inout/xw_excel.py

alixdamman · 2019-06-13T14:23:31Z

Using win32com instead of xwlings leads to problems when exporting NaN values:

    demo = load_example_data('demo')
    pop_be = demo.pop['Belgium']
    pop_be_nan = pop_be.astype('O')
    pop_be_nan[2013] = nan
    with open_excel('test_nan.xlsx', overwrite_file=True) as wb:
        wb['Sheet1'] = ''
        wb['Sheet1']['A1'] = pop_be_nan.dump()
        wb.save()

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells
        data_range(data_cells(5, 1), data_cells(7, 4)).Value = pop_be_nan.dump()
        wb.save()

produces an Excel file with the following values:

gender\time | 2013 | 2014       | 2015
Male            |          | 5493792 | 5524068
Female         |          | 5687048 | 5713206

gender\time | 2013   | 2014       | 2015
Male            | 65535 | 5493792 | 5524068
Female         | 65535 | 5687048 | 5713206

FWIW, I think the magic done by xlwings is here : https://github.com/xlwings/xlwings/blob/master/xlwings/_xlwindows.py#L1040

I suggest to use slwings to dump the data but use win32com when it comes to generate graphical elements. What do you think?

gdementen · 2019-06-14T07:02:01Z

Using win32com instead of xwlings leads to problems when exporting NaN values:

[snip]

FWIW, I think the magic done by xlwings is here : https://github.com/xlwings/xlwings/blob/master/xlwings/_xlwindows.py#L1040

I suggest to use slwings to dump the data but use win32com when it comes to generate graphical elements. What do you think?

We could workaround the issue easily (by doing essentially what xlwings does). Whether it is worth it or not depends on the speed difference and the time it takes to generate the reports. I mean, if it's smaller than a 10% difference, or the total time for our biggest reports is smaller than 1 second, it's probably not worth avoiding xlwings, but in my tests I think I remember the difference was important both relatively and in seconds. That said, there was multiple factors involved (early binding vs late binding, disabling screen updates, etc.) so I am unsure I timed and/or remember the right thing!

alixdamman · 2019-06-14T07:14:08Z

We could workaround the issue easily (by doing essentially what xlwings does).

Yes but you know how it goes. We start with a small workaround (we essentially copy/paste what xlwings does), then users complain because some special values were not taken into account. So, we adapt our workaround (again by copy/pasting some code from xlwings), etc, etc... All of this involving more code to maintain while xlwings does the job.

I think using

   app = wb.app.api
   app.ScreenUpdating = False

already improve quite the performance. Once the early binding implemented, I think it should be enough.

In any case, there is still the possibility to go back to win32com in the future if we note that the performance is really too bad. But for the moment, I would like to avoid more hacky code to maintain.

gdementen · 2019-06-14T09:05:02Z

Oh well, I know that pretty well. Honestly, it's all because at some point JD compared the timings of the APL code and the Python code and I wanted to make sure we are faster 😉. Before you scrap the win32com code, could you please time it on a realistic example and then time again the same example once you have switched to xlwings (and report the result here)? Then we can have some facts to remember.

FWIW, the screenupdating thing is done by default in larray for a while now.

alixdamman · 2019-06-14T09:38:55Z

If APL is faster, I can guess it's essentially because special values are not handled.
JD is not the only user of larray.

Before you scrap the win32com code, could you please time it on a realistic example and then time again the same example once you have switched to xlwings (and report the result here)? Then we can have some facts to remember.

I already made the move in my local branch and then added new commits but switching from xlwings to win32com involved to modify a few lines of code. I'll make the benchmark but after I'm happy with the API I end up with.

alixdamman · 2019-06-17T14:43:55Z

Well, OK, you win. I made the following benchmark script:

    from xlwings import xlplatform

    def dump2(array):
        return [[xlplatform.prepare_xl_data_element(x) for x in row] for row in array.dump()]


    demo = load_example_data('demography')
    pop = demo.pop[2015, 'M', 'BE'].astype(float)
    pop[0] = nan
    print(pop.info)

    with open_excel(os.path.join(TESTDATADIR, 'test_pop.xlsx'), overwrite_file=True) as wb:
        # ---- win32com ----

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = pop.dump()"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('win32com - current    ', t)
        wb.save()

    with open_excel(os.path.join(TESTDATADIR, 'test_pop2.xlsx'), overwrite_file=True) as wb:
        # ---- win32com ----

        xl_wb = wb.api
        data_sheet = xl_wb.Worksheets(1)
        data_range = data_sheet.Range
        data_cells = data_sheet.Cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = dump2(pop)"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('win32com + check value', t)
        wb.save()

    with open_excel(os.path.join(TESTDATADIR, 'test_pop_xw.xlsx'), overwrite_file=True) as wb:
        # ---- xlwings ----

        data_sheet = wb.sheets[0]
        data_range = data_sheet.range
        data_cells = data_sheet.cells

        stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).value = pop.dump()"
        t = timeit.timeit(stmt, globals=globals(), number=200)
        print('xlwings               ', t)
        wb.save()

Results are:

3 x 121
 geo [3]: 'BruCap' 'Fla' 'Wal'
 age [121]: 0 1 2 ... 118 119 120
dtype: float64
memory used: 2.84 Kb
win32com - current     1.4690143459999998
win32com + check value 1.3786742059999995
xlwings                8.167913915

gdementen · 2019-08-01T11:11:24Z

@alixdamman can you finalize this by next week or do we schedule it for 0.32?

alixdamman · 2019-08-01T12:58:06Z

Depends if you have time to review the PR

alixdamman · 2019-08-01T13:02:13Z

Depends also if we agree on the api. I am not sure it is a good idea to by default add each new graph to the right and having a method newline to start a new row of graphs. We may want to use something more like subplot from matplotlib and forcing users to specify the place of the graph in a predefined grid. What do you think?

gdementen · 2019-08-01T15:07:00Z

Requiring users to use newline bothers me too, but the subplot api where you must (AFAIK!) specify the exact grid position for each "item" is painful. It is nice to have in some situations but I don't want that to be the only way to work. On the other hand, specifying the number of graphs per line once and then calling add_graph repeatedly (and it would call "newline" automatically) seems like a nice improvement.

Besides, it goes in the direction of something I suggested to Marie when I reviewed some of the code in demo (and she found it interesting and readable):

    dashboard.add_graphs({"Taux de fécondité - {igeo} - {inat}": icf},
                         axis_per_loop_variable={'igeo': AX_GEO, 'inat': P.AX_NAT2},
                         template='CompPP_FEC_1GEO_nat1',
                         graphs_per_line=len(P.AX_NAT2))

It should not be too hard to implement such a method. Something like this untested code should do the trick:

def add_graphs(self, array_per_title, axis_per_loop_variable, template=None, graphs_per_line=1):
    loop_variable_names = axis_per_loop_variable.keys()
    axes = axis_per_loop_variable.values()
    titles = array_per_title.keys()
    arrays = array_per_title.values()
    i = 0
    for loop_variables_values, arrays_chunk in la.zip_array_items(arrays, axes=axes):
        loop_variables_dict = dict(zip(loop_variable_names, loop_variable_values))
        for title_template, array_chunk in arrays_chunk:
            title = title_template.format(loop_variables_dict)
            self.add_graph(array_chunk, title=title, template=template)
            i += 1
            if i % graphs_per_line == 0:
                self.newline()

alixdamman · 2019-08-02T06:28:29Z

I like the idea. I will start to implement it.
Although, I find the argument name axis_per_loop_variable not very intuitive.
It is hard to guess what it does at first sight.

gdementen · 2019-08-02T07:37:32Z

the weird name was meant to make it clear in which direction the mapping goes. Maybe swap keys and values and name it simply: "loop_variables"?
loop_variables={AX_GEO: 'igeo', P.AX_NAT2: 'inat'}

alixdamman · 2019-08-02T07:45:55Z

what bothered me is the meaning of 'variable'. How these variables are used?

gdementen · 2019-08-02T07:49:21Z

they are used to construct the titles. in the example above, they are used to fill "{igeo}" and "{inat}".

larray/tests/test_excel.py

larray/inout/xw_reporting.py

doc/source/api.rst

larray/inout/xw_reporting.py

gdementen

LGTM (modulo the test failures of course). It seems strange to me that the test actually passed on Python2???

gdementen · 2019-08-05T11:09:41Z

also this needs a mention in the changelog

… Excel file with multiple graphs at once

alixdamman requested a review from gdementen December 7, 2018 08:55

gdementen reviewed Dec 12, 2018

View reviewed changes

gdementen reviewed Feb 6, 2019

View reviewed changes

gdementen requested changes Feb 8, 2019

View reviewed changes

larray/inout/xw_excel.py Outdated Show resolved Hide resolved

larray/inout/xw_excel.py Outdated Show resolved Hide resolved

alixdamman added this to the 0.31 milestone Apr 26, 2019

alixdamman force-pushed the master branch 3 times, most recently from 9264074 to 01669f2 Compare May 10, 2019 10:03

gdementen force-pushed the master branch from b1f2c1d to cb86d28 Compare June 26, 2019 15:17

alixdamman force-pushed the dashboard_676 branch from 65a34e0 to 3cdbae2 Compare June 28, 2019 14:10

gdementen assigned alixdamman Aug 1, 2019

gdementen reviewed Aug 2, 2019

View reviewed changes

alixdamman requested a review from gdementen August 5, 2019 08:02

gdementen reviewed Aug 5, 2019

View reviewed changes

gdementen approved these changes Aug 5, 2019

View reviewed changes

bump version to 0.31-dev

f533a3c

alixdamman force-pushed the dashboard_676 branch from 9f6fa15 to 944d05d Compare August 5, 2019 11:59

fix larray-project#676 : implemented ExcelReport class to generate an…

c5c62fe

… Excel file with multiple graphs at once

alixdamman force-pushed the dashboard_676 branch from 944d05d to c5c62fe Compare August 5, 2019 12:06

alixdamman merged commit 4a9c954 into larray-project:master Aug 5, 2019

alixdamman deleted the dashboard_676 branch August 5, 2019 12:14

alixdamman added a commit to alixdamman/larray that referenced this pull request Aug 7, 2019

fixed documentation issues introduced by PR larray-project#730

e2e4aec

alixdamman added a commit to alixdamman/larray that referenced this pull request Aug 7, 2019

fixed documentation issues introduced by PR larray-project#730

28ae78b

alixdamman added a commit that referenced this pull request Aug 7, 2019

fixed documentation issues introduced by PR #730

6103500

fix issue 676 : implemented ExcelReport class #730

fix issue 676 : implemented ExcelReport class #730

Uh oh!

Conversation

alixdamman commented Dec 7, 2018

Uh oh!

gdementen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gdementen Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alixdamman commented Jun 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gdementen commented Jun 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alixdamman commented Jun 14, 2019

Uh oh!

gdementen commented Jun 14, 2019

Uh oh!

alixdamman commented Jun 14, 2019

Uh oh!

alixdamman commented Jun 17, 2019

Uh oh!

gdementen commented Aug 1, 2019

Uh oh!

alixdamman commented Aug 1, 2019

Uh oh!

alixdamman commented Aug 1, 2019

Uh oh!

gdementen commented Aug 1, 2019

Uh oh!

alixdamman commented Aug 2, 2019

Uh oh!

gdementen commented Aug 2, 2019

Uh oh!

alixdamman commented Aug 2, 2019

Uh oh!

gdementen commented Aug 2, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gdementen left a comment

Choose a reason for hiding this comment

Uh oh!

gdementen commented Aug 5, 2019

Uh oh!

Uh oh!

alixdamman commented Jun 13, 2019 •

edited

Loading

gdementen commented Jun 14, 2019 •

edited

Loading