-
Notifications
You must be signed in to change notification settings - Fork 6
fix issue 676 : implemented ExcelReport class #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- looks mostly good. Can we distribute the .crtx files?
- do you use all those .crtx files? If not, remove the unused ones.
larray/inout/xw_reporting.py
Outdated
"Please provide one for at least 'width' or 'heigth'") | ||
width = cls._get_width(width) | ||
height = cls._get_height(height) | ||
cls._default_size = (width, height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missunderstood my comment about this 😢. Let's talk about it tomorrow...
9264074
to
01669f2
Compare
Using win32com instead of xwlings leads to problems when exporting NaN values: demo = load_example_data('demo')
pop_be = demo.pop['Belgium']
pop_be_nan = pop_be.astype('O')
pop_be_nan[2013] = nan
with open_excel('test_nan.xlsx', overwrite_file=True) as wb:
wb['Sheet1'] = ''
wb['Sheet1']['A1'] = pop_be_nan.dump()
wb.save()
xl_wb = wb.api
data_sheet = xl_wb.Worksheets(1)
data_range = data_sheet.Range
data_cells = data_sheet.Cells
data_range(data_cells(5, 1), data_cells(7, 4)).Value = pop_be_nan.dump()
wb.save() produces an Excel file with the following values:
FWIW, I think the magic done by xlwings is here : https://github.com/xlwings/xlwings/blob/master/xlwings/_xlwindows.py#L1040 I suggest to use slwings to dump the data but use win32com when it comes to generate graphical elements. What do you think? |
We could workaround the issue easily (by doing essentially what xlwings does). Whether it is worth it or not depends on the speed difference and the time it takes to generate the reports. I mean, if it's smaller than a 10% difference, or the total time for our biggest reports is smaller than 1 second, it's probably not worth avoiding xlwings, but in my tests I think I remember the difference was important both relatively and in seconds. That said, there was multiple factors involved (early binding vs late binding, disabling screen updates, etc.) so I am unsure I timed and/or remember the right thing! |
Yes but you know how it goes. We start with a small workaround (we essentially copy/paste what xlwings does), then users complain because some special values were not taken into account. So, we adapt our workaround (again by copy/pasting some code from xlwings), etc, etc... All of this involving more code to maintain while xlwings does the job. I think using app = wb.app.api
app.ScreenUpdating = False already improve quite the performance. Once the early binding implemented, I think it should be enough. In any case, there is still the possibility to go back to win32com in the future if we note that the performance is really too bad. But for the moment, I would like to avoid more hacky code to maintain. |
Oh well, I know that pretty well. Honestly, it's all because at some point JD compared the timings of the APL code and the Python code and I wanted to make sure we are faster 😉. Before you scrap the win32com code, could you please time it on a realistic example and then time again the same example once you have switched to xlwings (and report the result here)? Then we can have some facts to remember. FWIW, the screenupdating thing is done by default in larray for a while now. |
If APL is faster, I can guess it's essentially because special values are not handled.
I already made the move in my local branch and then added new commits but switching from xlwings to win32com involved to modify a few lines of code. I'll make the benchmark but after I'm happy with the API I end up with. |
Well, OK, you win. I made the following benchmark script: from xlwings import xlplatform
def dump2(array):
return [[xlplatform.prepare_xl_data_element(x) for x in row] for row in array.dump()]
demo = load_example_data('demography')
pop = demo.pop[2015, 'M', 'BE'].astype(float)
pop[0] = nan
print(pop.info)
with open_excel(os.path.join(TESTDATADIR, 'test_pop.xlsx'), overwrite_file=True) as wb:
# ---- win32com ----
xl_wb = wb.api
data_sheet = xl_wb.Worksheets(1)
data_range = data_sheet.Range
data_cells = data_sheet.Cells
stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = pop.dump()"
t = timeit.timeit(stmt, globals=globals(), number=200)
print('win32com - current ', t)
wb.save()
with open_excel(os.path.join(TESTDATADIR, 'test_pop2.xlsx'), overwrite_file=True) as wb:
# ---- win32com ----
xl_wb = wb.api
data_sheet = xl_wb.Worksheets(1)
data_range = data_sheet.Range
data_cells = data_sheet.Cells
stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).Value = dump2(pop)"
t = timeit.timeit(stmt, globals=globals(), number=200)
print('win32com + check value', t)
wb.save()
with open_excel(os.path.join(TESTDATADIR, 'test_pop_xw.xlsx'), overwrite_file=True) as wb:
# ---- xlwings ----
data_sheet = wb.sheets[0]
data_range = data_sheet.range
data_cells = data_sheet.cells
stmt = "data_range(data_cells(1, 1), data_cells(pop.shape[0] + 1, pop.shape[1] + 1)).value = pop.dump()"
t = timeit.timeit(stmt, globals=globals(), number=200)
print('xlwings ', t)
wb.save() Results are:
|
65a34e0
to
3cdbae2
Compare
@alixdamman can you finalize this by next week or do we schedule it for 0.32? |
Depends if you have time to review the PR |
Depends also if we agree on the api. I am not sure it is a good idea to by default add each new graph to the right and having a method |
Requiring users to use newline bothers me too, but the subplot api where you must (AFAIK!) specify the exact grid position for each "item" is painful. It is nice to have in some situations but I don't want that to be the only way to work. On the other hand, specifying the number of graphs per line once and then calling add_graph repeatedly (and it would call "newline" automatically) seems like a nice improvement. Besides, it goes in the direction of something I suggested to Marie when I reviewed some of the code in demo (and she found it interesting and readable): dashboard.add_graphs({"Taux de fécondité - {igeo} - {inat}": icf},
axis_per_loop_variable={'igeo': AX_GEO, 'inat': P.AX_NAT2},
template='CompPP_FEC_1GEO_nat1',
graphs_per_line=len(P.AX_NAT2)) It should not be too hard to implement such a method. Something like this untested code should do the trick: def add_graphs(self, array_per_title, axis_per_loop_variable, template=None, graphs_per_line=1):
loop_variable_names = axis_per_loop_variable.keys()
axes = axis_per_loop_variable.values()
titles = array_per_title.keys()
arrays = array_per_title.values()
i = 0
for loop_variables_values, arrays_chunk in la.zip_array_items(arrays, axes=axes):
loop_variables_dict = dict(zip(loop_variable_names, loop_variable_values))
for title_template, array_chunk in arrays_chunk:
title = title_template.format(loop_variables_dict)
self.add_graph(array_chunk, title=title, template=template)
i += 1
if i % graphs_per_line == 0:
self.newline() |
I like the idea. I will start to implement it. |
the weird name was meant to make it clear in which direction the mapping goes. Maybe swap keys and values and name it simply: "loop_variables"? |
what bothered me is the meaning of 'variable'. How these variables are used? |
they are used to construct the titles. in the example above, they are used to fill "{igeo}" and "{inat}". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo the test failures of course). It seems strange to me that the test actually passed on Python2???
also this needs a mention in the changelog |
9f6fa15
to
944d05d
Compare
… Excel file with multiple graphs at once
944d05d
to
c5c62fe
Compare
Not documented yet but ready to start the discussion.