Description
Preconditions (*)
- Magento 2.3 and Elastic Search v5 configured.
- Ability to debug ProductDataMapper.php file - https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/Elasticsearch/Model/Adapter/BatchDataMapper/ProductDataMapper.php
Steps to reproduce (*)
- Create a searchable product attribute with a lot of values. Most common -
brands
. We had around 300 values. - Install a lot of products that use values inside this attribute. In our case it was ~180k, but for the debug you may use sample data.
- Run reindex
bin/magento indexer:reindex catalogsearch_fulltext
(or trigger it in a way to be able to debug)
Expected result (*)
- Index is running fine and take sane amount of time.
Actual result (*)
- (In our case and develop PC) with elastic 2.3 reindex takes ~40m to complete. With Elastic5 - it was complete in
8h
Issue comes from this method.
private function getValuesLabels(Attribute $attribute, array $attributeValues): array
{
$attributeLabels = [];
foreach ($attribute->getOptions() as $option) {
if (\in_array($option->getValue(), $attributeValues)) {
$attributeLabels[] = $option->getLabel();
}
}
return $attributeLabels;
}
For each product magento run this code providing an array of attribute ids here. This is used for both multiple and single (select) attributes, so $attributeValues
may look like [123, 456]
(simplified, value looks a little differently), but for brands
it mostly contain one value.
$attribute->getOptions()
return an array with 300+ values, each is compared to the $attributeValues
array to be able to retrieve a label value for this specific attribute.
Even if one search takes milliseconds (well, mostly hundreds of milliseconds) multiply this code by amount of products and you will see drop down of performance here by 5 hours.
There are 2 ways to improve this foreach
fast - cache attribute values by id somewhere and validate if each value inside $attributeValues
exists as key or cache values that are returned after function (will require joining array). Class is used for every product, so pre-cached value will be used correctly without recalculating it on each product.