Skip to content

Elastic Search 5 Indexing Performance Issue #20966

Closed
@DarthFly

Description

@DarthFly

Preconditions (*)

  1. Magento 2.3 and Elastic Search v5 configured.
  2. Ability to debug ProductDataMapper.php file - https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/Elasticsearch/Model/Adapter/BatchDataMapper/ProductDataMapper.php

Steps to reproduce (*)

  1. Create a searchable product attribute with a lot of values. Most common - brands. We had around 300 values.
  2. Install a lot of products that use values inside this attribute. In our case it was ~180k, but for the debug you may use sample data.
  3. Run reindex bin/magento indexer:reindex catalogsearch_fulltext (or trigger it in a way to be able to debug)

Expected result (*)

  1. Index is running fine and take sane amount of time.

Actual result (*)

  1. (In our case and develop PC) with elastic 2.3 reindex takes ~40m to complete. With Elastic5 - it was complete in 8h

Issue comes from this method.

private function getValuesLabels(Attribute $attribute, array $attributeValues): array
    {
        $attributeLabels = [];
        foreach ($attribute->getOptions() as $option) {
            if (\in_array($option->getValue(), $attributeValues)) {
                $attributeLabels[] = $option->getLabel();
            }
        }
        return $attributeLabels;
    }

For each product magento run this code providing an array of attribute ids here. This is used for both multiple and single (select) attributes, so $attributeValues may look like [123, 456] (simplified, value looks a little differently), but for brands it mostly contain one value.
$attribute->getOptions() return an array with 300+ values, each is compared to the $attributeValues array to be able to retrieve a label value for this specific attribute.

Even if one search takes milliseconds (well, mostly hundreds of milliseconds) multiply this code by amount of products and you will see drop down of performance here by 5 hours.

There are 2 ways to improve this foreach fast - cache attribute values by id somewhere and validate if each value inside $attributeValues exists as key or cache values that are returned after function (will require joining array). Class is used for every product, so pre-cached value will be used correctly without recalculating it on each product.

Metadata

Metadata

Assignees

Labels

Component: ElasticsearchFixed in 2.4.xThe issue has been fixed in 2.4-develop branchIssue: Clear DescriptionGate 2 Passed. Manual verification of the issue description passedIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedIssue: Format is validGate 1 Passed. Automatic verification of issue format passedIssue: Ready for WorkGate 4. Acknowledged. Issue is added to backlog and ready for developmentReproduced on 2.2.xThe issue has been reproduced on latest 2.2 releaseReproduced on 2.3.xThe issue has been reproduced on latest 2.3 release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions