Your catalog is way too big, you e-commerce won’t work” Well..wrong!
We shared our approach to huge catalogs, and our experience in handling an e-commerce with 700K products.
Following this peculiar case, we decided to go further and we created a 1M products catalog to test Magento response to an environment with such a huge amount of data. We talked about Magento potentiality and analyzed the points of weakness that emerged during our developments.
Introduction to Prompt Engineering (Focusing on ChatGPT)
Giuliana Benedetti - Can Magento handle 1M products?
1.
2. About me
• Project Manager @ Webformat
• Magento and TYPO3 projects
• Requirements analysis
• Planning of development and support activities
3. What’s in the menu?
• Huge catalog
• What we did
• What we are doing
4. Once upon a time..
• The project began as a migration from a proprietary platform to
Magento 1 Community
• Shoes and accessories E-commerce
• We developed the integration between their management software,
that was handling products anagraphic, warehouse and orders
anagraphic
• Integration with Amazon e Ebay
5. Products Database
• The original products database counted around 150k products
• Configurable products
• On average, 10 simple products for each configurable
• Virtual products
6. Continued Growth
• In one year only we reached the amount of 700K products stored in
Magento
• 66k configurable products
7. Challenges faced
Alignment between catalog and management software
Updating warehouse
Reindexing
Generating images
Server response time
Backoffice operations
Third parts modules integration
Export of feed for Google shopping & Co.
Marketplace synchronization
Disk space
8. Updating the catalog (1/2)
• Initially 150k products, this is what we planned:
• Massive initial import
• Frequent update during the day via webservice
• When the catalog started growing, the data exchange volumes via
webservice began unsustainable. The exchange procedure needed a
redesign.
9. Updating the catalog (2/2)
• Today we have700k products
• Based on Magmi and CSV file exchange (product anagraphic)
• Nighttime update – the DIFF
• Exceptional whole catalog update
• The client accepted that the new products will be published with a delay of 1
day
10. Warehouse update (1/2)
• No warehouse fully dedicated to web
• Shared with the offline shops
• It’s not possible to update the warehouse nighttime only and use that
stock during the day
• Frequent updates
11. Warehouse update (2/2)
• Every 15 minutes update from management software by loading
the DIFF
• Only stock update
• Via CSV file writing directly on database (Magmi)
12. Reindex (1/2)
• The bigger the catalog, the slower the reindex
• Initially, the reidex was lauched after each update (15 min)
• After a while, the reindex started being too much time demanding:
the update cycle was starting when the previous update reindex cycle
was still running.
13. Reindex (2/2)
• Solution:
• All the reindexes have been disabled, except for the stock reindex
• All reindexes are now performed after the nighttime import
• Today a full reindex takes around 75 minutes and generates a heavy
load on the database
14. Catalog_url_rewrite (1/2)
• Magento 1 has a critical point with URL rewrite process:
• All product URLs are rewritten, also simple products that are «Not visible
individually» and exist only to be associated to a configurable.
• With 700k products catalog, this meant:
• Creating millions of rows in the catalog_url_rewrite table
• An URL rewrite process that takes hours to be completed
15. Catalog_url_rewrite (2/2)
• A patch has been installed, to avoid the simple and not visible
individually products url generation
• Module Dnd_Patchindexurl:
https://www.magentocommerce.com/magento-connect/dn-d-patch-
index-url-1.html
• Now the reindex process takes around 20 minutes
16. Images generation (1/2)
• One of the main problems that we had to face was the product
thumbnails generation, done by Imagemagik
• Every day hundreds of products are published
We verified that the frontend CPUs were often stressed because of
Imagemagik process and the writing operations on database
17. Images generation (2/2)
• We found a solution in generating the thumbnails during the massive
import, so Imagemagik could work together with the import
procedure
• Nighttime, the images are generated and saved in a dedicated server,
without interfering with user navigation
• Today we have around 881K images saved
18. Server response time
• With such a huge catalog, some categories hold even hundreds of
products
• The first loading time (if they are not cached) is indeed high
• We activated caching on Redis and Varnish
• Not enough, the first loading time was anyway too heavy
19. Solutions 1/2
• Moving the cache clearing process during the night
• At 8 in the morning, the website navigation was starting to suffer
• We planned a job to pre-cache all the critical pages
• Minimized cache invalidation
• Clear cache only for products for which the stock quantity was updated via
WS
20. Solutions 2/2
• Client training to better handle the cache erasing
• Minimized the number of filters in layered navigation
• Each filter increases the reindex time and the pages combinations not cached
21. Backoffice operations
• Initially all the catalog update activities were performed from
Magento backoffice
• Problems:
• Frequent reindexes
• Frequent cache updates
• Server load (the backoffice product list filters are CPU demanding and they
charge MySql)
• Common operations were slown
• Several BE users ended to be concurrent
22. Solutions
• Initially a new backoffice server have been introduced
• MySql load problem was not solved. Reindex re-caching as well.
• We introduced a new process to handle the catalog, using an excel
file
• This improved the efficiency of who was managing the anagraphic data
• Massive excel file import performed each 3 days via FTP
• Categories still handled from backoffice
23. Third party modules integration
• Critical point
• Not all the modules found in the Marketplace are developed in an
optimal way
• They «simply» load the products collection without pagination
• They execute nested query
• There are cycles on collections that initialize all products unnecessarily
• …
• A big profiling and optimization work was needed
24. Feed export (Google Shopping & Co.) 1/2
• While the catalog was growing, the feed time export was encreasing
as well
• In the very beginning, the exports were handled by a Magento
module
25. Feed export (Google Shopping & Co.) 2/2
• Solution steps:
• The module have been replaced with ad-hoc procedures, with high level of
optimization
• The exportation jobs are executed on backoffice server during the night, to
not load the frontend
• It have been introduced a MySql slave as data source, to not load the master
and the website as a consequence
26. Marketplace synch
• We are using M2E Pro
• Client side: EAN code full check
• Tech side: handling the automatic synchronization process
• An automatic full synchronization is too heavy. When synchronize?
• What synchronize?
• Magmi
27. Disk space (1/2)
• Well, here we are: even if disk space is quite cheap, using too much of
it it’s not convenient..
• Data exchange logs very heavy
• Frequent data exchange and huge amount of data
• Log files were growing fast
• Log rotate was activated hourly
• Log are archived after few days
28. Disk space (2/2)
• High image quantity, continuously growing
• Huge feed export
• Huge CSV import files
• …
• Solutions applied:
• Constant monitoring activated
• Activated automatic procedures to clean log, old images, expired feed, etc.
29. Challenges to be faced
Elasticsearch integration
Growing catalog, until 1M products
More sells, more page views
Magento 2 migration
30. Elasticsearch
• For two reasons:
• Improve the search functionality offered to the client
• Minimize the load produced by the Magento internal search engine
• Critical issues to be faced:
• Catalog index time
• Only configurable products?
• What about the sizes?
31. 1M products
• Expected growth: in 1 year we’ll have 1M products
• At the moment we are performing tests with fake products
• We didn’t detect other critical aspects
• At the moment, we had to develop some more data exchange and feed
generation procedures optimization
32. More sells, more page views
• Sessions are increasing the number of not cached pages views is
increasing
• Pre – caching extension
• Increasing Varnish cache TTL
• Minimize products in categories and filters used
• Sales are increasing increasing also frequency of out-of stock
products
• To be evaluated: the impact of new reindex and re-caching politics on client
33. What if..?
• We’re planning with the client a Magento 2 migration
• We started our tests by migrating the actual Magento 1 environment
(700K products) to a Magento 2 installation
• We collected the results and still performing some other tests
34. HW specs
All tests were run on a VirtualBox VM with Linux Ubuntu 16.04.1 LTS, 8
GB RAM, 1 x 2,60 GHz cpu
Lamp configuration was featuring PHP version 5.6, Apache 2.4.18,
MySQL 14.14
Migration was performed from Magento version 1.9.2.2 through 2.1.3
35. Magento 2 migration (1/4)
• DB migration times: 1h 20‘
• BE performances:
BE Operation Magento 1 with cache Magento 2 with cache
Access to catalog almost 5' 7''
Access to product 3'' 10''
Access to categories 7'' 6''
Product searching 1'5'' 3''
36. Magento 2 migration (2/4)
• FE performaces for catalog browsing:
FE Operation Magento 1with cache Magento 2with cache
Catalog browsing / categories 30'' 7''
38. Magento 2 migration (4/4)
• We had some issues with the Catalog Fullsearch reindex (Magento 2)
• we had to apply a patch
https://github.com/magento/magento2/issues/5146
• Catalog Fullsearch reindex without patch takes around 2 hours with
patch applied took around 1 hour, so the times are quite comparable
02:12:37
02:12:37
39. Catalog URL rewrite
• M1 with Dnd_Patchindexurl module: 00:14:34
• M1 without Dnd_Patchindexurl module: 01:03:50
• M2: no catalog URL rewrite. URL Rewrite is handled at the product
saving
42. Yes, we can!
• It’s possible, but not without effort
• Large initial analysis
• Special attention to optimization processes
• What about Magento 2?