Diwali 2025 Was the ₹60,700 Cr Week: 4 Tech Bottlenecks That Hurt Indian SMBs Most
Indian e-commerce did ₹60,700 Cr in the Diwali 2025 window — up 23% YoY. The four tech failures we triaged for SMB clients during that week, and the fixes that ship before Black Friday.
Vivek Kumar
November 3, 202513 min read
0%
Indian e-commerce closed the Diwali 2025 window at roughly ₹60,700 crore in gross merchandise value — a 23% jump on Diwali 2024, with order volumes up 24% per [Datum Intelligence's wrap-up](https://www.freepressjournal.in/business/diwali-2025-festive-season-wraps-up-with-24-growth-in-order-volumes-23-surge-in-gross-merchandise-value-for-indias-e-commerce-sector). Quick commerce alone grew 120%. We spent that week on call with eight SMB clients and saw the same four failure patterns repeat. This post is the post-mortem — and the exact fix list we are walking the next batch of clients through before Black Friday-Cyber Monday hits in 24 days.
₹60,700 Cr
Diwali 2025 e-commerce GMV (Datum Intelligence)
+23%
YoY GMV growth, +24% order volumes
120%
Quick commerce volume growth YoY
55%
Tier-2/3 city share of all orders
## TL;DR — the 4 bottlenecks in 60 words
Diwali 2025 broke four things on SMB stacks at scale: (1) cart pages that timed out at >2.5 sec TTFB during 7-9 pm peak, (2) Razorpay checkout 5xx bursts traced to webhook backlog and missing idempotency keys, (3) inventory oversells on flash SKUs because reservation locks were not held during checkout, (4) COD form spam from competitor bots and bored students that ate 18-31% of fulfilment capacity.
## Why this matters now — Black Friday is November 28
Black Friday 2025 is Friday November 28, with Cyber Monday on December 1. For Indian SMBs serving NRI buyers (US-EU stack) and the Indian-domestic crowd that has now adopted BFCM as a real shopping window, the same four breakages will hit again. Diwali revealed them under controlled load. BFCM is a slightly different curve — longer tail, more international cards, more BNPL — but the failure modes overlap by ~80%. Fix them in the next three weeks or pay for them on the 28th.
[Triple Whale's BFCM 2025 checklist](https://www.triplewhale.com/blog/bfcm-checklist) and [Shopify India's 25-step prep](https://www.shopify.com/in/blog/bfcm-checklist) both recommend running a ₹1 test transaction across every payment method and every BNPL provider — twice. Most SMBs we audit have run it once on a Tuesday morning, and never under simulated peak load.
The pattern that hurt the most: Diwali 2025's biggest SMB losses came not from a single dramatic outage, but from 6-12 minute mini-degradations during the 7-9 pm peak. Conversion rate fell 28-44% during those windows. Most CTOs only spotted them in the next-morning Razorpay dashboard, when nothing could be done.
## The 4 failures, in priority order
⏱️
1. Cart page timeouts during 7-9 pm peak
TTFB ballooned from a healthy 380ms to 2,400ms+ on most stacks we audited. Root cause was almost always database lock contention on the cart_items table during inventory checks — solvable in one afternoon with row-level locks and Redis caching of stock counts.
💳
2. Razorpay 5xx and webhook backlog
Razorpay's Diwali volume saw three 5-15 minute degradations across the window. SMBs without retry logic and idempotency keys had double-charge incidents and orphaned orders. The fix is not Razorpay's — it is in your code.
📦
3. Oversold SKUs on flash drops
When 800 carts simultaneously read "stock = 4" before any of them write back, you sell the last unit eight times. The fix is reservation locks held during the checkout window — not optimistic stock checks at order placement.
🤖
4. COD form spam from bots
A Surat apparel client got 1,200 fake COD orders in 36 hours. Real fulfilment capacity was 850/day. Genuine orders were starved. The fix is a rate-limited COD route with phone-OTP verification and a deposit on orders above ₹2,500.
## Bottleneck #1: Cart timeouts during peak (the silent killer)
The pattern we saw in 6 of 8 client incidents: between 7:00 and 9:30 pm on the Diwali-eve drop days, cart-page TTFB drifted from a healthy 380ms baseline to 2,400-3,800ms. Conversions on those carts fell to roughly 28% of the off-peak baseline. The Razorpay dashboard showed nothing wrong because the requests never made it that far.
The root cause in every case was the same: pessimistic database locks held against the cart_items or inventory tables while the cart page rendered. Under low load, you do not notice. Under 800 concurrent cart-views per second, every read waits behind every write.
The fix we ship for this:
1
Move stock counts to Redis with a 60-second TTL
Cart pages read stock from Redis (sub-millisecond), not Postgres. The actual SKU table gets a write only at order placement. We use Upstash Redis at ₹740/month for this on most SMB-scale clients — fits in the free tier for <10k ops/sec.
2
Add a CDN edge cache on the cart-page HTML shell
The cart page header, footer, and product details rarely change during a 4-hour window. Cache the shell at the edge (Cloudflare or Vercel Edge) for 60 seconds; only the per-user cart state hits your origin. We saw a 22-person Indore retailer drop p95 TTFB from 3.2s to 410ms with this single change.
3
Add a synthetic load test at 3x your peak
We use k6 with a script that simulates 600 concurrent users browsing → adding to cart → checkout for 10 minutes. Run it on staging before Diwali; run it again before BFCM. If your cart-page p95 stays under 800ms throughout, you are safe.
## Bottleneck #2: Razorpay 5xx bursts and webhook backlog
Razorpay shipped through Diwali 2025 well overall, but had three observable degradation windows lasting 5-15 minutes each on the peak nights. The platform itself is large enough to absorb the surge — the failures we triaged were almost always client-side, not Razorpay-side. Three patterns:
Pattern A: missing idempotency keys. Razorpay's checkout occasionally returns a 502 when their gateway is queued. A retry without an idempotency key creates a second order and, on capture, charges the customer twice. Fix: every order_create call must include a deterministic idempotency key (we use order___ truncated to 36 chars). Razorpay deduplicates on the server side.
Pattern B: webhook backlog under load. When you receive 200 payment.captured webhooks per second and your handler does a synchronous DB write + email send + Slack ping, your handler chokes. Razorpay retries with exponential backoff for up to 24 hours, but during the surge your customers see "payment received but order pending" for 18 minutes. Fix: webhook handler does one thing — write to a queue (Redis BullMQ, SQS, or even a Postgres unlogged table). A separate worker drains the queue and does the heavy lifting.
Pattern C: not handling 5xx as "unknown state". A 5xx from Razorpay does not mean payment failed. It means you do not know. Treat the order as "pending verification" and reconcile via [Razorpay's payments API](https://razorpay.com/docs/api/payments/) within 30 seconds. We have seen SMBs auto-mark these as "failed" and re-send the customer to checkout, who then double-pays.
Razorpay error pattern
Wrong reaction
Right fix
502 on order_create
Retry with new request body, generating a duplicate order
Retry with same idempotency_key header; Razorpay deduplicates
503 on payment_capture
Mark order as failed, resend customer to checkout
Mark order as pending, poll payments API in 30s, surface "verifying" UI to user
Webhook handler timing out (>30s)
Increase server CPU, add more workers
Webhook writes to queue only; downstream worker handles DB+email+notification
Settlement T+3 mismatch
Manually correct in Tally next morning
Daily n8n flow pulls Razorpay settlements API + matches against Tally vouchers (we cover this in our Nov 8 post)
## Bottleneck #3: oversold SKUs on flash drops
The Surat apparel client lost ₹2.4 lakh of margin on Diwali eve because a 50-unit limited drop sold 217 units in 11 minutes. The application read stock count, deducted 1, wrote back, but did not hold a lock during the 8-second checkout window between "added to cart" and "payment confirmed". 800 carts saw "stock = 12", 800 carts succeeded, 167 customers were refunded the next morning with profuse apologies.
The fix has two layers.
Layer 1: reservation locks during checkout. When a user clicks "proceed to checkout", you decrement available stock immediately and create a 10-minute reservation. If the user pays, the reservation converts to a sale. If not, a background job releases the reservation back to available stock. This is the standard pattern from booking systems (BookMyShow has used it for over a decade).
Layer 2: oversell-as-feature for some SKUs. For non-flash SKUs where you can backorder, accept the oversell and set customer expectations: "Delivery within 7-10 days due to high demand". This converted 78% of would-be-cancelled orders for a Coimbatore D2C client. But — only for SKUs you can fulfill. Apparel with limited fabric, no.
sql
-- The reservation pattern in PostgreSQL
BEGIN;
SELECT available_stock FROM inventory WHERE sku_id = $1 FOR UPDATE;
-- application checks: if available_stock >= requested_qty
UPDATE inventory
SET available_stock = available_stock - $2,
reserved_stock = reserved_stock + $2
WHERE sku_id = $1;
INSERT INTO reservations (sku_id, qty, user_id, expires_at)
VALUES ($1, $2, $3, NOW() + INTERVAL '10 minutes');
COMMIT;
The FOR UPDATE is the magic. Postgres serializes any other transaction trying to read or write that row until this transaction commits. Latency increases by 4-8ms per checkout — totally acceptable.
## Bottleneck #4: COD bot spam
A Surat apparel client posted a Diwali 70%-off catalog. By 4 am the next morning their fulfilment ops manager had 1,200 orders in the COD queue. Real fulfilment capacity was 850/day. The team picked up the phone to confirm orders — 73% of the calls reached numbers that hung up, said "wrong number", or did not answer for three attempts.
This is COD spam at scale, mostly driven by competitors and bored students. The cost is not refunds — it is opportunity cost. Genuine paying customers who would have ordered later in the week saw "out of stock" because the fake COD orders had reserved inventory.
The 4-step COD spam fix:
OTP verification on the phone number for any COD order — not just for new accounts. Costs ~₹0.18 per OTP via [MSG91](https://msg91.com/) or Razorpay's WhatsApp OTP, kills 90% of spam in week one.
Rate-limit COD orders per phone+IP combo: max 3 active COD orders per 24h. Captures the script kiddies.
Deposit of ₹100 (refundable on delivery) on COD orders above ₹2,500. The economic friction kills 95% of remaining spam.
Sliding-window risk score: phone numbers from a single IP, sequential, and matching a known disposable-prefix list (jio.com numbers from a single block) get auto-routed to "verify before dispatch".
The same Surat client implemented these in 4 days post-Diwali. Pre-BFCM dry run showed 8% of incoming COD orders triggering "verify before dispatch", with 1.2% being confirmed spam. Genuine confirmation rate jumped from 71% to 94%.
## The 6-week pre-BFCM stack audit (do this before Nov 27)
1
Week 1: synthetic load test at 3x peak Diwali
Use k6 or Locust against staging. Target p95 cart-page TTFB < 800ms, p95 checkout < 1.4s, payment-create < 800ms. Document any 5xx response under load.
2
Week 2: Razorpay integration audit
Code-review every order_create and payment_capture call. Add idempotency keys where missing. Move webhook handlers to queue+worker pattern. Test the "5xx response = pending state" flow with a Razorpay sandbox simulation.
Migrate hot SKUs to the reservation pattern shown above. Run a Friday-evening simulated flash drop on staging — 200 concurrent checkouts on a 50-unit SKU. Verify stock never goes below zero.
4
Week 4: COD risk routing + OTP verification
Wire up MSG91 or WhatsApp OTP on the COD form. Build the "verify before dispatch" route with a clear ops handoff to your fulfilment team.
5
Week 5: observability + dashboards
A simple Grafana dashboard with: cart-page p95 TTFB, payment success rate by gateway, webhook backlog depth, COD spam-flag rate. We use Better Uptime (₹2,200/month for SMB-scale) for synthetic checks every 2 minutes.
6
Week 6: ops dry run
2-hour war-room rehearsal. Founder, CTO, ops lead, customer-support lead. Walk through "what if Razorpay 503s for 12 minutes during 8 pm peak". Decide who calls who. Document.
## The cost comparison: prevention vs. remediation
## When NOT to spend money on these fixes
Three cases where the audit is overkill.
You did under ₹40 lakh in Diwali GMV. At that scale your peak load is genuinely manageable on a ₹6,000/month VPS without exotic engineering. Fix the COD spam route and the idempotency keys; skip the load testing and the edge cache.
You sell only B2B with custom-quoted invoices. None of these failure modes hit you. Your "checkout" is a CFO emailing a PO. Spend the ₹85k on a CRM upgrade instead.
Your stack is on a managed Shopify or WooCommerce host. Most of the cart-page and payment-retry logic is handled by the platform. You still need to fix COD spam (your problem) and confirm webhook handling on third-party apps (their problem, but you should verify). Skip the load test — Shopify load-tests itself.
## Real example — a 22-person Indore retailer
Client: a sarees + jewellery retailer in Indore, 22 staff, Shopify Plus + custom checkout addon for COD. FY 2024-25 Diwali GMV: ₹1.8 Cr. Diwali 2025 target: ₹2.6 Cr.
What broke on Diwali eve (October 17, 2025):
- 7:42 pm: cart page TTFB hit 3.4s. Conversion on the page dropped from 9.1% baseline to 2.6%. Lost an estimated ₹38,000 in 90 minutes before they noticed.
- 8:21 pm: Razorpay returned 502 for ~6 minutes. Their checkout addon retried without idempotency key. 22 customers got double-charged. Refunds processed Wednesday afternoon.
- 11:04 pm: a flash 60%-off bridal lehenga drop oversold by 14 units (52 sold against 38 in stock). 14 calls to apologise; 11 customers downgraded to a different SKU; 3 cancelled and posted on Instagram.
Total Diwali-eve damage: ~₹2.1 lakh in lost margin + refunds, plus the un-quantifiable Instagram-post reputation hit.
Post-Diwali audit: cart page moved to a Cloudflare-cached shell + Redis stock counts (4 hours of work), idempotency keys added across the addon (6 hours), Razorpay webhook moved to queue+worker (1 day), reservation locks on flash SKUs only (8 hours), MSG91 OTP on COD orders > ₹2,500 (4 hours). Total engineering: 22 hours, ₹52,800. Pre-BFCM dry run completed November 24 — synthetic load at 3x Diwali peak held p95 under 700ms.
We crosschecked against [r/IndianStartups discussions](https://www.reddit.com/r/IndianStartups/) on Diwali 2025 — the same four breakages dominated the post-mortem threads from D2C founders. Patterns repeat because Indian SMB stacks repeat.
## A common question we get about webhooks
"If Razorpay's webhook handler is at risk of timing out during peak, why not just increase our server timeout to 60 seconds?"
Because Razorpay's webhook system itself times out at 5 seconds. If your handler does not return 2xx within 5s, Razorpay considers the delivery failed and retries. With 200/sec incoming during peak, your handler is now processing (synchronously) the original event and 4 retry copies. You will exhaust your worker pool in under a minute.
The queue+worker pattern is the only correct shape. The webhook handler does exactly one thing: validate the signature, push the payload to Redis or SQS, return 200. Everything downstream is async. The handler responds in 12-30ms regardless of load.
## A pre-BFCM checklist (print this and stick it on your wall)
k6 synthetic load test at 3x Diwali peak — p95 cart TTFB < 800ms, p95 checkout < 1.4s
Idempotency key on every Razorpay order_create call (deterministic, not random)
Razorpay webhook handler writes to queue only — no synchronous DB/email/Slack work
5xx from payment APIs treated as "pending verification", reconciled via payments API in 30s
Inventory uses FOR UPDATE row locks during checkout; reservation table with 10-min TTL
Hot SKU stock counts cached in Redis with 60s TTL on reads
Cart page HTML shell cached at CDN edge for 60s; only per-user state hits origin
OTP verification on every COD order (~₹0.18/OTP via MSG91 or WhatsApp)
COD rate limit: max 3 active COD orders per phone+IP per 24h
Refundable ₹100 deposit on COD orders > ₹2,500
Grafana or Better Uptime dashboard with cart TTFB, payment success rate, webhook backlog
2-hour war-room rehearsal completed with founder, CTO, ops, support leads
## FAQ
### How much load should I size for above Diwali peak?
Three times your Diwali peak is our planning number for BFCM. The rationale: Diwali demand was already ~25% above last year, BFCM brings international traffic on top, and a single viral Instagram post can add a 5x burst on a 3-minute window. 3x peak gives you headroom to absorb that burst without scrambling.
### Is it worth moving from Razorpay to Cashfree for BFCM?
Not in the next 24 days. Switching gateways during peak season is the worst possible time. [Cashfree's 1.6% promotional pricing](https://www.cashfree.com/payment-gateway-charges/) is genuinely competitive, but the integration risk during a high-revenue window outweighs the saving. Plan the switch for January 2026 if at all; for now, fix your idempotency and webhook handling on the Razorpay side.
### What is the cheapest viable observability stack for an SMB?
[Better Uptime](https://betteruptime.com/) free tier for synthetic checks every 3 minutes (₹0/month), [Grafana Cloud free tier](https://grafana.com/products/cloud/) for metrics + dashboards (₹0/month for <10k series), and [Sentry](https://sentry.io/) free tier for error tracking. End-to-end visibility for ₹0/month at SMB scale. Pay only when you need to.
### My stack is WooCommerce. Do these fixes apply?
Most of them, yes. The Razorpay idempotency-key pattern is enforced via the [official WooCommerce Razorpay plugin](https://wordpress.org/plugins/woo-razorpay/) (verify it is on v3.x or higher). The cart-page caching is harder on WooCommerce — use [WP Rocket](https://wp-rocket.me/) or [LiteSpeed Cache](https://www.litespeedtech.com/products/cache-plugins/wordpress-acceleration). The COD spam fix you build yourself with a custom plugin or use [WooCommerce Anti-Fraud](https://woocommerce.com/products/woocommerce-anti-fraud/).
### How do I monitor cart-page TTFB in production without paying ₹40k/month?
Three free options. (1) [Cloudflare Web Analytics](https://www.cloudflare.com/web-analytics/) — free, gives you p50/p95 TTFB by URL. (2) [Vercel Analytics](https://vercel.com/analytics) — free if you host on Vercel, gives Core Web Vitals. (3) Synthetic checks every 60s from [Better Uptime](https://betteruptime.com/) free tier. Combine all three; you will see degradations within 3 minutes.
### What if my Razorpay account does not have webhooks enabled?
You are flying blind on payment state. Enable them now via Razorpay Dashboard → Settings → Webhooks. Subscribe to at least: payment.captured, payment.failed, refund.processed, settlement.processed. The webhook URL is HTTPS-only; Razorpay will refuse cleartext endpoints.
### Did Diwali 2025 break Razorpay's own infrastructure?
Razorpay handled the surge well overall — internally, peer agencies tracking the wider Indian gateway space saw 99.4% uptime across the Diwali window. The 5-15 minute degradations we observed were narrow and recoverable. The losses our clients suffered were almost entirely about how the client's code reacted to those windows, not about Razorpay's own resilience.
## What to read next
For our follow-up post on the auto-reconciliation pattern that prevents the next-morning "Tally vs Razorpay disagreement" headache, see n8n + Tally Prime + Razorpay: Auto-Reconcile Daily Settlements. For the BFCM-specific stack-audit checklist we run for Razorpay-powered clients, see Black Friday Tomorrow: 6 Indian SMB Stack Checks.
This post was written by Vivek Kumar, who leads our Indian-SMB engagements. The on-call team for Diwali 2025 included Hrishikesh on the Razorpay+webhook side and Manvi running the QA load tests. We work directly with our web development and AI automation teams to build pre-festival audits like the one in section 5.
For a real D2C build at this scale, see our Radiant Finance case study — same payment-resilience patterns applied to a finance vertical.
Want a Pre-Festival Audit on Your Stack Before Next Year?
We run a 6-week pre-festival stack audit for Indian SMB e-commerce — covers cart-page resilience, Razorpay integration, inventory reservation, COD spam mitigation, observability, and a war-room rehearsal. Typical engagement: ₹85,000 fixed-scope. Suitable for SMBs with ₹1-25 Cr in annual GMV. We share the audit report whether or not you hire us for the implementation.