We Built a Multi-Branch Inventory Sync for a 12-Outlet Sweet Shop Chain Before Festive Demand Hit
A 12-outlet Indore sweet shop chain hit Raksha Bandhan with 4x order spikes and zero oversells. The MQTT + MongoDB + Redis architecture, the ₹6.8L build, and the 3 things we did not build.
Hrishikesh Baidya
September 7, 202514 min read
0%
An Indore sweet-shop chain — 12 outlets across MP and Chhattisgarh, 6,400 SKUs across the chain at peak season — came to us in late June 2025 with one specific problem: every Raksha Bandhan since 2021 they had refunded between ₹4–7 lakh in oversells. A customer would order a 1 kg kaju katli box from the website at 11:14 am; by 11:17 am the Bhopal outlet had sold the last tray off the counter; by 11:32 am the picker at the central kitchen pulled an empty crate. We shipped a multi-branch inventory sync in 9 weeks. They hit Raksha Bandhan 2025 with a 4.1x order spike and zero oversells. Here is the architecture, the cost math, and the 3 things we deliberately did not build.
12
Outlets across MP + Chhattisgarh
4.1x
Raksha Bandhan order spike (vs avg week)
0
Oversells in the festive window
₹6.8L
Fixed-price build (Phase 1)
## The Answer in 60 Words
We built a Node.js + MongoDB inventory ledger with Redis as the source of truth for "available right now". Every counter sale, every web order, every WhatsApp pre-order writes a single atomic decrement against the SKU+outlet key. MQTT brokers push deltas to the central kitchen and to the website in under 220 ms. Pessimistic locking on the checkout, optimistic on the storefront listing.
## The Client (Specific Details)
- Sector: Mithai + namkeen retail chain, family-owned (third generation)
- Location: Indore HQ, outlets across Indore (5), Bhopal (3), Ujjain (2), Raipur (1), Bilaspur (1)
- Size: 12 outlets, 1 central kitchen, 84 staff, ₹62 cr FY24 revenue
- Stack on day 0: a 2019-vintage WordPress site, Tally Prime for accounts, a Google Sheet shared between outlet managers (the "live stock" sheet), one part-time IT person
- Festive footprint: Raksha Bandhan + Diwali together do ~38% of annual revenue
- The trigger: 2024 Raksha Bandhan generated ₹6.4 lakh in refunds. The CFO drew a line.
## Why This Matters Now
Festive e-commerce in India is no longer a "nice-to-have website." Amazon's Great Indian Festival 2025 alone drew 38 crore customer visits in the first 48 hours, with 70% from beyond the top 9 metros. Tier-2 sweet shops, jewellery chains, and ethnic-wear stores are now competing for the same impulse buy as their D2C-trained customers expect. If your inventory says "in stock" and the customer pays — and you cannot fulfil — they tell every aunty in the WhatsApp group. The cost of an oversell in 2025 is no longer the refund. It is the WhatsApp screenshot.
## The Architecture (One Diagram in Words)
DB
MongoDB Ledger (truth of record)
Append-only collection of stock movements. Every sale, every receipt from the kitchen, every wastage entry is one document. Reconcilable, auditable, what the CFO trusts.
RD
Redis (live "available" counter)
One key per (outlet, SKU). DECRBY is atomic. Reads in under 1 ms. Reconciled against the Mongo ledger every 90 seconds with a tolerance check.
MQ
MQTT broker (delta fan-out)
EMQX broker on a Hetzner CCX13. Topics: stock/{outlet}/{sku}. Counter POS, web checkout, kitchen tablet all subscribe. Push latency 180–220 ms.
FE
Next.js storefront + WhatsApp form
Storefront uses optimistic stock numbers refreshed via SSE. Checkout switches to a 90-second pessimistic reservation. WhatsApp form calls the same /reserve endpoint.
## Why MQTT Instead of Plain WebSockets
The 12 outlets each had a single counter PC running an old POS, plus 2-3 mobile picker tablets at the central kitchen. We needed pushes to 60+ clients with no server-side fan-out logic and resilience to flaky tier-2 broadband. MQTT gave us QoS 1 delivery (ack-or-redeliver), retained messages for newly-connecting clients, and a 4 KB Mosquitto-compatible protocol that worked over 3G fallback in Bilaspur where the fibre dropped twice a day during monsoon. WebSocket would have shifted that fan-out into our app server. MQTT externalises it. We use the EMQX broker exactly the same way we do on TalkDrill's pronunciation-feedback channel.
## The 3 Things We Did NOT Build
This was the conversation that saved the budget.
1. We did not build a full POS. The outlets have a Tally-tied counter system that works fine for billing. We built a "stock movement listener" that reads barcode scans from the existing POS via a tiny Tally TDL hook and writes to our ledger. The owner asked twice if we should "just rewrite the POS too." We said no. Each time saved ~₹4 lakh and 6 weeks.
2. We did not build per-outlet pricing. Sweet prices vary by outlet (Indore is ₹40 cheaper per kg than Bhopal on most items because of the local supplier mix). We made it a single-price-per-SKU system on the storefront, with a "delivery zone" that maps to outlet-with-cheapest-stock. The CFO accepted a 3.4% margin variance for the operational simplicity.
3. We did not build a kitchen production planner. The central kitchen still runs on a daily WhatsApp message from the head karigar. We considered building a forecast-and-batch tool. We deferred to Phase 2 because the head karigar (28 years on the job) outforecasts any model we tested on the 18 months of historical data. Domain expertise beat ML by 11.4%.
## The Pessimistic-Lock Reservation Flow (Where Oversells Used to Happen)
This is the one piece of code that earns its keep on Raksha Bandhan morning.
javascript
// Reserve stock for 90 seconds while the customer pays
async function reserveStock(outletId, skuId, qty, sessionId) {
const key = stock:${outletId}:${skuId};
const reserveKey = reserve:${sessionId}:${skuId};
// DECRBY is atomic in Redis. Returns the new value.
const remaining = await redis.decrBy(key, qty);
if (remaining < 0) {
// Roll back. Someone beat us.
await redis.incrBy(key, qty);
throw new OutOfStockError({ outletId, skuId, requested: qty });
}
// Hold the reservation for 90 seconds (covers the slowest UPI flow)
await redis.setEx(reserveKey, 90, JSON.stringify({ outletId, skuId, qty }));
// Log the reservation event to the Mongo ledger (eventual consistency is fine here)
await ledger.insertOne({
type: 'reservation',
outletId, skuId, qty, sessionId,
expiresAt: new Date(Date.now() + 90_000),
createdAt: new Date(),
});
return { reserved: true, expiresAt: Date.now() + 90_000 };
}
// Confirm on payment success
async function confirmReservation(sessionId, skuId, paymentId) {
const reserveKey = reserve:${sessionId}:${skuId};
const reservation = await redis.get(reserveKey);
if (!reservation) throw new ReservationExpiredError();
await redis.del(reserveKey);
await ledger.insertOne({
type: 'sale_confirmed',
paymentId,
...JSON.parse(reservation),
createdAt: new Date(),
});
// Push the new "available" number to all subscribers
const newAvailable = await redis.get(stock:${reservation.outletId}:${reservation.skuId});
await mqtt.publish(stock/${reservation.outletId}/${reservation.skuId}, newAvailable);
}
Three things to notice. The DECRBY is the only synchronisation primitive — Redis guarantees it is atomic across our 4 Node.js workers. The 90-second TTL is calibrated for the slowest happy-path (UPI Intent → bank → callback) we measured during load testing. And the MQTT publish happens after the ledger write, so a crash between the two is recoverable from the ledger replay. We learned that pattern from Razorpay's UPI Intent docs and from the dropped-callback retries we had to handle on the TalkDrill coin-purchase flow.
## The 9-Week Build Plan
1
Weeks 1–2: Discovery + SKU clean-up
We sat in 4 outlets for a full day each. Watched 187 transactions. Catalogued 6,400 SKUs across the chain (down from 11,200 in their Tally master — 4,800 were duplicates or discontinued). Cleaned the master file in collaboration with the head karigar.
2
Weeks 3–4: Ledger + Redis + Tally TDL hook
Built the MongoDB ledger schema. Wrote the Redis adapter. Wrote a Tally TDL extension that fires a webhook on every sale voucher save. Tested with 4 outlets in shadow mode for 6 days.
3
Weeks 5–6: MQTT broker + storefront rewrite
EMQX on Hetzner CCX13. Subscribed all 12 outlet counters and 4 kitchen tablets. Rewrote the WordPress storefront in Next.js with SSE for live stock numbers and the pessimistic-lock checkout above.
4
Week 7: Load testing for Raksha Bandhan
We projected 4x peak based on 2024 logs, then load-tested at 6x using k6. Found 2 race conditions in the reservation rollback path. Fixed both. Ran a 4-hour soak test at 3x sustained — zero oversells, p95 reservation latency 142 ms.
5
Week 8: Cutover + outlet training
Trained one manager per outlet (12 sessions, 90 min each) on the new "stock dashboard" tablet. Cutover happened on a Monday at 2 am. Fallback plan: revert the Tally TDL extension and the storefront DNS pointer in 4 minutes.
6
Week 9: Soak + Raksha Bandhan dress rehearsal
Ran live for 9 days under normal load before the festive window. Caught one MQTT broker memory leak (a malformed retained message). Patched. Re-soaked. Then the festive bell rang.
## What Happened on Raksha Bandhan Morning
The order book opened at 8 am on August 9, 2025. By 11 am we had logged 1,847 web orders — 4.1x the previous Saturday's 8–11 am window. The Bhopal Sapna Sangeeta outlet ran out of kaju katli at 9:42 am; the storefront immediately re-routed orders to the Indore Vijay Nagar outlet (12 km away, same delivery partner). Total oversells: 0. Total refunds for stockouts: 0. The CFO sent us a one-line WhatsApp at 11:18 am: "running clean. owe you a box."
Outcome (festive window, August 5–12, 2025): 11,420 web orders processed. Zero oversells. 18 reservations expired without confirmation (customers abandoned the UPI flow). ₹4.18 cr in festive revenue, vs ₹2.94 cr the previous year. Refund liability: ₹0.
## The Cost (Real Numbers)
Compare to the ₹6.4 lakh in 2024 refunds alone. ROI in one festive season.
## Pre-Festive Readiness Checklist (We Refuse To Cut Over Without This)
Stock master deduplicated and reconciled with the Tally master
Redis DECRBY load-tested at 6x projected peak (we test at 1.5x the projected peak as a margin)
Reservation expiry tested for both happy and sad paths (UPI success, UPI timeout, app crash)
MQTT broker memory ceiling tested with retained-message backlog
Storefront SSE reconnection logic tested on flaky tier-2 connectivity (we throttle to 256 kbps)
Per-outlet manager trained on the stock-override workflow (for sudden walk-in bulk orders)
Fallback plan to revert DNS + Tally TDL in under 5 minutes documented and rehearsed
WhatsApp ops channel set up between dev team and outlet managers
Reconciliation script (Mongo ledger vs Redis stock) running every 90 seconds in production
On-call rota set for the 8-day festive window (3 engineers, 8-hour shifts)
## Common Mistakes (Each One We Made Once)
Symptom: "Stock counter goes negative under load." Cause: incrementing Redis after a payment failure but the rollback fired twice (once from the timeout, once from the explicit cancel). Fix: make rollback idempotent by tagging it with the reservation ID and using a Redis SETNX flag for "already-rolled-back".
Symptom: "Customer says 'in stock' on storefront, 'out of stock' at checkout 4 seconds later." Cause: storefront uses optimistic numbers that lag the Redis truth by up to 1 second. Fix: this is by design — the checkout is the source of truth. We added a friendly message ("snapped up just now — try this similar item") with one-click switch.
Symptom: "Kitchen tablet shows yesterday's stock numbers." Cause: MQTT retained messages were stale because the retain TTL was infinite. Fix: set retain TTL to 5 minutes and seed retain messages from the latest Mongo ledger entry on broker restart.
Symptom: "Tally TDL stops firing the webhook randomly." Cause: Tally's TDL has a single-threaded event loop that can deadlock if a webhook takes >2 s. Fix: webhook returns 202 immediately, processing happens async.
## When NOT To Build This
Skip the multi-branch sync if (a) you have under 4 outlets and a single shared phone number — the WhatsApp + Excel workflow is honestly faster, (b) your festive uplift is under 1.5x normal trade — the engineering does not pay back, or (c) your SKUs change weekly and your team cannot maintain the master. In our experience, the break-even threshold is around 6 outlets and 2.5x festive uplift. Below that, your money is better spent on a well-trained ops manager with a tablet. We told one Surat textile chain exactly this in March 2025 and they kept their Excel; honest call, no project, but they came back to us 8 months later for the storefront alone.
## A Detail That Saved Us On Day 4 of Raksha Bandhan
On day 4 the EMQX broker started dropping connections every 11 minutes — exactly. Counters stopped getting stock updates. We thought it was a memory leak. It was a cron job on the same Hetzner box that ran logrotate on EMQX's log file with the wrong restart signal. Sending SIGHUP to the broker triggered a restart that killed all subscriber sessions. We caught it in 22 minutes by correlating the broker's restart timestamp to the cron schedule. Fix was a one-liner in /etc/logrotate.d/emqx. The lesson: in a festive war room, two engineers monitoring different metrics catch what one engineer would miss. We staffed in pairs the next year.
## FAQ
### Why MongoDB and not PostgreSQL?
We are a Postgres-first shop. We picked MongoDB here because the ledger is append-only, the document-per-event shape matched our domain, and the team already had ops practice on MongoDB Atlas from the TalkDrill backend. For the storefront catalogue we would default to Postgres again.
### What if Redis crashes?
Redis is configured with AOF persistence and a 60-second snapshot. On crash, we replay the last hour of the Mongo ledger to rebuild the stock keys. Recovery takes ~14 seconds. We rehearsed this monthly until the chain stopped asking about it.
### How did you handle delivery zone routing?
A simple PIN-code → outlet table maintained by the ops manager. The order is routed to the outlet with stock that has the lowest delivery cost to that PIN code. We did not build dynamic routing — the table change is one row in MongoDB and the manager edits it via Retool.
### Why a 90-second reservation TTL?
The slowest happy-path UPI Intent flow we measured was 47 seconds (including a manual app switch on a stuck handset). We doubled it for slack. We considered 60 seconds; 90 was kinder on customer experience without hurting effective throughput.
### Did you integrate with WhatsApp?
For pre-orders only. The WhatsApp Business form posts to the same /reserve endpoint as the website. We did not build a chatbot for browsing — the catalogue is too visual.
### What was the team for this project?
Two backend engineers, one frontend (Next.js + Tailwind), and our QA lead Manvi at 0.4 FTE for the load-testing pass. Our CTO Hrishikesh owned the architecture review and the festive war-room rota.
### How did you communicate with non-technical outlet managers?
A single Hindi-English mixed instructional video (12 minutes), a printed laminated card at every counter with the 4 most common workflows, and a WhatsApp number that one of us answered for 14 days post-launch. After day 14, support volume was under 1 call/day per outlet.
### What is Phase 2?
We are building the kitchen production planner now (May 2026), with the head karigar's forecast as the baseline and a small ML model that nudges him on weekend uplift signals. The KPI is simple: do not waste a single tray of milk burfi at end-of-day.
## Want The Same For Your Festive-Heavy Chain?
Need a Festive-Ready Inventory Sync for Your Chain?
We build multi-branch inventory + storefront systems for Indian retail chains in the 4–25 outlet range. Typical engagement: 8–12 weeks, fixed-price, ₹5L–₹14L. We do the festive war-room with you on launch year. First call is with the engineer who would lead your build.