How to Audit Your 3PL's Delivery Performance
Your 3PL says they’re hitting 97% on-time. Your customers say otherwise. Three complaints this week, two WISMO tickets, one chargeback.
Someone’s numbers are wrong. Or more likely, you’re both measuring different things. Here’s how to figure out what’s actually happening, using data you already have. No new tools. About an hour of work.
Step 1: Spot-Check 10 Orders in Shopify
Go to your Shopify admin. Orders > Fulfilled > last 30 days. Click into 10-15 orders at random.
For each one, look at the timeline on the right side. You’ll see:
- Order created timestamp
- Fulfilled timestamp (when Shopify got the fulfillment notification from your 3PL)
- Carrier tracking events (first scan, in transit, delivered)
Count the hours between “Order created” and the first carrier scan. Not the “Fulfilled” timestamp. The first actual carrier movement.
Red flags that jump out fast:
- Big gap between “Fulfilled” and first carrier scan. If your 3PL marks orders fulfilled at 2 PM but the carrier doesn’t scan until the next morning, that’s 16+ hours of hidden delay. Your 3PL’s report says “shipped same day.” The package didn’t move until tomorrow.
- Fulfilled timestamps clustering at the same time daily. That’s batch processing. If every order shows “fulfilled” at 4:00 PM regardless of when it was placed, your 3PL is processing in waves, not real-time. Not necessarily bad, but it means a 9 AM order waits 7 hours before anything happens.
- Weekend orders with no activity until Monday. If your store takes orders 7 days a week but your 3PL only ships 5, every Friday night order is already 48+ hours old before it gets touched. Depending on your SLA, that might be fine. But your customers don’t know it’s the weekend at the warehouse.
- Tracking stuck on “label created” for days. The label exists but the package hasn’t moved. Could be a carrier pickup issue, could be the 3PL creating labels during processing and staging packages for a later pickup.
This isn’t scientific. It’s a gut check. But most brands have never clicked through individual order timelines, and the patterns show up in 10 minutes.
Step 2: Pull Your 3PL’s Numbers (and Read the Fine Print)
Log into your 3PL’s dashboard and find their performance report for the same 30-day window. ShipBob, ShipHero, Red Stag, Deliverr: they all have some version of this.
You’ll see a number. Usually 95-99%. Looks great.
Write it down. Then write down the answers to these three questions:
1. When does their clock start? Some 3PLs start counting from order receipt. Others from acknowledgment. Others from a daily cutoff. If your customer ordered at 9 PM and the 3PL’s clock doesn’t start until their 10 AM cutoff the next morning, they got 13 free hours. The customer experienced those 13 hours as waiting.
2. What counts as “shipped”? Label created? Packed and on the dock? Carrier scanned? One merchant told us their “on-time rate” dropped from 98% to 73% after switching 3PLs. Exact same service quality. The new provider just measured “shipped” differently.
3. What’s in the denominator? Are cancelled orders excluded? Backordered SKUs? International shipments? A 97% on-time rate calculated from 800 of your 1,000 orders tells a different story than 97% of all 1,000.
You’re not looking for a gotcha. Most 3PLs aren’t gaming these numbers. They built reporting around their own operational definitions, which make sense from their side of the warehouse wall. The problem is those definitions don’t match what your customer experiences. And they probably don’t match what Shopify shows you.
You now have their number and their definitions. You’ll need both for Step 4.
Step 3: Check the Carrier Timestamps
This is the detective step most ops teams skip. Shopify tracks carrier events for 100+ carriers natively. That data is sitting in your store right now.
If you already use a delivery tracking tool like AfterShip, you have the carrier side covered: pickup, transit, delivery. That data is useful for this step. But the gap most ops teams are missing is everything before the carrier scan: when did the 3PL receive the order, how long did pick and pack take, how long did the package sit before pickup? The carrier data can’t answer those questions directly, but it can help you triangulate.
The export: Go to Orders > Export in Shopify. In your CSV, the columns you need are Created at (order placed), Fulfilled at (3PL marked it shipped), and Shipping Name / tracking number. Open it in Google Sheets or Excel.
Three things to check:
The label-to-scan gap. For 20-30 orders, click the tracking number and look at the first carrier event timestamp. Compare it to the Fulfilled at time in your spreadsheet. You’re looking for a pattern. If Fulfilled at is consistently 6+ hours before the first carrier scan, your 3PL is marking orders shipped well before the carrier has the package. This is the single most common source of discrepancy between your 3PL’s numbers and your reality.
Slow transit outliers. Sort your CSV by Created at. For delivered orders, check how many days from first carrier scan to delivery. You don’t need a pivot table. Just scan the tracking pages for your last 20-30 orders and note the transit days. If most ground shipments land in 3-4 days but you see a cluster of 7-8 day deliveries, click into those. Same carrier and service level? Same destination region? If the slow ones are all going to the same area, it could be a warehouse location issue. If they’re random, check whether the carrier or service level changed on those orders.
Exception patterns. As you click through tracking pages, count how many show “exception,” “delay,” or “address correction” events. If more than 1 in 10 have exceptions, that’s a signal. Group them by type:
- Address corrections → label data quality issue (3PL side)
- Package dimension rejections → packing or label issue (3PL side)
- Missed pickups / “tendered to carrier, not picked up” → carrier coordination issue
- Weather / network delays → carrier issue, not actionable
The point of Step 3 isn’t a perfect analysis. It’s building a picture of what actually happened to your packages versus what your 3PL’s report says happened.
Step 4: Reconcile the Three Numbers
You now have three views of the same 30 days:
- Your Shopify spot check (Step 1)
- Your 3PL’s reported performance (Step 2)
- Carrier timestamp data (Step 3)
They won’t match. Put your 3PL’s report next to your spreadsheet and look for these patterns:
Their number is 10+ points higher than yours. You calculated fulfillment time from Created at to first carrier scan. They calculated from cutoff-adjusted receipt to label creation. Neither is wrong. The gap is definitions: when the clock starts, what “shipped” means, what’s in the denominator. This is the most common outcome and the most fixable. Getting the calculation right is its own challenge, but it starts with agreeing what you’re both measuring.
Their report says “shipped same day” but your tracking shows next-day first scan. You found this in Step 3: the label-to-scan gap. Many 3PLs create labels during packing so boxes are ready for carrier pickup later. Not malicious, just a different definition of “shipped.” But when they report “same day” and the carrier doesn’t have the package until tomorrow, that’s a real gap your customer experiences.
Your fulfillment numbers look fine but deliveries are still late. The 3PL got the package out on time. Transit took longer than expected. Go back to your Step 3 notes on slow transit outliers. If the slow deliveries share a carrier, a service level, or a destination region, you have a specific question for your QBR: “Are we using the carrier and service level we agreed on? Did warehouse routing change?” This is a carrier or routing problem, not a warehouse problem.
Averages look fine but you’re still getting WISMO tickets. Look at the worst 5% of your Step 3 orders. If most orders ship in 10 hours but 1 in 20 takes 72+ hours, that’s 5 angry customers per day at 100 orders/day. Averages hide this. P95 catches it. Your 3PL’s aggregate report will never surface this. Your customer support inbox already has.
Step 5: Have the Conversation
You have specific data now. Here’s what to do with it.
If the numbers genuinely look good: Set a recurring calendar reminder to repeat this audit quarterly. Performance degrades slowly. The thin ice pattern, where everything looks fine while your safety margin quietly evaporates, is the most common way brands get blindsided.
If there’s a definitions mismatch: Bring your three data sources to the next QBR. Not accusations, just: “We’re seeing different numbers because we’re measuring differently. Here’s what we’d like to align on.” Most 3PLs respond well to specific, data-backed requests. They deal with clients who complain without evidence all the time. Showing up with timestamps puts you in a different category.
If performance is genuinely slipping: Document it. Two weeks of data, not one bad day. Then the conversation is straightforward: “Here’s what we’re seeing. We need a plan to get to X by Y date.” If you’re evaluating alternatives, you need this baseline from your current provider anyway. You can’t define “better” without knowing where you are now.
If you use multiple 3PLs: This audit becomes a comparison exercise. Run it for each provider with the same methodology. The same definitions, same data sources, same time window. Otherwise you’re comparing one 3PL’s label-creation time against another’s carrier-scan time and drawing the wrong conclusions.
Making It Stick
This audit takes about an hour the first time. Quarterly, it keeps you honest. The Shopify spot-check alone (Step 1) takes 10 minutes and catches the worst problems on its own. Put it on your calendar.
The manual version works at low volume and for quarterly check-ins. If you’re shipping hundreds of orders daily across multiple providers, the spreadsheet reconciliation breaks down: too many orders to click through, too many tracking pages to check, too many CSVs to keep in sync. At that point you need something that stitches Shopify orders, 3PL timestamps, and carrier events together automatically and alerts you when the numbers drift.
That’s what 3PL Pulse does. One timeline per order, your SLA definitions, your alert thresholds. Try it free to see what your numbers actually look like.