A loan tape is the spreadsheet a seller hands you to describe a pool — one row per loan, a few hundred columns of FICO, LTV, DSCR, occupancy, balance, rate, vesting, and status — and it is the single document that the entire bid is built on, which is exactly why it deserves more suspicion than it usually gets. I spent a decade on the other side of this problem, first building Pagaya's consumer-loan origination platform at roughly $10 billion a year of flow, where the whole business was deciding which loans to buy from which originators and being right about it at scale, and now at Fundable, where my co-founder and I build the AI platform that fund and asset managers run their diligence and operations on. This post is the checklist I actually use when I look at a tape, written for the analyst or principal who has a pool in front of them and a bid due Friday.
Start from the premise that the tape is wrong
Not maliciously wrong, usually — just stale and self-reported, which amounts to the same risk. The tape is assembled from the originator's loan-origination system, and that system captured the borrower and the collateral at the moment of funding and then mostly froze them in time. FICO scores age. LTVs are computed against original appraisals that may be two years old in a market that has moved twenty percent in either direction. Vesting is listed as an individual when the title quietly moved into an LLC after a refinance. Occupancy says owner-occupied because that's what the borrower attested at application, regardless of what's true now. The single most useful mental shift in tape diligence is to stop treating the tape as data and start treating it as a set of claims that each need to be corroborated — some cheaply, some expensively, but none for free.
The fields that quietly lie
Some columns are reliable because they're mechanical — current unpaid principal balance, note rate, and maturity date are pulled straight from the servicing system and rarely diverge. The fields that get buyers hurt are the ones that require an external source of truth to verify and almost never get it.
Occupancy is the classic one, because owner-occupancy fraud is the most common misrepresentation in residential lending, and a non-owner-occupied property carries different default behavior and different pricing than the tape implies. LTV is only as good as the appraisal behind it and the date of that appraisal, and a pool marked at 65% LTV against appraisals from the last cycle peak is not a 65% LTV pool. Lien position sounds binary until you pull title and find an intervening lien — a tax lien, a mechanic's lien, a second that was supposed to be subordinated and wasn't. DSCR on an investment property is computed from a rent figure that was true the month the loan funded and says nothing about whether the tenant is still paying. And status — the field that says "current" — is the one I trust least of all, because "current" on the originator's tape and "current" in the county recorder's office are two different facts that diverge more often than anyone wants to admit.
Reconcile the tape against the documents, not against itself
The diligence that most buyers run is statistical sampling — pull ten or twenty percent of the files, have a reviewer check the documents against the tape, extrapolate the error rate to the pool, and sign off. That catches gross errors, but it has two structural weaknesses: the sample is small, so anything below the sampling threshold is invisible, and the review is a human reading PDFs, so it's slow, expensive, and inconsistent across reviewers. The principle that matters is simple to state and hard to operationalize at scale — you cannot validate the tape from the tape, you have to validate it against the underlying chain of evidence, which means reconciling the numbers on the spreadsheet against the bank statements, the appraisal, the title commitment, the note, and the public record for every loan, not for a sample of them.
To make that concrete, two catches from our diligence work in the last couple of months, both real. In the first, we ran tape-versus-documents validation for a residential-mortgage servicer affiliated with a $60B-AUM institutional credit investor, pulling public-record enrichment on every loan in a non-QM pool, and flagged one loan with an active pre-foreclosure filing already sitting in the county recorder's office — the originator's tape said "current," the buyer's analysts had reviewed the file and signed off, and both had missed it. In the second, a non-QM originator was sending a tape to that same buyer, and we cross-referenced the borrower's bank-statement PDFs against the liquid-assets line of the tape and found the stated ending balance exceeded the statements by $1.5 million — not a rounding error, but the difference between a creditworthy borrower and a fraud risk. Neither of those is exotic. Both are the predictable result of a tape that nobody reconciled against the documents underneath it.
The checklist, loan by loan
When I work a tape, this is the order of operations, structured so the cheapest disqualifying signals run first and the expensive judgment work runs only on what survives.
Public-record enrichment on the whole pool, first. Before anyone opens a single PDF, pull county-recorder and public-record data on every property — pre-foreclosure and notice-of-default filings, tax delinquency, mechanic's and tax liens, recent transfers, and ownership changes. This is the cheapest signal per loan and the one most likely to disqualify a loan outright, and running it on the full pool instead of a sample is the single biggest upgrade most diligence processes can make.
Title and lien position. Confirm the lien position the tape claims, check for intervening liens, and verify vesting matches the borrower of record — a surprising share of "first lien" loans have something the buyer didn't price.
Collateral and value. Re-derive LTV against a current valuation rather than the original appraisal, and pay attention to appraisal date and method — a desktop or AVM valuation at origination is a weaker anchor than a full interior appraisal, and the gap matters most in volatile segments.
Income, assets, and capacity. For non-QM in particular, reconcile the bank-statement, DSCR, or asset-depletion figures the loan was underwritten on against the actual documents, because this is where the underwriting was already non-standard and the variance is widest.
Status and performance, against an external source. Don't accept "current" on faith — corroborate it against payment history and public-record events, since a loan can be current on the last reported payment and already in pre-foreclosure on the courthouse calendar.
What clean diligence is actually worth
Diligence is expensive today because it's manual. Institutional buyers running third-party-review work are paying roughly two-to-five-hundred dollars per loan for what amounts to a labor-intensive document review, and block-trade diligence on a $500M tape can be a $15M exercise before the buyer has even closed — which means most buyers ration diligence, sample instead of reviewing the whole pool, and accept a known blind spot in exchange for getting the bid out on time. That tradeoff is the actual cost of manual diligence, and it's the one that AI changes: not the analysis itself, since a good analyst could always find a misstated balance or a missed pre-foreclosure, but the unit economics of doing that analysis on every loan instead of a sample. When reviewing the whole pool costs roughly the same as reviewing ten percent of it, the rational diligence strategy changes, and the blind spot that buyers have been pricing in as unavoidable risk stops being unavoidable.
The takeaway
A loan tape is a seller's best, stalest, most self-interested summary of a pool, and the buyers who consistently price pools well are not the ones with the sharpest models — they're the ones who treat the tape as a set of claims to be corroborated against the documents and the public record, loan by loan, before the bid goes out. The expensive surprises in whole-loan buying are almost never genuinely unknowable at bid time; they're sitting in a county recorder's office or on page three of a bank statement, waiting for someone to reconcile the tape against the evidence underneath it. If you're bidding on pools and you're sampling because diligencing the whole tape by hand is too slow and too expensive, that's exactly the constraint worth a conversation.