Misclassification of U.S. property record types is the primary driver of pipeline inaccuracy, manual QA cost, and client data quality failures at real estate data platforms. The U.S. recording system spans 3,144 county jurisdictions with no uniform instrument taxonomy, format standard, or indexing convention making record-type literacy the most consequential engineering capability a data operations team can build.
Table of Contents
More than 100 million property instruments are filed annually across 3,000+ U.S. county recording jurisdictions each with its own rules, formats, and indexing conventions. For data operations teams, that fragmentation surfaces daily as misclassified instruments, broken pipelines, and inaccurate client data.
Most failures share a single root cause treating all real estate document types as one undifferentiated category. Platforms that classify property records by their specific structural characteristics and account for how those characteristics vary across states produce fundamentally better data.
Written for data operations leaders and directors accountable for pipeline accuracy at scale. Engineers: see Sections 7–8 and the appendices. Product managers: see Sections 7 and 9.
3,144
U.S. county and county-equivalent recording jurisdictions, each operating under its own recording rules.50
separate state recording statutes governing what instruments must be filed, in what format, and how indexed.$16.2B
in title insurance premiums written in 2024 a measure of the transaction volume driving recording instrument filings.92%+
of the U.S. population now in e-recording counties yet format variation persists even within digital recording systems.U.S. property recording is fragmented because it was never designed as a national system. It evolved over 400 years as a patchwork of colonial-era statutes, state laws, and county-level administrative practices. Understanding this history is essential context for any data operations leader building a pipeline that must work across the full recording landscape.
1634
Year U.S. property recording began Massachusetts Bay Colony ordinance requiring land transfers to be recorded publicly400+
Years of independent county-level administration that produced today’s 3,000+ distinct recording formats92%+
of the U.S. population now resides in counties that accept e-recording but format standardization remains incompleteSave this report and reference it when setting your next lender SLA.
The foundations of the U.S. property recording system were laid in the Massachusetts Bay Colony in 1634, when the colonial government required that land transfers be recorded with a local registrar to provide public notice and prevent fraudulent double-conveyances. Virginia followed with its own recording statute in 1640, and the other colonies established similar requirements throughout the seventeenth century.
Critically, each colony developed its recording system independently with different administrative structures, different requirements for what must be recorded, and different indexing methods. When the United States was established, the Constitution left property law as a state function.
The colonial-era recording systems became state systems, and the states delegated day-to-day administration to counties. The result was that the recording infrastructure of a new nation was not designed from the ground up it was inherited from thirteen separate colonial experiments and expanded westward as new states entered the union.
As the United States expanded westward through the nineteenth century, new states established their own recording systems based on the practices of the states their settlers came from modified by local land conditions, legal traditions, and administrative preferences.
States in the South adopted practices from Virginia and the Carolinas. Midwestern states drew on New England and mid-Atlantic models. Western states, influenced by Spanish and Mexican land law in former territories, developed unique instruments including California’s Grant Deed and Texas’s Vendor’s Lien Deed that have no direct equivalent in eastern state recording systems.
By 1900, the United States had more than 3,000 county recording jurisdictions operating under 48 different state recording statutes. The fragmentation that data platforms must navigate today was already fully formed more than a century ago.
For most of the twentieth century, property records were entirely paper-based. Documents were submitted physically to county recorder offices, stamped with a recording date and instrument number, entered in a grantor-grantee index maintained in large ledger books, and stored in filing cabinets or bound volumes. The quality of the index depended entirely on the accuracy of the clerk entering the data by hand.
This paper era created the legacy document challenge that data platforms face today. An estimated 40 to 60 percent of records that data platforms must process particularly for historical ownership searches, chain of title construction, and pre-1980 encumbrance research exist only as scanned images of paper originals.
Field layouts are non-standard. Handwriting is common. Instrument numbers are formatted inconsistently. Legal descriptions span multiple lines in formats that OCR systems frequently misread.
| Recording Era | Approx. Period | Document Format | Primary Processing Challenge |
|---|---|---|---|
| Handwritten ledger era | Pre-1920 | Handwritten on paper; clerk-transcribed index entries | OCR accuracy near zero; manual transcription required; significant name spelling variation |
| Typewritten paper era | 1920–1970 | Typed on paper; grantor-grantee index in ledger books | OCR generally workable but inconsistent; non-standard layouts; no parcel-level indexing in most counties |
| Early photocopy era | 1970–1985 | Carbon copies and photocopies; first microfilm archives | Image quality degradation; microfilm scan artifacts; partial digital indexes with paper originals |
| Early digitization era | 1985–2000 | Scanned paper documents; first electronic indexes | Index available digitally; document images are variable-quality scans; instrument numbers begin standardizing |
| E-recording era | 2000–present | Native digital in growing number of counties; XML-structured in PRIA-compliant jurisdictions | Dual-format pipelines required; PRIA-compliant and non-compliant documents coexist in same county systems |
The Uniform Land Transactions Act (1970s) attempted to create a model recording statute state could adopt. It was adopted by very few. The Uniform Electronic Transactions Act (UETA) and ESIGN (both circa 2000) created the legal foundation for electronic recording but left implementation details to individual states and counties, perpetuating format variation even as documents moved from paper to digital.
PRIA’s electronic recording standards represent the most serious current effort at standardization. PRIA’s XML-based data interchange standards define common field names and structures for the most common instruments. However, PRIA standards are voluntary.
As of 2024, PRIA estimates approximately 60 percent of U.S. recording jurisdictions accept electronic submissions but full PRIA XML compliance is a much smaller subset of that figure.
Operational implication:
When a data operations leader asks why their pipeline requires separate classification logic for two adjacent counties in the same state, the answer is 400 years of independent local administration. Building classification architecture that treats each county’s format as a legitimate variant not a deviation from a standard that does not actually exist – is the only approach that scales.
U.S. property records are organized at the county level, with recording authority distributed across more than 3,000 jurisdictions. There is no single national standard for how records are filed, indexed, or formatted creating significant variation in document structure, field naming, and data completeness that intelligent property document processing platforms must navigate at scale.
Property recording in the United States is a function of state law administered at the county level. Each state defines which instruments must be recorded, what information they must contain, and how they must be indexed. Counties then implement those requirements with discretion that has, over decades, produced a recording landscape of extraordinary inconsistency.
At the federal level, certain instruments most notably IRS tax liens and federal court judgments are filed directly with county recorders under federal statutory authority. These filings coexist with state-level instruments in the same county recording system but often follow different formatting conventions than locally originated documents.
At the state level, recording statutes define the legal framework: which instruments provide constructive notice when recorded, what must be included for a document to be accepted for recording, and what indexing requirements apply. States vary significantly in these requirements some mandate grantor-grantee indexing only, others require parcel-level indexing, and a growing number have adopted digital submission standards that partially align with PRIA guidelines.
At the county level, recorders implement state law with considerable operational discretion. The result is that two adjacent counties in the same state may use entirely different document labels for the same instrument, index the same fields under different names, and produce records in formats digital, scanned, or paper-based that require completely different ingestion approaches.
The Property Records Industry Association (PRIA) has developed model standards for electronic recording and document data interchange that, if uniformly adopted, would significantly reduce cross-county format variation. PRIA’s XML-based standards provide a common data structure for key recording instruments and have been adopted in whole or in part by a growing number of jurisdictions.
However, adoption remains uneven. Many counties particularly smaller rural jurisdictions and those with legacy paper-based systems have not implemented PRIA-aligned digital recording. For data platforms, this means that PRIA-compliant ingestion logic cannot be assumed across a full county coverage map. Classification systems must be built to handle PRIA-structured records, partially structured records, and fully unstructured scanned documents within the same pipeline.
Operational implication:
A data platform covering all 3,000+ U.S. counties cannot rely on format consistency. Pipeline architecture must treat each county’s recording format as a distinct variant not an exception and build classification logic that handles the full spectrum from structured digital records to handwritten historical documents.
The main types of property deeds in the United States are Warranty Deeds, Quitclaim Deeds, Grant Deeds, and Trustee’s Deeds. Each conveys ownership interests but differs in the warranties provided, the states where it is commonly used, and the data fields it contains differences that have direct implications for how data platforms must classify and extract these instruments.
Conveyance documents are the instruments through which ownership of real property transfers from one party to another. They represent the highest-volume document category in most county recording systems and form the backbone of chain of title data. Misclassifying or mis-extracting conveyance instruments has immediate downstream consequences for any data product that depends on accurate ownership history, including those that power automated title search workflows.
A Warranty Deed is a deed in which the grantor guarantees clear title to the grantee and agrees to defend against any future claims on that title. It is the most common conveyance instrument in most U.S. states and typically contains the following key data fields – grantor name, grantee name, legal description of the property, consideration amount, recording date, and instrument number.
State-level naming variations create classification complexity. In some jurisdictions, the instrument is labeled a General Warranty Deed; in others, it appears as a Statutory Warranty Deed or simply a Deed. Platforms relying on document label matching alone will misclassify a significant percentage of warranty deeds that do not use the expected label.
A Quitclaim Deed is a deed in which the grantor conveys whatever interest they hold in the property without warranty of title and without guarantee that any interest actually exists. Quitclaim deeds are commonly used in family transfers, divorce settlements, and corrections of prior recording errors. They are structurally similar to warranty deeds but carry significantly different implications for chain of title integrity.
From a data quality perspective, quitclaim deeds introduce risk. Because they convey no warranty, they cannot be used to confirm that the grantor actually held the interest being conveyed. A platform that treats a quitclaim deed the same as a warranty deed in chain of title construction will produce ownership records that overstate the reliability of the title chain. Classification logic should tag quitclaim deeds distinctly and flag them for downstream consumers who use the data for title or risk analysis.
A Grant Deed is a deed used primarily in California, Nevada, and a small number of other western states. It provides an implied warranty that the grantor has not previously conveyed the property and that the property is free from encumbrances created by the grantor. A Grant Deed is not the same as a Warranty Deed its warranty is narrower but it is frequently mis-labeled or misclassified as a warranty deed by platforms that do not have California-specific classification logic.
A Trustee’s Deed is a deed conveying property held in trust most commonly issued following a non-judicial foreclosure sale, when the trustee under a deed of trust conveys the foreclosed property to the successful bidder. Trustee’s deeds sit at the intersection of conveyance and foreclosure document types, which makes them particularly prone to misclassification in pipelines that treat these categories as mutually exclusive.
36%
of U.S. real estate transactions had title issues discovered and resolved before closing including ownership gaps, recording errors, and undisclosed liens. This figure, from ALTA’s most recent claims analysis, has been consistent across multiple survey years and reflects the ongoing operational cost of incomplete deed chain data.
Industry example:
Cook County, Illinois and Los Angeles County, California both record deed instruments but use entirely different field structures and indexing formats. Cook County uses a grantor-grantee index with document type codes that do not always match common instrument names. Los Angeles County’s recorder uses a separate property document type taxonomy. A data platform ingesting both counties must maintain distinct classification and extraction logic for each using a single universal deed parser will produce field-level errors in both.
A mortgage is a two-party security instrument in which a borrower pledges real property as collateral to a lender. A deed of trust is a three-party instrument in which the borrower conveys title to a neutral third-party trustee to hold as security for the lender until the loan is repaid. The practical distinction matters for data platforms because the two instruments are filed under different names, have different key fields, and dominate in different states.
Mortgage and financing instruments represent the highest-volume document category in most county recording systems, and the one with the greatest variety of associated instruments. The Mortgage Bankers Association forecast approximately 5.46 million mortgage origination loans in 2025 totalling approximately $2.05 trillion in volume each generating a primary security instrument plus a chain of subsequent instruments including assignments, releases, and modifications that must all be classified and linked to the originating transaction.
A mortgage is a security instrument in which the borrower (mortgagor) retains title to the property while pledging it as collateral to the lender (mortgagee). If the borrower defaults, the lender must pursue judicial foreclosure a court-supervised process that can take months or years depending on the state. Mortgages are the dominant security instrument in states that follow judicial foreclosure procedures, including New York, Florida, Illinois, and New Jersey.
Key data fields in a mortgage instrument include borrower name, lender name, loan amount, interest rate (in some jurisdictions), property legal description, maturity date, and recording date. Lien position indicators first mortgage, second mortgage are critical fields for downstream analytics applications and are frequently absent or inconsistently populated in county recording data.
A deed of trust is a security instrument used in non-judicial foreclosure states including California, Texas, Arizona, and approximately 35 other states. Unlike a mortgage, a deed of trust involves three parties: the borrower (trustor), the lender (beneficiary), and a neutral trustee who holds nominal title to the property as security. If the borrower defaults, the trustee can conduct a non-judicial foreclosure sale without court involvement making the process significantly faster than judicial foreclosure.
From a classification standpoint, deeds of trust and mortgages are functionally equivalent security instruments but are labeled differently in recording systems. A platform covering both mortgage and deed-of-trust states must maintain state-aware classification logic that correctly identifies both instrument types and maps their fields to a common output schema.
An Assignment of Mortgage is an instrument that transfers the interest in a mortgage and the right to receive payments under it from the original lender to a new holder, typically as part of the secondary mortgage market. Tracking assignment chains is essential for any data product that reports current lien holder information. Assignment chains frequently break in recording data due to gaps in filing, recording delays, or assignments executed but never recorded.
A Release or Satisfaction of Mortgage is the instrument filed when a mortgage has been paid in full and the lender releases the lien. It is one of the most operationally important documents for encumbrance clearance and one of the most difficult to match accurately to its originating instrument. Matching a release to its originating mortgage requires linking instrument numbers, borrower names, and legal descriptions across a filing gap that may span decades and multiple county recording system migrations.
Operational implication:
Mortgage instrument volume – originations, assignments, modifications, and releases – dwarfs every other document category in most county recording systems. Platforms that do not build dedicated classification and matching logic for the full mortgage instrument chain will accumulate open lien errors that compound over time and degrade the accuracy of every downstream data product that depends on encumbrance status.
Liens recorded against real property in the United States fall into four primary categories: tax liens (federal and state), mechanic’s liens, judgment liens, and HOA liens. Each arises from a different legal basis, is filed under different procedures, and carries different implications for title integrity and property value differences that require dedicated classification logic in real estate data platforms.
Lien and encumbrance documents are among the most consequential records in the property data ecosystem and among the most inconsistently classified. Unlike deed and mortgage instruments, which follow broadly predictable structural conventions, lien documents vary significantly in format, label, and filing procedure depending on the type of lien, the state, and the county. A platform that routes all lien instruments to a single classification category will produce encumbrance data with systematic gaps.
A tax lien is a legal claim against real property for unpaid taxes. Federal tax liens, filed by the IRS under the Federal Tax Lien Act, attach to all property and rights to property of a taxpayer who has neglected or refused to pay a federal tax liability. They are recorded with county recorders and must be searched in title and data workflows that require encumbrance clearance.
State and county tax liens arise from unpaid property taxes and are administered at the county level. Unlike federal tax liens, which are filed by a federal agency and follow a consistent federal format, state and county tax liens follow state-specific procedures and are labeled differently across jurisdictions appearing variously as Tax Lien, Certificate of Delinquency, Notice of Tax Lien, or similar labels depending on the state.
A mechanic’s lien is a security interest in real property granted to contractors, subcontractors, and material suppliers who have provided labor or materials for improvements to the property and have not been paid. Mechanic’s lien procedures vary significantly by state including the timeframe within which the lien must be filed after work is completed, the notice requirements that must be satisfied before filing, and the priority rules that govern the lien’s relationship to existing mortgages.
This variation creates a multi-format classification problem. A mechanic’s lien filed in California follows a different statutory procedure and uses different document labels than one filed in Texas or New York. Platforms covering all three states must maintain state-specific classification logic for mechanic’s liens or accept systematic misclassification of a lien type that can represent significant financial exposure for property owners and lenders.
A judgment lien is a lien that attaches to real property when a court enters a monetary judgment against the property owner and the judgment is recorded or docketed with the county recorder or clerk. Judgment liens are among the most frequently missed instruments in automated processing pipelines because they originate in the court system rather than the real estate transaction ecosystem, and may be filed under the judgment debtor’s name rather than the property address.
The impact of a missed judgment lien on downstream data quality is significant. Any data product that reports encumbrance status or supports title analysis without capturing judgment liens will systematically underreport the liens attached to affected properties an error that compounds as judgment liens accumulate against properties held by individuals with civil litigation history.
An HOA lien is a lien filed by a homeowners association against a property for unpaid assessments, dues, or fees. HOA liens have grown substantially in prevalence over the past two decades as the proportion of U.S. housing in planned communities and condominium developments has increased. They are inconsistently formatted across counties some counties record them as separate instruments, others incorporate them into a broader assessment lien category and are frequently underrepresented in data platforms that have not explicitly built classification logic for them.
$124K
Average value of a federal tax lien filed by the IRS against real property in FY 2023. Federal tax liens are the highest-value lien category in county recording systems and among the most frequently misclassified by platforms that confuse them with state or county tax liens.
Hypothetical example:
Consider a data platform processing property records across two adjacent counties in a southeastern state. County A records mechanic’s liens as ‘Claim of Lien’ documents; County B records the same instrument as ‘Materialman’s Lien.’ A platform relying on document label matching alone will classify the same instrument type differently in each county producing inconsistent encumbrance data for any client whose coverage area spans both jurisdictions.
The primary documents filed during the U.S. foreclosure process are the Lis Pendens (judicial foreclosure states), Notice of Default, Notice of Trustee Sale, and the post-foreclosure conveyance instruments REO Deed or Sheriff’s Deed. The documents filed and their sequence depend on whether the state uses judicial or non-judicial foreclosure procedures.
Foreclosure and distressed property documents are among the most consequential instruments in the real estate recording ecosystem and among the most misclassified in data pipelines that have not been specifically trained on them. ATTOM’s Year-End 2025 Foreclosure Market Report recorded 367,460 properties with foreclosure filings in 2025 up 14 percent from 2024 each generating a sequence of documents whose accurate classification is essential for any data product that tracks distressed property inventory, lien status, or chain of title integrity.
A lis pendens is a notice recorded in county property records indicating that litigation is pending that affects the title to a specific property. In judicial foreclosure states, the filing of a lis pendens is typically the first recorded instrument in the foreclosure sequence it provides constructive notice that the lender has initiated legal proceedings against the borrower. Lis pendens filings are not exclusive to foreclosure they can be filed in any real property litigation which means classification logic must distinguish foreclosure-related lis pendens from other litigation notices.
A Notice of Default (NOD) is the formal recorded notice that a borrower has defaulted on a loan obligation and that the lender or trustee intends to pursue foreclosure. NODs are primarily used in non-judicial foreclosure states and serve as the first step in the trustee sale process. The NOD triggers a statutory waiting period typically 90 days in California, for example during which the borrower may cure the default before the sale proceeds.
A Notice of Trustee Sale (NTS) is the recorded notice announcing the date, time, and location of a foreclosure auction. It is filed after the NOD waiting period has expired and the borrower has not cured the default. The NTS represents a distinct stage in the foreclosure timeline from the NOD confusing the two in classification produces distressed property data that misrepresents the foreclosure stage of affected properties. In states that use NTS filings, the instrument is typically labeled Notice of Trustee’s Sale, Notice of Sale, or Notice of Foreclosure Sale depending on the jurisdiction.
A Real Estate Owned (REO) Deed is the instrument conveying a foreclosed property from the lender to a buyer following a completed foreclosure sale where the lender took title. A Sheriff’s Deed performs the same function in judicial foreclosure states, conveying the foreclosed property following a court-ordered sheriff’s sale. Both instruments represent the end of the foreclosure chain and the beginning of a new ownership record. A platform that does not correctly link these post-foreclosure conveyance instruments to the preceding foreclosure document chain will produce ownership histories with structural gaps that misrepresent the circumstances of the conveyance.
Operational implication:
Misclassified foreclosure documents are among the most consequential pipeline errors a real estate data platform can produce. A property incorrectly shown as having a completed foreclosure or one where an active foreclosure is missed will produce materially inaccurate data for any downstream application that uses distressed property status, encumbrance history, or ownership chain integrity as inputs.
Beyond deeds, mortgages, liens, and foreclosure instruments, county property records contain a range of supporting document types including plats, easements, covenants, probate instruments, and UCC filings that affect property value, title integrity, and data completeness. These ancillary record types are disproportionately underrepresented in data platform pipelines that have not built classification logic specifically for them.
For data operations teams managing large-scale ingestion pipelines, ancillary record types present a specific operational challenge: they are individually low-volume compared to deeds and mortgages, but their collective absence from structured outputs creates systematic data gaps that affect clients whose use cases depend on complete encumbrance and property characteristic data.
A plat is a recorded map depicting the division of land into lots, blocks, streets, and easements within a subdivision. Plats establish the legal boundaries of individual parcels and are the foundational document for parcel-level property data. For data platforms building or maintaining parcel databases, accurate plat classification and the extraction of lot dimensions, easement locations, and subdivision names are essential for parcel-level data completeness.
An easement is a recorded right granted to a party typically a utility company, neighbour, or government entity to use a portion of a property for a specific purpose. Easements are encumbrances that affect the property’s use and value and must be reflected in complete property records. Covenants, conditions, and restrictions (CC&Rs) are recorded agreements that restrict how a property may be used common in planned communities and condominium developments.
Both easements and covenants are frequently deprioritized in automated ingestion pipelines because they do not affect ownership transfer or lien status directly. However, their absence from structured outputs creates material gaps for clients whose applications assess property usability, development potential, or HOA compliance.
Probate and estate instruments including personal representative deeds, affidavits of survivorship, and court orders conveying property through estate administration are low-volume but high-complexity document types. They are disproportionately misclassified by platforms that have not trained classification logic specifically on them, because they use legal language and document structures that differ significantly from standard conveyance instruments.
Uniform Commercial Code (UCC) financing statements are primarily filed for personal property security interests, but when the collateral includes fixtures or improvements attached to real property, they are filed with the county recorder as real property records. UCC filings with real property collateral are a growing category as commercial real estate financing structures evolve and as agricultural and energy-related financing involves real property fixtures more frequently.
Industry note:
Generic intelligent document processing tools not specifically trained on real estate document types consistently underperform on ancillary record types. Misclassified plats, easements, and probate instruments create silent data gaps errors that do not generate processing exceptions but produce incomplete outputs that only become visible when a client queries data that should exist but does not.
Real estate-specific classification systems that are pre-trained on the full range of instrument types including ancillary records handle these categories as standard cases rather than exceptions. Section 9 covers what this capability difference means for pipeline architecture decisions.
Mortgage and financing instruments account for the largest share of annual county recording volume in the United States, typically representing 45 to 55 percent of all recorded instruments in active real estate markets. Conveyance documents (deeds) represent 20 to 30 percent. Lien instruments, foreclosure documents, and ancillary records make up the remainder. Understanding this volume distribution is essential for prioritizing pipeline investment and exception management resources.
One of the most consistent gaps in real estate data platform architecture is the mismatch between where pipeline investment is concentrated and where recording volume actually lives. Many platforms invest heavily in deed classification because deeds are the most conceptually familiar instrument while underinvesting in mortgage instrument classification, which typically represents two to three times the recording volume of deeds in active transaction markets.
The following benchmarks are derived from publicly available county recording volume data, ALTA industry reports, and MBA mortgage origination statistics. They represent typical distribution ranges for active real estate markets. Rural and lower-transaction counties may have different distributions, with higher proportions of deed instruments and lower mortgage volumes.
| Document Category | Typical Share of Annual Recording Volume | Peak Volume Conditions | Pipeline Investment Priority |
|---|---|---|---|
| Mortgage & Financing Instruments (originations, assignments, releases, modifications) | 45–55% | Refinance boom periods (2020–2021 saw 3–4x normal volumes per MBA data) | Highest – volume leader; assignment chain errors compound over time |
| Conveyance Documents (deeds of all types) | 20–30% | Strong purchase market; probate and estate settlement peaks in Q1 | High – foundation of ownership chain; misclassification affects all downstream products |
| Lien Instruments (tax, mechanic’s, judgment, HOA) | 10–15% | Construction boom periods drive mechanic’s lien spikes; recession periods drive tax and judgment lien increases | High – disproportionate downstream impact relative to volume; most frequently missed category |
| Foreclosure Documents (lis pendens, NOD, NTS, REO/sheriff’s deeds) | 3–8% | Post-recession periods (2009–2012 saw 10x+ normal foreclosure volume per CoreLogic data) | High – misclassification produces most consequential data quality errors |
| Ancillary & Supporting Records (plats, easements, covenants, UCC, probate) | 8–15% | Subdivision development cycles drive plat recording volume | Medium – individually low volume but collectively material; most frequently absent from pipeline coverage |
+17%
Projected increase in total U.S. mortgage origination volume in 2025 vs. 2024 reaching approximately $2.05 trillion. For data platforms, this translates directly to a 17% increase in mortgage instrument recording volume that pipelines must absorb without proportional increases in manual review capacity.
Recording volume by document type is not stable across time. It responds directly to macroeconomic conditions, interest rate cycles, and regional real estate market activity often in ways that create sudden, significant shifts in the document mix a pipeline must process.
| Market Condition | Document Types That Spike | Approximate Volume Multiplier | Operational Impact |
|---|---|---|---|
| Refinance boom (low interest rate environment) | Mortgage originations, assignments, releases | 2–4x normal volume (MBA, 2020–2021 data) | Exception queues grow faster than headcount can absorb; misclassification rates rise as volume overwhelms manual review capacity |
| Purchase market peak (low inventory, high demand) | Warranty deeds, new mortgage originations | 1.5–2x normal volume | Deed classification errors most visible; chain of title construction under time pressure |
| Post-recession foreclosure wave | Lis pendens, NOD, NTS, REO deeds | 5–10x normal volume (CoreLogic, 2009–2012 data) | Foreclosure document classification accuracy becomes most critical data quality variable; misclassification rates must stay low despite volume spikesy |
| Construction boom | Mechanic’s liens, plats, subdivision maps | Mechanic’s liens, plats, subdivision maps | Mechanic’s lien label variation and state-specific filing rule complexity amplified at scale |
| Rising interest rate environment (purchase slowdown) | Probate/estate instruments, distressed sale deeds | Modest increase (10–20%) | Lower overall volume but higher proportion of complex instruments requiring manual review |
Volume benchmarks have a direct and underappreciated relationship to exception queue management. A pipeline that processes 10,000 documents per day with a 2 percent misclassification rate generates 200 exception records per day.
If a refinance boom doubles mortgage instrument volume and mortgage instruments are the category most likely to require exception handling due to assignment chain complexity the exception queue can grow to 400 to 600 records per day without any change in the underlying misclassification rate.
Platforms that manage exception queues manually will face a structural capacity problem during volume spikes. The only scalable response is reducing the misclassification rate before volume spikes occur through document-type-aware classification that does not degrade under volume pressure rather than increasing manual review headcount reactively.
Volume benchmark takeaway:
Invest pipeline resources proportional to recording volume, not to the conceptual familiarity of a document type. Mortgage instruments are the highest-volume category and the one most likely to generate compounding data quality errors if classification is inadequate. Lien instruments are the most consequential category relative to their volume. Foreclosure documents are the most likely to spike suddenly. Pipeline architecture should reflect all three of these realities.
Property recording formats differ across U.S. states and counties in three primary dimensions: the security instrument used (mortgage vs. deed of trust), the specific conveyance and lien instruments recognized under state law, and the recording format digital, hybrid, or paper-based. Each dimension requires distinct classification and extraction logic in a real estate data platform.
For data operations leaders managing multi-county pipelines, state and county recording variation is not an edge case to be handled by exception logic. It is the structural reality of the U.S. property recording landscape. A platform covering 500 counties is managing 500 distinct recording environments each with its own document taxonomy, field conventions, and submission format. The only scalable response is a classification architecture built around that variation, not despite it.
The most operationally significant state-level variation for real estate data platforms is the division between states that use mortgages as the primary security instrument and states that use deeds of trust. This division affects not only the primary security instrument but the entire foreclosure document chain associated with it.
| Security Instrument | Foreclosure Type | Key States |
|---|---|---|
| Mortgage (two-party) | Judicial foreclosure | New York, Florida, Illinois, New Jersey, Ohio, Pennsylvania |
| Deed of Trust (three-party) | Non-judicial foreclosure | California, Texas, Arizona, Colorado, Nevada, Virginia, Washington |
| Both instruments used | Varies by instrument | Georgia, North Carolina, Mississippi, Montana |
Beyond the mortgage/deed of trust division, several states use conveyance or financing instruments that are structurally distinct from standard national equivalents and require dedicated classification logic:
The format of source documents fully digital, scanned paper, or hybrid has direct implications for ingestion strategy, OCR requirements, and expected field-level accuracy. As of 2024, the U.S. recording landscape spans all three categories:
| Recording Format | Characteristics | Pipeline Implications |
|---|---|---|
| Fully digital (e-recording) | Structured XML or standardized digital submission. Fields are discrete and machine-readable. Growing adoption, particularly in large metro counties. | Highest extraction accuracy. Lower OCR dependency. PRIA-aligned counties may offer field-level structured data directly. |
| Scanned paper (digitized) | Physical documents scanned to image files (TIFF, PDF). Text extracted via OCR. Quality varies by scanner, document age, and handwriting prevalence. | OCR accuracy is the primary accuracy constraint. Handwritten fields require specialized extraction logic. Historical records pre-1990 often in this category. |
| Hybrid | Mix of e-recorded and scanned documents. Common in mid-size counties transitioning to digital recording. Recent documents may be digital; older records are scanned. Simplifile, CSC, and eRecording Partners Network (ePN) are the primary e-recording service providers tracking county adoption nationally. | Pipeline must handle both formats within the same county. Classification logic must be format-aware, not just document-type-aware. |
The following reference table provides a high-level overview of the primary security instrument, foreclosure type, and recording format status for all 50 U.S. states. This table is intended as a planning reference for data operations leaders and product managers mapping pipeline coverage requirements.
| State | Primary Security Instrument | Foreclosure Type | E-Recording Status |
|---|---|---|---|
| Alabama | Mortgage / Deed of Trust | Non-judicial | Partial |
| Alaska | Deed of Trust | Non-judicial | Partial |
| Arizona | Deed of Trust | Non-judicial | Widespread |
| Arkansas | Deed of Trust / Mortgage | Both | Partial |
| California | Deed of Trust | Non-judicial | Widespread |
| Colorado | Deed of Trust | Non-judicial | Widespread |
| Connecticut | Mortgage | Judicial | Partial |
| Delaware | Mortgage | Judicial | Partial |
| Florida | Mortgage | Judicial | Widespread |
| Georgia | Security Deed | Non-judicial | Widespread |
| Hawaii | Mortgage | Judicial | Partial |
| Idaho | Deed of Trust | Non-judicial | Non-judicial |
| Illinois | Mortgage | Judicial | Widespread |
| Indiana | Mortgage | Judicial | Partial |
| Iowa | Mortgage | Judicial | Partial |
| Kansas | Mortgage | Judicial | Partial |
| Kentucky | Mortgage | Judicial | Partial |
| Louisiana | Mortgage | Judicial | Judicial |
| Maine | Mortgage | Judicial | Partial |
| Maryland | Deed of Trust | Non-judicial | Widespread |
| Massachusetts | Mortgage | Non-judicial | Widespread |
| Michigan | Mortgage | Non-judicial | Partial |
| Minnesota | Mortgage | Non-judicial | Non-judicial |
| Mississippi | Deed of Trust | Non-judicial | Partial |
| Missouri | Deed of Trust | Non-judicial | Partial |
| Montana | Deed of Trust | Non-judicial | Partial |
| Nebraska | Deed of Trust | Non-judicial | Partial |
| Nevada | Deed of Trust | Non-judicial | Widespread |
| New Hampshire | Mortgage | Non-judicial | Partial |
| New Jersey | Mortgage | Judicial | Partial |
| New Mexico | Mortgage | Mortgage | Partial |
| New York | Mortgage | Judicial | Widespread |
| North Carolina | Deed of Trust | Non-judicial | Non-judicial |
| North Dakota | Mortgage | Judicial | Partial |
| Ohio | Mortgage | Judicial | Partial |
| Oklahoma | Mortgage | Judicial | Partial |
| Oregon | Deed of Trust | Non-judicial | Widespread |
| Pennsylvania | Mortgage | Judicial | Widespread |
| Rhode Island | Mortgage | Non-judicial | Partial |
| South Carolina | Mortgage | Judicial | Partial |
| South Dakota | Mortgage | Judicial | Partial |
| Tennessee | Deed of Trust | Non-judicial | Partial |
| Texas | Deed of Trust | Non-judicial | Widespread |
| Utah | Deed of Trust | Non-judicial | Widespread |
| Vermont | Mortgage | Judicial | Partial |
| Virginia | Deed of Trust | Non-judicial | Widespread |
| Washington | Deed of Trust | Non-judicial | Widespread |
| West Virginia | Deed of Trust | Non-judicial | Partial |
| Wisconsin | Mortgage | Judicial | Partial |
| Wyoming | Mortgage / Deed of Trust | Both | Partial |
Data operations teams building or validating county coverage should consult the following independent, non-proprietary sources for county-level recording format and e-recording status information:
| Source | What It Provides | URL |
|---|---|---|
| PRIA eRecording Hub | County-by-county e-recording status, updated as new counties enable electronic submission. Authoritative source for recording office contact information and submission format requirements. | pria.us |
| Simplifile eRecording County Directory | Real-time directory of counties accepting e-recording via the Simplifile network one of the three major e-recording submission platforms. Useful for validating digital submission availability. | simplifile.com/erecording-counties |
| Individual County Recorder Websites | Primary source for county-specific document type taxonomies, recording fees, format requirements, and indexing conventions. No single aggregator fully replaces direct county recorder documentation. | Varies by county |
| ALTA Best Practices | Guidance on recording standards, gap period management, and document handling procedures that inform data platform design for title-adjacent use cases. | alta.org/title-insurance-and-settlement-company-best-practices |
| RESO (Real Estate Standards Organization) | Develops data standards for real estate data interchange, including property data field definitions that complement PRIA recording standards in pipeline output schema design. | reso.org/data-dictionary |
| U.S. Census Bureau, Government Units Survey | Authoritative count of county and county-equivalent jurisdictions the definitive source for jurisdiction count data used throughout this guide. | census.gov/govs/cog |
The most critical data fields across real estate property records are those that establish identity (grantor, grantee), location (legal description, parcel number), and transaction context (recording date, instrument number, consideration amount). These fields form the foundation of chain of title construction, lien status determination, and property ownership history and are the fields most likely to contain errors in legacy scanned documents.
For data operations leaders, field-level accuracy is the metric that matters most to downstream clients. A record correctly classified but with a misspelled grantee name or a truncated legal description will produce ownership data that fails match logic in analytics applications. The following section defines the critical fields by document category and identifies the fields where extraction errors are most common and most consequential.
| Document Type | Critical Fields | County-Variable Fields | Common Error Patterns |
|---|---|---|---|
| Warranty / Grant Deed | Grantor, Grantee, Legal Description, Recording Date, Instrument Number | Consideration Amount, Documentary Transfer Tax, Assessor Parcel Number | Grantee name abbreviation; legal description truncation in OCR; APN format variation |
| Quitclaim Deed | Grantor, Grantee, Legal Description, Recording Date | Consideration Amount (often $0 or nominal) | Nominal consideration misread as actual sale price; grantor/grantee role confusion in family transfers |
| Mortgage / Deed of Trust | Borrower, Lender, Loan Amount, Recording Date, Maturity Date | Interest Rate, Lien Position, Trustee Name (DoT only) | Interest Rate, Lien Position, Trustee Name (DoT only) |
| Assignment of Mortgage | Assignor, Assignee, Original Instrument Reference, Recording Date | Loan Amount, Property Address | Broken instrument reference chain; assignee name abbreviations inconsistent with originating lender name |
| Release / Satisfaction | Releasing Party, Original Instrument Reference, Recording Date | Loan Amount, Payoff Date | Unmatched releases due to instrument number format changes; partial releases mislabelled as full satisfactions |
| Tax Lien (Federal) | Taxpayer Name, IRS Serial Number, Tax Period, Recording Date | Property Address (not always present) | Taxpayer name variant matching failures; multiple liens for same taxpayer not linked |
| Mechanic’s Lien | Claimant, Property Owner, Property Description, Lien Amount, Recording Date | Contractor License Number, Work Completion Date | State-specific label variation; lien amount OCR errors in handwritten filings |
| Lis Pendens | Plaintiff, Defendant, Case Number, Court, Recording Date | Property Address, Loan Reference | Case number format variation by county; defendant name matching to property owner record |
| Notice of Default | Trustor, Beneficiary, Trustee, Default Amount, Recording Date | Loan Reference, Property Address | Default amount OCR errors; trustee name inconsistency across foreclosure chain documents |
| Notice of Trustee Sale | Trustee, Property Description, Sale Date, Opening Bid, Recording Date | Loan Reference, Beneficiary Name | Sale date extraction errors; opening bid amount OCR failures in scanned documents |
Real estate data platforms should handle multiple property record types by building document-type-aware classification at the ingestion layer, normalizing field-level outputs to a common schema by document category, implementing confidence scoring on all extracted fields, and aligning output structures to PRIA standards where applicable. Platforms that route all records through a single generic extraction pipeline will produce accuracy levels that do not scale to the demands of analytics-ready data clients.
The architecture decisions that determine a data platform’s accuracy and scalability are made at the document classification layer before a single field is extracted. A pipeline that correctly classifies every incoming instrument by document type can apply targeted extraction logic, validate outputs against document-type-specific field rules, and route exceptions to the right human reviewers. A pipeline that misclassifies instruments routes them to the wrong extraction logic and produces errors that propagate downstream invisibly.
The most common architectural failure in real estate data pipelines is relying on keyword or label matching to classify incoming documents. A keyword-matching approach classifies documents based on text patterns found in the document if the word ‘mortgage’ appears, the document is classified as a mortgage. This approach fails in several predictable ways: it misclassifies instruments with non-standard labels, it cannot distinguish between instruments that share vocabulary (a Release of Mortgage and a Mortgage share key terms), and it produces classification errors that are difficult to detect because they do not generate processing exceptions.
A document-type-aware classification approach, by contrast, classifies instruments based on structural characteristics the combination of fields present, the document layout, the relationship between parties named, and the legal language used rather than surface-level text patterns. This approach generalizes across the full range of county-level label variations and produces classification accuracy that does not degrade as coverage expands.
Published outcomes from real estate-specific AI classification deployments demonstrate the accuracy gap between generic and domain-trained approaches. Hitech i2i one platform built specifically for real estate document classification reports 99% field-level accuracy across 150+ document types and 1,000+ county formats in production deployments processing millions of records annually. This performance level is not achievable with keyword-matching or generic IDP systems applied to the same document set.
Multi-county normalization – the process of mapping the diverse field names, value formats, and data structures produced by different counties into a single consistent output schema is one of the most resource-intensive ongoing operations in real estate data platform maintenance. Counties change their recording systems, adopt new document management platforms, and modify their indexing conventions without advance notice. A normalization architecture that requires manual intervention for each county format change does not scale.
The scalable approach is to build normalization logic at the document-type level rather than the county level defining a standard output schema for each document type and mapping each county’s field conventions to that schema as a classification-time operation. When a county changes its format, only the county-specific mapping rule requires updating the downstream output schema remains consistent.
Confidence scoring assigns a reliability score to each extracted real estate data field based on the extraction method, source document quality, and field-level validation results. A confidence score below a defined threshold triggers human review before the field value reaches downstream outputs preventing low-certainty extractions from propagating as if they were verified data.
Source-linked data attaches a provenance reference to each extracted field, recording the source document, the recording date, and the extraction method. Analytics-ready data clients increasingly require source linkage as a baseline it allows downstream applications to trace any data point back to its originating instrument, audit data quality, and resolve discrepancies between competing source records.
Platform implication:
Data platforms that implement confidence scoring and source-linked data outputs reduce client escalations, accelerate dispute resolution, and position themselves to serve the most demanding analytics and risk management use cases segments where data quality requirements are highest and switching costs are lowest.
The following five recommendations are written for data operations leaders managing real estate data pipelines at scale. Each recommendation is actionable within the existing operational structure of a data platform and each addresses a root cause of the accuracy and scalability failures described in this guide.
~50%
of reported losses on lender title insurance policies in 2025 emerged from just three categories: Fraud, Forgery, and Lien Priority – all of which trace directly to failures in encumbrance data completeness and accuracy upstream of the title search.
Map which document types your pipeline currently classifies reliably, which are routed to manual review or default categories, and which are absent from your outputs entirely. A document coverage audit will typically reveal three to five instrument categories that your pipeline misclassifies systematically categories whose absence has been quietly degrading output quality for months or years without generating obvious processing errors.
Quantify the operational cost of each gap; manual review hours per week, error correction cycles per month, and most importantly client escalations or data quality complaints that trace back to misclassified instruments. A coverage audit that produces a cost figure for each gap transforms a technical discussion about classification logic into a business case for investment.
If your current pipeline routes incoming documents based on text patterns, labels, or keyword matching, replace that routing logic with classification built around instrument structure. Define the structural characteristics that distinguish each document type the combination of parties, fields, and legal language that is unique to each instrument and build your classification logic around those structural fingerprints rather than surface-level text.
Measure the accuracy delta before and after the transition using a held-out validation set of labeled documents. The accuracy improvement on ambiguous instruments those with non-standard labels, unusual party structures, or multi-instrument filings will be the most significant, and the most valuable for clients whose use cases depend on complete and accurate encumbrance data.
Define a canonical output schema for each document type in your pipeline specifying mandatory fields, optional fields, and acceptable null handling rules for each. Apply this schema consistently across all counties, so that a warranty deed processed from Cook County, Illinois and one processed from Los Angeles County, California produce outputs in the same field structure with the same naming conventions.
Consistent output schemas reduce the integration burden on downstream clients, enable systematic data quality monitoring across your full county coverage map, and allow you to detect normalization failures counties where your extraction is producing systematically different outputs from the canonical schema before those failures reach client-facing data products.
Add a confidence score to every extracted field in your output data. Configure confidence thresholds by field type the acceptable confidence floor for a grantee name should be higher than for a documentary transfer tax amount and route any extraction below threshold to human review before it enters downstream outputs.
Track confidence score trends by county, document type, and source document format over time. Counties where confidence scores are systematically low indicate normalization logic that needs updating. Document types with consistently low confidence on specific fields identify extraction logic that needs retraining. Confidence scoring transforms field-level accuracy from an outcome you discover retrospectively into a variable you actively manage.
For all output fields where PRIA data interchange standards provide field-level guidance, align your output schema to the PRIA standard. PRIA-aligned outputs maximize compatibility with downstream analytics applications, title plant systems, and data licensees who are building their own PRIA-compliant ingestion pipelines. They also position your platform for formal PRIA certification, which is an increasingly recognized quality signal among data operations buyers.
Where your current county coverage includes PRIA-compliant e-recording jurisdictions, validate that your extraction outputs for those counties match the PRIA field structure exactly. Discrepancies between PRIA-structured source data and non-PRIA output schemas in your pipeline indicate normalization logic that is adding unnecessary variance to structured data that did not require normalization.
The following questions represent the most common follow-up inquiries from data operations leaders, data engineers, and product managers after engaging with the material in this guide. Each answer is written to be direct and actionable.
A platform with national coverage should be able to classify a minimum of 80 to 100 distinct document types to avoid systematic data gaps. A platform targeting the highest data quality standards should build toward 150+ document types, accounting for state-specific instruments, historical variants, and multi-instrument filing formats.
Generic IDP tools typically classify 20 to 30 document types a coverage gap that produces silent data errors in every county that uses instruments outside that narrow taxonomy.
The most common root cause is misclassification at the document level before extraction begins. When a document is routed to the wrong extraction template because the classification system identified it as a warranty deed when it is actually a quitclaim deed, or as a mortgage when it is actually a deed of trust every extracted field is drawn from the wrong schema.
The result is extraction that looks successful but produces structurally incorrect outputs. Fixing extraction logic without fixing upstream classification logic does not resolve the underlying problem.
Pre-digital records require a three-stage approach: (1) image quality assessment to determine whether standard OCR is viable or whether enhanced preprocessing is needed; (2) document type classification based on structural layout rather than text extraction, since handwritten text is unreliable for classification; and (3) field extraction using form-aware models trained specifically on historical document layouts rather than modern instrument templates.
Platforms that route pre-digital records through the same pipeline as modern e-recorded documents will produce systematic extraction failures on historical documents typically affecting grantee names, legal descriptions, and instrument numbers most severely.
The execution date is the date on which a document was signed by the parties. The recording date is the date on which it was accepted and recorded by the county recorder. The gap between the two which can range from same-day in e-recording jurisdictions to several months in backlogged county offices creates a data currency problem.
A platform that indexes records by execution date will show transactions as occurring earlier than they are publicly knowable. A platform that indexes by recording date will accurately reflect when the public record was established. For any data product that supports title search, risk analysis, or transaction monitoring, recording date is the authoritative timestamp and the two dates must be captured and distinguished as separate fields.
Recording volume affects pipeline accuracy indirectly, through exception queue management. As volume increases, the absolute number of documents requiring manual review grows even if the misclassification rate stays constant. During refinance booms, mortgage instrument volume can increase 2 to 4 times above baseline, overwhelming review capacity and causing exceptions to be resolved less thoroughly or cleared without review.
The correct response to anticipated volume spikes is to reduce the misclassification rate before the spike occurs through pipeline investment in document-type-aware classification not to increase manual review capacity reactively. A platform that manages exceptions manually at baseline volume will not be able to scale that model during a volume event.
See title search turnaround time benchmarks for how volume spikes affect TAT across pipeline types.
PRIA compliance for a data platform output means structuring extracted data fields to conform to PRIA’s XML-based data interchange standards using PRIA field names, data types, and value formats for the instruments PRIA has standardized.
It is worth pursuing for two reasons: (1) downstream analytics clients and title plant systems that are building PRIA-compliant ingestion pipelines will integrate with your data more easily, reducing client onboarding friction; and (2) PRIA alignment positions your platform for formal PRIA certification, which is an increasingly recognized quality signal among enterprise data operations buyers. PRIA compliance is not a binary state platforms can begin with the highest-volume instruments (mortgage, deed, assignment) and expand coverage incrementally.
Confidence thresholds should be calibrated to the downstream consequence of an error in each field, not set uniformly across all fields. Fields with high downstream consequence grantee name, grantor name, legal description, instrument number should have high confidence thresholds (typically 90 to 95 percent) because errors in these fields propagate into ownership chains, search indexes, and client-facing data products.
Fields with lower downstream consequence documentary transfer tax, consideration amount in non-arms-length transactions can tolerate lower thresholds. The practical test: if an error in a given field would generate a client escalation or affect a title or risk analysis outcome, that field needs a high confidence threshold.
The most effective validation approach is schema consistency monitoring: comparing the distribution of field values, null rates, and data types for each document type across all counties in the pipeline on a rolling basis. A warranty deed processed in Maricopa County, Arizona and one processed in Broward County, Florida should produce outputs with the same field structure, the same null handling, and comparable value distributions for fields like legal description length and grantee name format.
Systematic divergence between counties in any of these dimensions indicates normalization logic that is applying different rules to the same document type a pipeline inconsistency that will show up as data quality variation in downstream client products.
Robust platforms should index by both and link them. Grantor-grantee indexing is how county recording systems work and how chain of title is legally constructed; it is the authoritative structure for ownership history research. Parcel-level indexing is how property analytics applications work; it allows all instruments affecting a specific parcel to be retrieved regardless of party names.
The challenge is that not all counties provide assessor parcel numbers (APNs) in recorded documents particularly in older records requiring parcel linkage to be constructed through legal description matching or external parcel database joins. Platforms that support only one indexing method will systematically underserve clients whose use cases require the other.
Multi-instrument filings where two or more legal instruments are recorded as a single package, such as a deed and mortgage recorded simultaneously at closing require a pipeline capable of document segmentation before classification. A pipeline that treats the entire multi-instrument package as a single document will apply one classification and one extraction template to what is actually two or more distinct instruments, producing incomplete or structurally merged outputs.
Document segmentation – the ability to identify and separate individual instruments within a multi-document submission is a prerequisite for accurate classification and extraction of bundled filings and is one of the capabilities that most clearly differentiates real estate-specific IDP systems from generic document processing tools.
The diversity of U.S. property record types is not going to simplify. More than 3,000 county jurisdictions, each with its own recording conventions, document taxonomies, and digitization timelines, represent a structural reality that data platforms must build around not wait for resolution.
The platforms that deliver analytics-ready property data at scale are those that have treated document-type awareness as a foundational architecture decision rather than an ongoing exception-handling problem. They classify each incoming instrument by type before extraction, apply targeted field logic for each instrument category, normalize outputs to consistent schemas across all counties, and score the confidence of every field they deliver to downstream clients.
The document types covered in this guide such as conveyance instruments, mortgage and financing instruments, lien and encumbrance documents, foreclosure records, and ancillary record types each carry specific structural characteristics, state-level variations, and field-level data quality risks. Building classification logic that accounts for those characteristics is what separates a pipeline that produces accurate data from one that produces plausible-looking data that fails at the field level when clients test it against primary sources.
AI-powered document classification has made this level of document-type specificity achievable at scale. Platforms that have not yet built classification logic around the full range of U.S. property record types are competing with those that have and the difference shows up directly in the accuracy metrics, client retention rates, and operational cost structures that determine long-term viability in the real estate data market.
The following terms are defined as they are used in U.S. property recording and real estate data processing contexts.
The full state-by-state recording instrument reference table covering all 50 U.S. states with primary security instrument, foreclosure type, and e-recording status is included in Section 7 of this guide. Refer to Section 7 for the complete table.
For county-level recording format detail, consult the PRIA County E-Recording Directory and individual county recorder websites. County recorder contact information and recording format status are updated periodically by PRIA and by state recorder associations.
The following quick reference table summarizes the mandatory fields, county-variable fields, and most common error patterns for the primary real estate document types. Use this table as a validation reference when defining output schemas or reviewing extraction logic for a new county coverage area.
| Document Type | Mandatory Fields | County-Variable Fields | Top Error Pattern |
|---|---|---|---|
| Warranty Deed | Grantor, Grantee, Legal Description, Recording Date, Instrument # | APN, Consideration Amount, Transfer Tax | Legal description truncation in OCR |
| Quitclaim Deed | Grantor, Grantee, Legal Description, Recording Date | Consideration ($0 nominal) | Grantor/grantee role confusion |
| Grant Deed (CA) | Grantor, Grantee, Legal Description, Recording Date | Consideration, Documentary Transfer Tax | Misclassified as Warranty Deed |
| Trustee’s Deed | Trustee, Grantee, Property Description, Sale Date, Recording Date | Opening Bid, Beneficiary Name | Confused with standard conveyance deed |
| Mortgage | Borrower, Lender, Loan Amount, Recording Date | Interest Rate, Lien Position, Maturity Date | Lien position absent; maturity date OCR error |
| Deed of Trust | Trustor, Beneficiary, Trustee, Loan Amount, Recording Date | Interest Rate, Lien Position | Trustee name variation across chain documents |
| Assignment of Mortgage | Assignor, Assignee, Original Instr. Ref., Recording Date | Assignor, Assignee, Original Instr. Ref., Recording Date | Loan Amount, Property Address Broken instrument reference chain |
| Release / Satisfaction | Releasing Party, Original Instr. Ref., Recording Date | Loan Amount, Payoff Date | Unmatched release — instrument # format change |
| Federal Tax Lien | Taxpayer Name, IRS Serial #, Tax Period, Recording Date | Property Address | Taxpayer name variant match failure |
| Mechanic’s Lien | Claimant, Property Owner, Property Description, Amount, Date | License #, Work Completion Date | State label variation mislabeled as other lien |
| Judgment Lien | Plaintiff, Defendant, Case #, Court, Amount, Recording Date | Property Address | Missed filed under debtor name not property address |
| HOA Lien | HOA Name, Property Owner, Amount, Recording Date | Assessment Period, Property Address | Inconsistent county filing category |
| Lis Pendens | Plaintiff, Defendant, Case #, Court, Recording Date | Property Address, Loan Reference | Case # format variation by county |
| Notice of Default | Trustor, Beneficiary, Trustee, Default Amount, Recording Date | Loan Reference, Property Address | Default amount OCR error |
| Notice of Trustee Sale | Trustee, Property Description, Sale Date, Opening Bid, Date | Beneficiary, Loan Reference | Sale date extraction error in scanned documents |
| REO / Sheriff’s Deed | Grantor (Trustee/Sheriff), Grantee, Property Description, Date | Sale Price, Judgment Reference | Not linked to foreclosure chain treated as new conveyance |