Your engineering team is talented. They can build anything. So when someone suggests building internal scrapers for Secretary of State verification, the instinct is to say yes. After all, how hard can it be to pull data from 50 government websites?
The answer, as one Tier 1 prospect described it: "Obviously really complicated... maintenance nightmare."
The hidden costs of building internal scrapers extend far beyond initial development. They include ongoing maintenance as states change their websites, the opportunity cost of engineering time not spent on core product features, and the operational risk when a critical integration breaks during high-volume periods. The build vs. buy decision for verification infrastructure isn't really about whether your team can build it. It's about whether they should.
The 50-State Problem
Building a single state integration sounds manageable. Building 50 of them—each with different website architectures, data formats, response times, and change frequencies—is a fundamentally different challenge.
Every state is different
Secretary of State websites vary dramatically:
• Technology stacks: Some states use modern frameworks with structured APIs. Others run legacy systems from the 1990s with inconsistent HTML. • Data formats: Business names, registration numbers, status codes, and officer information are formatted differently in every state. • Authentication requirements: Some states require session management, CAPTCHA solving, or rate limiting that complicates automated access. • Response times: Pennsylvania might return results in 2 seconds. California might take 90 seconds during peak hours.
A scraper built for Delaware won't work in Texas. The logic that parses Nevada's results will break on New York's data structure. You're not building one integration—you're building 50 independent systems that happen to serve the same purpose.
The maintenance multiplier
According to research from Crawlbase, website structure changes are among the most persistent challenges in web scraping: "Websites often change their HTML structure and API endpoints... These frequent changes hinder scrapers from carrying out their tasks."¹
For Secretary of State websites specifically, changes happen constantly:
• Redesigns: States periodically modernize their business search interfaces • Security updates: New CAPTCHA implementations, rate limiting, or bot detection • Data structure changes: Field names, page layouts, and result formats evolve • Infrastructure changes: URL structures, session handling, and error responses shift
GroupBWT's analysis of scraping challenges found that "scraper breakage rates have climbed sharply. In some industries, 10–15% of crawlers now require weekly fixes due to DOM shifts, fingerprinting, or endpoint throttling."²
Apply that breakage rate to 50 state integrations: 5-8 scrapers breaking every week, each requiring engineering attention to diagnose and fix.
The True Cost Calculation
The build vs. buy decision comes down to total cost of ownership, not just initial development. Most teams dramatically underestimate the ongoing costs.
Development costs
Building 50 state integrations from scratch requires:
• Initial development: 2-4 weeks per state for research, development, and testing = 100-200 weeks of engineering time • Edge case handling: Each state has quirks that emerge only in production • Error handling: Timeout logic, retry mechanisms, fallback strategies • Data normalization: Converting 50 different data formats into a consistent schema
At a median software developer salary of $133,080 per year according to the U.S. Bureau of Labor Statistics³, plus benefits and overhead (typically 1.25-1.4x base salary), the fully-loaded cost of one engineer is approximately $166,000-$186,000 annually.
Two years of engineering time for initial build = $332,000-$372,000 before you've processed a single verification.
Maintenance costs
The rule of thumb in software development: maintenance costs 15-20% of initial development costs annually. According to product leadership analysis, "For a feature that took 6 months to build, that's 1-2 months of engineering time every year."⁴
For 50 state integrations:
• Breakage response: 5-8 hours per week diagnosing and fixing broken scrapers • Monitoring: Automated alerts, manual verification, quality assurance • Updates: Adapting to state website changes, new security measures, data format shifts • Documentation: Keeping internal knowledge current as systems evolve
Conservative estimate: 0.5 FTE dedicated to scraper maintenance = $83,000-$93,000 annually, indefinitely.
Opportunity cost
Every hour your engineers spend fixing a California scraper is an hour they aren't building features that differentiate your product. This is the cost that doesn't appear on any budget line but determines competitive outcomes.
As one prospect put it: "Throwing engineers on verification" instead of core product development. That's not just a cost—it's a strategic mistake.
The fintech and lending space moves fast. The time your team spends maintaining verification infrastructure is time your competitors spend building better underwriting models, faster funding workflows, and superior borrower experiences.
The "One Change in California" Problem
Here's what the build vs. buy calculation misses until you've lived it: the operational risk of scraper dependency.
Scenario: California redesigns
California processes more business filings than any other state. Your lending operation depends on California verifications for 20% of your volume. On a Tuesday morning, California launches a website redesign.
Your scraper breaks. Every California verification fails. Your underwriting queue backs up. Deals that should fund today are stuck waiting for manual verification. Your team scrambles to reverse-engineer the new site structure, test fixes, and deploy updates.
How long does this take? Days, not hours. And during that time, your funding pipeline is clogged.
The cascading failure
Scraper failures rarely happen in isolation. When California breaks, your engineering team focuses on California. Meanwhile:
• Pennsylvania quietly changed a CSS class, and nobody noticed • Nevada added rate limiting, and your requests are being throttled • A Florida error is returning partial data, but it looks complete
By the time you fix California, three other states need attention. This is the "maintenance nightmare" in action.
Redundancy requires resources
Robust systems need redundancy. For scrapers, that means:
• Multiple parsing strategies per state • Fallback data sources where available • Monitoring and alerting infrastructure • On-call engineering rotation
Building this redundancy multiplies your development and maintenance costs—or you accept the operational risk of single points of failure across all 50 states.
What "Not Our Core Business" Really Means
The phrase "not our core business" appears repeatedly when Tier 1 prospects discuss verification infrastructure. It's not an admission of incapability. It's strategic clarity.
The competitive advantage question
Ask yourself: Does building Secretary of State scrapers give you a competitive advantage?
If you're a lending platform, your competitive advantage comes from:
• Better risk assessment models • Faster funding decisions • Superior borrower experience • Efficient capital deployment
Verification infrastructure is necessary but not differentiating. Every lender needs it. No lender wins deals because their SOS scraper is 5% faster than competitors'.
The resource allocation question
Engineering talent is expensive and scarce. According to Forrester's research cited in industry analysis, "67% of failed software implementations stem from incorrect build vs. buy decisions."⁵
The question isn't whether your team can build verification infrastructure. The question is whether verification infrastructure is the highest-value use of their time.
For most lending operations, the answer is no. The highest-value use of engineering time is building the proprietary systems that actually differentiate your business.
The API Alternative
What does buying look like in practice?
Integration complexity
A well-designed verification API reduces 50 state integrations to one:
• Single endpoint: One API call handles all states • Normalized data: Consistent response format regardless of source • Handled edge cases: Timeout logic, retries, and error handling built in • Maintained by specialists: When California changes, someone else fixes it
Integration time: Days to weeks, not months to years.
Cost comparison
Compare the total cost of ownership:
Build: • Initial development: $332,000-$372,000 (2 engineer-years) • Annual maintenance: $83,000-$93,000 (0.5 FTE) • 5-year TCO: $747,000-$837,000 • Plus: Opportunity cost of diverted engineering focus
Buy: • Integration development: $10,000-$25,000 (1-2 weeks engineering) • Annual API costs: Variable based on volume, typically $24,000-$120,000/year • 5-year TCO: $130,000-$625,000 • Plus: Engineering time freed for core product work
The math favors buying for almost every lending operation except the very largest with dedicated infrastructure teams.
Risk transfer
When you buy, you transfer operational risk. If California's website changes, it's not your emergency. If Pennsylvania implements new bot detection, you don't diagnose it. If a state adds CAPTCHA requirements, you don't solve them.
The vendor's job is maintaining coverage. Your job is building your business.
When Building Makes Sense
To be clear: building in-house isn't always wrong. It makes sense when:
• Verification is your core product: If you're building a KYB platform, the infrastructure is the business • Scale justifies dedicated teams: Processing millions of verifications monthly with dedicated infrastructure engineering • Unique requirements exist: Proprietary data sources, specialized processing, or competitive differentiation from verification itself • You have existing infrastructure: Teams already maintaining large-scale scraping operations
For most lending operations processing thousands to tens of thousands of verifications monthly, these conditions don't apply. The economics favor buying.
Making the Decision
The 2024 Standish Group CHAOS study found that "more than 35% of large enterprise custom software initiatives are abandoned, and only 29% are delivered successfully."⁶
Those odds should inform your build vs. buy analysis. Custom development projects fail at alarming rates, and verification infrastructure is particularly prone to underestimated complexity.
The decision framework
Build if: • Verification is core to your competitive advantage • You have dedicated infrastructure engineering capacity • Your scale justifies the fixed costs • You're prepared for ongoing maintenance commitment
Buy if: • Verification is necessary but not differentiating • Engineering resources are better deployed on core product • You need coverage faster than you can build it • Operational risk transfer has value
For most lenders, the answer is buy.
Comparing Your Options
If you've decided to buy, the next question is which solution fits your needs. The verification market includes direct-to-source providers like Cobalt, broader KYB platforms, and database aggregators. Each has different strengths, coverage models, and pricing structures.
For a detailed comparison of how different verification providers stack up on coverage, data freshness, and pricing, see our analysis on comparing Cobalt vs Middesk features.
Closing the Loop
The build vs. buy decision connects directly to operational outcomes. Teams that build verification infrastructure spend years developing and maintaining systems that don't differentiate their business. Teams that buy free their engineering resources to focus on what matters.
The "maintenance nightmare" isn't hypothetical. It's the lived experience of teams who underestimated the complexity of 50-state coverage and now dedicate engineering resources to scraper upkeep instead of product innovation.
The alternative: treat verification as infrastructure you buy, not software you build. Use API-based verification to eliminate the operational burden, then redirect engineering focus to automating manual underwriting tasks that actually move the needle on your business.
See how Cobalt automates this →
Sources
• Crawlbase | 10 Web Scraping Challenges (+ Solutions) in 2025
• GroupBWT | Web Scraping Challenges & Compliance in 2025
• U.S. Bureau of Labor Statistics | Software Developers: Occupational Outlook Handbook
• Medium | The Product Leader's Guide to Buying vs. Building Software
• Full Scale | Build vs. Buy Software Development: A Comprehensive Decision Framework for 2025
• Neontri | Build vs. Buy Software: A 3-Model Decision Framework












.png)