Rethinking Third-Party Risk in the Age of AI

Traditional vendor risk management is a point-in-time snapshot that's outdated in weeks. I built a system using specialized AI agents that assesses security, compliance, and financial stability across our entire vendor portfolio—without expensive commercial tools.

AITHIRD PARTY RISK

Andrew Brosman

2/11/202611 min read

Traditional vendor risk management has a few fatal flaws: namely, your assessment expires the moment you complete it. You spend weeks gathering questionnaires, reviewing certifications, and analyzing SOC 2 reports, only to produce a risk assessment that represents a single point in time. By the time your vendor makes it through procurement, that carefully compiled risk profile might already be obsolete. Meanwhile, your vendor portfolio keeps growing, and the idea of reassessing hundreds of relationships annually becomes a resourcing fantasy. Many compliance frameworks advocate for continuous monitoring as a solution but expensive tools make it prohibitive for many companies to obtain this level of oversight.

In the age of AI, the walls are coming down and security governance teams can build a better third-party risk management system to solve these problems. Not as a replacement for human judgment, but as a way to conduct more thorough reviews and make continuous, comprehensive vendor monitoring actually feasible. What started as an experiment in using AI to accelerate manual searches evolved into a swarm of specialized agents that continuously assess and monitor vendor risk across multiple dimensions. Here's how it works and what I learned building it.

The Problem with Traditional Third Party Risk Management

Many organizations ask for a SOC 2 during vendor onboarding and then... hope for the best. While compliance reports and certifications have their place, they provide limited, point in time assurance by their very nature. Some companies manage third party risk through large questionnaires, testing their vendor's knowledge and capabilities to handle security threats. However this approach has its own flaws; namely a trust without verify mentality if (a huge if) the questionnaire is completed at all. Outside of their effectiveness, questionnaires can be blockers to procurement processes, creating an endless back and forth until the form is filled to the customers' satisfaction.

More mature organizations may use specialized third party risk management applications to do all of the above, with a layer of continuous monitoring. Applications like BitSight or SecurityScorecard help manage the vendor lifecycle but also provide alerts for security incidents and relevant vulnerabilities. I'm not knocking these tools, these solve many of the issues with traditional third party risk management, but the underlying logic of these systems is not overly complicated. These capabilities can be built at home with customization that would otherwise be limited by the application interface.

Breaking Down Vendor Risk into Dimensions

Vendor risk isn't one thing, it's a composite of distinct risk attributes that can be evaluated independently. Traditional vendor risk management typically looks at likelihood and impact to arrive at a vendor risk score or tier. I believe there's actually more value in taking apart these simplifications into distinct risk attributes. These can be customized to fit your needs but some to consider are:

Data Sensitivity - What type of data does this vendor access? PII? Payment information? Health records? Each category carries different regulatory obligations and breach implications.

Business Criticality - How essential is this vendor to operations? Is this a SaaS platform running your customer service, or a one-off consultant from last year? Criticality affects both the impact of vendor failure and how much due diligence is justified.

Security Posture - What's the vendor's demonstrated commitment to security? Certifications matter, but so do breach history, public security practices, and how they respond to incidents.

Financial Stability - Can this vendor maintain service? A critical vendor entering bankruptcy is a different risk than a low-stakes relationship with a struggling startup.

Regulatory Compliance - Does the vendor operate in a regulated industry? Are they subject to oversight that provides assurance? Have they demonstrated compliance with relevant frameworks?

Operational Resilience - How does the vendor handle business continuity? Do they have redundancy? Geographic concentration risk? Disaster recovery capabilities?

Here's the key insight: each of these dimensions requires different evaluation methods and sources. Security posture might be assessed through vulnerability databases and security news. Financial stability needs earning reports and credit ratings. Compliance status requires regulatory filings and certification databases.

Trying to build one AI model to evaluate all of this is like asking one person to simultaneously be a security researcher, financial analyst, and compliance auditor. Better to have specialists.

From AI Assistant to Agent Swarm

Phase 1: AI-Assisted Search

Initially, I just wanted AI to help standardize vendor research. Instead of each analyst Googling vendors differently and making subjective judgments, I built a system where AI would conduct structured searches and summarize findings against our risk framework.

This worked. It was faster than manual research and more consistent. But it was still just an augmented manual process—a helpful assistant who could search quickly but needed constant direction.

Phase 2: Realizing Generic Doesn't Cut It

The limitation became obvious when I watched the AI assess a key software vendor. It did a competent job finding security certifications and recent news. But it treated compliance the same way it treated general security posture—as just another checkbox. It didn't understand how to weigh SOC 2 compliance in the context of the business criticality of the software, or for that matter, what the software did for the company.

Different risk dimensions don't just need different data sources; they need different evaluation logic, different weightings of evidence, and different understanding of what "good" looks like.

Phase 3: Building Specialized Agents

I rebuilt the system as a collection of specialized agents, each owning one risk dimension:

The Security Agent monitors vulnerability databases, security news feeds, breach notification filings, and GitHub repos for exposed credentials. It understands that a credential leak is different from a phishing incident, which is different from a ransomware attack. It knows which security certifications are meaningful versus theater.

The Financial Agent tracks earnings reports, credit ratings, funding announcements, and leadership changes. It recognizes warning signs like declining revenue, executive turnover, or down rounds. It distinguishes between a startup's expected burn rate and an established company's cash flow.

The Compliance Agent monitors regulatory filings, certification statuses, and audit reports. It knows which regulations apply to which industries and which certifications are relevant for which data types.

The Operational Resilience Agent assesses business continuity practices, infrastructure dependencies, and geographic concentration risk. It tracks outages, understands the difference between planned maintenance and reliability issues, and monitors for concentration risk in the vendor's own supply chain.

Each agent is essentially an AI specialist with domain-specific knowledge, curated sources, evaluation criteria tailored to what matters in its domain, and the ability to recognize nuance and context.

How It Actually Works

Initial Assessment

When a new vendor enters the system, each specialized agent conducts its domain-specific research. The Security Agent searches for breach history and certifications. The Financial Agent pulls credit information and funding status. The Compliance Agent identifies applicable regulations and verification status.

Each agent produces a dimensional score and a structured summary of findings. These aren't just numbers—they include the reasoning, the sources consulted, and the specific evidence that informed the assessment.

Aggregation and Risk Scoring

The dimensional scores feed into an overall risk calculation, but not through simple averaging. The weights are dynamic based on context:

A vendor handling payment data gets higher weight on Security and Compliance
A mission-critical vendor gets higher weight on Financial Stability and Operational Resilience
A vendor in a regulated industry gets higher weight on Compliance
A vendor with access to sensitive data gets higher weight on Security

This produces a composite risk score, but more importantly, it produces a risk profile—a multi-dimensional view that shows exactly where the risks concentrate.

Continuous Monitoring

I built a monitoring bot that runs continuous searches across all vendors, looking for material changes in news, security incidents, regulatory actions, and financial events. It's pretty simple, alerts trigger when keywords indicate something significant has changed.

Here's where this gets interesting: this general monitoring approach could be expanded into the specialized agent swarm I described earlier. Instead of one bot monitoring everything generically, you could build out dedicated agents for each risk dimension:

The Security Agent could run targeted daily searches for breach disclosures, CVE publications, certification status changes, and vulnerability reports specific to each vendor's technology stack.

The Financial Agent could monitor earnings releases, credit rating changes, executive departures, funding announcements, and bankruptcy filings—distinguishing between routine quarterly reports and material financial events.

The Compliance Agent could watch for regulatory enforcement actions, certification lapses, audit report publications, and changes in applicable regulations for each vendor's industry.

The Operational Resilience Agent could track service outages, infrastructure changes, geographic expansion or consolidation, and supply chain disruptions that might affect vendor reliability.

Each specialized agent wouldn't just flag events—it would reassess its dimension of risk and produce an updated evaluation. If the change is material enough, it could trigger a full reassessment across all dimensions. A vendor suffers a breach? The Security Agent picks it up within hours, reassesses security posture, and potentially escalates the overall risk score. A critical vendor announces layoffs? The Operational Resilience Agent flags the potential service delivery risk.

The beauty of starting with a general monitoring bot is that you can incrementally specialize it. Begin with broad keyword monitoring across all risk types, then gradually split off dedicated agents as you identify which dimensions need deeper, more nuanced evaluation. The infrastructure is the same—continuous, event-driven monitoring—but the intelligence layer gets progressively smarter as you add domain expertise to each agent.

Here's What Changed

Speed and Scale

Vendor assessments that used to take hours free up resources for daily check-ins that might take an hour of analyst review after the agents complete their research. More importantly, we can now actually monitor our full vendor portfolio continuously. Previously, we might reassess our top 50 vendors annually. Now every vendor is being monitored every day, with the system automatically escalating those that show concerning changes.

Earlier Detection and Better Visibility

We caught a vendor security incident at a critical supplier that no one else was aware of. The monitoring bot caught the incident, notified the security team, and mobilized teams to respond accordingly. Previously, we might not have known until the vendor's annual review or until they notified us (which they're required to do, but delays happen). Timing matters—we were able to immediately review what data they had access to, implement additional monitoring, and engage them before they'd even notified us.

Better Resource Allocation

The system doesn't replace human expertise; it directs it. Instead of analysts spending time on routine vendor research, they're focusing on high-risk situations flagged by the agents. Instead of annual reviews being a checkbox exercise, they're focused conversations about specific risk dimensions that matter.

Defensible Decision-Making

Every risk assessment is documented with sources, reasoning, and evidence. When procurement asks why we're flagging a vendor, we can show exactly what the Security Agent found, which sources it consulted, and why that matters for our risk framework. When auditors ask how we monitor third parties, we can demonstrate continuous monitoring with an audit trail of agent assessments and updates.

Every rating has to have a confidence score—an easy way for an analyst to identify if the AI is unsure of its findings. Low confidence scores either mean the agent needs better sources or the vendor needs manual investigation. This prevents the system from making confident-sounding assessments based on insufficient data and helps analysts know where to focus their validation efforts.

This transparency also makes the system easier to calibrate. When an agent's assessment seems off, we can examine its reasoning and adjust its evaluation criteria or sources. The system gets smarter over time.

The Challenges

Source Reliability

Not all information sources are equally reliable, and agents can be gullible. It's important to build source credibility weighting and teach the agents to distinguish between verified press releases and random claims on X.

Different domains have different source reliability hierarchies. The Financial Agent should trust SEC filings over news articles. The Security Agent should trust CVE databases over blog posts. Building these hierarchies into each agent's evaluation logic was essential.

False Positives and Alert Fatigue

When you monitor everything continuously, you catch everything—including a lot of noise. A routine leadership change can be flagged as financial instability. Minor security vulnerabilities in tangential systems can create Security Agent alerts.

We had to tune each agent's threshold for what constitutes a material change. This is ongoing calibration—too sensitive and analysts ignore alerts, too conservative and you miss real risks.

The Aggregation Problem

How do you combine six different dimensional risk scores into one overall score that actually means something? Simple averaging doesn't work because dimensions aren't equally important. Weighted averaging requires knowing the right weights, which vary by context.

Our solution is to maintain both a composite score (using context-aware weighting) and the full dimensional profile. The composite score helps with prioritization and trending, but analysts always have access to the underlying dimensional scores and can override when context demands it.

Handling Uncertainty

Agents don't always find complete information. A new startup might have no financial history. A foreign vendor might have limited public security information. Early on, agents would sometimes hallucinate information to fill gaps or make confident assessments based on insufficient evidence.

We had to explicitly teach agents to express uncertainty, to document when information is unavailable, and to distinguish between "we assessed this and found no issues" versus "we couldn't find information to assess this." Risk scoring now explicitly accounts for information availability (confidence scores)—insufficient data is itself a risk factor.

What This Means for Third-Party Risk Management

Traditional TPRM treats vendor assessment as a discrete project: gather information, analyze it, make a decision, file the report. This made sense when gathering information was labor-intensive and analysis was manual. But when AI can gather and synthesize information continuously, the limiting factor is no longer data collection.

The job of a TPRM program should be to continuously monitor all vendors and direct human attention to where it's needed. The specialized agent swarm makes this possible because each agent can maintain awareness of its domain across the entire vendor portfolio, freeing up analysts to focus on validation of AI analysis, deeper dives into flagged vendors, and escalation.

Previously, comprehensive monitoring was impractical—you could only afford to do it for your highest-risk vendors. Now, monitoring scales independently of vendor count. The constraint is building good agents and tuning them properly, not the volume of vendors to monitor.

Lessons for Building AI Risk Systems

If you're considering building something similar, here's what I'd prioritize:

Start with the risk framework, not the AI. The dimensional risk model came first. The AI is just the mechanism for evaluating against that framework. If your risk framework is poorly defined, AI won't fix it—it'll just scale the confusion. Spend a lot of time here to get the weights right and to ensure the output makes sense.

Specialization beats generalization. One agent trying to do everything will be mediocre at all of it. Specialized agents, each excellent in their domain, produce better results even if aggregating them is more complex. It reduces hallucination and makes error checking that much easier.

Build transparency into the system from the start. Every assessment should show its sources and reasoning. This isn't just for audit trails—it's how you debug and improve agent performance.

Tune for the right error profile. False negatives (missing real risks) are more dangerous than false positives (over-flagging). Tune your agents to be sensitive, then build escalation workflows that handle the noise.

Continuous monitoring requires continuous calibration. The agents need ongoing tuning as you learn what works. Source reliability changes. Risk frameworks evolve. New threats emerge. This isn't a build-it-and-forget-it system.

Humans still decide. The agents inform decisions; they don't make them. Every material risk finding should go through human review. The goal is to make humans more effective, not to replace judgment with automation.

What's Next

As AI gets better, the TPRM system built using AI will only get better. Right now, the system is reactive—it detects changes and reassesses risk. The next frontier is predictive: identifying risk trajectories before they materialize.

Eventually, I want the system to be extremely specific and prescriptive: not just "this vendor is high risk" but "this vendor is high risk because of X, and you should consider Y and Z as mitigation." I want the mitigations to be realistic ways for teams to decrease risk, ranked by effort to implement. The specialized agents already understand their domains well enough to suggest remediation. The challenge is aggregating those domain-specific recommendations into coherent, prioritized action plans.

The ultimate goal is a TPRM system that doesn't just alert you to problems but tells you exactly what to do about them—ranked by what's actually feasible for your team to implement. AI enables third party and frees up analyst time for the work that matters, understanding risk and implementing mitigation measures.