AI agents ‘underperform’ on confidentiality handling

July 9, 2025

1067

New research led by Salesforce AI scientist Kung-Hsiang Huang highlights key limitations in the current use of AI agents within customer relationship management (CRM) workflows — particularly in B2B settings where data privacy and multi-step coordination are critical.

The study found that large language model (LLM) agents had a 58% success rate on simple, single-step CRM tasks but struggled significantly with complexity. When tasks required multiple steps or follow-up actions, the success rate dropped to just 35%.

Weaknesses in multi-step execution

The biggest issue? AI agents faltered when they needed to proactively gather information or clarify incomplete inputs — something that’s commonplace in B2B marketing and customer engagement. According to the report, agents were more effective when they engaged in clarification dialogues, suggesting that models still lack the initiative to resolve ambiguity unless prompted.

This insight is particularly relevant for B2B marketers using AI in lead qualification, support workflows, or customer onboarding, where information is often incomplete or evolving.

Confidentiality concerns

A more pressing concern for marketers, however, is the agents’ poor handling of sensitive data. The study revealed that LLMs, including those used in agent-based workflows, show “low confidentiality awareness.” In short, they do not intuitively understand what counts as private or how to manage it responsibly.

While marketers can attempt to correct this via prompts that instruct the AI to avoid sharing or using sensitive information, the solution is unreliable. The safeguards tend to degrade over time, especially in longer conversations. The report also found that open-source models performed particularly poorly on this front, struggling to follow nuanced, layered instructions.

For B2B brands working with personally identifiable information (PII), proprietary business data or confidential client insights, these findings raise serious concerns about compliance, security and brand risk.

Where agents do perform well

Despite these issues, LLM agents showed strong performance in structured workflow execution — such as executing predefined actions or pulling simple reports — with a success rate of up to 83% for single-turn tasks. That suggests opportunities still exist for using AI in tightly scoped, low-risk CRM functions, provided guardrails are in place.

Key implications for B2B marketers

Caution with sensitive data: Do not deploy LLM agents in workflows involving PII, financial data, or proprietary business information without robust testing and safeguards.
Avoid complex conversations: Limit the use of AI agents to simpler, single-action tasks unless your model is specifically designed to handle iterative problem-solving.
Prompt design matters: Effective prompting and periodic reinforcement can improve outcomes — but they are not a fail-safe, especially for long or multi-turn interactions.
Use in structured workflows: Agents can add value in standardised, repeatable marketing or sales support tasks — such as sending follow-ups or tagging leads — where decision-making is limited.

The bottom line

Despite their rapid evolution, AI agents are not yet ready to replace humans in CRM tasks that require contextual judgment, confidentiality, or flexible dialogue. For B2B marketers, particularly those working with high-value accounts or sensitive data, the message is clear: tread carefully, and don’t confuse automation with maturity.

Want the latest B2B marketing news straight to your inbox? Subscribe to our free weekly newsletter!

Interested in sales, marketing or business skills courses and training? Check out our training partner, Learning Room.