How Attackers Are Adding AI Voice Cloning to Microsoft Teams Attacks

How Attackers Are Adding AI Voice Cloning to Microsoft Teams Attacks

How Attackers Are Adding AI Voice Cloning to Microsoft Teams Attacks

https://thehackernews.com/expert-insights/2026/06/how-attackers-are-adding-ai-voice.html

Publish Date: 2026-06-08 03:03:00

Source Domain: thehackernews.com

Microsoft Teams’ cross-tenant collaboration feature, which allows external accounts to message employees directly, is enabled by default in most enterprise deployments. Most organizations have never audited or restricted it. That default setting has become one of the more reliable social engineering entry points security teams are managing today.

The base attack is straightforward. An attacker creates an external Teams account, identifies a target through LinkedIn or a company directory, and sends a message posing as IT helpdesk staff. The message cites an urgent account issue (an MFA problem, a security alert, a failed login) and asks the employee to open Quick Assist, a built-in Microsoft remote assistance tool, and approve a session.

What has changed recently is the layer added on top of that initial contact: an AI-generated voice that sounds like someone the target already knows.

How the Base Attack Chain Unfolds

Once Quick Assist access is established, the attack follows a consistent sequence:

  1. Reconnaissance via Command Prompt and PowerShell to map the environment and assess access level
  2. Malware delivery through DLL side-loading, hiding malicious code inside trusted application processes to avoid detection
  3. Persistence through registry modifications that survive reboots
  4. Command-and-control over HTTPS, blending with normal web traffic
  5. Lateral movement via Windows Remote Management (WinRM) toward domain controllers
  6. Data collection and exfiltration using tools like Rclone, automatically prioritizing high-value files

Every component here is standard Microsoft tooling. No novel exploit or custom malware is required. The attack’s effectiveness rests entirely on a human decision made in the first few minutes of contact.

The AI Voice Layer

AI voice synthesis tools can generate voice replicas from recorded audio samples. Quality and required sample length vary across tools; production-quality results typically require between 30 seconds and several minutes of…

Source