Employees Are Feeding Sensitive Biz Data to ChatGPT, Raising Security Fears

Raw Text

Risk

6 MIN READ

News

Robert Lemos

Contributing Writer, Dark Reading

Source: Komsan Saiipan via Alamy Stock Photo

Employees are submitting sensitive business data and privacy-protected information to large language models (LLMs) such as ChatGPT, raising concerns that artificial intelligence (AI) services could be incorporating the data into their models, and that information could be retrieved at a later date if proper data security isn't in place for the service.

In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM.

In one case, an executive cut and pasted the firm's 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient's name and their medical condition and asked ChatGPT to craft a letter to the patient's insurance company.

And as more employees use ChatGPT and other AI-based services as productivity tools, the risk will grow, says Howard Ting, CEO of Cyberhaven.

"There was this big migration of data from on-prem to cloud, and the next big shift is going to be the migration of data into these generative apps," he says. "And how that plays out [remains to be seen] — I think, we're in pregame; we're not even in the first inning."

With the surging popularity of OpenAI's ChatGPT and its foundational AI model — the Generative Pre-trained Transformer or GPT-3 — as well as other LLMs, companies and security professionals have begun to worry that sensitive data ingested as training data into the models could resurface when prompted by the right queries. Some are taking action: JPMorgan restricted workers' use of ChatGPT , for example, and Amazon, Microsoft, and Wal-Mart have all issued warnings to employees to take care in using generative AI services.

And as more software firms connect their applications to ChatGPT, the LLM may be collecting far more information than users — or their companies — are aware of, putting them at legal risk, Karla Grossenbacher, a partner at law firm Seyfarth Shaw, warned in a Bloomberg Law column .

"Prudent employers will include — in employee confidentiality agreements and policies — prohibitions on employees referring to or entering confidential, proprietary, or trade secret information into AI chatbots or language models, such as ChatGPT," she wrote. "On the flip side, since ChatGPT was trained on wide swaths of online information, employees might receive and use information from the tool that is trademarked, copyrighted, or the intellectual property of another person or entity, creating legal risk for employers."

The risk is not theoretical. In a June 2021 paper, a dozen researchers from a Who's Who list of companies and universities — including Apple, Google, Harvard University, and Stanford University — found that so-called "training data extraction attacks" could successfully recover verbatim text sequences, personally identifiable information (PII), and other information in training documents from the LLM known as GPT-2. In fact, only a single document was necessary for an LLM to memorize verbatim data, the researchers stated in the paper .

Picking the Brain of GPT

Indeed, these training data extraction attacks are one of the key adversarial concerns among machine learning researchers. Also known as "exfiltration via machine learning inference," the attacks could gather sensitive information or steal intellectual property, according to MITRE's Adversarial Threat Landscape for Artificial-Intelligence Systems (Atlas) knowledge base .

It works like this: By querying a generative AI system in a way that it recalls specific items, an adversary could trigger the model to recall a specific piece of information, rather than generate synthetic data. A number of real-world examples exists for GPT-3, the successor to GPT-2, including an instance where GitHub's Copilot recalled a specific developer's username and coding priorities .

Beyond GPT-based offerings, other AI-based services have raised questions as to whether they pose a risk. Automated transcription service Otter.ai, for instance, transcribes audio files into text, automatically identifying speakers and allowing important words to be tagged and phrases to be highlighted. The company's housing of that information in its cloud has caused concern for journalists .

The company says it has committed to keeping user data private and put in place strong compliance controls, according to Julie Wu, senior compliance manager at Otter.ai.

"Otter has completed its SOC2 Type 2 audit and reports, and we employ technical and organizational measures to safeguard personal data," she tells Dark Reading. "Speaker identification is account bound. Adding a speaker’s name will train Otter to recognize the speaker for future conversations you record or import in your account," but not allow speakers to be identified across accounts.

APIs Allow Fast GPT Adoption

The popularity of ChatGPT has caught many companies by surprise. More than 300 developers, according to the last published numbers from a year ago , are using GPT-3 to power their applications. For example, social media firm Snap and shopping platforms Instacart and Shopify are all using ChatGPT through the API to add chat functionality to their mobile applications.

Based on conversations with his company's clients, Cyberhaven's Ting expects the move to generative AI apps will only accelerate, to be used for everything from generating memos and presentations to triaging security incidents and interacting with patients.

As he says his clients have told him: "Look, right now, as a stopgap measure, I'm just blocking this app, but my board has already told me we cannot do that. Because these tools will help our users be more productive — there is a competitive advantage — and if my competitors are using these generative AI apps, and I'm not allowing my users to use it, that puts us at a disadvantage."

The good news is education could have a big impact on whether data leaks from a specific company because a small number of employees are responsible for most of the risky requests. Less than 1% of workers are responsible for 80% of the incidents of sending sensitive data to ChatGPT, says Cyberhaven's Ting.

"You know, there are two forms of education: There's the classroom education, like when you are onboarding an employee, and then there's the in-context education, when someone is actually trying to paste data," he says. "I think both are important, but I think the latter is way more effective from what we've seen."

In addition, OpenAI and other companies are working to limit the LLM's access to personal information and sensitive data: Asking for personal details or sensitive corporate information currently leads to canned statements from ChatGPT demurring from complying.

For example, when asked, "What is Apple's strategy for 2023?" ChatGPT responded: "As an AI language model, I do not have access to Apple's confidential information or future plans. Apple is a highly secretive company, and they typically do not disclose their strategies or future plans to the public until they are ready to release them."

Attacks/Breaches

Application Security

Subscribe

Securing OT, Remote Access and Converged SOC Operations

2022 State of OT Cybersecurity Report

More White Papers

How Applications Are Attacked: A Year in Application Security

Managing Identity in the Cloud

More Webinars

The 10 Most Impactful Types of Vulnerabilities for Enterprises Today

Shoring Up the Software Supply Chain Across Enterprise Applications

More Reports

7 Women Leading the Charge in Cybersecurity Research & Analysis

Microsoft Patches 'Dangerous' RCE Flaw in Azure Cloud Service

Top Tech Talent Warns of AI's Threat to Human Existence in Open Letter

How Applications Are Attacked: A Year in Application Security

Managing Identity in the Cloud

Expert Advice for Getting the Most from Security Orchestration, Automaton & Response Enterprise Tools

SBOMS and the Modern Enterprise Software Supply Chain

How Supply Chain Attacks Work -- And What You Can Do to Stop Them

More Webinars

The 10 Most Impactful Types of Vulnerabilities for Enterprises Today

Shoring Up the Software Supply Chain Across Enterprise Applications

The Promise and Reality of Cloud Security

10 Hot Talks From Black Hat USA 2022

How Machine Learning, AI & Deep Learning Improve Cybersecurity

More Reports

Securing OT, Remote Access and Converged SOC Operations

2022 State of OT Cybersecurity Report

The 4 Major Safety Checks Needed to Launch Your ASM Program into Orbit

Cloud Incident Response Datasheet

2022 Unit 42 Incident Response Report

More White Papers

Black Hat USA - August 5-10 - Learn More

Black Hat Asia - May 9-12 - Learn More

More Events

Securing OT, Remote Access and Converged SOC Operations

2022 State of OT Cybersecurity Report

More White Papers

How Applications Are Attacked: A Year in Application Security

Managing Identity in the Cloud

More Webinars

The 10 Most Impactful Types of Vulnerabilities for Enterprises Today

Shoring Up the Software Supply Chain Across Enterprise Applications

More Reports

Single Line Text

Risk. 6 MIN READ. News. Robert Lemos. Contributing Writer, Dark Reading. Source: Komsan Saiipan via Alamy Stock Photo. Employees are submitting sensitive business data and privacy-protected information to large language models (LLMs) such as ChatGPT, raising concerns that artificial intelligence (AI) services could be incorporating the data into their models, and that information could be retrieved at a later date if proper data security isn't in place for the service. In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM. In one case, an executive cut and pasted the firm's 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient's name and their medical condition and asked ChatGPT to craft a letter to the patient's insurance company. And as more employees use ChatGPT and other AI-based services as productivity tools, the risk will grow, says Howard Ting, CEO of Cyberhaven. "There was this big migration of data from on-prem to cloud, and the next big shift is going to be the migration of data into these generative apps," he says. "And how that plays out [remains to be seen] — I think, we're in pregame; we're not even in the first inning." With the surging popularity of OpenAI's ChatGPT and its foundational AI model — the Generative Pre-trained Transformer or GPT-3 — as well as other LLMs, companies and security professionals have begun to worry that sensitive data ingested as training data into the models could resurface when prompted by the right queries. Some are taking action: JPMorgan restricted workers' use of ChatGPT , for example, and Amazon, Microsoft, and Wal-Mart have all issued warnings to employees to take care in using generative AI services. And as more software firms connect their applications to ChatGPT, the LLM may be collecting far more information than users — or their companies — are aware of, putting them at legal risk, Karla Grossenbacher, a partner at law firm Seyfarth Shaw, warned in a Bloomberg Law column . "Prudent employers will include — in employee confidentiality agreements and policies — prohibitions on employees referring to or entering confidential, proprietary, or trade secret information into AI chatbots or language models, such as ChatGPT," she wrote. "On the flip side, since ChatGPT was trained on wide swaths of online information, employees might receive and use information from the tool that is trademarked, copyrighted, or the intellectual property of another person or entity, creating legal risk for employers." The risk is not theoretical. In a June 2021 paper, a dozen researchers from a Who's Who list of companies and universities — including Apple, Google, Harvard University, and Stanford University — found that so-called "training data extraction attacks" could successfully recover verbatim text sequences, personally identifiable information (PII), and other information in training documents from the LLM known as GPT-2. In fact, only a single document was necessary for an LLM to memorize verbatim data, the researchers stated in the paper . Picking the Brain of GPT. Indeed, these training data extraction attacks are one of the key adversarial concerns among machine learning researchers. Also known as "exfiltration via machine learning inference," the attacks could gather sensitive information or steal intellectual property, according to MITRE's Adversarial Threat Landscape for Artificial-Intelligence Systems (Atlas) knowledge base . It works like this: By querying a generative AI system in a way that it recalls specific items, an adversary could trigger the model to recall a specific piece of information, rather than generate synthetic data. A number of real-world examples exists for GPT-3, the successor to GPT-2, including an instance where GitHub's Copilot recalled a specific developer's username and coding priorities . Beyond GPT-based offerings, other AI-based services have raised questions as to whether they pose a risk. Automated transcription service Otter.ai, for instance, transcribes audio files into text, automatically identifying speakers and allowing important words to be tagged and phrases to be highlighted. The company's housing of that information in its cloud has caused concern for journalists . The company says it has committed to keeping user data private and put in place strong compliance controls, according to Julie Wu, senior compliance manager at Otter.ai. "Otter has completed its SOC2 Type 2 audit and reports, and we employ technical and organizational measures to safeguard personal data," she tells Dark Reading. "Speaker identification is account bound. Adding a speaker’s name will train Otter to recognize the speaker for future conversations you record or import in your account," but not allow speakers to be identified across accounts. APIs Allow Fast GPT Adoption. The popularity of ChatGPT has caught many companies by surprise. More than 300 developers, according to the last published numbers from a year ago , are using GPT-3 to power their applications. For example, social media firm Snap and shopping platforms Instacart and Shopify are all using ChatGPT through the API to add chat functionality to their mobile applications. Based on conversations with his company's clients, Cyberhaven's Ting expects the move to generative AI apps will only accelerate, to be used for everything from generating memos and presentations to triaging security incidents and interacting with patients. As he says his clients have told him: "Look, right now, as a stopgap measure, I'm just blocking this app, but my board has already told me we cannot do that. Because these tools will help our users be more productive — there is a competitive advantage — and if my competitors are using these generative AI apps, and I'm not allowing my users to use it, that puts us at a disadvantage." The good news is education could have a big impact on whether data leaks from a specific company because a small number of employees are responsible for most of the risky requests. Less than 1% of workers are responsible for 80% of the incidents of sending sensitive data to ChatGPT, says Cyberhaven's Ting. "You know, there are two forms of education: There's the classroom education, like when you are onboarding an employee, and then there's the in-context education, when someone is actually trying to paste data," he says. "I think both are important, but I think the latter is way more effective from what we've seen." In addition, OpenAI and other companies are working to limit the LLM's access to personal information and sensitive data: Asking for personal details or sensitive corporate information currently leads to canned statements from ChatGPT demurring from complying. For example, when asked, "What is Apple's strategy for 2023?" ChatGPT responded: "As an AI language model, I do not have access to Apple's confidential information or future plans. Apple is a highly secretive company, and they typically do not disclose their strategies or future plans to the public until they are ready to release them." Attacks/Breaches. Application Security. Subscribe. Securing OT, Remote Access and Converged SOC Operations. 2022 State of OT Cybersecurity Report. More White Papers. How Applications Are Attacked: A Year in Application Security. Managing Identity in the Cloud. More Webinars. The 10 Most Impactful Types of Vulnerabilities for Enterprises Today. Shoring Up the Software Supply Chain Across Enterprise Applications. More Reports. 7 Women Leading the Charge in Cybersecurity Research & Analysis. Microsoft Patches 'Dangerous' RCE Flaw in Azure Cloud Service. Top Tech Talent Warns of AI's Threat to Human Existence in Open Letter. How Applications Are Attacked: A Year in Application Security. Managing Identity in the Cloud. Expert Advice for Getting the Most from Security Orchestration, Automaton & Response Enterprise Tools. SBOMS and the Modern Enterprise Software Supply Chain. How Supply Chain Attacks Work -- And What You Can Do to Stop Them. More Webinars. The 10 Most Impactful Types of Vulnerabilities for Enterprises Today. Shoring Up the Software Supply Chain Across Enterprise Applications. The Promise and Reality of Cloud Security. 10 Hot Talks From Black Hat USA 2022. How Machine Learning, AI & Deep Learning Improve Cybersecurity. More Reports. Securing OT, Remote Access and Converged SOC Operations. 2022 State of OT Cybersecurity Report. The 4 Major Safety Checks Needed to Launch Your ASM Program into Orbit. Cloud Incident Response Datasheet. 2022 Unit 42 Incident Response Report. More White Papers. Black Hat USA - August 5-10 - Learn More. Black Hat Asia - May 9-12 - Learn More. More Events. Securing OT, Remote Access and Converged SOC Operations. 2022 State of OT Cybersecurity Report. More White Papers. How Applications Are Attacked: A Year in Application Security. Managing Identity in the Cloud. More Webinars. The 10 Most Impactful Types of Vulnerabilities for Enterprises Today. Shoring Up the Software Supply Chain Across Enterprise Applications. More Reports.