As the tech sector races to develop and deploy a crop of powerful new AI chatbots, their widespread adoption has ignited a new set of data privacy concerns among some companies, regulators and industry watchers.
Some companies, including JPMorgan Chase (JPM), have clamped down on employees’ use of ChatGPT, the viral AI chatbot that first kicked off Big Tech’s AI arms race, due to compliance concerns related to employees’ use of third-party software.
It only added to mounting privacy worries when OpenAI, the company behind ChatGPT, disclosed it had to take the tool offline temporarily on March 20 to fix a bug that allowed some users to see the subject lines from other users’ chat history.
The same bug, now fixed, also made it possible “for some users to see another active user’s first and last name, email address, payment address, the last four digits (only) of a credit card number, and credit card expiration date,” OpenAI said in a blog post.
And just last week, regulators in Italy issued a temporary ban on ChatGPT in the country, citing privacy concerns after OpenAI disclosed the breach.
A ‘black box’ of data
“The privacy considerations with something like ChatGPT cannot be overstated,” Mark McCreary, the co-chair of the privacy and data security practice at law firm Fox Rothschild LLP, told CNN. “It’s like a black box.”
With ChatGPT, which launched to the public in late November, users can generate essays, stories and song lyrics simply by typing up prompts.
Google and Microsoft have since rolled out AI tools as well, which work the same way and are powered by large language models that are trained on vast troves of online data.
When users input information into these tools, McCreary said, “You don’t know how it’s then going to be used.” That raises particularly high concerns for companies. As more and more employees casually adopt these tools to help with work emails or meeting notes, McCreary said, “I think the opportunity for company trade secrets to get dropped into these different various AI’s is just going to increase.”
Steve Mills, the chief AI ethics officer at Boston Consulting Group, similarly told CNN that the biggest privacy concern that most companies have around these tools is the “inadvertent disclosure of sensitive information.”
“You’ve got all these employees doing things which can seem very innocuous, like, ‘Oh, I can use this to summarize notes from a meeting,’” Mills said. “But in pasting the notes from the meeting into the prompt, you’re suddenly, potentially, disclosing a whole bunch of sensitive information.”
If the data people input is being used to further train these AI tools, as many of the companies behind the tools have stated, then you have “lost control of that data, and somebody else has it,” Mills added.
OpenAI also published a new blog post Wednesday outlining its approach to AI safety. “We don’t use data for selling our services, advertising, or building profiles of people — we use data to make our models more helpful for people,” the blogpost states. “ChatGPT, for instance, improves by further training on the conversations people have with it.”
“These sample conversations are reviewable by trained reviewers and kept for up to 3 years, separately from your Google Account,” the company states in a separate FAQ for Bard. The company also warns: “Do not include info that can be used to identify you or others in your Bard conversations.” The FAQ also states that Bard conversations are not being used for advertising purposes, and “we will clearly communicate any changes to this approach in the future.”
Google also told CNN that users can “easily choose to use Bard without saving their conversations to their Google Account.” Bard users can also review their prompts or delete Bard conversations via this link. “We also have guardrails in place designed to prevent Bard from including personally identifiable information in its responses,” Google said.
“We’re still sort of learning exactly how all this works,” Mills told CNN. “You just don’t fully know how information you put in, if it is used to retrain these models, how it manifests as outputs at some point, or if it does.”
Mills added that sometimes users and developers don’t even realize the privacy risks that lurk with new technologies until it’s too late. An example he cited was early autocomplete features, some of which ended up having some unintended consequences like completing a social security number that a user began typing in — often to the alarm and surprise of the user.
Ultimately, Mills said, “My view of it right now, is you should not put anything into these tools you don’t want to assume is going to be shared with others.”