As Meta prepares to unveil its expansive Llama 3 AI model, indications suggest that the company has already using Instagram and Facebook data for its training. Users of these platforms are automatically enrolled to provide consent for this data usage. Although they retain the choice to opt out, the process involves multiple steps, raising potential concerns regarding data privacy.
To power chatbots like ChatGPT, Gemini, or Claude, extensive datasets are essential for training. Insufficient training data is recognized as a potential obstacle to the future development of generative AI.
In contrast to its rivals, Meta holds a significant edge due to the massive volumes of user-generated data on its social media platforms. Over the last couple of days, Meta has notified its users that it is planning new AI features.
What do these AI notifications entail?
Several users of Meta products received notifications stating that, starting from June 26, the company will enhance its "AI at Meta experiences", incorporating Meta AI and AI creative tools. It read, “To help bring these experiences to you, we’ll now rely on the legal basis called legitimate interests for using your information to develop and improve AI at Meta.”
On its Gen AI privacy page, Meta says, “We also use information shared on Meta’s Products and services. This information could be things such as posts or photos and their captions.” This implies that Meta's AI models will be trained using all the photos you share on Facebook and Instagram, regardless of whether your accounts are private or public.
The company also intends to gather personal information from third-party services to train and improve its AI model. Meta goes on to say that private messages will not be used for training AI models.
The privacy policy further states, “Even if you don’t use our Products and services or have an account, we may still process information about you to develop and improve AI at Meta.” This means that the non-users' information can be used if shared by others, even if they don't have Meta accounts.
Explaining how the training works, Meta's policy read, these models are trained by analysing billions of images along with their accompanying text captions, which provide descriptive information about the images. The model learns the relationship between the text descriptions and the images. Once this association is understood, the model can generate new images based on text descriptions provided by users.
How to opt-out of it?
Although users have the right to opt out, the process seems unnecessarily complicated. Instead of enabling a single-click option, Meta requires users to provide an explanation. Moreover, Meta reserves the right to deny your request.
Meta's Policy update notice read, "We will now rely on the legal basis called legitimate interests for using your information to develop and improve AI at Meta. This means that you have the right to object to how your information is used for these purposes. If your objection is honored, it will be applied going forward."
Several social media users have noted that the process is deliberately made to be highly cumbersome, likely to reduce the number of people who will raise objections.
In order to opt out on Instagram, tap the drop-down menu in the top-right corner. Scroll to the bottom and select "Help", then tap on "Help Center". Next, tap "About AIs on Instagram" and then "How Meta uses information for generative AI models".
Scroll down and tap "Learn more and submit requests here". Choose a suitable option and fill out the form. Meta requires users to provide proof with screenshots and relevant prompts to make the opt-out process more difficult. One will need to enter a one-time code sent to the email to complete the request. After processing, you will receive an email indicating whether your request was honoured.
For Facebook, ensure you are logged into your account in your browser. Open the "AI at Meta Data Subject Rights" page, select the appropriate opt-out option, fill out the form, and click "Send". You will need to attach evidence showing how Meta’s AI model generated your personal information.
However, to fill out the form, users are given three options related to third party data being used for "improving AI at Meta":
-"Access, download, or correct personal information from third parties used for AI at Meta."
-"Delete personal information from third parties used for AI at Meta."
-"Submit a concern about personal information from third parties related to an AI response at Meta."
Thus, there is no clear mention of opting out of personal data sharing with the models. The options provided are limited and specifically pertain to third parties.
AI firms' race for data exploitation
With the rapid advancement of artificial intelligence, data privacy concerns have become increasingly urgent. Nearly everyone has posted something online, and it is likely that AI companies have utilised this information to train their generative AI models.
Large language models, such as ChatGPT, and image creators depend on extensive datasets to operate effectively. Technology companies frequently scrape data from the web to develop and train these AI models. As a result of which these tech companies have come on the radar of several copyright infringement lawsuits.
OpenAI has been facing several legal challenges regarding its use of copyrighted online content. In December, the New York Times announced its intention to sue the company, alleging that it had used "millions" of articles published by the media organisation to train its ChatGPT AI model. Additionally, in September, authors George R.R. Martin and John Grisham revealed plans to file a claim, alleging that their copyrighted works were used without permission to train the system.
Companies such as Reddit and X have started selling or licensing user data to AI firms. Despite increasing regulatory demands and investigations into these practices, progress in providing users with greater control over their online data remains slow.
Earlier this year, Google reached an agreement with Reddit for access to AI training data, reportedly worth $60 million annually. This deal allows Google to use real-time data from Reddit and leverage Google AI to improve Reddit's search capabilities.
Subsequently, Reddit disclosed this partnership, highlighting its commitment to providing Google with more efficient methods for training AI models. Through Reddit's data API, Google could access "the platform's dynamic content, ensuring streamlined access to Reddit's extensive content repository".
Last year, X also updated its privacy policy to state that it may use data collected on the platform to train its AI models. Following this, Elon Musk tweeted a clarification that this will only include public data, not direct messages or anything private.
The platform, in July 2023, imposed various restrictions on its platform to curb access to AI services like Google Bard and ChatGPT from scraping its data to train their models. Musk even threatened to sue Microsoft for this. This clearly indicated a strategic move by X to capitalise on all that data for its own benefit.