Cybersecurity

How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

30 sec read84 views0 listens

Researchers have found that web-scraped datasets used to train vision-language AI models often ignore data owners' wishes regarding consent, raising ethical concerns and legal risks such as copyright infringement lawsuits. Analyzing DataComp, a dataset of 12.8 billion text-image pairs, the study reveals that many samples come from sites with terms prohibiting scraping and contain indications of copyright notices or watermarks, indicating current AI data collection practices need improvement to respect user consent.

Read the full article at arXiv cs.CR (Cryptography & Security)

Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

Singapore turns AI scrutiny towards chatbots, personal data, and digital twins

Singapore is enhancing regulations around the use of personal data in generative AI, requiring clearer consent from users and transparency from companies. This move aims to build trust and set a standard for responsible AI usage across Southeast Asia...

Ali Nemati

Sustainability & ClimateJun 2228 sec read

America's data center backlash is bipartisan - can it stay that way?

Opposition to new data centers, driven by concerns about energy consumption, water scarcity, and land use, is emerging as a bipartisan issue across the U.S. This backlash is impacting project development and reshaping local and state politics, as com...

Ali Nemati

CybersecurityApr 1326 sec read

Citizen Lab: Webloc tracked 500M devices for global law enforcement

Citizen Lab reported that law enforcement agencies globally used the Webloc tool to track up to 500 million devices via advertising data, raising serious privacy and legal concerns. This practice allows extensive surveillance without adequate oversig...

Ali Nemati

Tech & GadgetsMar 3024 sec read

OkCupid settles FTC case on alleged misuse of its users' personal data

OkCupid has settled a long-standing FTC lawsuit over alleged sharing of user data with Clarifai without proper disclosure or consent. This settlement underscores the importance for tech professionals to adhere strictly to privacy policies and transpa...

Ali Nemati

CybersecurityMar 1023 sec read

Law enforcement disrupted Tycoon 2FA phishing-as-a-service platform

Law enforcement disrupted Tycoon 2FA, a phishing-as-a-service platform responsible for sending millions of fraudulent emails to over 500,000 organizations globally each month. This action significantly reduces the risk of account takeovers and follow...

Ali Nemati

How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

Related Articles

Singapore turns AI scrutiny towards chatbots, personal data, and digital twins

America's data center backlash is bipartisan - can it stay that way?

Citizen Lab: Webloc tracked 500M devices for global law enforcement

OkCupid settles FTC case on alleged misuse of its users' personal data

Law enforcement disrupted Tycoon 2FA phishing-as-a-service platform