Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Hello - I'm EA-adjacent and have a cursory understanding of AI alignment issues. Thought I'd toss out a naive question!

AI systems rely on huge amounts of training data. Many people seem reluctant to share their data with these systems. How promising are efforts to limit or delay the power of AI systems by putting up legal barriers so that they can't scrape the internet for training data?

For example, I could imagine laws requiring anyone scraping the internet to ensure that they are not collecting data from people who have denied consent to have their data scraped. Even if few people deny consent in practice, the process of keeping their data out, or removing it later on, could be costly. This could at least buy time.

New Answer
New Comment

1 Answers sorted by

ChristianKl

Dec 11, 2022

20

This is basically the discussion at https://www.lesswrong.com/posts/vsuMu98Rwde5krxSJ/should-we-push-for-requiring-ai-training-data-to-be-licensed

If you can successfully argue in court that it's a copyright violation to use data without having acquired the copyright it would made it significantly harder.

Otherwise, European citizens who's name is known by an AI system could make GDPR requests and ask what data is stored on them and then ask for that data to be deleted.

1 comment, sorted by Click to highlight new comments since: Today at 6:12 PM
[-]Dacyn1yΩ11-1

-"For example, I could imagine laws requiring anyone scraping the internet to ensure that they are not collecting data from people who have denied consent to have their data scraped."

In practice this is already the case, anyone who doesn't want their data scraped can put up a robots.txt file saying so, and I imagine big companies like OpenAI respect robots.txt. I guess there could be advantages in making it a legal rule but I don't think it matters too much.