this post was submitted on 16 Apr 2026

9 points (80.0% liked)

Programming

27059 readers

203 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 3 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

Are we doing crimes when scrapping data online? For example public available music? (jeferson.me)

submitted 1 month ago by Shin@piefed.social to c/programming@programming.dev

33 comments fedilink hide all child comments

I'm trying to get to a reason on this, but my point reach to a limit.

I've the feels that scraping the internet for public accessible data, like for example open and public music on Spotify wouldn't be a crime, but the distribution would be. At the same token, this is seem as a crime, while Google does the same and nothing happens, even worse, if this get regulated, Google would have a huge advantage on anyone else.

So, my deeper question is: "Is copyright dead?"

all 34 comments

sorted by: hot top controversial new old

[–] one_old_coder@piefed.social 17 points 1 month ago* (last edited 1 month ago) (1 children)

It's illegal but all AI companies do it more than you'll ever do. You have my permission.

I still buy on Bandcamp because they deserve it.

[–] TehPers@beehaw.org 1 points 1 month ago

It's illegal

Sauce? Also, where?

[–] esc@piefed.social 12 points 1 month ago (2 children)

I can't care less about copyright and 'crimes' of copying.

[–] thingsiplay@lemmy.ml 1 points 1 month ago

So you don't care about GPL and Open Source then?

[+] middlemanSI@lemmy.world -6 points 1 month ago (3 children)

I guess you never created an original of anything? Maybe I read that wrong..

[–] Luminous5481@anarchist.nexus 10 points 1 month ago (2 children)

I've written more than one piece of software, and plenty of wordpress themes. I always release them without a license, for anyone to use however they want. copyright is capitalist nonsense and only exists to gatekeep creative freedom and stifle innovation.

[–] middlemanSI@lemmy.world 4 points 1 month ago

I hate capitalism and the way it devalues people, reducing them to consumers. The fact remains we live in it, and have to eat. If you release everything to AI crawlers, what do you eat, assuming you don't lay tiles for a living, which would make you "rich" but very busy..

[–] eager_eagle@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

you should release it as public domain; unfortunately others can't "legally" use anything just because they have access to it (no license).

yeah, it's absolutely stupid capitalist theater

[–] Luminous5481@anarchist.nexus 2 points 1 month ago (1 children)

I don't care about laws or legality, and neither should you.

[–] thingsiplay@lemmy.ml 3 points 1 month ago

I think everyone should care about lawas and legality. It's a bad advice to tell anyone to not care.

[–] Shin@piefed.social 6 points 1 month ago (1 children)

With the slow-death of copyright, what else is left? And if not dead, how can we reclaim it? I've so many questions, and I can't focus on a single thing :(

[–] thingsiplay@lemmy.ml 1 points 1 month ago (1 children)

Copyright is not dying, that is what Ai companies and those who do not care want you to believe. So you stop caring too. Copyright is an important law around the world. Just because there are loopholes and current difficulties and not being clear, does not mean its dead or dying. It just means (as always) needs some new adjustments and clarification to adapt to new technology.

[–] Shin@piefed.social 1 points 1 month ago (1 children)

I've the impression that copyright isn't for the "small guy", but for the "big tech"

[–] thingsiplay@lemmy.ml 0 points 1 month ago (1 children)

It's also for the small guy, so the big guy does not steal your ideas and use it without compensation.

[–] Shin@piefed.social 1 points 1 month ago (1 children)

Th logs on my blog say otherwise

The logs on my git repository says otherwise too

[–] thingsiplay@lemmy.ml 0 points 1 month ago (1 children)

What exactly do you mean?

[–] Shin@piefed.social 0 points 1 month ago (1 children)

That I've logs that prove that some companies are scraping my posts and code when they should not.

[–] thingsiplay@lemmy.ml 1 points 1 month ago (1 children)

I know they do this. But it does not change the fact that copyright is for the small guy AND for the big guys.

[–] Shin@piefed.social 1 points 1 month ago (1 children)

I don't get it them, I've strictly said in my blogs, and pages: "do not scrape", I've robots.txt also explicating stating: "do not enter bot", And yet they scrape my data. Even when it's clear that this isn't welcome, so my copyright is already violated, I can get parts of my test from the Gemini and OpenAI, so it's already in their system.

The copyright is already broken, You are suggesting that I should try to sue them? I don't follow man... really sorry.

[–] thingsiplay@lemmy.ml 2 points 1 month ago (1 children)

It's not the copyright that is broken, but its enforcement. And just because people violate copyright does not mean it is not meant for you as well. Just with any other law. Just because someone breaks the law, does not mean it was not meant for you as well. The copyright is there for you too. That is the purpose.

[–] Shin@piefed.social 0 points 1 month ago (1 children)

Gotcha... I think I understand your point. Yet, the lack of enforcement for this law for the small guy, and the dozens of cases of the big guy using this to screw the small guy proves somewhat contrary on it... But I think that we disagree on the philosophical implication on the topic, not on the topic itself.

And before I forget, thanks a lot for clarifying the your point of view, even that we don't fully agree in totality on the topic, I appreciate your effort and reasoning with me.

[–] thingsiplay@lemmy.ml 2 points 1 month ago (1 children)

Glad we understood our points of view. I honestly didn't know how to express myself better. And thank you as well staying calm without the typical forum toxicity. :-) I guess there is not much else to discuss about this particular point anymore, as we cleared any prior misunderstandings up.

Ultimately, the law is there, the execution is not.

[–] Shin@piefed.social 2 points 1 month ago

Yeah... maybe the revision shouldn't be on the law itself, but on the fact that the economic power can easily translate into law/gov power and this could be a major issue... But this is a total side-step on the topic... And another thing to my already dizzy mind to think about it :D

[–] esc@piefed.social 3 points 1 month ago

I did, obtaining a monopoly on it would go counter my beliefs. Anyway originality is overrated and very hard to measure. Especially now.

[–] misk@piefed.social 5 points 1 month ago* (last edited 1 month ago)

Not only copyright is dead but so is licensing of things in general. This means there’ll be less original work from both commercial and non-commercial projects. Commercially there won’t be ways to profit so why bother. On the libre licensing front why would you contribute code to GPL licensed projects or release art under Creative Commons if it’s going to be license washed anyway?

[–] Luminous5481@anarchist.nexus 4 points 1 month ago

no, in almost all cases internet piracy is not a crime. it is a civil issue. now if you were scraping information that wasn't public, that could be a crime depending on the circumstances.

[–] pinball_wizard@lemmy.zip 3 points 1 month ago* (last edited 1 month ago)

Legally, check your local laws or just be sure to cover your ass with tor or a VPN with an anonymous core.

Ethically, just obey Wheaton's Law: "Don't be a dick."

With web scraping, I can think of two ways Wheaton's law applies:

Scrapers should blend into existing background web traffic. They should be slow enough to not overwhelm their servers. This requires babysitting any new scraper until one is sure it is tuned to be safe for the scraped site.
Any scraped content shouldn't be re-hosted in a way that harms the original content creators. Sharing is lovely. Harming artists sucks. Finding the right balance between preservation and respect can take some thought, but it's usually actually a pretty wide road.

[–] thingsiplay@lemmy.ml 2 points 1 month ago (1 children)

As far as I understand, Google scrapes data, processes it and uses it for commercial cases. It's a company, not a private person scraping and using for personal cases. A very important distinction.

[–] Shin@piefed.social 1 points 1 month ago

Since it's a company, it should not use our data, right? right? It's my data, it can't use my post for training, right? It's not fair use... right?

[–] TehPers@beehaw.org 2 points 1 month ago* (last edited 1 month ago) (1 children)

There's no obvious answer to your question without more information (for example, where are you?) but I'm not aware of scraping being illegal anywhere, with some exceptions. For example, in the US (where I am), as long as you're not doing "illegal hacking" to scrape your data, you're probably fine.

There are TOSs that websites like to impose as well. If you have to agree to one to access any data, you should follow it. Breaking the TOS isn't really "illegal" in a criminal sense (in the US), but you may expose yourself to anything from being blocked from the site to a lawsuit. Bypassing blocks might also be illegal, though you'd have to speak to a lawyer to know more about that.

[–] Shin@piefed.social 1 points 1 month ago (1 children)

That's the point, my focus is on the "Europe" as a general place, since they need to sync the "law" to some degree, there is different levels, but the base line are the same.

Most public data, like all the music in Spotify don't require a cookie. So I could in theory scrape all the Spotify music to "listem later". This wouldn't be "illigal", but if that's the case Annas Archive should be "fine"... (I know that they are distributing, and this is the fight)

But, if they scrapped the music, and I scrape we would have the same "dataset", so if I download the Annas "dataset", would it be different from mine? So if I prefer to download the Anna's dataset instead of scrape myself, would this be illigal? They aren't selling (on the contrary of Google).

There is way to many questions in my head :(

[–] TehPers@beehaw.org 2 points 1 month ago

This wouldn't be "illigal", but if that's the case Annas Archive should be "fine"... (I know that they are distributing, and this is the fight)

I don't know much about European law, but redistribution changes things a lot here in the US. At least here, it then gets into copyright law, and you'd be reproducing copyrighted works without authorization (the Internet Archive attempted to get around this with books by getting legitimate copies of the books, digitizing them, then "lending" the digital copies of those books).

So if I prefer to download the Anna's dataset instead of scrape myself, would this be illigal?

No idea in Europe. In the US, it might be, depending on what the contents of the work are. I believe Anna's Archive would count as piracy in this case, though scraping directly from Spotify might not be because they are redistributing the music with authorization from the copyright holder. It gets pretty confusing, honestly.

Regardless, if you aren't doing things at large scale, even if you are breaking a law by downloading pirated content, it's unlikely anyone will care. People usually only really start caring if you start redistributing stuff, so as long as you aren't hosting what you're scraping, you're unlikely to run into any trouble.

[–] phonics@lemmy.world 1 points 1 month ago

Google would have some kind of licence in place I suspect. But what about people with photographic memory?