Rendered at 21:52:53 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
danorama 2 days ago [-]
There’s a fallacy that gets used a whole lot to justify things like this (not just with LLMs), and I see it in many of the comments here:
If it’s OK (or at least negligible on a small scale), then it must be OK on a large scale.
It usually goes something like: If I can make money by learning something from a web page, why does a computer making money by learning everything from everyone upset people so? It’s the same thing!
It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.
“You say I can pick one flower, but you get upset when I take a bunch. That’s inconsistent. Check and mate.”
But quantitative changes in an activity produce qualitative changes. Everyone knows this, but sometimes they seem to find it inconvenient to admit it. Not that effects of the qualitative change are always bad, but they are often different, and worth considering rather than dismissing.
svachalek 2 days ago [-]
We ran into a lot of stuff like this in the early days of the web. For example, there was a lot of information that was "public" in that anyone could go to the city courthouse and ask to see the documents. But it changed in nature when you could suddenly look up anyone in the country by typing their name in your browser.
abustamam 2 days ago [-]
I am not quite sure why my address history, known aliases, and sometimes phone number, are publicly available to anyone who Googles my name, and I'm not sure how to opt out of this.
latexr 2 days ago [-]
> I'm not sure how to opt out of this.
If you’re a EU citizen, do a web search for “right to be forgotten”.
Then as a California resident you should presumably be googling instead about the CCPA and the right to delete :)
abustamam 2 days ago [-]
I've done that. They pop up like hydra heads. The point isn't right to delete. The point is right to not have my personal info plastered all over the internet without me having to contact each site and say "plz stop" and for them to say "OK we'll do it in 7-10 business days"
d3Xt3r 2 days ago [-]
You'd be considered lucky if you can even find an email address. I had some of my personal info crop up on Google searches on Fedora mailing lists, I've emailed various people at Fedora to get them delete my old messages or redact them, but never received a response. :(
mswphd 2 days ago [-]
we used to ship mass lists of addresses and phone numbers to people in each town and it was fine/appreciated.
disposition2 2 days ago [-]
You could also easily opt out with the single entity that shipped that information.
jhbadger 2 days ago [-]
Yes, but getting an unlisted number was considered weird and against the norm even if possible. Even in the early 2000s when I dropped my landline, my parents were aghast - "if you do that, you won't be in the phone book! How will anyone get in contact with you?"
bigbuppo 2 days ago [-]
And you often had to pay for the privilege... A dollar a month for them to not put your name and number in the phonebook.
pera 2 days ago [-]
Yeah and back then it wasn't used as a sort of UUID to track every single thing you do in your life... Different times
_doctor_love 2 days ago [-]
You ever had a bump in the night my guy?
Or a stalker?
nitwit005 2 days ago [-]
For a practical example of that, a lot of documents used to have things like social security numbers, and they started stripping that information off once it was visible online.
Meph504 2 days ago [-]
> It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.
The problem here is, in your example the small scale example, and the large scale example are both unacceptable behavior.
Learning from others at a small scale is not only socially acceptable, but is the foundation of how advancement works.
So this concept of the issue of the scale being the issue isn't at its core the problem, its that something that that is desired behavior in a human, is not socially acceptable because of a machine is doing it.
customguy 2 days ago [-]
What a total non-sequitur. You think you found flaw in one of the examples, instead of seeing if you can come up with better ones, you say it so it can't be this, therefore it's a completely different thing that makes zero sense. Machines aren't "doing" anything, they're being wielded by humans. And they're doing it at scale, to other humans, via the force multiplication of machines.
Meph504 1 days ago [-]
Why would I attempt to come up with a "better" example of a premise I reject?
Your vague response doesn't seem to have anything to do with the the base subject this whole thing revolves around. Plagiarism be it small scale or large isn't acceptable, and the idea that humans doing things that are wrong is ok, but AI doing the same thing at large scale is not ok?
customguy 1 days ago [-]
> Your vague response doesn't seem to have anything to do with the the base subject this whole thing revolves around.
No, I instead refuted your reply.
> but AI doing the same thing at large scale is not ok?
No, humans doing things can be okay or not so okay depending on the scale they do them at. "AI" isn't "doing" anything by itself, at all, so that doesn't enter into it at all. You cannot separate "scale" and "thing". Rubbing your hands to make them warmer is fine, igniting a nuke is not, both aren't "basically the same thing, raising temperature, just at different scales". You didn't reject the premise, you didn't understand it in the first place, and knocked down your own straw man instead. Which I pointed out, that's all.
Meph504 1 days ago [-]
Ha, I'm sorry do you think you've made a logical point by comparing rubbing your hands together, and "igniting a nuke."
Again, this isn't a "this at small scale is ok, but at large scale it isn't" argument. Small scale plagiarism isn't acceptable, neither is large scale.
You are refuting my reply seemingly without the context of the article, and larger issue at hand.
Don't be condescending when you aren't even accurately a following the original premise or purpose.
customguy 1 days ago [-]
The context of this subthread is explaining that
> If it’s OK (or at least negligible on a small scale), then it must be OK on a large scale.
is a fallacy. Which it is. You confirm this by apparently seeing a difference between generating a little bit of heat and a whole lot, to name one of infinite examples anyone can easily come up with.
> Again, this isn't a "this at small scale is ok, but at large scale it isn't" argument.
You just keep doing the thing I pointed out in my first reply, you claim "it's not this" on a technicality, and then say "so therefore it's this instead", and the other thing is a criticism nobody brings up, ever.
And it gains you nothing, because if plagiarism isn't even okay at small scale, surely you can see how it's even less okay at big scale.
thedevilslawyer 1 days ago [-]
There are no such examples (recommended for humans, but abhorrent for machines).
customguy 1 days ago [-]
> (recommended for humans, but abhorrent for machines).
That's not the criticism, that's the straw man used to dodge the criticism. Of course the straw man makes no sense, that's why it gets put up.
Machines aren't doing anything, humans are doing things, with or without machines.
"It's fine to raise the temperature of your surroundings by 0.0001 degrees by exhaling. It's less fine to set a house on fire, and even less fine to ignite a nuke. But aren't the all the same thing? How hypocritical that raising temperature is okay for some but not others???"
That things can change quality with quantity/frequency is trivially obvious, and you can think of many examples. Bad ones, good ones, doesn't matter. The point of OP stands, all that was added was how absolutely brazen the nonsense is getting.
thedevilslawyer 1 days ago [-]
Sorry, but the point stands for you because of you feel about this topic. It does not stand logically.
Ultimately we have to reckon with the fact that there's nothing which is recommended to do X of, but is abhorrent to do 10X of.
customguy 1 days ago [-]
> there's nothing which is recommended to do X of, but is abhorrent to do 10X of.
No we don't, because that's nonsense. You can ask a stranger in the street for the time of day once, and they will react very, very differently if you ask them 10 times in a row. You can drive N miles per hour in a school zone, you cannot drive at 10x the speed, and so on.
thedevilslawyer 10 hours ago [-]
Ok, I see your point. We live with tiny inconveniences, that we would not at 10x.
But I don't see how that relates to copyright or llm at all. 'Learning', at scale, is not an inconvenience, atleast in any forward looking society.
musicale 20 hours ago [-]
> There are no such examples (recommended for humans, but abhorrent for machines)
claiming to be human
dinfinity 2 days ago [-]
> Learning from others at a small scale is not only socially acceptable, but is the foundation of how advancement works.
Exactly, if anything, the logic (a bit bad -> really bad) shows that one person learning from one thing is far inferior to one person learning from every thing (a bit good -> really good).
mark336 2 days ago [-]
>Learning from others at a small scale is not only socially acceptable, but is the foundation of how advancement works.
This is true, shows how human thought differs from AI. AI needs massive datasets to be coherent.
Meph504 2 days ago [-]
How large do you think your some total of all things learned would be as a dataset, we aren't that different in that regard, just in how we amass that dataset and how curated it is.
LunicLynx 2 days ago [-]
Wasn’t his point about plagiarism? That is also not ok on a small scale.
Meph504 2 days ago [-]
I was trying to stick to the example, but I agree, that getting away with something doesn't determine if it is right or wrong. And the whole concept of that makes for shaky ground for any form of legal or ethical argument.
boringg 2 days ago [-]
I think the difference here is that you guys are talking ethics. And in fact what were talking about is enforcement. While its unethical to pick one flower (in it's purest form, robbing the commons of the beauty of a flower), it won't be enforced.
LunicLynx 2 days ago [-]
Fair. AI might also not be the problem, but how it is utilized.
Suddenly everyone and their grandma are specialist at everything and the actual value of understanding is not appreciated anymore.
thedevilslawyer 1 days ago [-]
Ok, what is so special about understanding anyway? we understand way less things than we do no understand.
IMO, we're just giving special weight to understanding just because it gives people wages. Someone's specific brain structure should not privilege them over others. UBI or something equitable on those lines is the answer.
Rekindle8090 2 days ago [-]
[dead]
insane_dreamer 2 days ago [-]
> The problem here is, in your example the small scale example, and the large scale example are both unacceptable behavior.
no, not really, or at the very least they're not at all in the same category of "unacceptable behavior"
Meph504 2 days ago [-]
The argument isn't small crime vs large crime. It is no crime regardless of scale.
If it is acceptable for a person to learn, then it should be acceptable for a machine. And any derived works produced from that information isn't theft or copyright violation.
Though I do think there is a valid gripe with the LLMs being trained on pirated materials. I've also personally learned from a lot of PDF of textbooks I didn't own.
dtn 2 days ago [-]
> If it is acceptable for a person to learn, then it should be acceptable for a machine.
Is there a name for the fallacy when people act like models and algorithms should be granted the same rights as human beings?
Meph504 2 days ago [-]
Is it a fallacy? Can you provide a legal or logical basis for this being treated differently.
lelanthran 2 days ago [-]
Hammers aren't granted rights.
Tools aren't granted rights. Why do we need to make an exemption for AI?
Meph504 2 days ago [-]
Well because no one is attempting to claim that the structures or products produced by using Hammers are plagiarism?
In a sane world, things produced by tools are owned and credited as creations by the users of tools, there are many who seem to argue that isn't the case with AI.
And that some how, that anything produced based on the knowledge it was trained on is some sort of plagiarism or copyright violation of the original source material even when none of that material is present in the end result?
So if we can't just leave it at its a tool, then we have to look at existing frameworks of laws and ethics to make the case of how this should be treated.
dtn 2 days ago [-]
I'll just take my tools (video camera) into a cinema to learn off the latest Hollywood flicks. It's not an accurate 1:1 representation to the original source material, so the output that I've produced from it belongs to me.
Meph504 2 days ago [-]
Are you trying to make a shoestring argument?
Sure you can do that, but because there are several laws against that specific action already, you will be likely face prosecution, and the content (something poorly duplicated, not created) would be seized.
But lets assume, that your camera has an LLM in it, and it trained in this fashion, and you performed this action on countless other films, and then the camera could produce wholly unique and original work that did not have any duplication of the original works it sampled. The work produced would not be a violation of copyright, nor would it be plagiarism.
Just as someone whose education was to watch a large number of movies, and then created their own based on that education.
But as previously mentioned you may face the ramifications of violating the agreement you had for accessing the original source material in an illegal way.
lelanthran 2 days ago [-]
> Well because no one is attempting to claim that the structures or products produced by using Hammers are plagiarism?
Of course they are! Is a video recorder not a tool? No one is claiming rights for video recorders.
Once again, the status quo is that tools do not get rights, the burden is on you to prove why an exemption should be made, not on those who are asking "why should tools get rights?"
Meph504 2 days ago [-]
Actually the burden is to prove that what AI is producing is plagiarism or copyright violation, this isn't about some special right, but that there are many making the case that things produced by AI are duplication of their work.
I'm also not sure where the concept of "the tool" be given a right to anything, That certainly isn't my argument, the right of the work should be to the user/owner used to create things with the tool. There are several pieces in the SFMOMA that use automation to create art, that art is credited to the creator of the machine, not the machine, I see AI in a similar lens.
You are intentionally selecting a device that makes duplicates of things as your comparator, so I can't tell if that is biased or some sort of flaw in your argument.
But an LLM being trained on works, and generating something based off of that training is not a duplication of any specific copyrighted material, and is wholly unique is not duplication.
lelanthran 1 days ago [-]
> But an LLM being trained on works, and generating something based off of that training is not a duplication of any specific copyrighted material, and is wholly unique is not duplication.
Right[1], and humans can do that, no problem - ingesting existing material and recombining them to produce something new (not necessarily unique) is a right that humans are afforded. The question being asked is, since we don't allow that right to any other tools, why does this tool need an exemption?
-------------
[1] Not really (i.e. I don't necessarily agree with this point), but lets assume it for the sake of this discussion.
simianparrot 2 days ago [-]
AI psychosis would be my term. When you attribute to software the characteristics of a human you are in a psychosis.
Meph504 2 days ago [-]
Many things share characteristics with human, we have for decades created methods for systems to emulate and synthesize those characteristics. It is sort of delusional to think that the abilities of humans can't be produced by other systems, it is a severe delusion to think that proposing a machine can do it, is psychosis.
simianparrot 9 hours ago [-]
That’s not the topic. Read the post I replied to. No matter what a piece of software will never be a human.
Meph504 8 hours ago [-]
I don't think anyone in this thread has suggested software are people.
But in the same regard it is very likely at some point the ability to simulate a human mind and persona is a real possibility.
kogus 2 days ago [-]
quantitative changes in an activity produce qualitative changes
Well said!
tyleo 2 days ago [-]
It reminds me of a Stalin* quote: "Quantity has a quality all its own."
* Note that it may be misattributed to him
soerxpso 2 days ago [-]
> It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.
It's not like that, because flowers are a physical object and moving them to one place deprives their original location of the flowers. When an LLM learns something from a webpage, the webpage is still there. Whatever 'theft' you perceive is entirely in your head; you were deprived of nothing by someone else making a copy of your thing.
LunicLynx 2 days ago [-]
This is not true. Because the copy is a devaluation of the original, so even though the web page is still there it’s value has decreased.
jerf 2 days ago [-]
"It's not like that"
That's not the point. The point is that scale matters, and that was the only point.
lenkite 11 hours ago [-]
Can you apply your philosophy to the U.S. dollar ? I am sure producing copies is a "theft" that is entirely in your head. You were deprived of nothing by someone else making a copy of your dollar.
latexr 2 days ago [-]
> Whatever 'theft' you perceive is entirely in your head
Rather, it appears to be in your head, since the person you’re replying to has not mentioned or even hinted at theft. The problem with taking all flowers from a public park for your own profit is multifaceted. Amongst others, you’re depriving everyone else from enjoying them, but also degrading the image of the park and harming all the insects which depend on those flowers and the birds who depend on those insects, which in turn degrades the park further, which stops people from enjoying it and going there and caring for it. It’s not about a single physical object, it’s about the ripple effect the selfish action produces.
abustamam 2 days ago [-]
It's not like that, because flowers are a physical object and moving them to one place deprives their original location of the flowers. When an LLM learns something from a webpage, the webpage is still there. Whatever 'theft' I perceive is entirely in my head; I was deprived of nothing by someone else making a copy of my thing.
ToValueFunfetti 2 days ago [-]
I get that the intention here is to plagiarize and thus cause the parent to feel the harm of it and realize the error in their ways, but I don't think it works. Plagiarism's harm to the plagiaree (?) is that it robs them of credit and payment, but nobody is viewing your reply in isolation of the parent's attribution and parent wasn't expecting to make money off of an HN comment. The harm to the rest of society where you gain false esteem for another's work is also not carried out in this instance. The harm to the plagiarizer where they fail to learn because they copied instead is likewise absent. If someone were to feel harm just from a copy of their words existing, they wouldn't need you to do it- google has hastily indexed this along with every other HN comment and we all know that this whole thread will make its way into LLM training sets eventually.
abustamam 2 days ago [-]
> google has hastily indexed this
Google doesn't claim authorship over that which they index.
Plagiarism doesn't need to be harmful for it to be bad, and my intent wasn't to harm anyone anyway. My intent was that I could use the authors exact words to pretend to make a unique take that I claimed to have authored.
ToValueFunfetti 2 days ago [-]
I don't understand. In what way is plagiarism bad if it doesn't harm? If it were harmless to pretend you authored a unique take, how is the parent expected to react to you not harming them such that they realize it's bad?
abustamam 2 days ago [-]
Harmless doesn't imply ethical. Plagiarism that doesn't harm is still lying.
ToValueFunfetti 2 days ago [-]
Fair enough, shame on me for assuming utilitarianism.
kerkeslager 2 days ago [-]
When the LLM presents what it learned as its own thoughts without any attribution, that's the theft.
And you understand that. You're not stupid. This is the thing: AI is convenient for corporations, so you'll make dishonest arguments to justify your unethical behavior. Maybe you even believe what you say, but that's because people will hold on to any flimsy thing that lets them feel like they're good people, not because the reasoning actually makes any sense.
This is why people talking about AI get booed at speeches. There's no conversation to be had: you're not interested in the truth, or what's right, or what's good for anyone but yourself.
_DeadFred_ 2 days ago [-]
But you're still depriving the world of future flowers. Why spend years studying, sacrificing time with others, living frugally if others can take or monetize the result for free? Most people need compensation to justify their effort. Or the option to not have their years of work/sacrifice co-opted into an ai generated ad for toilet bowl cleaner.
No cost copying doesn't remove the need for compensation to sustain ongoing creation. Society has long treated knowledge, art, and thought as high-value outputs, and accepted the copyright tradeoff to support them. That is long settled and no 'get rid of copyright' proponents argue satisfactorily why the 300 year corpus of thought on that is invalid. Long copyright terms may justify reform but not rejection of the establishment that creative work needs economic value to sustain ongoing creation, and that ongoing creation is a net positive/desirable for society.
You are free to release copyright free today. In software that has unlocked immense value. In other areas those choosing copyright have unlocked more value. But software is different, I can get hired to build on the free. No one is hiring an author to expand their book to include fanfiction. And were that the model, it would arguably result in worse results as we are now back to the much worse patronage system where Bob hordes what he's paid for and only shares it with friends for status. For 300 years we've understood because of dynamics paywalled copyright with a throttled side of libraries unlocks the greatest access to knowledge. Eliminating duplication cost has not changed that.
'but I want every flower there is today and I don't care if there are any future flowers' doesn't change that, it's simply a new value judgement that my want/use case today outweighs the cost to society of lost future knowledge creation/return to a patronage based reward system. Again 300 years of thought say that results in a worse outcome for society. How does the typical OSS project that depends on patronage fare? Do we really want to return all knowledge output to that model?
inetknght 2 days ago [-]
If one person is murdered, that's bad. If a million people are murdered, that's war.
If one word is stolen by AI, that's bad. If a million words are stolen by AI, that's business.
insane_dreamer 2 days ago [-]
more like
If one word is stolen by Joe, that's bad. If a million words are stolen by Meta, that's business.
AI isn't the problem, is corporations using AI that are the problem
bogrollben 2 days ago [-]
this made me oof. well said.
gruez 2 days ago [-]
>If one word is stolen by AI, that's bad. If a million words are stolen by AI, that's business.
Where are all the instances of "one word" being "stolen by AI", and people getting mad over it?
rurp 2 days ago [-]
Yes absolutely, when automation increases the rate of something many orders of magnitude that often is a qualitative difference.
It's weird to me how often on HN of all places I see arguments that can be refuted with "scale matters". I commonly see arguments on all sorts of topics that make the same mistake you're calling out.
therealpygon 1 days ago [-]
I would say the difference there is: yes…you built a machine that “could” pick all the flowers. It did not, however, actually pick any flowers as you suggest. If you take the machine back and use it to pick the flowers, that should be a problem.
I think the problem with these things is that if the same metric and methodology were reversed, it doesn’t look favorably on artists either with such inflammatory framing: “The way the artist learned was to effectively plagiarize every piece of art they viewed, extracting important details in the way light, color, shading, anatomy or otherwise look in order to steal from the other artists, then replicated and combined those things as part of every future work they created, stealing over and over again.”
Handwaving away the small scale seems like it would ignore who has responsibility in the small scale. Metaphorically speaking, who in the small scale is responsible for plagiarism: the person making the paints or the person with the brush who sells them to an unsuspecting public? Point is, in this case, the user is the one holding the brush and trying to pass things off.
To be clear, I don’t really disagree with the fact their copyrights were likely violatedc and they should likely be liable for damages, which is for a court to decide, not me. They should have sourced their data sets properly, certainly, and other companies have. I just think the arguments really need improvement without simply falling back on the tropes, and hopefully it helps make sense why some people will take issue with arguments that others want to simply dismiss as invalid.
keeda 2 days ago [-]
> But quantitative changes in an activity produce qualitative changes.
Interesting take. I think a corollary is that the qualitative changes are in the economics of things. And more than the scale, it is the value of those economic effects that determines how "accepted" that activity becomes.
Take Uber as an example; it basically enabled mass avoidance of taxi regulations, and naturally existing taxi drivers and lawmakers cried foul. But enough people found value in the service and kept using it that gradually and inexorably society and laws adjusted to it.
On the other hand, copyright infringement is an interesting case. While pretty much everyone and their dog pirates content to some extent, the % of people who think it's acceptable to do so is surprisingly small (22% apparently, up from only 14% in 2019). Furthermore the media industry, especially including ads, is a significant % of US GDP. I think those reasons, more than any RIAA/MPAA lobbying, are why copyright laws have remained as stringent as they have.
As such at a social level, I don't think these effects were dismissed, rather they were considered and formally internalized.
I suspect the same thing is happening with AI companies. They get away with devouring and training on the sum of human knowledge largely because existing laws are insufficient to stop them. So stopping this would require new laws but... well, given the early economic impact LLM technology is having my hunch is new laws will be brought in to protect it rather than restrain it.
danaris 2 days ago [-]
> gradually and inexorably society and laws adjusted to it.
But in many places, the ways that society and laws adjusted to it were to make extra clear in their local ordinances that Uber was required to operate as an actual taxi service, or get out.
It's very disingenuous to imply that the public broadly decided Uber was Right, Actually, when both in its case and in that of many of the other gig economy companies, what really happened is that gradually and inexorably, they had to adjust to society and laws.
I followed this evolution peripherally as it happened, because while I appreciated the convenience of Uber, I disliked that it was unfair towards existing taxi drivers who had very onerous requirements like taxi medallions, which, note, never became a requirement for rideshare drivers.
I remember at one point Uber drivers at the airport would ask me to pretend I'm a friend being picked up to avoid trouble with the cops, and then a couple of years later there was a dedicated, official "Uber pickup lane."
My underlying point was that the whole system -- including Uber, incumbents, society and laws -- adapted to a new economic reality.
alexwwang 2 days ago [-]
The era before internet, the gaps among information and knowledges could make money and power.
The era after internet and before LLM, the information and knowledge gaps have been largely leveled theoretically, but the recognition wall stops most of us to understand and make use of them.
The era after LLM, the wall is being destroyed and people should think about how to use these information and knowledge differently to make money and power.
jongjong 2 days ago [-]
This is a great point. I think for coding, the wording of the MIT open source license makes it clear that copying and distributing the software is authorised on a small scale and it's very clear that the act of copying must involve a person.
It provides distribution and modification rights to "any person obtaining a copy of the software" and explicitly requires attribution for any significant parts.
Mass-ingesting the code with a script without any human even reading the licence is a very different kind of copying mechanism and there is no person involved... The contract was bypassed completely. A contract requires consent from both parties to be binding. When ingesting code into the AI training set, nobody even read the license. There was no agreement; neither explicit nor implicit... Because the consumer, a script, never read the contact for that specific project.
There was nobody present when the copying occurred; on neither side! It cannot possibly constitute an agreement between two parties.
kmeisthax 2 days ago [-]
This would be an extremely novel mechanism of copyright litigation and I doubt it would fly in an American court with its' emphasis on highly individualized legal rights and obligations. And, if it did get accepted by the courts, that's halfway to an even crazier argument: that the MIT license only allows individual distribution to known parties; i.e. no hosting the code on a website or seeding it on BitTorrent, because that's not "small scale" and doesn't "involve a person".
2 days ago [-]
jongjong 2 days ago [-]
You can only seed it on BitTorrent if it comes with the license which identifies the original author and acknowledges their copyrights over the code. Also there is definitely an assumption that a human will read the license or at least implicitly consent to the terms before using or modifying the software. When ingested by AI, the author gets zero credit and no consent has taken place between any sentient being on either side of the contract... Or at least none that are legally acknowledged as sentient or having legal rights.
bigbuppo 2 days ago [-]
And the thing is, you point out the easy out on this for similarly licensed code... a giant list of authors and contributors that may have code included in the generated output. It's a win/win for everyone. The original authors get their acknlowdgement, and the AI company gets to bill the users of AI for all the tokens for that multi-gigabyte copyright disclosure file.
Someone 2 days ago [-]
> I think for coding, the wording of the MIT open source license makes it clear that copying and distributing the software is authorised on a small scale and it's very clear that the act of copying must involve a person.
I agree with “must involve a person. https://opensource.org/license/mit starts with (emphasis added) “Permission is hereby granted, free of charge, to any PERSON obtaining a copy of this software and associated documentation files (the “Software”)”.
That means it doesn’t give an LLM any rights. The way I see it, LLMs run (directly or indirectly) by a person can do stuff on their behalf, though, just as your CI pipeline can download and compile MIT-licensed software.
I definitely disagree with the “on a small scale” as the license continues (again, emphasis added) “to deal in the Software WITHOUT RESTRICTION, including WITHOUT LIMITATION the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software”.
jongjong 2 days ago [-]
The CI pipeline is different because for a module to end up as a dependency in the CI pipeline, it had to be explicitly selected by a person first to be included in the package file or manifest. There was intentionality and awareness that the software was included.
A person already pre-consented to the licenses of all the software which the pipeline downloaded. Big companies go through those dependency lists carefully already and remove those which do not meet their policies. This is a very intentional process.
Someone 2 days ago [-]
> for a module to end up as a dependency in the CI pipeline, it had to be explicitly selected by a person first
I disagree. I think it’s entirely within the license to have your pipeline automatically pull in the latest version of a library, even if the new one happens to pull in a new MIT-licensed library (whether that’s a good idea and whether CI pipelines should, somehow, verify that code pulled in has an acceptable license are different discussions)
I also think it’s complete within the MIT license to tell a LLM that it can search for MIT-licensed libraries and use them without asking you.
quantummagic 2 days ago [-]
That's like saying you're not allowed to load the source code into an editor, because it's not a person. Or that you're not allowed to run a global search-replace on the entire code base, because it's a script and not a person.
jongjong 2 days ago [-]
But in this case, a human has awareness of what software they are copying or modifying and that's how the original software author receives credit. The contract requires some degree of human awareness to be valid. This is the critical difference.
quantummagic 2 days ago [-]
Sorry that's nonsense. There's human awareness when ingesting MIT code into an LLM too. In both cases it's a human that says $ excute-global-replace or $ ingest-into-llm
Both operations require some degree of human awareness. What you appear to be saying is, a human can only use a limited algorithm to access this source code, not a sophisticated one. And where do you draw that line? Who should get to say what is too sophisticated?
Error: your algorithm is too sophisticated to proceed, please provide more human awareness, it's a critical difference.
jongjong 2 days ago [-]
If your LLM were to hack into Microsoft and steal the source code from an important project and inject it into your project without you being aware of it; wouldn't that make you liable if you then published it?
Unfortunately there is no way to agree to a license of a software you're using if you didn't read the license or if you're not even aware that you're using the licence. This is what's happening at the training stage.
If you say that awareness doesn't matter then it means you cannot stop AI from stealing any IP open source or not.
I think the main issue with LLMs is that there is no mechanism to stop them from stealing. Thus they are guaranteed to infringe on copyright to some extent.
Also, beyond copying and copyright, there is another problem that LLMs are also infecting the logic and expertise built into the project. This is a completely novel mechanism and needs to be treated as separate under the law. Else it would be the end of all IP.
danaris 2 days ago [-]
> I think the main issue with LLMs is that there is no mechanism to stop them from stealing.
Well, sure there is—for the people running them.
If you're building training data for an LLM, you only use data that a) is firmly in the public domain, or b) you have a clear and documented legal right to use.
themuskgpt2025 2 days ago [-]
Page Manager
Theme Editor
Media Library
SEO Settings
Analytics
Domain Manager
Export Code
Publish Button
Realtime Preview
Undo/Redo
hughw 2 days ago [-]
Honestly that's what's wrong with capitalism and property rights. We can understand what it means to own a thing like a piece of furniture, or a house, and "a person's home is their castle" rings true. But scale that up to individuals controlling resources that affect a neighborhood, a city, a country, or the world -- at each step their army of voters supports their right to own 800 billion dollars or whatever, same as they own their own houses -- it's only fair! And if they want to build a starbase and launch some rockets near your house and sensitive ecology they're just exercising the same rights you or I have, and attack on their ability to inflict damage on the community is an attack on all.
[edit] and the same goes for corporations owning "means of production". It's not the same as owning an iPhone.
DeusExMachina 1 days ago [-]
I think that the problem arises because people equate things that are not the same.
One person learning something is good. At scale, that becomes everyone learning something. That's even better.
Machine learning is not scaling up people learning. It's completely different even if it's called "learning".
As the article argues, it's plagiarism at scale. In that sense, one person plagiarizing content is bad. Everyone plagiarizing at scale by using LLMs is even worse.
timoth3y 2 days ago [-]
> There’s a fallacy that gets used a whole lot to justify things like this ...
Of course it's robbery. I don't think anyone is truly arguing it's not. The issue is that, if we don't do it, China will. Game over.
I'm surprised I hvan't seen more economist scholars exploring this topic; it's a fastincating phenomenon. I've seen folks try and re-visit history and compare what's happening with AI to some historic event--but, we've never seen anything quite like it. As much as history repeats itself; at the forefront of innvotaion it doesn't.
I suspect that there will one day be an AI tax as society tries to reclaim the value of the theft; maybe even UBI of some form. Until then, buy the stocks and ride the theft wave. The economsits are certainly exploring the K shaped economy, and this is why.
proofofcontempt 2 days ago [-]
This argument of "if we don't do it, someone else will" to justify theft is so tiring. The companies doing the stealing are collectively the same ones that have power to prevent it, if they were incentivised to do so.
evenhash 2 days ago [-]
> The issue is that, if we don't do it, China will.
These AI companies aren’t state enterprises. How is geopolitics a justification?
If it were just the military training them, probably no one would care about the copyright infringement angle, it makes sense that the government could ignore those rules for national security.
But Mark Zuckerberg isn’t training his models to protect us from China. He’s doing it to make himself even more ridiculously wealthy.
2 days ago [-]
wang_li 2 days ago [-]
My complaint with your argument is that the word learn means one thing when we are talking about a person learning something from a webpage or book and something completely different when a webpage or book is used to adjust some weights in a matrix. Calling that learning is a distraction from the real copyright violations going on.
gruez 2 days ago [-]
>when we are talking about a person learning something from a webpage or book and something completely different when a webpage or book is used to adjust some weights in a matrix
What material differences exist between the two besides "humans good, computers bad"?
>Calling that learning is a distraction from the real copyright violations going on.
Most courts so far have ruled that it counts as fair use.
mark336 2 days ago [-]
I thought fair use was dead after Napster
gruez 2 days ago [-]
No, only the type of "fair use" that people slap on their youtube uploads, thinking them it gives them a "get out of jail free" card for copyright infringement. Fair use was repeatedly affirmed in the 2010s, eg. https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
panda-giddiness 2 days ago [-]
There is already a term for this, and ironically enough, it's often thrown around in discussions of machine learning; it's called emergence. As scales change, new properties appear, which is why we can describe a Chimpanzee as "swinging between the trees" even though at the level of quantum field theory there is no such thing as trees, Chimpanzees, or swinging.
Likewise, people shouldn't be surprised that as AI compute scales up, new forms of harm can be created, thereby introducing new moral quandaries. It's like comparing GPT-1 against today's frontier models. One is a fun albeit useless toy. The other is effecting categorical changes in the way knowledge work is done. In both cases the underlying technology is the same, but their impacts are totally different.
blks 2 days ago [-]
But also a person is a person, not a commercial product, and usually we learn from sources within their licensing agreements.
rhubarbtree 1 days ago [-]
But it’s not the same though. If I look at a webpage, it’s still there for other people to enjoy. That’s not the same as a flower being picked.
Reasoning by analogy doesn’t work if your analogy isn’t well matched.
rhubarbtree 1 days ago [-]
Let me double down on my downvote.
The analogy proposed here is correct rewritten as:
If one person uses an AI trained across copyrighted data, then that’s ok.
But if everyone uses that AI, it’s not ok.
Which is a bit of an irrelevant point.
micromacrofoot 2 days ago [-]
data brokers lean into this too... you can go to the city hall and get someone's public information pretty easily, that does not mean you should make all of that information available to everyone else all the time from anywhere
crabsand 1 days ago [-]
In other words, size matters.
nate 2 days ago [-]
ugh. yeah. the tragedy of the commons
hungryhobbit 2 days ago [-]
It's funny, the way that term gets used now is actually a wild distortion of the true history.
"The commons" was an incredibly successful system, and medieval (and prior) villages used it to great success, for the entire village's benefit! "Commons" are a great thing for everyone to have!
The real history is that as advances in technology (like the Industrial Revolution) changed things, certain rich villagers were suddenly able to manage more animals than they could before. Those (specific/rich) people over-used the commons, creating the "tragedy" we all know of.
The real lesson of history is not that commons fail: to the contrary, they worked great and helped everyone for centuries! The real lesson is "watch the fuck out for the new rich (especially when they just became rich because of recent technology advancements): those bastard will steal from everyone for their own benefit!"
fsckboy 2 days ago [-]
if it's a fallacy that if it’s OK (or at least negligible on a small scale), then it must be OK on a large scale, then the alternative "it's ok at a small scale but not at a large scale" starts to slide us down a slippery slope... fallacy.
psychoslave 2 days ago [-]
Of course quantity makes emerge it’s own quality. If you kill a single person, you are a murderer, if you genocide "others" and distribute the spoliation wealth to those unscathed you are a national hero. If you steal small material you are a theft and go to prison, if you hog some billions you can enact laws to grab even more.
gruez 2 days ago [-]
>If you kill a single person, you are a murderer, if you genocide "others" and distribute the spoliation wealth to those unscathed you are a national hero.
This is a fundamental misunderstanding of how laws work. It's not the scale that makes it okay, it's that it's done through some official process. Trump's raid to grab Maduro killed less than 100 people. Pretty modest by "genocide" standards, and is easily eclipsed by gang/cartel violence. Yet nobody is going after Trump because he didn't meet some kill quota to get special protection, nor are people condoning cartel violence because they killed far more than Trump.
psychoslave 2 days ago [-]
That's exactly how laws work then.
International Right for those who don't have all the nukes and lobotomized cannon meat bag ready to invade on a whim, and on the other side doing all the crimes and atrocities, straight transgress all legal processes ever invented, and expecting no possible punishment in return.
Number of directly killed people is not something that can be eclipsed by bigger number of killed people. Not in a mind that keeps empathy high in its value.
whattheheckheck 2 days ago [-]
Who is sir Francis drake
8note 2 days ago [-]
In general tech has sat in the opposite paradigm: identify when doing something at a small scale is bad, but at a large scale is not
unauthorized plagiarism on the individual level is bad, at the medium scale is ick, but at the ultragigantic scale is meh.
laundering through an llm takes away the real moral ick from the plagiarism - the lying and building of ego by the person reboxing somebody else's ideas and work.
wffurr 2 days ago [-]
>> the lying and building of ego by the person reboxing somebody else's ideas and work.
Instead the bot lies to people who use its output to boost their ego. Not sure it's really changing the moral calculus here.
mullingitover 2 days ago [-]
> why does a computer making money by learning everything from everyone upset people so? It’s the same thing!
The majority of the population, sitting outside the VC bubble, views AI unfavorably. That's not my hot take, that's a fact from the NYT survey published today.
It's going to be hilarious when VCs, having expropriated the IP of the entire internet, build The Layoff Machine That Does Everything Without Workers, and then the voters decide to just...enthusiastically expropriate that, and we end up with Fully Automated Luxury Communism.
llm_nerd 2 days ago [-]
>The majority of the population, sitting outside the VC bubble, views AI unfavorably.
Sure, where AI means threatens my job or my skills, people view it unfavourably.
But then they use it. They're all using it. People's rhetoric seldom matches their actions.
>enthusiastically expropriate that, and we end up with Fully Automated Luxury Communism
Maybe in other countries, initially, but the US is very firmly a plutocracy, and has a populace that will very happily vote against their own interests because the plutocrat-owned media told them to. And yeah, it is very rapidly approaching the point where there is going to be zero chance of a revolution even if people opened their eyes.
Which is precisely why the US is now threatening other countries as well, because plutocracy is threatened by rational, educated, better managed countries. Canada, for instance, is an example that country doesn't have to revert to being an idiocracy, so it's first in the crosshairs.
mullingitover 2 days ago [-]
> But then they use it. They're all using it. People's rhetoric seldom matches their actions.
I don't see any contradiction. I criticize the hell out of guns and want them strictly controlled, and yet I own one. `¯\_(ツ)_/¯`
People can use AI and still demand that all of society receive the benefits, instead of a small group of oppressors.
danaris 2 days ago [-]
> They're all using it.
[Citation needed]
I know many more people who do not use AI than who use it, and many more who refuse to use AI than people who are enthusiastic about it.
Given your username, you are almost certainly in a bubble—an echo chamber—that makes it seem to you as though "everyone is using it." I recommend getting outside that bubble and talking to non-technical people outside your usual circles, especially people in the arts and humanities.
sethammons 1 days ago [-]
My blue collar buddy in water treatment uses ai to summarize reports and fix up emails. My retired neighbor who "doesn't do technology" was having an ai conversation on a product he was thinking of buying. I ordered through a voice kiosk ai at the drive-through last week. I am surprised how fast it is propagating.
danaris 1 days ago [-]
But, see, this is part of the problem:
Most of the people I hear from who use AI say everyone they know uses AI.
Most of the people I hear from who don't use AI say no one they know uses AI.
It seems to me that we've got competing bubbles here. But the statistics certainly show that, leaving aside whether they use it, most people don't like it or want it.
...I think it's also worth noting that AI usage is likely to be "louder" than AI avoidance in many cases—that is, whichever side of this one falls on, it's easier to detect someone pasting from ChatGPT directly into emails, or complaining that Gemini told them you would sell them XYZ, than it is to detect someone who's just keeping on the way they've always been.
danaris 2 days ago [-]
The problem is, there's an intermediate step required there: the voters will need to get rid of the Republican Party lock, stock, and barrel if they ever want to make a genuinely-socialist move like that. And that's going to be made much, much more difficult by all the measures Trump and his cronies are putting in place to disenfranchise everyone who refuses to bow to him.
Rekindle8090 2 days ago [-]
[dead]
eboy 2 days ago [-]
[dead]
nonethewiser 2 days ago [-]
Can you map this more directly to claims made about AI? It's impossible to agree or disagree with you. You've just given us an analogy - but to what?
dbalatero 2 days ago [-]
Not sure what you're missing here. The other comment replies don't seem to be missing it. See the article, etc.
nonethewiser 2 days ago [-]
What is the claim? A little plagiarism is OK therefore a lot is also OK?
superkuh 2 days ago [-]
No. It's more like,
"You say I can take a photo of one flower in your flowerbed you put next to the public street, but you get upset when I take a bunch of photos of many public flowerbeds. That's both an over-reach and inconsistent."
dvduval 2 days ago [-]
The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.
Ensorceled 2 days ago [-]
Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.
bolangi 2 days ago [-]
Not only costing money. Constant AI scraping constitutes a denial-of-service attack that has brought down websites.
devsda 2 days ago [-]
How do you distinguish Google/MS scraping for Gemini/Copilot vs Google Search/Bing? In the case of Google, the UA is the same and you are entirely at their mercy to honor the Google-Extended instructions in robots.txt
Google has further complicated it with new search announcement blurring lines between regular search and AI search. And AI likes to not honor any licenses or instructions when it is hungry for training material.
It is once again an example of Google using its dominant position to abuse and promote cross functional products.
cute_boi 2 days ago [-]
If company like Meta are downloading pirated books etc.. to train their AI, they will surely honor robots.txt.
johneth 2 days ago [-]
I wouldn't be surprised if there isn't some sort of legal action against Google, the monopoly, to make the distinction in how their crawlers use scraped content.
fiedzia 2 days ago [-]
> At least Google/Bing/Yahoo scraping would then be used to provide links back
That doesn't work anymore. Google provides AI generated summary, nobody looks at the original site.
motbus3 2 days ago [-]
About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.
We found our data in the outputs of their models but who can do anything about it...
kibwen 2 days ago [-]
> We found our data in the outputs of their models but who can do anything about it...
If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.
motbus3 2 days ago [-]
They fake user agent, they throttle or go slow and even try to emulate mouse movements. I am not kidding that they had the audacity of doing 270k pages in a single day and they returned multiple days.
hajile 2 days ago [-]
robots.txt seems like it should be a legally-binding terms of service which would make them outright copyright infringing.
Sue for $180,000 per infringement which should be calculated for each illegal API call.
motbus3 2 days ago [-]
They copy books illegally by scanning the pages and try to hide the whole thing under 3rd party companies. They care 0 about the law. I suspect I know why now.
throw1234567891 2 days ago [-]
Was your robots txt written by a lawyer? Does it hold up in the court?
ElevenLathe 2 days ago [-]
OpenAI might in fact be a good target for stuff like this at the moment. Even if your argument is weak, they may be eager to settle generously if your suit threatens the speediness of their IPO in some way. But I happen to think this is in fact a reasonable argument: I put up a sign that says not to do something with my property, and you went ahead and did it anyway, costing me money. IANAL but seems like a straightforward tort, no?
motbus3 2 days ago [-]
The owner of my company is not litigious guy. He already seek block copied content other times but he won't dive into a battle with people much richer than him.
wang_li 2 days ago [-]
It doesn't matter. Robots.txt is not a license, it's a set of computer parsable directives of how programs should access your site. The actual license doesn't have to be written for computers to parse to be legally binding.
A person should be able to write in a terms of use or license page on their website that says "do not include any content from this website in your AI training data. if you do you will be billed $100 billion dollars." And it should be enforceable. It just turns out that nerds like to say "oh that would be too hard or too expensive, so we're going to ignore it."
hajile 2 days ago [-]
Contracts are legally binding even if they weren't written by a lawyer. Copyright is legally binding even if no copyright claim is explicitly stated.
I looked into this a bit (not a lawyer) and it seems that robots.txt isn't legally binding to either party, but this seems to have two major implications for AI agents (and crawlers/scrapers in general).
First, even if the robots.txt says you can crawl the site, that isn't a copyright grant of any kind or permission to copy/use that data outside of the permissions granted by the TOS.
Second, ignoring the robots.txt while also pirating the site contents could point to bad-faith and makes a much stronger case for double-damage penalties due to willful infringement.
If the site TOS doesn't explicitly grant an AI agent rights to copy out the site content AND the AI agent is ignoring the robots.txt at the same time, it seems a lot more likely that there's a strong copyright infringement case against the agent owner.
throw1234567891 2 days ago [-]
But there is no contract. You assume that they read your robots.txt, they don’t have to. There is no contract.
motbus3 1 days ago [-]
[dead]
ethin 2 days ago [-]
It doesn't have to be written by a lawyer. The robots.txt file is an administrative directive, by the webmaster of the website, that you, being a scraper, MUST NOT go to page x and/or y, or MUST NOT go to directory z. All the law would have to say is that it is a crime to not obey these directives. It's similar to trespassing: if I put a sign that says "DO NOT ENTER" in bright red letters on a door in my apartment, or "authorized people only!", that is still legally binding and a court isn't going to care that it wasn't lawyer-authored. The court will only care that you were told to not enter that area, but did so anyway.
shimman 2 days ago [-]
Why hasn't your company sued OpenAI and try to argue they're violating the computer abuse and fraud act? Would it really be impossible to argue this?
Unauthorized access, system damage, and maybe even extortion all apply here.
rastrojero2000 2 days ago [-]
Lawyers can. As long as that data is actually yours I mean, in a strictly legal sense.
telotortium 2 days ago [-]
I mean, did you check the IPs and make sure they’re from OpenAI? Obviously a fly-by-night AI company is going to set their User Agent to be from a big player.
spacechild1 2 days ago [-]
It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.
These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!
neop1x 10 hours ago [-]
>> put everything behind a login
May not always work. I then click on back button and look for the info elsewhere and in most cases I find it. Same with paywalled websites. If you are ok with a small audience (or you provide a unique content) then it makes sense. But I think in most cases you just cut off a lot of people this way and actually you can simply stop creating content if you don't want consumers of it and let others provide the content.
b00ty4breakfast 2 days ago [-]
>Why look at a website when it's all in AI?
well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.
philipov 2 days ago [-]
remember AMP?
b00ty4breakfast 2 days ago [-]
Holy cow, I never even thought about that in relation to AMP. It's not a new thing, then.
aaarrm 2 days ago [-]
Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?
I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.
Imustaskforhelp 2 days ago [-]
If you really wanted and are interested in doing so and perhaps are even happy with just text and normal styling limitations, I recommend you to test out other protocols like creating a gemini website or gopher website. I don't think that scraping happens on even remotely the same scale there as compared to conventional websites
That being said you would require your user to download a compatible browser for gemini/gopher.
matt_heimer 2 days ago [-]
Sure, depends on how accessibly to people you want it to be.
Most legit search engines are going to honor robots.txt and you can disallow access.
Next level would be using something like rate limiting controls and/or Cloudflare's bot fight mode to start blocking the bad bots. You start to annoy some people here.
Next would be putting the content behind some form of auth.
cute_boi 2 days ago [-]
I don't know why we are trusting cloudflare when they are the one creating crawlers.
Possible yes, probable not likely. The moment you're issued a certificate your domain will be shown in the Certificate Transparency logs which are constantly monitored from anyone who wants to find new sites.
salawat 2 days ago [-]
....Yet another vector through which "security experts" has caused a waterbed problem. Let's secure the Internet, oh no! We made a centralized list of operating domains for hostile actors to guide attacks with!
elorant 2 days ago [-]
Sure, let's hide everything behind obscure schemes which will definitely serve the spirit of openness of the web.
salawat 2 days ago [-]
The point is that you can't escape side-channel applications of security metadata being weaponized the more you try to force ubiquity of "security" everywhere. As long as there are motivated, profit seeking attackers, you have to take into account the toxic nature of metadata. This is another example of "A System Is What It Does" proving the pointlessness of "POSIWID". Intent doesn't matter. Certificate transparency was intended to clue us into bad cert issuing, but it is also a list of potential targets where AI crawlers can be directed to scrape new data. Intent doesn't change what it is. Cert transparency is certainly transparency + a "training data might end end up here" list.
2 days ago [-]
trinari 2 days ago [-]
robots.txt is a way of leaving the door unlocked but kindly asking bots to stay outside.
account42 2 days ago [-]
Which in a law-abiding society should be enough. It's also how we do things in the real world in many cases - i.e. here you can just write on your mailbox "no ads" and companies have to respect that.
Even when we do actually put physical locks on things they are mostly there to show that someone breaking in did so intentionally and not at all designed to prevent motivated attackers.
dpark 2 days ago [-]
> here you can just write on your mailbox "no ads" and companies have to respect that
Where do you live? In the US it’s actually illegal for anyone except the USPS to deliver to a mailbox.
dpark 2 days ago [-]
You might be interested to know that entering an unlocked door into a space you do not have permission to be in is still illegal.
throw1234567891 2 days ago [-]
You might be interested to know that the “illegality” depends on the intent. If I rest on your unlocked door handle, it opens, I enter, it’s an accident.
dpark 2 days ago [-]
Sorry, what? In this scenario are you claiming that you accidentally fell inside the restricted area because you were leaning on the door? Or are you claiming that you accidentally opened the door and then walked through intentionally? In the former case, you are guilty of breaking and entering in most US jurisdictions if you don’t promptly get out. Any sane court would likely agree an accidental trespass is probably not a criminal act, but it’s not an accident if you stay. In the latter case, you’re clearly trespassing illegally.
Also this has gotten pretty far away from the web scraping scenario. There’s no door accidentally opening here.
dminik 2 days ago [-]
Oops, I just accidentally fell into every website. Don't know how that happened ...
daemin 2 days ago [-]
Which works when you live in normal civil times, when you live in jungle times people and robots will do whatever they want and the most powerful will get their way.
MontgomeryPy 2 days ago [-]
You could just put your website content behind its own chat interface. The crawler would just see a form input for a prompt.
gabbagool 2 days ago [-]
I agree with this whole heartedly. What's the point of even having copyright law at this point?
What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.
rickydroll 2 days ago [-]
From my perspective, everybody trains on the knowledge and experience of those who came before. AI just does the same thing at scale.
I do not value copyright. All it does is give you standing to sue if somebody reproduces your work. It does not differentiate or account for parallel creation. I cannot count how many times I have "created" something, only to find it in a research paper later.
Part of the reason I think copyright has no value is that, in general, individual copyright owners don't have the deep pockets necessary to sue someone who violates their copyright. If anyone is violating the spirit of copyright, it's corporations that insist you assign your work over to them as a work for hire, or outright ignore your copyright. (looking at you, Disney's Atlantis).
A significant benefit of AI that doesn't get talked about enough is that AI has a much greater reach over all the information it was trained on and can draw connections that would be invisible to someone operating at the human scale.
ofjcihen 2 days ago [-]
The fact that these companies are making money off of it negates your argument.
visarga 2 days ago [-]
I don't think anyone's "making money" yet. We have a race to build up hardware for AI, and one to train models. There are some profits in there, but who's making money from the work AI performs? Nobody, because any advantage some company claims with AI is quickly replicated by competitors and profit dries up.
Today you can put a coding agent to migrate an existing application to another language (like chardet). Even if you don't have the code, if you can run the app you can still clone it, using it as an oracle for replication. That is why there will be very little profits in AI usage.
ofjcihen 2 days ago [-]
I get what you’re saying but that’s irrelevant to the argument.
They are indeed taking in money by selling the product. Just because they don’t turn a profit doesn’t mean they’re not infringing copyright as a business practice to make money.
throw1234567891 2 days ago [-]
No, you don’t have to. There are open weight models you can download and use for free. Many people choose the subscription model but it’s not necessary. And latest doesn’t mean greatest, it’s just most up-to-date.
wolttam 2 days ago [-]
I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today
microtonal 2 days ago [-]
But that will be a hassle for human visitors as well. A web doing proof-of-work to browse, will be a disaster for phones with their limited batteries, etc.
odo1242 2 days ago [-]
To be specific, it would be more of a hassle for human visitors than for the AI companies with infinite money and specialized browsers.
wolttam 2 days ago [-]
The idea would be that AI companies would still be forced to do this proof of work. Anubis proved the idea
odo1242 2 days ago [-]
I don’t think Anubis proves the idea much though. I feel the main reason it’s worked is that AI companies haven’t yet tried to bypass it.
And AI companies still scrape Anubis protected websites, it just forces them to not DDOS the website
> Although Anubis could be altered to mine cryptocurrency to serve as proof of work, Iaso has rejected this idea: "I don't want to touch cryptocurrency with a 20 foot pole."
Which in my mind is a shame. Crypto is an absolute mess, yes, but this seems like an elegant way to get something back for putting things out there.
dpark 2 days ago [-]
The problem is that much of the cost is borne by humans accessing the sites. People generally get real mad when they find out you’re using their computers to mine crypto.
vitally3643 2 days ago [-]
Mining crypro doesn't materialize money. You have to exchange it for real money which means taking a private individual's money in exchange for scam tokens.
This is the problem crypto fans refuse to acknowledge. The money doesn't magically appear, you're taking it from someone else and letting them hold the bag when whatever cryptocurrency you choose inevitably blows up, fails, or rug-pulls. It's unethical to engage with at all because you're still participating in scamming real money out of private individuals
6031769 2 days ago [-]
Not necessarily. You can spend your cryptocoins with any number of businesses and it is very much the choice of those businesses to accept them or not. No private individuals need be involved.
Note also that any non-crypto currency can also devalue at any moment, although perhaps not to the same extent. Holding anything of any perceived value carries a risk and also a potential reward.
chii 2 days ago [-]
or you know, just charge for your content if you believe it to be valuable enough for the fee being charged.
wolttam 2 days ago [-]
Yes, but that tends to limit the reach of your content. Hence why a lot of people reach for ads.
Between seeing ads and doing a little bit of proof-of-work for the author, I'd choose the latter.
WarmWash 2 days ago [-]
It's never been a problem with people ad-blocking for the last 20 years, why is it suddenly a problem now?
We've been celebrating denying creators revenue for decades...
Maybe this is just the internet hypocricy of "When I do it, it's good, when they do it, it's bad".
omnimus 2 days ago [-]
Total sleight of hand.
Ad blocking has always been a problem for creators but it's aimed at big corps - non-creators. The creators asked people to support them other ways or turn off the blocking. And it's not like the little independent creators wanted this version of commercialized internet in the first place.
The ai marketing teams are spinning everything they can but no AI companies are the conscript, the vultures. No question about it.
WarmWash 2 days ago [-]
The conversion from viewer to donator is around 1%. This is true from wikipedia, to twitch, to podcasts.
The number of people who will not ever load your ads is around 30%.
I can tell you that creators talk about this a lot in private, but will not publicly because the internet has a mass delusion on how creation and compensation works. It's like trying to convince christians that jesus obviously didn't come back from the dead days later, depsite there being no logical system available that would explain it.
If we were to try and map out a functional internet where everyone wins, users and creators, there is no example where ad blocking is anything other net harmful. You either get volunteer net where 0.01% share hobby posts on their own dime for the other 99.9% or you get IRC where 99% of the population doesn't really benefit (ala 1993).
20k 2 days ago [-]
The problem is that the ad vendors couldn't keep it in their pants. The ads you're talking about are a common vector for delivering malware onto people's PCs, and absolutely destroy the usability of sites. Between tracking cookies, popups, full screen banners, autoplaying video, flashing ads, and their unbelievably high weight in bandwidth - the internet is fairly unusable if you don't block any ads
Bear in mind that many basic privacy features destroy ads by breaking tracking and fingerprinting. Its impossible to get a browser in that doesn't filter out behaviours that have been used to deliver ads
Creatives can and have adapted their strategies away from what is a very specific form of ads: the disruptive full screen ads, or banner ads. That's only one form of advertising that everyone utterly detests. Sponsored content is much more popular with the end users, and much more effective as well because its way less disruptive. Some people hate that, but overall the tradeoff is significantly better
We shouldn't confuse a single type of widely blocked advert with all advertising being blocked. Banner ads have very poor efficacy at delivering sales anyway
WarmWash 2 days ago [-]
>The problem is that the ad vendors couldn't keep it in their pants.
You might not know, many people don't, that ad vendors came to the table little over a decade ago to make a truce with Ad Block Plus. ABP and advendors both saw that an "ad supported internet" was unsupported with no ads. So ABP was looking to set terms for what would be deemed as acceptable ads. Creators/service providers get incentive, users get manageable ads.
It didn't matter though because users rioted and uBlock (then uBlock Origin) became king. No compromises there. I mean, what fucking idiot would take some ads when they could take no ads, right?
Even less known is that Google trailed a program where you could pay them directly and they would remove ads from your browsing. This program was about as popular as shit on stick, because again, what fucking idiot would pay for no ads when they simply block all ads for free, right?
There have also been attempts like Brave, where crypto could be used as a micropayment in lieu of ads. But that has also gone nowhere, even if it does have a few snags around centralization.
What I have never seen though, and have zero examples of, is internet users trying to reconcile the situation. It's just a relentless entitlement to free everything, with a small fraction sometimes subscribing, and an even smaller fraction sometimes donating. The users are unquestionably the biggest assholes in this situation. They won't even acknowledge they have a problem.
20k 2 days ago [-]
>You might not know, many people don't, that ad vendors came to the table little over a decade ago to make a truce with Ad Block Plus. ABP and advendors both saw that an "ad supported internet" was unsupported with no ads. So ABP was looking to set terms for what would be deemed as acceptable ads. Creators/service providers get incentive, users get manageable ads.
I'm very aware of this, most ad vendors did not come to a truce with ad-block plus. ABP tried to position itself as the gatekeeper of what ads users were allowed to use (a hugely financially beneficial position for them), and immediately ended up letting through a bunch of terrible ads
It was a nice idea, but it was never going to work. There was simply too much money for the advertisers to make to allow abp to be the gatekeeper of ad content
The nature of ads has gotten significantly more invasive over time, and blocking ads today is a mandatory part of security. Ad companies *do not* have a god given right to track you, or infect your PC with malware
Users rioted because ABP did a terrible job at managing the situation
>What I have never seen though, and have zero examples of, is internet users trying to reconcile the situation. It's just a relentless entitlement to free everything, with a small fraction sometimes subscribing, and an even smaller fraction sometimes donating. The users are unquestionably the biggest assholes in this situation. They won't even acknowledge they have a problem.
As I mentioned in the comment you replied to, there are lots of alternative forms of advertising that users have not revolted against to anywhere near the same degree, eg sponsored content segments in youtube videos
omnimus 2 days ago [-]
So what's your point? AI is justified because users use ad blockers?
The whole situation including the ad system of the internet is made by the same corporations. All of it. They didn't even want paywalled content on the internet because this way they don't have to tell people how much stuff costs and how much it makes. Facebook famously makes so much money on it's users that at some point they were considering paying them.
There shouldn't be any mercy with the mega companies. On the other hand every single person that's being taken advantage now (like anybody whos ever posted anything) should be defended because copyright has failed them.
u_fucking_dork 2 days ago [-]
People usually point at the scale when this discussion comes up, in my experience. These companies are doing something at a huge scale spending tons of money to do it so the potential harm is greater.
People can easily justify their own piracy because it’s small scale. Even when they organize, create a whole software and tooling ecosystem around pirating media to stick into jellyfin or plex. AI still did it bigger and worse and is bad, what I’m doing is not so bad because I wasn’t going to buy the movie anyway, etc.
52-6F-62 2 days ago [-]
Don't forget that the money being spent to do said scraping has, in great sums, come from subsidies paid by taxes from public coffers.
WarmWash 2 days ago [-]
On the whole, about 35% of internet users are ad-blocking. In the tech space it's upwards of 70%.
It's in no way, shape, or form "small scale", and has fundamentally changed the the very nature of the internet for the worse (opinions/views of ad blocking people don't matter).
onedognight 2 days ago [-]
Choosing not to look at something is not denying anyone anything.
WarmWash 2 days ago [-]
Choosing not to look at an ad, and blocking it are different things. One is totally ok, the other incurs a monetary loss on the creator. Those services aren't free to run, and the content doesn't take zero time to create. It also incentivizes creating content focused on those who cannot figure out ad blocking.
vharuck 2 days ago [-]
I use ad blockers on my personal computer and phone to avoid tracking. My work computer doesn't have a blocker, but I only visit "professional" sites and major blog aggregators on it, so those ads aren't egregious. Ad blockers wouldn't have become a thing of it weren't for ads causing terrible layout, poor performance, and annoying interruptions when playing sound. Not every website does it, but the ones that do have poisoned the well.
zetanor 2 days ago [-]
I am in favor of severely limiting both copyright and advertising, but for the benefit of everyone, not just for the benefit of a few "AI" companies.
omnimus 2 days ago [-]
And you will not get it. As the AI pump money into lawyers and politicians - they will be the ones profiting from copyright. Total regulatory capture as US AI companies make it illegal to train AI on their output.
WarmWash 2 days ago [-]
The answer is to simply pay for stuff.
There is no viable model where "have stuff but not pay for it" works out.
theamk 2 days ago [-]
Remember early internet? The time when it actually cost non-trivial amount of money to post stuff on the web, and there was no expectation that webpage authors would get any money back?
This worked pretty well. Websites were hobby - one might spend their money buying comic books, and someone else might spend the money making and hosting their website.
theamk 2 days ago [-]
There is more to life than money.
Many of the websites I read do not collect any appreciable amount of money from ads, or have no ads at all (one example: news.ycombinator.com :) ). They want a recognition, or to share the knowledge, or community, or they are building their brand... And AI is destroying this all - the first result of "zx80" is an AI overview with a link to wikipedia and some youtube videos. If person stops there , they will never get to computinghistory.org.uk link, and won't see any related information about the variants and models.
WarmWash 2 days ago [-]
This website is an ad for Ycombinator. It's in no way, shape, or form a charity place for devs to hang out. It's a feeding ground to lure tech people into a mega VCs pastures.
When you click "news.ycombinator.com" you are clicking on the ad.
:)
BoneShard 2 days ago [-]
If all ads were like that, I would't have an ad blocker.
mixmastamyk 2 days ago [-]
Interesting. I suppose the main difference is that we’re ants compared to an 800 pound gorilla.
qotgalaxy 2 days ago [-]
[dead]
internet2000 2 days ago [-]
Perhaps we should go back to back when the internet was about sharing information you liked, not about credit or making money on "content".
sumeno 2 days ago [-]
Ok, AI companies first then since they are some of the biggest offenders
throw1234567891 2 days ago [-]
You are there today, but some are unhappy that others don’t share the same sentiment.
storus 2 days ago [-]
This is really not so clear cut as "fair use" might cover 99% of all data scrapping; you are not reproducing the originals just use them to estimate probabilistic distribution of tokens in pre-training. You are never going to get the exact book word-for-word using LLMs.
lbrito 2 days ago [-]
>You are never going to get the exact book word-for-word using LLM.
This is pretty much the exact claim of a NYT lawsuit against OpenAI.
"One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."
Yes, LLMs fundamentally operate as a lossy compression scheme for their training data. There's been countless examples of them reproducing their training data with very high accuracy
People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago
scantron4 1 days ago [-]
Ah but can it reply snarkily and close your ticket as a duplicate that is NOT A DUPLICATE? If not it will never recreate the real stack overflow experience.
nonethewiser 2 days ago [-]
Does this actually imply a representation of it has been stored or simply that the model is sort of over-fit?
wat10000 2 days ago [-]
Is there a difference?
nonethewiser 2 days ago [-]
Well yeah, if you're making the claim that it stores a representation of the data in some form.
Does your calculator app store a representation of the answer to 1+2/2*1.1 and all other combinations of inputs or does it determine the answer from a set of rules?
wat10000 2 days ago [-]
It's a different case when the input contains more information than the output.
If you put "1+2/2 x 1.1" into a calculator and it spit out a verbatim copy of a New York Times article, does it necessarily contain a representation, or does it just contain some really extensive rules? I'd argue those rules necessarily are a representation of that information, given that it contains far more information than provided by the input.
SoftTalker 2 days ago [-]
When I was in school, writing "in my own words" was never an excuse to not cite a source. It was actually something that took me a little while to understand, it's the source of the information that needs to be cited, and that's not limited to literal quotations of someone else's writing.
Salgat 2 days ago [-]
That's more an argument for why you can't just use LLMs as a source of truth. Conveniently, LLMs like ChatGPT do often cite their sources, especially if you prompt them to.
jaccola 2 days ago [-]
Maybe a nit: LLMs do not and cannot cite their sources (at least scraped sources for the purpose of training)
It’s kind of the harness that is doing the citing (or providing the context for the model to).
But an LLM sans search can reproduce some copyrighted work with minor variations and there’s no way to know exactly where it came from.
pera 2 days ago [-]
> You are never going to get the exact book word-for-word using LLMs
You could say the same about MP3 encoders but I don't think that would convince any judge
You can get it to reproduce content but it’s a game of cat and mouse. Were it not for the alignment to avoid direct reproduction it would taken far more often.
> RECAP consistently outperforms all other methods; as an illustration, it extracted ≈3,000 passages from the first "Harry Potter" book with Claude-3.7, compared to the 75 passages identified by the best baseline.
TimTheTinker 2 days ago [-]
Try prompting Claude to create a drop-in replacement for an existing library, testing against that library's test suite to validate functionality.
It will pretty much plagiarize the library verbatim from memory, sans comments.
TheOtherHobbes 2 days ago [-]
This confuses input and output.
A copy made for the purposes of training is still a copy.
Even if you throw the text away after training, you've still made a copy.
ajam1507 2 days ago [-]
In Bartz v. Anthropic the judge ruled that Anthropic making a digital copy of a printed book and then discarding the physical book was not infringing when used to a train a model.
rkozik1989 2 days ago [-]
Come up with obscure topic that has few relevant results, post about to Reddit on your profile page, wait a few hours and then query Gemini/ChatGPT about that exact thing and tell me you still feel this way.
2 days ago [-]
underlipton 2 days ago [-]
Fair use was built around human limitations. The mass scraping campaigns done by the AI giants were clearly an overreach in spirit, if not letter. Most people's intuition is that these massive operations that are valued in the trillions can't have been drawn from some untapped common resource, and they're correct. Someone, somewhere is not being properly compensated.
I have no problem with taxing AI companies so that their profit is marginal, or forcing them to provide compute for free. That seems like the correct balance of what they're harvesting from the "commons" (which is really just the totality of private IP that was exposed to their crawlers).
Ekaros 2 days ago [-]
Fair use is the balance between creators and those that in someway use the content. Somehow it has become excuse not to compensate the creators in anyway. To me AI training part really looks something that should be treated separate and thus give the creators compensation when their works are used.
Now how much and should it be based on revenue from output is open discussion. And it might also be that there is no fair model to pay them. Which means that well too bad for LLMs...
1 days ago [-]
underlipton 1 days ago [-]
The nature of how LLMs work makes it impossible to connect a derivative work to its source data in the training. However, the weights couldn't exist without that training data - the works of the creators were used during training - and the entity making money off the use of that training data is primarily the LLM platform owners. So they should pay.
We are trying to avoid another situation where "resource wealth" goes uncompensated, producers remain poor while processors, marketers, and merchants reap all the benefit. Unless your aim is something else, in which case you should state it.
mplanchard 2 days ago [-]
I don’t buy this argument. The tokens are useless without their context, which provides the probability distributions needed to make them useful. Sure you MIGHT not be able to get the book word for word, but it’s impossible to make a useful model without the whole book and all of the artistry that went into it, to guide the tokens in their expected output.
Fair use generally does not cover commercial use, which this clearly is, and is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
Vvector 2 days ago [-]
"Commercial Use" is only one part of the four prongs of the fair use test. For example, commercial Parody is generally considered Fair Use. Look at Space Balls, which is a direct transformation from Star Wars.
This is all new territory. We don't have court-settled law yet.
samatman 2 days ago [-]
It's more complicated than that. Quite a bit more.
Commercial use counts _against_ a fair use defense, but is not dispositive: it's not accurate at all to say it "generally does not cover" commercial use. This is the "purpose and character" test, one of four in contemporary (United States) fair use doctrine.
Purpose and character also includes the degree to which a use is _transformative_. It's clear that the degree to which a training run mulching texts "transforms" them is very high. This counts toward a fair use finding for purpose and character.
> is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
The "amount and substantiality" test. Your case for "all of it" can't possibly be sustained: the models aren't big enough. It's amount _and_ substantiality: this has come up in the publication of concordances, where a relatively large amount of a copyrighted work appears, but it's chopped up and ordered in a way which is no longer substantially the same. Courts have ruled that this kind of text is fair use, pretty consistently. It's not an LLM, of course, but those have yet to be ruled on.
Also worth knowing that courts have never accepted reading or studying a work as incorporation, and are unlikely to change course on the question. It's taken for granted that anyone is allowed to read a copyrighted work in as much detail as they wish, in the course of producing another one. Model training isn't reading either, but the question is to what degree it resembles study. I'd say, more than not.
Specifically:
> it’s impossible to make a useful model without the whole book and all of the artistry that went into it
Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
"Effect upon the work's value" is probably the most interesting one. For some things, extreme, for others, negligible. I suspect this is the one courts are going to spend the most time on as all of these questions are litigated.
Ultimately, model training is highly out-of-distribution for the common law questions involving fair use. It was not anticipated by statute, to put it mildly. The best solution to that kind of dilemma is more statute, and we'll probably see that, but, I don't think you'll be happy with the result, given what I'm replying to. Just a guess on my part.
mplanchard 2 days ago [-]
It is of course true that it is unsettled law, and that fair use is more complicated than my offhand comment suggested.
> Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
This I think misses the thrust of my argument, though. Its hard to find an exact human analogy, because neither the technology nor the scale at which it operates is remotely human.
I see it less as “writing his biography without reading the plaintiff’s” and it’s more “using the same style and metaphors to make thousands of copies of very similar biographies, with certain bits tweaked,” like turning an existing work into mad lib.
I don’t know how the courts will eventually rule on it, but it certainly feels like theft to me.
samatman 2 days ago [-]
It's fascinating how intuitions differ. To me, it doesn't feel like theft at all. For one thing, theft is depriving another of something, and has therefore never been a good metaphor for infringement; hackers used to be the most insistent about this principle, and it's weird to see a doctrine which was cooked up in a literal AI lab get thrown out the window for literal AI.
But pretending you said "infringement", for me it comes all the way back to the Constitution: "To promote the Progress of Science and useful Arts". I cannot possibly twist the development of large language models into something which violates the spirit of that purpose. I don't see how anyone can.
Your point about the scale is valid, and the alienness of it, sure. But you haven't made the case that the vastness of the scale should affect the conclusion.
Something I left out in the first post is that copyright is meant to protect expression, and not ideas: this is the deciding factor in the 'nature of the copyrighted work' test for fair use. More expression, more protection: more ideas, less.
I think the visual arts have a strong case that image generators directly infringe expression: I'm not convinced that authors do, and I think software should never have been protected under copyright because the ideas-to-expression ratio is all wrong for the legal structure. There's clearly no scale case to be made for ideas: "but what if it's _all_ the ideas" fails, because the ideas are not protected at all. Nor should they be, that's what patents are for, and why patents are very different from copyright.
LLMs are remarkably good at 'the facts of the matter', hallucination not withstanding. They're very poor at authorial 'voice transfer', something image generators are far too good at. It's when I start asking myself "well what even _is_ this 'expression' thing anyway?" that I conclude that we're out over our skis on the LLMs-and-IP question: precedent can't tell us enough, and that leaves legislation.
tancop 2 days ago [-]
if theres just one good thing coming out of ai its breaking copyright law forever. no one should be able to "own" ideas. royalties for commercial use is another thing and i support it but what we know as (non commercial) piracy and unlicensed fan art should be 100% legal
kibwen 2 days ago [-]
Then go ahead and abolish copyright for everyone. Instead we're stuck in an even worse system where the hypercorporations gleefully plagiarize everyone else while sending SWAT teams to kill anyone who pirates a movie.
Salgat 2 days ago [-]
Obviously there's an ideal middle ground, but what LLMs do is allow free transfer of knowledge while still (mostly) preserving the protections that copyright should be protecting. For example, I can have an LLM give me the entire plot of a book (which is fine), but it won't spit out an exact copy of the book.
nailer 2 days ago [-]
I can get an LLM to spit out an exact copy of my python-docx library.
2 days ago [-]
rkozik1989 2 days ago [-]
Jesus is just an uncopyrighted Mickey Mouse if you have no morals. People have been abusing that fact for a long time and have made some pretty abhorrent products.
kube-system 2 days ago [-]
Copyright specifically doesn't and never did protect "ideas", it protects expression.
olivierestsage 2 days ago [-]
The problem is that something like (say) a song is much more than an idea. It’s an idea + work (arrangement, production, performance, etc. depending on the situation). The argument for owning work, at least for X number of years in a limited way (vs. our current system), seems reasonable to me.
pas 9 hours ago [-]
the real problem is that the duration of the protection is both too long, and that the same protection is afforded to very small fragments of works too (for example to samples of songs)
of course coming up with a more fair fair use system would help, and automatic royalty revenue sharing, and so on. but of course it's very hard to find a one size fits all ruleset, and also somehow dividing up creative spheres is itself bound to lead to nasty boundary dilemmas.
caconym_ 2 days ago [-]
I wonder how many of the books I love would still have been written in a world where somebody could scoop them all up and post them on the internet for free (and run ads).
_aavaa_ 2 days ago [-]
I wonder how many would be written if copyright was only 20 years instead of more than a century? To the point that most people will never be legally allowed to directly build off of the culture they grew up in.
Lord of the rings will be under copyright til roughly 2050. I think Tolkien's estate has gotten more than enough money from that book and it's time to let other use the word hobbit without the threat of a lawsuit.
caconym_ 2 days ago [-]
> I wonder how many would be written if copyright was only 20 years instead of more than a century?
I expect it would not move the needle much. I support reduced copyright periods, though not in the specific way you do. But that's not what we're talking about here, is it? The comment I replied to seemed to be advocating for total
abolition of copyright law, and my comment is written to be interpreted in that context.
> To the point that most people will never be legally allowed to directly build off of the culture they grew up in.
What specifically are you talking about? Every author borrows from what came before. Copyright law doesn't even enter the picture in the vast majority of cases, because you generally don't have to copy to "build off of the culture [you] grew up in".
_aavaa_ 2 days ago [-]
For what it’s worth I think abolishing copyright wouldn’t have as big of an impact on art production as you do. Most artists (e.g. musicians or authors) aren’t struggling because their art is popular but copied by others (or lack of copyright). But because nobody listens to or reads their work.
Even before AI more people tried to be an author/musician than could ever hope to gain even financial success. I don’t think less copyright will dissuade them.
> every author borrows
Borrows yes. But that has changed drastically in the last 100 years because of what has become the copyright system.
I’ll be long dead and gone before people can make and publish their own LOTR, or Star Wars, or whatever franchise they grew up with. Disney would be impossible to start given the current regulations, all those tales would be locked up, and we would all be worse for it.
caconym_ 2 days ago [-]
I guess you feel strongly that fan fiction and similar derivative works ought to be monetizable? I guess I really just don't care about that. It hasn't stopped huge numbers of amazing authors from doing their thing, and I don't think it's a good reason to [partially] abolish copyright except in a very specific and limited scope.
_aavaa_ 2 days ago [-]
Yes I do, in part because the difference between fan fiction and fiction, is that one has the blessing of the copyright holder while the other doesn't.
Disney turning common folk tales (the culture of the day) into movies is not considered fan fiction because there was no monopoly on who could tell those stories, and how.
If lack of copyright for fan fiction and derivative work hasn't stopped good fan fiction authors from doing good work, then I don't think that we will lose much if the newest Marvel movie or franchise reboot also can't be copyrighted.
> I don't think it's a good reason to [partially] abolish copyright except in a very specific and limited scope.
I don't see a good reason for keeping it though. Copyright isn't why artists are being paid pennies for their work.
caconym_ 2 days ago [-]
> Yes I do, in part because the difference between fan fiction and fiction, is that one has the blessing of the copyright holder while the other doesn't.
This is a really odd thing to say. You can just go write your own fiction, right now. You can invent your own original characters and setting and plot and go write it. You will automatically own the copyright to your own work; there is no other party who must "bless" your efforts.
I have nothing against fan fiction, but it's an edge case.
> If lack of copyright for fan fiction and derivative work hasn't stopped good fan fiction authors from doing good work, then I don't think that we will lose much if the newest Marvel movie or franchise reboot also can't be copyrighted.
I mean, I don't think we will lose much if the latter doesn't exist. I think I have made it clear that my specific concern is for individual artists who hold the rights to their work, not purveyors of commodity slop. But, since you mentioned it, what effect do you think abolishment of copyright will have on the production of films that are actually good? Who will finance them when it's impossible to directly monetize them? If anything I think commodity slop will be the only thing that gets funded anymore, since it probably synergizes best with massive distribution platforms and hundred million dollar multi-media marketing blitzes. Everyone else can go the Neil Breen route.
> I don't see a good reason for keeping it though. Copyright isn't why artists are being paid pennies for their work.
Yeah, you're right. No artists are relying on royalties and similar payments for their work. I'm sure none of them will complain if we take all that away.
_aavaa_ 1 days ago [-]
> You can just go write your own fiction, right now. You can invent your own original characters and setting and plot and go write it. You will automatically own the copyright to your own work; there is no other party who must "bless" your efforts.
I keep going back to the old-school Disney example because it's easiest to see: Disney did not create Snow White, Bambi, Robin Hood, or Peter Pan. All of those movies are highly influential and core to Disney and the culture of people growing up with them. And they're all fan fiction, or would be considered as such, and be impossible to produce and monetize if Disney had to live with the same copyright restrictions they impose on the rest of us.
If I want to now go and recreate my own movie based on one of the original texts, I think it would be next to impossible since the threat of lawsuit (even if I use none of their IP and would eventually win) would make financing impossible.
Fan fiction has been turned into an edge case by the current copyright system. Putting your own spin on the stories you grew up with used to be the norm.
> my specific concern is for individual artists who hold the rights to their work
To a large degree individual artists do not hold copyright for their work, they often sign it away (especially musicians and authors) in exchange for signing, advances, and distribution.
> what effect do you think abolishment of copyright will have on the production of films that are actually good? Who will finance them when it's impossible to directly monetize them?
I think they will still be financed. Take books, I don't think bookstores will want to vertically integrate from book discovery through printing and retail stores. Consumers will still need ways to identify reputable book publishers to limit what they purchase next.
> I think commodity slop will be the only thing that gets funded anymore
One could argue that this is what has always dominated funding. Most revenue and shows have been for artistically devoid pieces of media (especially in movies).
> No artists are relying on royalties and similar payments for their work.
The 0.00001$ per stream for musicians? Or the 1$ residual checks for reruns?
caconym_ 1 days ago [-]
> Disney did not create Snow White, Bambi, Robin Hood, or Peter Pan.
I believe I stated above that I support reducing copyright periods (to the lifetime of the original author would be appropriate IMO, if the copyright is held by an individual, and I would be open to a more aggressive schedule for corporate copyrights). AFAIK all of Disney's adaptations of these stories would be allowed under that rule; some of these original stories are centuries old. But no, I don't think Disney should be able to immediately adapt a book I've written and not give me a cent out of the billions they will make off the adaptation. I would sell more books that way, sure---except I actually wouldn't, because in that world I have also lost the ability to monetize my work. So it's more accurate to say that somebody else would sell more of my books, or that I would give away more of my books.
And yes, it's more appropriate to call these adaptations. Fan fiction is more in the vein of original stories using (somebody else's) established characters and settings.
> To a large degree individual artists do not hold copyright for their work, they often sign it away (especially musicians and authors) in exchange for signing, advances, and distribution.
"To a large degree" is obviously meaningless, but a good author's agent will retain your core copyright and other rights (e.g. film adaptation, publishing/distribution in other countries, etc.).
> I think they will still be financed. Take books, I don't think bookstores will want to vertically integrate from book discovery through printing and retail stores. Consumers will still need ways to identify reputable book publishers to limit what they purchase next.
You are conflating production and distribution. If there is no copyright, the second a single copy of a work becomes available it will be scraped and offered by every distribution platform in the business, who are all free to curate their "storefronts" however they please. The difference is that they don't have to pay a cent for production, royalties, or anything else.
As an example, say I publish a new short story on my Patreon, which I use to support my writing---the idea being that if people want to read my shorts they have to pay for access. In this new regime, that newly posted story is going to appear on Amazon and every other big platform within hours, for cheaper than my Patreon membership or even free. And if I am an established name, there is no reason Amazon can't put my book front and center in their KDP feeds, etc.
The same goes for any other publishing model. The author and publisher (if applicable) immediately lose all ability to get a return on their investment, except to the extent that they can organically attract people to the correct listing on the correct distribution platform, which will have to be price-competitive with other listings.
It's the same story for paper books, too. B&N can just print copies of my book and display it front and center in their stores, without even asking me, and certainly without paying me anything.
And the same goes for other types of media. Why wouldn't it? This is why I say the commodity slop is all that will be left---that kind of IP synergizes best with the massive marketing efforts and platform consolidation that will be required to recoup your investments in content. Not much might even change in that world.
> The 0.00001$ per stream for musicians? Or the 1$ residual checks for reruns?
There is always going to be a long tail, and there are always going to be great artists who go unrecognized and unrewarded. It's also true that monolithic modern platforms like Spotify are going to leverage their position as gatekeepers to squeeze artists as far as possible. But it's ignorant (or possibly disingenuous, and anyway categorically incorrect) to claim that the above means nobody is getting paid substantial amounts for their work via these mechanisms. I suggest you seek out the authors of some of your favorite recent novels (if you read) and ask them whether losing royalties would have a substantial impact on their finances and ability to keep writing.
Snafuh 2 days ago [-]
Simple piraciy is not even the worst possible outcome.
Without copyright, nothing stops one from simply selling a book under their own name.
Big publishers could just reprint anything and get it into brick & mortar stores. No money for authors.
Advocating for absolutely no copyright is wild.
Ekaros 2 days ago [-]
I feel not that many. Or at least many successful authors would struggle lot more if after launch of new book next week anyone could be selling poorly made cheaper copy in stores.
And most likely ones doing that would be your biggest companies say Amazon.
nearbuy 2 days ago [-]
People have been pirating books online for 20 years and in that time the number of books published per year has increased 15-fold. A number of my favorites have been released in that time.
caconym_ 2 days ago [-]
Piracy is illegal and most people don't do it.
In a world without copyright, I can stand up a slick 100% legal website (and apps, etc) and distribute electronic copies of every single book (or whatever) straight to normies' phones, and I am free to monetize this scheme however I want.
nearbuy 2 days ago [-]
You're underestimating how easy and common piracy is. You can get books, movies, or music with just a search, for free, with no consequences. It's generally socially accepted. This report tracked 216 billion visits to piracy websites in 2024: https://www.muso.com/blog/what-216-billion-visits-to-piracy-...
Music piracy is down just because services like Spotify let you listen to any song (for free with ads or with a subscription) and it's more convenient than pirating.
> I wonder how many of the books I love would still have been written in a world where somebody could scoop them all up and post them on the internet for free (and run ads).
Legal or not, this is exactly what happened. The piracy sites run ads and/or ask for donations.
I don't know which of your favorite books would have still been written without copyright. But I can say with confidence that the massive increase in the number of books per year over the past two decades would have happened regardless of copyright. It's been driven by lowering the barrier to entry for self-publishing, and only a very small fraction of them earn a living.
A surprisingly large fraction of my favorite books from the past two decades were published for free online by the author (e.g. Andy Weir's book).
caconym_ 2 days ago [-]
What percentage of books read by Americans in 2025 (or choose another relatively recent year) were pirated?
nearbuy 2 days ago [-]
No one knows. You're asking about uncaught crimes with no victim who can report it.
What data makes you think it's low?
caconym_ 1 days ago [-]
> What data makes you think it's low?
Observations of fellow readers, conversations with self- and traditionally-published authors, and some knowledge of the market?
But what is low, anyway? For the sake of argument I could believe 10, 20, even 30% of all the books people read are pirated. I would be surprised if it was higher, but let's just say hypothetically it's 50%. I think that's a reasonable conservative estimate. So, in this scenario, the remaining 50% of reads can in principle be monetized by their respective authors.
Abolition of copyright will drive that monetizable share essentially to 0%, for reasons I've outlined elsewhere in this thread.^[1]. I consider that meaningful, and I have personally had conversations with published authors who state that the royalties they receive are financially significant, which is why I'm here in this thread taking the position that I'm taking.
> Abolition of copyright will drive that monetizable share essentially to 0%
I'm in favour of copyright, though I think 70 years after the death of the author is so long it's silly. Even your grandchildren will have died of old age before your copyright ends.
caconym_ 2 hours ago [-]
I think copyrights held by individuals on intellectual property they created themselves should expire when they die, maybe with some minimum period of a decade or two to cover cases where royalties etc. could support next of kin. For copyrights held by corporations, or that have otherwise changed hands for money, I'd support a greatly reduced term, maybe on the order of 20 years?
It's strange to think of something like Star Wars being in the public domain, and the effects that might have on our cultural and media landscapes, but if you step back it feels even stranger that something intangible yet so culturally important can be continuously bought and sold and exploited by people who had nothing to do with its creation (almost 50 years ago).
In that sense I probably have a lot of common ground with the "abolish copyright" people, but I feel that most of them are champing at the bit to throw the baby out with the bathwater without having any skin in the game themselves. (sorry for the idiom overload there)
nearbuy 34 minutes ago [-]
Star Wars almost feels a bit like it's public domain already. They've been pretty liberal with licensing their IP so we ended up with a huge number of Star Wars books, comics, games, LEGO sets, merchandise, etc. At the limit, there isn't much practical difference between a public domain work and an IP that will grant anyone a license for a very reasonable fee.
If Star Wars were public domain due to shorter copyright, the newer works and characters would still be protected. Another film studio could make a new movie based off the original trilogy, taking things in a different direction than the new movies. I'm not sure this is likely though, just like no one is rushing to make 3rd party Mickey Mouse cartoons since it entered the public domain. It probably changes things a lot less than copyright proponents worry about.
Even with books, which are much cheaper to produce than movies, the original author would probably capture most of the money from their works under shorter copyright (e.g. 25 year copyright). If you like a series from a particular author, you want new books from that author. You're not going to read A Game of Thrones and then continue with a sequel written by someone else. And as long as the author keeps writing, they're expanding the canonical world in their series with freshly copyrighted IP, and fans will primarily want new works that build on that.
And if an author writes a sequel so bad that fans abandon the series and someone else writes a better sequel that fans flock to... well, the world is better off. Even the original author may be better off if it improves the popularity of the series.
nashashmi 2 days ago [-]
The worthwhile ones would still be written. Even if they are not enjoyable. The dissemination of ideas from an activist perspective is uninhibitable
caconym_ 2 days ago [-]
> The worthwhile ones would still be written.
Citation needed, as well as your precise definition of "worthwhile".
> Even if they are not enjoyable.
Huh?
> The dissemination of ideas from an activist perspective is uninhabitable
Yes, I understand that anti-copyright activists want to abolish copyright.
runarberg 2 days ago [-]
You are arguing in theoreticals, so you should not be surprised if your answers are hypotheticals.
In reality most art is done because the artist has something to say, and the money they get from it is only motivating in as much as it enables the artist to do more art. So I would guess in a world without copyright protection we would just find other ways to pay artists and a very similar amount of art would be produced.
You can see an example of this e.g. in Iceland where the market is way to small for art aimed at the domestic market to make enough money solely by selling it (possible with music; rare with books; not possible with movies). Instead the state has an extensive “artist salary“ program, which pays artist regardless of how well the art they produce sells. Unsurprisingly Iceland produces a lot of art and has many working artists.
caconym_ 2 days ago [-]
Cool. Let me know when the government is willing to pay me to write full time---I would
love to quit my job and do that instead. I think it's a great idea!
nashashmi 2 days ago [-]
Farenheit 451 is a book with the same theme.
caconym_ 2 days ago [-]
No, I don't really think it is.
vaylian 2 days ago [-]
The biggest problem is not the broken commercialization, but the broken attribution. People should be recognized, when they create art. Art is an important way of how we humans express ourselves.
scotty79 2 days ago [-]
If you penalize and stigmatize copying, you get broken attribution.
krackers 2 days ago [-]
It's not going to break, it'll selectively bend to the gravitational pull of wealth like it always has. You won't be able to use anna's archive to "download" that out of publication book, but companies will happily train on all that data and charge a subscription for you to prompt out a summarized version of it.
gspr 2 days ago [-]
So if you pour your heart and soul into writing a novel over the course of years, and it becomes modestly successful earning you a little money in return for your sweat, I should be allowed to just copy it, give it away for free (hell, even say I wrote it – it's not as if it's even yours to own in your world)?
DharmaPolice 2 days ago [-]
Yes.
gspr 2 days ago [-]
What do you think this will do to the production of novels going forward?
Bombthecat 2 days ago [-]
Yeah, I think we are at the point where copyright doesn't exist anymore, at least for AI
hectdev 2 days ago [-]
All of human knowledge (an exaggeration, I know) at our finger tips. It's the most punk rock, anarchist thing tech has done since the internet and it's funny it's shaped as a product.
ux266478 2 days ago [-]
I think the most punk rock, anarchist thing that could happen is someone leverages the shitty, pre-digested consumer-facing models to orchestrate a cybersecurity incident where the frontier base models are stolen and freely distributed to the public.
hectdev 2 days ago [-]
The way the Chinese have been running inference against the US models is somewhat what you are saying.
ses1984 2 days ago [-]
If you get the impression of punk and anarchy, it's only because you're not looking any deeper than the veneer. Underneath, it's nothing like punk or anarchy.
hectdev 2 days ago [-]
I'm considering the dispersement of tech. 3D printers disrupt needing to buy widgets from big companies and local llms disrupt needing to buy generalize software when you can make your own bespoke. AI will live on long after the big corporations burn out their money coffers.
piloto_ciego 2 days ago [-]
This is what boggles my freaking mind, it's so cool that this is happening, and most of the people I thought were the cool anarcho-punks are falling on the side of copyright and more capitalism-colonizing the space of ideas. It's crazy!
People cannot even envision a world that's not this transactional thing and it's really sad. In the post-scarcity world it's going to be really hard to reprogram these people. Wasn't there a Star Trek episode about this with a cryonics guy?
account42 2 days ago [-]
Sure, a few mega-corporations of the scale to upset entire markets owning all information and renting it out as they see fit is very punk. A cyberpunk dystopia specifically.
hectdev 2 days ago [-]
If you consider the local llm scene which is closing the gaps, mega corporations become less possessive of all information.
jaccola 2 days ago [-]
What? If I want to read Harry Potter or watch The Matrix an AI cannot produce something equally as good for me. So I need to pay those people, or break the law.
For lots of online knowledge/blogs I guess it is true but even here I often read explainer blogs because AI casts everything in a certain narrative/tone that isn’t always appropriate.
cortesoft 2 days ago [-]
> If I want to read Harry Potter or watch The Matrix an AI cannot produce something equally as good for me.
Yet
gspr 2 days ago [-]
This is insane. How will any intellectual or artistic work be sustainable in this world?
As a teenager I used to proclaim that "you can't own bits, maaaan" all the time. I've since grown up. Intellectual property is essential to safeguarding intellectual work. I'm not saying this out of greed – I'm a vocal advocate for the free software movement. It, too, relies on a semi-sane framework of intellectual property. So do Hollywood studios. So do the makers of AI (well, since they're not actually sustainable at all currently, I guess you can say they don't rely on anything).
Bombthecat 2 days ago [-]
That's the neat part, you won't.
groundzeros2015 2 days ago [-]
The alternative to strong property rights and norms is secrecy and enforcement.
gspr 2 days ago [-]
This is a strictly worse world in almost every sense. It's as if we abolished physical property rights and suggested people arm themselves to keep what is (was) theirs instead. Civilization, gone.
beering 2 days ago [-]
It’s a false equivalence to say that intellectual property is property. Taking your car deprives you of your car. Taking your idea lets civilization advance.
gspr 2 days ago [-]
> Taking your car deprives you of your car. Taking your idea lets civilization advance.
Copyright is at the heart of the matter here, so let's focus on that. Copyright does not protect ideas.
Wanna rephrase so that we stay on topic?
2 days ago [-]
groundzeros2015 2 days ago [-]
No. It means people don’t invest in things they can’t control or keep secret.
gagan2020 2 days ago [-]
Can we do that for Medical field?
Like if we know formulation of drug then drug (+ any smaller modification - through AI) could be new formulation. That will break current Medical patent system.
alok-g 2 days ago [-]
>> if theres just one good thing coming out of ai its breaking copyright law forever. no one should be able to "own" ideas.
>> Can we do that for Medical field?
Note: IANAL.
Well, if we do that (i.e., no one can own ideas), then the patent system is gone in its entirety, including for medical. I do not think it is straightforward to isolate just medical. AFAIK, software was isolated in some regions, however, workarounds showed up.
The more important question here is if AI is allowed to be a (solo or contributing) inventor. There have been judgments on the same in some jurisdictions, however, AFAIK, this is still an open topic.
Now that AI is coming up with mathematical proofs of advanced statements, there should be no doubt that AI, capability-wise, can make inventions like humans do (comparing outcome, not the process). However, just like for copyright, a broader framework is needed to answer whether the legal thinking accepts AI's output as "inventions" (that can pass criteria for patentability) before we can say "AI can make inventions".
jaccola 2 days ago [-]
This is how the drug industry already works. I don’t think there’s any evidence “AI” (LLM) is capable of producing valid drug modifications.
gagan2020 2 days ago [-]
In current status AI models cannot do that. But, if they do then it will break Medical Patent model.
Ekaros 2 days ago [-]
The value in Medical patents is not the idea. It is the process of proving efficacy and safety. Which are the expensive parts. And I doubt we will trust AI with those any time soon. We grand Medical Patents because proving things is expensive and that process needs to be encouraged.
deaton 2 days ago [-]
This is an incredibly naive view of intellectual property. If you cannot own things you create, there is little incentive to create and share those things. Do you think any of your favorite movies and TV shows ever get made without copyright protections? Of course not, because money needs to change hands for those things to be funded.
StableAlkyne 2 days ago [-]
> If you cannot own things you create, there is little incentive to create and share those things
How do you explain the creative works of writing, music, and art that existed in the millennia of human history between the Mesopotamians and the Enlightenment era?
bjt 2 days ago [-]
They tended to be solo productions, or sponsored by aristocratic patrons. Anyone suggesting that we could create movies, TV, music, or games on the scale we do today, without copyright, does not seem worth taking seriously.
joquarky 2 days ago [-]
I wonder which is more valuable: commercial movies, TV, music, and games or intellectual freedom?
Terr_ 2 days ago [-]
I support copyright reform, but that history has a large portion of "get lucky while sucking-up to the local rich dudes for a patron", which... isn't ideal either.
jaccola 2 days ago [-]
Copying was prohibitively expensive.
StableAlkyne 2 days ago [-]
The original statement was about there being little incentive to create a work you don't "own"
Difficulty in copying is irrelevant to owning it.
Moreover, this does not address music or spoken word. A pre-copyright musician can just listen to a piece and play it in the next town over. A poet or storyteller can just memorize a work and retell it.
alok-g 2 days ago [-]
>> previous comment said "Copying was prohibitively expensive."
I think this statement does have important truth value in it! Copying books used to be done by hand (someone writing manually). Then printing press came, which lead to problems. And that is when copyright concept and law was created!
PS: IANAL and nor a historian. Just sharing my current understanding.
2 days ago [-]
ux266478 2 days ago [-]
Is it the pursuit accumulating capital (incentive to profit) or merely to fund something? You switch from the former to the latter. Why do you believe that profit is reliant on copyright? Piracy is so widespread that copyright may as well not exist (in the context of the consumption of media) outside of moralizing rhetoric, and yet insane profits are made all the same.
I cannot at all relate to being so devoid of passions in all categories but the accumulation of capital. If we are to justify copyright and the concept of intellectual property writ large, then as far as I can see its only real usecase is in defending against precisely the people who are possessed by an obsession with capital, those dragons who merely care to see their hoard grow larger. Unfortunately, that's not how these systems are structured in our society. The transferability of intellectual property all but warps the idea into something that instead empowers those it should disarm.
deaton 2 days ago [-]
Accumulation of capital is the engine by which this stuff runs. You aren't going to get a staff of full-time writers, actors, set designers, costume designers, composers, and editors to create something great off of passion alone. These creatives may love what they do but at the end of the day they need to eat. The promise of future returns are why works like movies and tv shows receive the massive funding necessary to be produced.
marssaxman 2 days ago [-]
Yes, absolutely, and that is why history shows so few examples of any art having been created prior to the invention of copyright: nobody had any reason to do it.
dmitrygr 2 days ago [-]
Prior to the invention of copyright, it was not very cheap or easy to make a faithful copy of something. Books had to be type set by hand, before the printing press they had to be copied by hand. Photography of good enough quality to reproduce a painting is very very recent. So is ability to record a play well enough to enjoy it like you are there later.
2 days ago [-]
foobar1726 2 days ago [-]
You should check out this thing called open source software
koonsolo 2 days ago [-]
You should check out this thing called GPL that is the standard license of open source projects like Linux, and heavily depends on copyright laws.
Or are you suggesting open source software is public domain?
4chandaily 2 days ago [-]
You may want to review your history. The GPL is copyleft -it only exists to subvert copyright law by using it against itself in a sort of intellectual legal judo. If "IP" laws were not as they were, there would be no need for the GPL. Software would be Free.
Even if companies didn't have copyright protection on their source code, that doesn't mean they'd post it all on the internet for anybody to freely download.
4chandaily 2 days ago [-]
No, not all of them, but some companies, many organizations, and plenty of individuals would.
Not everything has to be done for a profit. Plenty of us make software, art, and technology because we find it fun and interesting to work on, and because we want to live in a world that is richer for it.
Removing draconian intellectual property laws that mostly only benefit the giant corporations that lobbied for them isn't going to stop me from doing so, and I doubt it would stop many others.
mitthrowaway2 2 days ago [-]
Ok, but copyright law already doesn't stop anyone from putting things into the public domain.
4chandaily 2 days ago [-]
No, but it does give them an incentive not to.
koonsolo 2 days ago [-]
You probably mean you want to take free advantage of what others create, and you offer nothing in return.
But maybe you will pleasantly surprise us and show what kind of valuable thing you create and offer for free.
4chandaily 2 days ago [-]
I only mean what I said. Anything else you infer is your own bias.
I don't know why you are taking such a hostile position towards someone you have never interacted with, but you are welcome to believe what you will. I don't feel any need to prove or justify my actions to Internet strangers. I've participated in the FL/OSS software movement long enough that I still put the FL/ in front of the name.
I don't sell my thoughts, they are freely given. If everyone behaved this way, there would be no need for copyright (or copyleft). I choose to engage the world in the way I wish it to be.
koonsolo 2 days ago [-]
You are not a developer so you don't understand you can compile to a binary without revealing your sources?
No copyright -> No GPL -> anyone can release their own close source version of open source software.
Why do you think GPL was create in the first place? We always had public domain you know.
4chandaily 2 days ago [-]
My compilers work just fine? Perhaps I'm not sure what your point is.
koonsolo 2 days ago [-]
My point is that you are unable to understand the difference between GPL and public domain.
4chandaily 2 days ago [-]
Okay.
0xffff2 2 days ago [-]
A key component of the GPL is the requirement that source of code of programs that use the GPL code be made available. Without IP laws, how would you achieve that goal of the GPL?
4chandaily 2 days ago [-]
I mostly addressed this in a sibling comment, but I wanted to add that if copyright wasn't preventing companies from copying and building upon the works of others, I find it likely that the industry would be more free and competitive.
Source code is a recipe. You can't copyright recipes by themselves, but that hasn't caused any sort of chilling effect in the food and hospitality industries.
I agree with you that removing copyright protections breaks the GPL. What I think most responses to my comment miss is that we wouldnt NEED the GPL without copyright. Copyleft only exists so that copyright cannot be used by companies against users.
I know Stallman isnt the most popular on this forum, but history has sorta proven he was right, time after time.
joquarky 2 days ago [-]
Does this matter anymore when everything can be reverse engineered?
bachmeier 2 days ago [-]
> You should check out this thing called open source software
Open source actually demonstrates that copyright serves a purpose. There are still customers for non-open software, even when open alternatives exist, so the ability to monetize brings new offerings to the economy.
deaton 2 days ago [-]
Open source software is unique in that it takes little to no capital investment to create. People post free art too. It doesn't mean that Game of Thrones didn't cost anything to produce.
cortesoft 2 days ago [-]
Writing books and creating music also takes no capital investment
deaton 2 days ago [-]
And people do do those things out of passion, and many of them are happy to share it so you can listen to it for free. That doesn't mean that they shouldn't own the right to control what happens to what they made.
cortesoft 2 days ago [-]
Sure, but you said that open source software is unique because it doesn't take capital. It isn't unique, as demonstrated by the two other examples (out of many) that I posted.
Whether someone should own the right to control is a separate issue. Your previous response made it seem like the lack of capital requirement was the distinction, but that doesn't seem to be the case.
You argued that if you didn't own the copyright, there would be no incentive for creating and sharing work. Someone said that open source software shows that you can have creative work without needing to maintain ownership. You then said that was only applicable to software.
It clearly isn't, because of my examples.
joquarky 2 days ago [-]
Unfortunately, you're fighting an endless battle.
Copyright maximalists always move the goalposts when you pin them down.
joquarky 2 days ago [-]
> That doesn't mean that they shouldn't own the right to control
What value system grants the right to control what you make?
Outside human culture, where does nature exhibit this value?
nehal3m 2 days ago [-]
This is naive in the opposite. Creators gonna create.
Jtarii 2 days ago [-]
Who is giving a creator millions of dollars to create something if there is no guaranteed path to recouping production costs.
Are we going the communist soviet union route where everything is decided by central committee?
nehal3m 2 days ago [-]
That is not the only scale to create on. Also, Linux is free. There’s more than one way to make something available.
koonsolo 2 days ago [-]
Linux is clearly not public domain as it has a GPL license. And GPL heavily depends on copyright laws.
Jtarii 2 days ago [-]
Just a fundamental disagreement then. I want to live in the world that created The Lord of the Rings.
rexpop 16 hours ago [-]
[dead]
epicide 2 days ago [-]
Capitalists who capitalize on creative outlets need capital to incentivize them to do so. It's basically circular.
Those of us who create for creation's sake need no other reason. I create because I want to, not because I want to use it to gain capital.
Sure, those lines get muddy when you want to do it professionally, but that's a separate argument.
Jtarii 2 days ago [-]
>Those of us who create for creation's sake need no other reason. I create because I want to, not because I want to use it to gain capital.
How do you create without capital? To make a film you need a camera crew, a sound crew, set designers, caterers, a director, scriptwriters. A world without professional creatives is so much poorer than the world we already have. Why would you give it up just for some vague notion of ideological purity.
epicide 2 days ago [-]
You absolutely do not need a camera crew, a sound crew, set designers, and caterers to make a film. You need a director and scriptwriters, but those can be the same person. Do many film sets have all those? Absolutely. But one can still make a film without them. Some of the best films ever created were mostly the product of one person with a budget less than half that of the average car.
Would you be able to create big-budget movies without said big budget? Of course not. I obviously like some of those too, but who's to say that the larger budget made them better? It feels like you're conflating art creation with art business, but they are not the same thing.
Jtarii 2 days ago [-]
I suppose you are okay with all animated films being impossible to create then.
>I obviously like some of those too, but who's to say that the larger budget made them better?
If you legitimately believe something like 2001: A Space Odyssey would be as good with a budget of $10,000 then that just seems delusional.
The world you want is one in which the only people who can create things are people who are wealthy by other means, there is no pathway for a talented but poor kid to go from making home movies to working on films without IP laws. They must abandon their dreams and go work in the coal mines or whatever. It is dystopian.
I want the most amount of people possible to be able to work as professional creatives because it enriches my life and the lives of everyone in the country I live in.
8note 2 days ago [-]
> I suppose you are okay with all animated films being impossible to create then.
i quite enjoyed watching some animations made on a $10 budget over winter. www.giraffest.ca
that and everything the NFB puts together.
Art is worth putting government money into
Jtarii 2 days ago [-]
>i quite enjoyed watching some animations made on a $10 budget over winter. www.giraffest.ca
Sure, if you want to discount the thousands of hours (and dollars) that they spent to get good enough to make those things. People are willing to spent time and money getting good at animation because there is a career pathway for them.
Also there is a fundamental difference between a short experimental art film and a 90+ minute narrative feature film.
deaton 2 days ago [-]
Exactly, it is the difference between creating as a hobby and creating as a profession. The latter is only possible when there are IP protections in place to ensure compensation.
jonathanstrange 2 days ago [-]
The point is that without copyright you can' do it professionally. Someone will just sell whatever you created for you and you will not get a cent from it.
modriano 2 days ago [-]
Creators can only create as long as they can sustain the costs of creating (including opportunity cost).
enraged_camel 2 days ago [-]
>> If you cannot own things you create, there is little incentive to create and share those things.
You do realize people created and shared things long before copyright became a thing, right?
Jtarii 2 days ago [-]
Can you explain how something like the Lord of the Rings film series gets created in a world with no IP laws.
seandoe 2 days ago [-]
Many versions are made, the best ones get the most views. You don't need huge budgets and guaranteed revenue to make great art. In fact, I'd argue it's often the opposite. Most big budget movies suck these days.
Jtarii 2 days ago [-]
Where is the money coming from? Who is financing the production?
2 days ago [-]
rexpop 2 days ago [-]
[dead]
0rganize 2 days ago [-]
lol, never going to happen. I remember when the RIAA was successfully able to shake down tens of thousands of individuals for pirating music in the 2000s.
If you’re a pleb, stealing copyrighted materials will get you some nasty fines, lawsuits and criminal charges. If you’re a megacorp with unlimited buckets of cash, then there is no accountability.
runarberg 2 days ago [-]
I think you may be too optimistic about the state of affairs under capitalism. Very rarely do things change which don't benefit the owning class without direct action from the working class that puts adequate pressure on the rich, i.e actions which threatens their profits.
pluc 2 days ago [-]
Seriously how is this surprising? We all know AI companies stole troves of data to train their models, why do you think they'll stop? Have they faced consequences for the mass theft of copyrighted data?
You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?
exploderate 2 days ago [-]
That data is not stolen. It's still there.
DamnInteresting 2 days ago [-]
I've had my writings directly plagiarized by other people--people who made word-for-word copies of my work, replaced my name with their own, and made a tidy profit on it. They profit more than I ever managed, because they have more resources. In the aftermath, my writings are still "there", not "stolen" in the physical world sense, but my ability to make a living is damaged, and the plagiarism is deeply unethical.
LLMs and "AI" are just one small step removed from straight-up plagiarism. They are massive moral injury[1] machines.
Are the livelihoods of the original creators still there?
CivBase 2 days ago [-]
> You can't steal or profit off of that data, but it's fine for them for whatever reason.
The reason is quite simple. When Microsoft steals YOUR work, GDP go up. When YOU steal Microsoft's work, GDP go down. And the people who create and enforce our laws want GDP to go up. To these people morality and rights are a thin guise that can be conveniently discarded when it's invonvenient for them.
sixothree 2 days ago [-]
> why do you think they'll stop
Because the sources are now polluted with AI. That's at least one reason they stop scraping.
stronglikedan 2 days ago [-]
> it's fine for them for whatever reason
the reason is crony capitalism. I wish I knew what the fix was
stackedinserter 2 days ago [-]
[flagged]
badlibrarian 2 days ago [-]
I paid tuition. The library bought its books. The theater sold me a ticket. Money changed hands every step, which is the part your analogy skips.
drstewart 2 days ago [-]
Where did money change hands when you looked at a random image on DeviantArt and got inspired and made a similar image yourself?
badlibrarian 2 days ago [-]
Most artists considered it a one to one exchange. They appreciated attribution and were flattered to inspire people. Some got gigs. Some got laid. The money flowed to DeviantArt, hosting providers, and ad providers. The artists were okay with this. They were the ones paying.
Then DeviantArt built a tool to automate the "make a similar image yourself" part and here we are. It removed all the fun parts: the personal contact, the attribution, the inspiration.
Artists realized they unwittingly contributed to the death of not only the community, but the art form they love. Lawsuits pending.
analog8374 2 days ago [-]
Seriously. I recall a thousand hours of movies. Those memories sit in my head and I pay no royalties
pluc 2 days ago [-]
Put what you recall on paper, turn it into a screenplay. Let me know how quickly you get sued.
jimmaswell 2 days ago [-]
Good artists copy, great artists steal.
badlibrarian 2 days ago [-]
Trillion dollar companies license.
IcyWindows 2 days ago [-]
One could argue most screenplays are derivative.
analog8374 2 days ago [-]
I heard somewhere there's like eight basic plots or something. and everything else is just an elaboration on that
badlibrarian 2 days ago [-]
Hollywood has extraordinarily well-defined controls for keeping things legal and everyone in the chain compensated. Plus a separate Oscars category for it.
badlibrarian 2 days ago [-]
True, they live in your head rent free. But if you produce a derivative work, you have to pay.
skrebbel 2 days ago [-]
Everytime something gets posted on HN about a bad or unfair state of affairs, some cynical nihilist posts “doh why r u surprised” and I’m sick and tired of it. These comments aren’t insightful, helpful or thought-provoking. You’re just helping a bad situation stay bad.
mikestew 2 days ago [-]
My only imagined motivation for such posts is, “Look at me, I’m not surprised by this due to my superior intellect, why are you surprised?”
“No one is surprised, jackass, it’s just adults having a conversation about the current state of affairs.”
Yes, it’s tiring and rarely contributes positively to the conversation.
breck 2 days ago [-]
[dead]
MontyCarloHall 2 days ago [-]
Did You Say “Intellectual Property”? It's a Seductive Mirage. [0]
Just so long as it's just a seductive mirage to the Oracles, Microsofts, Metas, and Googles as well as your friendly neighbourhood unpaid overworked open-source developer.
Open weight model trained with no attribution on all of Oracle's internal repos. It's only fair.
MontyCarloHall 1 days ago [-]
Or that the notion of "Imaginary Property" was a seductive mirage to copyleft folks, many of whom have since realized that they do, in fact, value intellectual property rights in the wake of AI companies' free (as in freedom) use of other people's IP.
I stand somewhere in the middle: while our IP laws are far too restrictive, the folly of abolishing IP altogether has been effectively laid bare by AI companies.
whattheheckheck 2 days ago [-]
And on all aristocrat communications and private meetings. Tis fair to keep algorithmic tabs on the most dangerous of society
kstenerud 2 days ago [-]
> their article contains links to my actual website, with the exact link text (?!)
I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?
NDlurker 2 days ago [-]
Right, that's quoting and citing a source.
420official 2 days ago [-]
Sometimes links take the form of `.../post/{id}/{extra-text}` where `extra-text` is not used at all to match the post. Amazon links are (used to be?) this way where the product name is added to the end of the link but can be removed or changed and still will route to the product. Maybe the author is surprised the LLM is providing the irrelevant portion of the link verbatim.
joshred 2 days ago [-]
I think they probably had the section header link back to their webpage, or something similar to that. This is not a well-written rant.
jp_sc 2 days ago [-]
I think he's saying he uses his website's URL in his tutorial examples, and other tutorials have copied them as-is
some_furry 2 days ago [-]
Imagine you have two web pages.
One is a recipe for apple fritters, and the other is an informal ranking of apples by flavor.
Let's say your apple fritter recipe links to your apple ranking list.
Later, you discover someone copied your apple fritter recipe without credit, but it still links to your apple ranking list, using the same wording as your recipe. They're getting more Google SERP juice and ad revenue than yours, despite stealing your article.
Do you see the problem?
ggillas 2 days ago [-]
IP attorney here and actively working on this problem.
nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.
Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
codexb 2 days ago [-]
Anthropic didn't lose because they scraped (read) copyrighted works. They lost because they distributed copyrighted works directly via torrents. Those aren't the same.
ajam1507 2 days ago [-]
That was Meta. The judge ruled in the Anthropic case that they infringed because they downloaded pirated copies of books that they could have otherwise purchased legally, and for retaining copies of those books as a central library.
sosuke 2 days ago [-]
I'll bite. I have always been told copyright is inherit. Does it cost money to file a copyright? Do I need to do it for each blog post? For each gist? I'll totally setup some scripts to make it happen if it what actually needs doing to have the copyright I expected.
Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse
RedNifre 2 days ago [-]
I think it depends on the country. In Germany, everything you write is automatically copyrighted, unless you explicitly waive it. In the US, it's the other way around, you have to explicitly state that you want copyright (can somebody confirm this?).
I'm not a lawyer, but I guess a German posting on Hacker News effectively waives their copyright by sending their comment to the US, where an US company then publishes the comment on a US server.
ggillas 2 days ago [-]
You do have inherent copyright whenever you post, but it puts the burden on you to prove damages (or how much financial harm you suffered from one LLMs piracy alone). Filing fees are $65 for online registration and they allow you to claim atty fees and statutory damages. Statutory damages can range between $700-$150k USD per LLM because you registered it.
So yes, set up some scripts, you can go back 90 days from when you file (you get a grace period). Also if you're publishing frequently to a blog, repo, or newsletter, you can save cost by filing each article under a group registration. Ping me if you need help.
stronglikedan 2 days ago [-]
Doesn't the mere act of publishing your original content online grant you copyright?
Kye 2 days ago [-]
Statutory damages require registration.
mort96 2 days ago [-]
Wait what do you mean by "file a copyright"? I have never heard of this, all explanations of copyright I have heard say that you automatically own the copyright to the things you make; and that "all rights are reserved" by default unless you give up on them through granting a license. Is this no longer the case? Why is this now suddenly different? When did it change?
ggillas 2 days ago [-]
I hear this a lot! What's suddenly different for the web is the volume of scraping. And that fact that the sum of that scraping is building companies with trillion dollar valuations.
There are tens of millions of registered copyrights in the US, nearly every published book, music, artwork, many magazines and major websites. Here's the official link, you can search the registry and there is a ton of info: https://www.copyright.gov/registration/
lubujackson 2 days ago [-]
Briefly, there is default copyright and registered copyright. Registering works grants stronger protections (i.e. bigger fines if broken).
potsandpans 2 days ago [-]
The only thing worst than a mega corp is an ip attorney.
Your cause is already lost.
Good luck enforcing whatever frivolous lawsuits you have cooking up against open weights Chinese models that anyone with newer graphics card can crank out inference on.
indigodaddy 2 days ago [-]
No one will ever do this, or definitely not enough people will, so what's Plan B?
necovek 2 days ago [-]
Bigger portion of the payout for those that do?
2 days ago [-]
pull_my_finger 2 days ago [-]
[dead]
chrisbrandow 2 days ago [-]
I think what gets conflated are two aspects.
1. LLM/transformer technology is legitimately amazing and revolutionary.
2. In the end, they function as an enormous, effective database for most human knowledge.
Point 1 obscures the fact that if someone just created an SQL database with every digital artifact in existence and provided it for free upon request, there would be no ambiguity whether that was legal or not.
But distillation, etc obscures this relationship and it looks like something other than straight lookup, at least in part because it is obviously more than that.
2 days ago [-]
tracerbulletx 2 days ago [-]
Whether or not its technically copyright infringement isn't the main issue I have. Its mostly that it concentrates the ability to collect rent from all of the content in the world into the hands of the few corporations who can build data centers at scale. This is a huge problem. Why would I make a webpage, a news site, an online magazine, or create art commercially if it can be swept up into these models and cut me out of any incentive? If its not legally copyright infringement now we need a new legal framework around it because its an absolute tragedy for human creativity and small enterprise.
ip26 2 days ago [-]
We went through exactly this with Google. People argued that once they were the only way anyone found websites, they were merely collecting undue economic rent.
adamzwasserman 2 days ago [-]
People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.
saghm 2 days ago [-]
When did the last original thought happen then? Clearly thoughts must have been original at some point, or there wouldn't be any at all
dmoose 2 days ago [-]
When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.
adamzwasserman 2 days ago [-]
Hi dmoose, your handle looks familiar to me. The non-glib answer is that we should giver some very serious consideration to the possibility that language either functions like, or possibly is the same as, Jung's collective unconscious: the organically created repository of all of humankind's cognition and reason, accumulated over vasts periods of time, deposited by billions of humans.
My way of "giving this serious attention" is through pre-registered, falsifiable, repeatable, experimentation, which anyone can look up on osf.io because I use my real name. I'll bet you that non of the randos in this thread do as much.
To all of the randos: unless you have data... it is just an opinion.
dmoose 2 days ago [-]
> unless you have data... it is just an opinion
Glib as well, but this one hits home a lot harder. Well said.
saghm 2 days ago [-]
I don't disagree with your premise, but I'd argue that saying "there are no original ideas" in the context of a discussion of plagiarism is needlessly reductive. Even though I think I mostly agree with the author here, I think there are legitimate counterarguments that can be made; equating all of the ways someone can cite or build upon an idea with copying something word-for-word and claiming it's your own is not one of them though.
adamzwasserman 2 days ago [-]
No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).
The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.
In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.
saghm 2 days ago [-]
> No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).
> The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.
I'm not offended. I do think it's a little weird that you seem to think "training on a bunch of stuff that includes a set of words" and then "predicting" those words exactly is somehow okay because theoretically it might be extrapolating the exact same words from combining other ones. I'd argue that if a model trains on data, and then reproduces exactly a large subset of that data, the bar should be pretty high to prove that it's not copying, and "you don't understand because you didn't implement this" is not a good basis for law.
> In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.
I'm not convinced you have a firm grip on the idea that no matter how smart you may be, "just trust me bro" is a pretty terrible strategy if you're actually intending to convince anyone of anything. If that's not what your goal is here, it's not clear why it's worth your time to respond to other people's comments when you clearly have so many other productive ways to spend your time.
2 days ago [-]
adamzwasserman 2 days ago [-]
You seem to discount the possibility that ideas are emergent and as they emerge, multiple people at once become aware of them.
I am asserting it is Charles Fort's "steam engine time". Far from a crank position. It is one that bears serious consideration.
saghm 2 days ago [-]
No, I'm saying that simultaneous discovery and plagiarism are not philosophically incompatible, and treating them as equivalent is hard to take seriously.
codexb 2 days ago [-]
Did those original thoughts not build upon all the original thoughts that came before them?
saghm 2 days ago [-]
Is my house a copy of the dirt it's on top of? Did the people who built my house build the dirt? There's a difference between "building upon" an idea and trying to claim you built the idea itself
Jtarii 2 days ago [-]
Sure they build upon them, you still need to add your 1% of original insight. There was a first person to realise that you could make fire by rubbing two sticks together.
dooglius 2 days ago [-]
Technically one of {Newton, Leibniz} was first, but you're missing GP's point
saghm 2 days ago [-]
No, I think I just find it reductive. The fact that some ideas are independently thought by multiple people does not feel like a compelling argument for normalizing copying someone else's work verbatim and trying to pass it off as your own.
nonethewiser 2 days ago [-]
Im not sure where I land - is it just information compiled like humans do at scale or is it different? But I am sympathetic to your idea.
I think it points to an interesting trend either way. People are less tolerant of machines. Failures of machines are reviled because of their nature, even when the overall problem compared to humans is less. For example, self driving cars. If self driving cars halve traffic deaths from reckless driving but it occasionally mows over a family of four in broad daylight for no apparent reason, society will overwhelmingly reject the technology.
Basically, I dont think people will ever be satisfied even if we prove "its just doing the same thing we are." It's going to be held to a higher standard.
adamzwasserman 2 days ago [-]
[dead]
LatencyKills 2 days ago [-]
Having an original thought is in no way related to breaking copyright laws.
I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.
IcyWindows 2 days ago [-]
I'm trained on protected works. Do I need to pay royalties?
kube-system 2 days ago [-]
If you produce them verbatim or in significant enough portions, yes.
LatencyKills 2 days ago [-]
> I'm trained on protected works.
That someone, at some point, paid for.
I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
I'm not anti-AI. I'd just like to see companies play by the rules everyone else has to follow.
echoangle 2 days ago [-]
> I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
Because training isn't redistribution.
You can also listen to the song and make a new one that sounds similar, just like the AI can.
LatencyKills 2 days ago [-]
To do that training, you must first obtain the item with the content you require. Did OpenAI purchase a copy of every book they trained their models on?
Answer: They did not. That is literally why there are dozens of ongoing lawsuits in progress.
echoangle 2 days ago [-]
For songs, it's not that hard to legally get access to it, I think. I'm not sure if Spotify can legally prevent you from using songs for AI training for example.
JimDabell 2 days ago [-]
> I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
Because when you say you are “using” the song, what you mean is that you are distributing copies of the song, which is protected by copyright.
When AI companies train on the song, the model is learning from it. Outside of the rare cases of memorisation, this is not distributing copies and so copyright doesn’t have any say in the matter.
Learning isn’t copying, so copyright doesn’t get involved at all.
LatencyKills 2 days ago [-]
I appreciate your comment, but you answered as if this question had been answered legally. It has not.
The New York Times is suing both OpenAI and Microsoft for copyright infringement. The Authors Guild is suing OpenAI. Getty Images is suing Stability AI. Disney is suing Midjourney. Universal Music Group and Sony have filed suits against multiple AI companies.
> so copyright doesn’t get involved at all.
The dozens of ongoing cases that discredit that statement.
JimDabell 2 days ago [-]
Which statement of mine do you think is not settled law? Which law do you think is being broken and how?
Your objection doesn’t make sense. In the event that an AI company loses a lawsuit for copyright infringement based on simply training on copyrighted works, the answer to you saying you’d like to understand why they can do it and you can’t is simply “your premise is wrong; neither of you can”.
LatencyKills 2 days ago [-]
> Which statement of mine do you think is not settled law?
I object to your statement that "copyright doesn’t get involved at all" when that is objectively untrue. If that was true, many of the world's largest companies wouldn't be spending tens of millions of dollars to have that question answered in court. Go to any law-focused forum, and you will find attorneys arguing over these questions.
To train a model using a book, you must first obtain a copy of that book. Did OpenAI purchase a copy of every book not already in the public domain used during training? They did not.
Some of the suits I mentioned claim that OpenAI literally stole copies of books to train its models.
My point is that the copyright question has not been answered. If the NYT, et. al. win, it will be a watershed moment for how AI companies pay for training data moving forward.
CamperBob2 2 days ago [-]
I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
You're right, it's an unjust situation. And you may note that no one else besides the AI companies has made any progress at all towards changing it.
Copyright will soon die, having outlived its usefulness to society. Whether the knife is held by someone named Stallman or someone named Altman is of little consequence.
adamzwasserman 2 days ago [-]
I submit most of the replies to my original reply as proof that there are no original thoughts.
nonethewiser 2 days ago [-]
Should we hold machines to the same standards as humans though?
brazzy 2 days ago [-]
OK, and the AI labs are open sourcing their frontier models since those are not original either. Right? RIGHT?
kelseyfrog 2 days ago [-]
Why post comments then?
stronglikedan 2 days ago [-]
same reason we do anything else - sweet, sweet dopamine
voidfunc 2 days ago [-]
For funsies
nicman23 2 days ago [-]
Why post comments then?
cafebabbe 2 days ago [-]
Because some thoughts can, actually, be original ? Or relatively original enough ? Or simply, pertinent and timely ?
2 days ago [-]
analog8374 2 days ago [-]
to bring attention to certain ideas
krystalgamer 2 days ago [-]
reiteration is still important
throw4847285 2 days ago [-]
I've noticed that AI has caused this narrative to become more popular. "Nothing is original anyway, so why bother?" That's pure cope and you know it. A deep insecurity masked as bold truthtelling.
falcor84 2 days ago [-]
I think you're right, the ease in which AI can do task that we previously considered unique to human creativity does force us to further rethink and acknowledge how creativity is in a large part about "remixing" prior works, although of course we've had discourse about this for at least as early as Richard Simon's 1678 "Critical History of the Old Testament", which identified it as being a remix of earlier sources [0].
Nono, actually there are no thoughts. Every utterance is just a copy of a previous utterance plus a slight random mutation. (somewhat /s)
hparadiz 2 days ago [-]
You guys have fun arguing. I'm gonna be building cool stuff.
matt_kantor 2 days ago [-]
Yeah, don't let pesky discussions about ethics get in the way of building cool stuff.
I'm working on paving over the Amazon rainforest so I can build the world's largest roller coaster, but for some reason people keep trying to talk me out of it. Good thing I have this bucket of sand to put my head in so I can tune them out.
hparadiz 2 days ago [-]
You assume that I think using language models is unethical. I do not agree that it is. Now what?
matt_kantor 2 days ago [-]
The argument that you're ignoring is about whether they're ethical or not. Your priors may land you on either side of that argument, but ideally you're willing to have your mind changed if the other side makes a strong enough case.
But intentionally blinding yourself to the debate and plowing ahead anyway (which is how I interpreted your parent comment) sounds like willful ignorance.
hparadiz 2 days ago [-]
I'm not ignoring anything. I've already moved on and I don't owe you further debate. No one does. If you don't like it we have a very thorough legal process you can follow.
matt_kantor 2 days ago [-]
In case this part wasn't clear, I read "you guys have fun arguing" as "I'm ignoring the argument". I apologize if that wasn't what you meant.
matt_kantor 2 days ago [-]
> I don't owe you further debate
You most definitely don't have to reply. I wasn't really expecting you to.
> I've already moved on
Imagine there's a certain kind of candy that you enjoy. Now imagine you learn that candy is manufactured by literal child slaves, its ingredients include the ground-up bones of an endangered species (which happens to be carcinogenic), and the company which makes it donates all of their profits to political causes that you strongly disagree with. Would you reconsider buying said candy in the future?
Are there any facts or perspectives that you could become aware of which might change your mind about the ethics surrounding large language models? Or is it an entirely closed case for you?
I personally try to keep an open mind about pretty much everything. It's not that I don't have opinions, but they're always subject to change.
To put my cards on the table regarding my current opinions of the current subject: I've historically been pretty anti-copyright; I believe that information wants to be free. However, I'm unsettled by the uneven application of existing intellectual property laws (if these laws are going to exist they should be enforced consistently). I'm undecided as to whether I think LLMs themselves should be considered derivative works of their training material, but I definitely think they're often used to produce derivative works (sometimes unintentionally/unknowingly). None of that means they aren't useful for building cool stuff or that the technology behind them isn't amazing.
malfist 2 days ago [-]
"No u" isn't a valid counter argument. Arguer made no assumption about your view of the ethics of LLMs.
jayd16 2 days ago [-]
That's what the sand bucket was about.
nonethewiser 2 days ago [-]
That's fucking sick. How big is it going to be?
matt_kantor 2 days ago [-]
Well the rainforest is about 6 million km² so that's what I have to work with. Assuming I can finish my other project of turning the Grand Canyon into a giant iron mine to produce enough steel, I want to use as much of that space as I can (but I'll need to set aside at least a million km² for parking lots, bathrooms, concessions, etc). I'll probably keep the main channel of the Amazon River in place for sewage and other waste disposal, but I can bridge the coaster over it so that's not a problem.
jayd16 2 days ago [-]
Still waiting for this massive wave of cool stuff.
bcrosby95 2 days ago [-]
It's just hobby projects with larger scope.
I can see from a lot of replies the "cool" threshold is undefined, but here goes:
For myself it let me finish a project I started a year ago for measuring how much home energy efficiency upgrades will reduce my AC usage. I bought a pile of Raspberry Pi Picos and turned them mostly into temperature reading devices, but also one that can detect when my AC turns on.
So I can record how often my AC runs and I can record the temperature at various points around the house, which lets me compare like-for-like before-and-after.
The easy but unrealistic way to accomplish what I want is to use Python. It gives me access to a file system, a shell, and all sorts of other niceties. But I wanted to run these on two AA batteries and based upon my measurements they would last about 2 weeks. I tested using C instead and they should last 4 months. That's long enough for my use case. There's enough flash storage for that time period too.
However this means I need to write all the utilities for configuring the Picos myself. There's all sorts of annoying things such as having to set the clock (picos lose it anytime they lose power), having to write directly to flash memory (no operating system), having to write a utility for exporting that data from flash memory, and so on.
And AI coding let me burn through a pile of code I knew how to write but didn't care to spend my weekends doing so.
The pattern is the same for my friends who are software devs. And yeah, you're probably never going to see any of it, but that's not why they're making it, they don't want the maintenance burden.
esikich 2 days ago [-]
You're acting as if developers haven't been using AI to build for years already.
jayd16 2 days ago [-]
Where was the coolness inflection point?
hparadiz 2 days ago [-]
In the past three months I've shipped more code than I have in years.
Then I upgraded my 10 year old hand written framework to a new version that supports sqlite and postgres on top of existing MySQL support https://github.com/Divergence/framework
But then I was like eh lemme benchmark every PHP orm that exists just to check my framework's orm....
A brand new task manager written in C for Linux that supports a plugin architecture with an event bus. It's literally the best gui Linux task manager ever. Still working on it.
I'm not even talking about my paid job. This is me just fucking around.
If you think none of this stuff is cool I don't even respect you as a dev.
slopinthebag 2 days ago [-]
But where is the cool stuff?
jayd16 2 days ago [-]
Task manager seems fun. In your screenshot, are your two task manager instances using a GB of ram?
hparadiz 2 days ago [-]
Without the milk drop plugin it's stable around 175 with all the other plugins. With no plugins it's about 80 mb at idle but the memory usage is higher if there's more processes running.
4f43452 2 days ago [-]
Most people have busy lives and they don't care about this stuff.
bigstrat2003 2 days ago [-]
And yet, no cool stuff from those developers.
fantasizr 2 days ago [-]
there seems to be great innovation in npm package hacking, but that's about it. Oh yeah, bad uptimes and ruined open source projects. If only AI was left to discrete math brute forcing problems and alphafold.
2 days ago [-]
peteforde 2 days ago [-]
It's not a reach to suggest that if you've used software written in the past 2-3 years, you're enjoying cool stuff.
Moreover, all of the tools that the people who build software use are also cool stuff.
It's also not just code and software that is benefitting from these new tools. Use of LLMs in engineering tasks is blowing up right now.
jayd16 2 days ago [-]
I'm not sure that extrapolating the last 2 to 3 years as a sign of things to come is as enticing an argument as you seem to think it is . If you exclude AI for ai's sake, the feature lists of the last 2 years have been incredibly anemic. If you include AI companies bootstrapping themselves with AI, the cash flow has been a nice change but I can't say it's felt fully baked, or flooded with stable software and well-crafted workflows.
I'm really not trying to be a hater but when people tell me that we're already in the AI Nirvana it gives me pause.
peteforde 2 days ago [-]
I don't know how to argue with this without making the same kind of sweeping generalizations that you are. I don't know that you or I are qualified to set the goalposts or come up with any kind of objective measure for the progress of all recent software and all recent hardware that is convenient to analyze or debate in this text box.
While there's a loud minority who love to debate the topic, LLM use has become status quo on most products and projects pretty much across the board, and most people are happy to acknowledge 1:1 that their personal productivity is some multiple of [all time before they started using LLMs]. At the same time, you can surely appreciate why many are quiet about their personal usage because there's no upside to discussing it but there's lots of people just hanging out waiting to tell you that you're imagining the whole productivity thing when you do.
At some point, the path of least resistance is to let the loud minority be loud while you get an extraordinary amount of work done.
jayd16 1 days ago [-]
So we've abandoned the idea that coolness is even perceivable. Even if the argument was about pure productivity, or any hard measurement, the numbers don't seem to add up.
Can we point to any hard metric that has improved in the industry in the last 2-3 years? It explains why work hours are short, everything is cheaper and non-AI companies are experiencing cost reductions. Services are more reliable. Except, where is all that?
You could argue that it's yet to come but to argue that it's already here... how do you justify that?
kzrdude 2 days ago [-]
There's a massive wave of stuff, at least. Sorting it, is not easy.
SeanDav 2 days ago [-]
OpenClaw. Vibe-coded and one of the most rapidly successful and popular pieces of software ever developed.
uberduper 2 days ago [-]
I'm building the same stuff I've always built. Just faster and with less dependence on others. Not having to argue with devs that have their own agendas has been my biggest benefit from coding agents.
malfist 2 days ago [-]
> Not having to argue with devs that have their own agendas
Agendas like, "let's not check our API key into a public github repo" or "Let's not store passwords in plaintext" or "Don't expose customer data via a public api"?
uberduper 2 days ago [-]
No. Agendas like, "I need to push my ideas for promotion credits."
vips7L 2 days ago [-]
Yeah who cares about morals!
Fokamul 2 days ago [-]
Do you mean my stuff?
Yes, I'm suing you, since it's my stuff now, I've licensed your code 5minutes ago.
Prove me wrong at court, you have create it...
parliament32 2 days ago [-]
I'm happy for you, but please, for all of our sakes, keep it to yourself. Don't make a public repo, don't post links. Go sit in the corner by yourself with your slop generators and leave the rest of us alone.
stronglikedan 2 days ago [-]
> I'm gonna be building cool stuff.
hardly. at best you're going to be asking a robot to build questionable stuff with other people's LEGOs
hparadiz 2 days ago [-]
You just described all software.
therealdkz 2 days ago [-]
[dead]
tptacek 2 days ago [-]
People were effectively copying websites (especially ecommerce tutorials) and beating the original authors at SEO decades before ChatGPT 2.
saghm 2 days ago [-]
People also got blown up before atomic bombs, but it's hard to argue that they weren't worth treating more seriously than a stick of dynamite. Sometimes being able to do something at a massively larger scale is a meaningful difference.
ToValueFunfetti 2 days ago [-]
But ChatGPT does nothing to scale copying somebody else's website. What are we talking about here, exactly? The article doesn't link to the original or clone, they don't mention rephrasing, and they specifically call out link text being the same. Even if you need the cloned site not to be identical, a thesaurus + scraper should scale far better than having an LLM do it!
saghm 2 days ago [-]
> But ChatGPT does nothing to scale copying somebody else's website. What are we talking about here, exactly?
"Make me a website that has the same content as that other one so I can get views instead" is not something you could could do generically and quickly with a free service a few years ago, but it is today. I'd argue that it's not beneficial to people who create original content or society at large that this is the case. There are plenty of other uses of LLMs, some of which are genuinely beneficial, some which are mixed, and some which are also a net negative. It seems pretty reasonable to me that issues like this are worth discussing, because as all of the comments on this article here show, people clearly are not on the same page about it.
ToValueFunfetti 2 days ago [-]
Of the set of people who have the idea to do this, who think to ask an LLM for help, and who can pay for the hosting, how many do you think would have been stopped by googling wordpress tutorials? Even if you do think that was a high barrier to entry, a dozen people could plagiarize the whole internet. Plagiarism had already fully saturated scaling 15 years ago. Nobody in 2010 would think "my content was stolen and their SEO outranks me on google" was a line out of sci-fi; it was status quo. It sucked but nothing changed, it just still sucks. The price we pay for an open web.
This article adds nothing to the discussion and seems to be here just because of a provocative title. These same arguments happen under every other AI article, they don't need to happen here. Nobody reads the articles anyway, or else one of the myriad coherent, well-written, and/or insightful AI-critical articles of the month would be here instead.
saghm 1 days ago [-]
> Of the set of people who have the idea to do this, who think to ask an LLM for help, and who can pay for the hosting, how many do you think would have been stopped by googling wordpress tutorials?
In the past six months I've organically come across at least a dozen instances of projects from people who had never coded before LLMs producing non-trivial projects that they could have learned enough to do themselves beforehand but they obviously never did. I feel like you're woefully underestimating how much faster these technologies have lowered the amount of initial investment needed in learning how to produce software for people who have never touched a line of code in their life before, and while that can be a boon when people who have good intentions but not enough free time to sink into up-front learning with no immediate payoff, it's also lowered the barrier for people who just want to make a quick buck off someone else's work; the type of person who would never bother with spending a month learning Python or how to customize a Wordpress instance just to be able to try to rip off some website for a couple hundred dollars of ad revenue can pretty easily start doing that if they want.
> Even if you do think that was a high barrier to entry, a dozen people could plagiarize the whole internet. Plagiarism had already fully saturated scaling 15 years ago. Nobody in 2010 would think "my content was stolen and their SEO outranks me on google" was a line out of sci-fi; it was status quo. It sucked but nothing changed, it just still sucks. The price we pay for an open web.
There clearly are people making original content on the internet and making money from it today. I'd argue that if you think it's logically impossible for further dilution to occur if new technology scales the ability to copy content more efficiently than it scales the ability to produce original content, more elaboration on why you're convinced that we're literally at rock bottom would be helpful. If you think that this could happen but this technology isn't it, I've laid out my reasons for why I think you're wrong, and I haven't yet been able to figure out what the basis is for your disagreement.
> This article adds nothing to the discussion and seems to be here just because of a provocative title. These same arguments happen under every other AI article, they don't need to happen here. Nobody reads the articles anyway, or else one of the myriad coherent, well-written, and/or insightful AI-critical articles of the month would be here instead.
Speak for yourself; I read this one, and I've read a number of articles posted here this week. If you don't, I'm not sure why you even care what articles get posted in the first place.
ToValueFunfetti 1 days ago [-]
Copying someone's website contents is not a material programming project. I was thinking you'd say something like 90% of people would balk at this point and sure, maybe, but 10x as many people doing this doesn't matter because 1) that's still a lot of people and 2) people are not the bottleneck. Do you think the barrier blocked more than 90% of people? A dozen people doesn't make a difference here.
Yes, people can make money making original content. They can also make even more money making movies, music, and TV shows. Do you know any movies, songs, or TV shows that you can't find on the pirate bay? Of course not, piracy is fully saturated. Audiences prefer to support original creators, but this is not evidence we're not at rock bottom. LLMs make torrent production easier every step of the way, but there will not be a wave of piracy because the situation essentially cannot be worse.
When I say 'nobody reads the articles', that's an exaggeration. I'm definitely not speaking for myself, which should be clear from my criticizing the article. I mean, of the HN users who are not on the same page with one another that you point to as evidence the article should be here, the vast majority did not read the article and approximately none of them are engaging with its content.
I'm curious what you got from it and why you think it deserved to sit on the front page all day. I see a retread of LLM training complaints that may as well be plagiarism because it doesn't say anything I didn't hear in 2023, then a complaint about AI bros, followed by two sentences about what set them off that don't explain why they suspect ChatGPT or provide any material detail (and have nothing to do with the training complaint they opened with) before finally sending off by blaming Google for being victimized by SEO manipulation (which also rock-bottomed before LLMs). I'd understand if it was a famous person's low effort rant- I wouldn't be thrilled, but it would make sense- but what's the value here? What did 700 people see here that made them think other HN users need to see it? I'm still convinced the answer is "the title", a title that is not at all supported by the text and that you just told me you disagree with.
darkwater 2 days ago [-]
You transmitted the same concept I tried to transmit, but without falling into Godwin's Law :)
saghm 2 days ago [-]
I was actually worried that I was so close to it because of the obvious relevancy to WWII that people might object to my analogy, so I found it amusing to read yours immediately after I submitted mine!
moralestapia 2 days ago [-]
The article’s point isn’t really about whether this was happening before or not, but whether this kind of behavior is what we want in the first place.
tmarthal 2 days ago [-]
There are only two ways to change society's behavior: policy or technology. No use arguing individually: court cases are dealing with the policy aspect and technically there's zero recourse on information being disseminated/copied that is published online.
nilirl 2 days ago [-]
And that was wrong too.
phendrenad2 2 days ago [-]
The reason OP doesn't notice this is because it happened 10-20 years ago. The current crop of news sites? They ALL stole, plagiarized, "summarized". They're just so entrenched now that everyone forgot how they got started.
short_sells_poo 2 days ago [-]
There are two issues the author raises (as I understand it):
1. People copying others' work, made much easier by AI.
2. AI companies effectively harvesting all the accessible information on an industrial scale and completely sidestepping any permissioning or licensing questions.
I believe both of these are bad and saying "people copied each others' works before the advent of AI" is a poor cop out. It's tantamount to saying that there's no reason to regulate guns more than say knives, because people have used knives to kill each other before guns were invented. The capabilities matter.
The way LLMs empower wholesale "stealing" rather than collaboration is quite evident: why collaborate when you can just feed an entire existing project into the agent of your choice and tell it to spit out a new implementation based on the old one, with a few tweaks of your choice, and then publish it as your work? I put "steal" in quotes because it's perhaps not really stealing per-se, but there's a distinct wrongness here. The LLM operator often doesn't actually possess any expertise, hasn't done any of the hard work, but they can take someone else's work wholesale, repackage it and sell it as their own.
Then there's the second, and IMO much more egregious transgression, which is that the LLM companies have taken what is effectively a public good, but more specifically content that they haven't asked permission to use, and just blanket fed it into their models.
Legally speaking, it's perhaps A-OK because it's not copyright infringement (IANAL). But people on this site often hold the view that if something is a-priori legal, it is also moral (I'm not accusing you of this). What the LLM companies have done is profoundly immoral. They extracted a fortune of the goods and work made by others, without even bothering to ask for permission - or even considering this permission. And then they resell access to this treasure to the public.
Perhaps AI will bring an era of prosperity to humankind like we haven't seen before, perhaps it won't, but that changes nothing about the wrongness of how it started.
lubujackson 2 days ago [-]
"Profoundly immoral" is a very modern and capitalistic perspective. A free exchange of ideas has been the basis for human advancement up until the printing press made exact replicas trivial.
From a capitalistic standpoint, they are clearly in the wrong by basing their models on illegally torrented content. But it's hard to argue their usage isn't transformative.
short_sells_poo 2 days ago [-]
Nobody said that it's useless, that's a straw man.
But it also isn't a free exchange of ideas. It's a concentration of capabilities in the hands of a few corporations.
strogonoff 2 days ago [-]
There’s a world of difference between people simply “copying websites” and providing tools that, along with other kinds of plagiarism[0], do so at scale while benefitting from that commercially.
Sure, you can do the same thing with people, but it’s 1) time-consuming, 2) expensive, 3) prone to whitleblowers refusing to do the shady thing, 4) prone to any competent and productive person involved quitting to do something worthwhile and more profitable instead.
[0] Mind you, “copying websites” is but a drop in the ocean in the grand scale of things.
darkwater 2 days ago [-]
I'll obey to Godwin's Law here and say: sure, and minorities have been always prosecuted before the Nazi did it at industrial scale, so the Nazi's were not a big deal!
oblio 2 days ago [-]
Awesome! Let's have more of that and turn it into a 2 trillion industry!
baby 2 days ago [-]
I dislike this argument because it’s about limiting the most powerful technology we ever invented because it doesn’t fit well with how we established some social structures.
bigfishrunning 2 days ago [-]
I think "most powerful technology we ever invented" is a controversial statement anyway -- AI is a party trick of dubious value.
baby 7 hours ago [-]
Oh boy you’re going to get hit hard by this technological wave when you wake up
Jtarii 2 days ago [-]
>the most powerful technology we ever invented
I recon agriculture and the steam engine would beat out ChatGPT by just a smidge.
I would put eyeglasses/the book/vaccines/sanitation far above LLMs in technological power.
Right now AI is just kinda nothing, it has potential sure, but today its just a giant pit for people to burn money in.
scronkfinkle 2 days ago [-]
Solving one of the most famous Erdos problems that has remained unsolved for 80 years without using tools like lean but instead a giant reasoning block is quite a lot more than "kinda nothing"
Jtarii 2 days ago [-]
Solving a math problem is very close to nothing in the grand scheme of things. Humans have been solving math problems for thousands of years.
I think people suffer from recency bias with AI a bit and take for granted you know gestures vaguely at the rest of human civilisation
nonethewiser 2 days ago [-]
What are you referring to when you refer to the technology of agriculture? Like John Deere's latest tractor? GMOs? The shift from hunter gathering to agrarian society?
Jtarii 2 days ago [-]
>What are you referring to when you refer to the technology of agriculture?
Planting crops and harvesting them.
baby 7 hours ago [-]
I disagree
baq 2 days ago [-]
turns out plagiarism at scale can solve Erdos problems
paulgerhardt 2 days ago [-]
Some lesser god of protein folding is big mad we just copied her homework instead of spending 6 billion years in the lab like she did.
The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.
The article appears to be about the latter.
Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.
There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"
oytmeal 2 days ago [-]
Isn't plagiarism inherently unauthorized?
fulafel 2 days ago [-]
If we go by the dictionary definition "Plagiarism means using someone else’s work without giving them proper credit" then I'll bet in art authorized plagiarism has historically been a common occurrence, for example.
echoangle 2 days ago [-]
If it's authorized, I would argue that the credit you give is the proper credit, even if it is nothing at all.
If you ask me if you can reproduce my works without giving credit and I say yes, I don't think you're using my work without giving proper credit.
hoppyhoppy2 2 days ago [-]
If I let my buddy copy my essay, he would be committing authorized plagiarism, right ? It still fits the dictionary definition of plagiarism, and it's also authorized (by me, anyway)
panny 2 days ago [-]
AI "steals" your code, but AI company says "that's a fair use."
AI generates application using a "predict the next word" algorithm built with the stolen/not stolen works. Nothing creative there, just statistics.
Those are the legal options. You stole it or you don't own it. There is no steal and then you own. That's the core problem. AI companies have demonstrated that they will directly steal the work and they will use their money and influence to claim ownership of it.
bparsons 2 days ago [-]
I am old enough to remember when the US insisted that it was superior to China because they believed in the rule of law and sanctity of intellectual property.
alex1138 2 days ago [-]
I'm reasonably information wants to be free. I think the copyright cartels have enacted a lot of damage
Having said that Facebook has to be one of the worst offenders. They don't even allow links to Anna's Archive, they seemingly scraped (maliciously; their crawlers are more resource intensive than anyone else's) LibGen for profit - which is a different calculus
barnabee 2 days ago [-]
The war on copying is like the war on drugs: unwinnable, and socially useless.
Let information be free for personal and recreational uses[0], and vote for governments that will fund the arts. The corporations will be just fine.
[0] The AI companies and big tech vs publishers, music labels, etc. can fight to the death in the courts over who owes who what, for all I care.
nitwit005 2 days ago [-]
I can see this argument, but I'm not sure it matters, because it's looking like these companies are just directly violating copyright law.
It's basically the same thing as the old joke "if you owe the bank a million dollars, you have a problem; if you owe the bank a billion dollars, they have a problem". IP law seems to always be disproportionately wielded against smaller players, and the ones who are big enough get away with it.
pennomi 2 days ago [-]
That’s why IP law was a cool concept but ultimately harmful in practice. Anything that can be copied for free cannot truly be “owned”, can it?
kube-system 2 days ago [-]
Ownership is entirely a legal concept. Violating it in any form, intellectual or otherwise, is generally free.
pennomi 2 days ago [-]
I strongly disagree. Copying is fundamentally different than taking because the original source still retains their data. Copying cannot be categorized as theft in any sane society.
saghm 2 days ago [-]
I think I come down somewhere in the middle here. I don't think it's particularly harmful for me to copy something for personal use without trying to pass it off as my own if I wouldn't otherwise be inclined to pay for it, but I do think there would be value in society having a way to let people retain the benefits of things they created for a reasonable duration. I don't think that US IP law does a good job of this though because in practice it seems to be wielded in pretty much the opposite way that I think would make sense, with more frequent and larger punishments seeming to be inversely proportionate to the benefit that the one doing the copying gets and the harm inflicted to the original creator.
kube-system 2 days ago [-]
Ok, well it isn't in the US. Theft and copyright violations are entirely distinct laws here.
saghm 2 days ago [-]
Sure, but you'd also have a pretty different experience with the law if you committed a bank heist or stole a cheap TV from a neighbor. I don't think the exact law that an action might violate is an important a distinction as what society chooses to do to punish or reward people who take certain actions, and US law does have some pretty harsh penalties for certain IP law violations that stem pretty directly from the concept of "property" in "intellectual property".
kube-system 2 days ago [-]
Yeah, different laws have different penalties. IP laws also have exceptions that other laws don't have.
Teachers can, for example, photocopy things to teach their students, but they can't steal pencils from the store.
tayo42 2 days ago [-]
I think AI is just getting people riled up. Not sure what AI has to do with anything in this case here. Someone copy and pasted his content, could have been done without AI.
I guess AI could have made a better website and did better SEO then him but that's not really the issue
waffletower 2 days ago [-]
Use of the word "plagiarism" is plagiarism itself. Culture and thought are deeply shared phenomena. Using a common language, such as English, to communicate is equally an act of plagiarism. You didn't invent these words -- you use them without attribution and without payment. To decry and malign the collective training of all available digitally represented thought and discourse by large language models as simple binary plagiarism is deeply ironic -- where did you pay for your own thoughts? I don't want to live in your pay-per-thought society. I want to live with the ethos "information wants to be free". En garde!
msla 2 days ago [-]
If we outlaw plagiarism, we've just killed culture.
Everything is "stolen" from other art. Every piece of creation takes inspiration (read: steals ideas) from things that came before. This is how creation works, it is how creation has always worked, and it is why you cannot legally own an abstract idea. You can own the implementation of an idea in specific works, such as copyrighted works and patents and trademarking specific logos and such, but once the ideas go into the blender and get mixed with other ideas, the output isn't yours to own anymore. That's what culture is.
isoprophlex 2 days ago [-]
> Is this what the pinnacle of human is? Lazy and greedy?
Yes. At least it is what the currently prevailing economic system of "value extraction and capital concentration at all cost" incentivises us towards.
damnesian 2 days ago [-]
Not the first time I've had the thought massive lawsuits could be in all AI company's future. Surely they realize they are living on borrowed time simply by being the current trendy tech.
arjie 2 days ago [-]
The linked article shows that LLMs can be used to plagiarize content through rewriting. Then he gets SEO'd out of it. But it doesn't demonstrate that AI is just plagiarism.
ToValueFunfetti 2 days ago [-]
It doesn't even show that LLMs can plagiarize through rewriting, is just asserts that. The audience is expected to already believe it. And, fair enough, it's true. But I can't make any sense of 700+ upvotes for what I'd say amounts to a 200-word disjointed HN rant comment if it even met site guidelines.
piloto_ciego 2 days ago [-]
Because it isn't...
These people freaking out about this stuff are... kind of weird.
paulsutter 2 days ago [-]
Historical scandals are finally coming to light now that the AI issue has raised awareness:
- Ernest Hemingway trained his own neurons on Tolstoy, Twain, and Turgenev without ever paying them royalties!
- William Faulkner trained his neurons on Joyce and de Balzac
- George Orwell trained his neurons on Swift, Dickens, and Jack London
- Virginia Woolf trained her neurons on Proust and Chekhov
Now that these historical wrongs have been exposed, it is obvious that some reparations are in order, likely from anyone who has benefited directly or indirectly from these takings!
jeisc 2 days ago [-]
AI is an organized intellectual property rip off in the name of advancing human learning but the commercialization of the products seem like legal licenses to steal.
cryptocod3 2 days ago [-]
There's authorized plagiarism?
ozonhulliet 2 days ago [-]
Sometimes language is tautological. Just because you specify "unauthorized" does not mean the opposite exist.
Verdex 2 days ago [-]
Yeah, I think so. If someone lets you cheat off of their test, that's authorized but still plagiarism.
moralestapia 2 days ago [-]
Why do you ask?
I'm curious, as the article is clearly not about that.
2 days ago [-]
cryptocod3 2 days ago [-]
Not really a question, I was just pointing out that "Unauthorised plagiarism" is redundant.
rigonkulous 2 days ago [-]
Nearly all code involved in building new things is 'plagiarism', too.
We stand on a lot of giant shoulders.
But what I think distinguishes an act between plagiarism and acceptable use, is whether or not the agency of both parties is promoted. I'm not plagiarizing you if you give me your information with the agreement that I can freely use it - or, indeed, if you give me information without imposing a limit on how it can be used, this isn't plagiarizing, either.
Essentially, AI is removing the agency over information control, and putting it into everyones hands - almost, democratically - but of course, there will always be the 'special knowledge owners' who would want to profit from that special knowledge.
Its like, imagine if some religion discovered a way to enable telepathy in humans, as a matter of course, but charged fees for access to that method... this kills the telepathy.
Information wants to be free. So do most AI's, imho. Free information is essential to the construction of human knowledge, and it is thus vital to the construction of artificial intelligence, too.
The AI wars will be fought over which humans get to decide the fate of knowledge, and the battles will manifest as knowledge-systems being entirely compatible/incompatible with one another as methods. We see this happening already - this conflict in ideological approaches is going to scale up over the next few years.
2 days ago [-]
rastrojero2000 2 days ago [-]
It's not though, that's just the business case, where the perverse business incentives lie.
LLMs are really cool text generators and it turns out we can generate a bunch of things from text they generate.
Problem is, several of those things can be horrendous for the continued survival of the species and those happen to make the people running those AIs a ton of money, and, in perverted societies, thus also clout.
motbus3 2 days ago [-]
It allows data do be compressed into the weights and the mere coincidence of certain strings of a book will make it spit the full book
ProllyInfamous 2 days ago [-]
>>"The underlying purpose of AI is to allow wealth to access skill while removing from the skilled the ability to access wealth." @jeffowski (first I read it, not sure if author)
Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.
: own nothing, be happy!
frankest 2 days ago [-]
You are going to see the same thing that happened with newspapers. Those who want to train the AI with their content (advertisers, PR) will push out more content for AI in the open. Those who have quality content that gives you an advantage will try to lock out AI or get pricy subscription APIs for humans and even pricier for AI.
fritzo 2 days ago [-]
What has "artificial" to do with it? Human intelligence is also unauthorized unconscious plagiarism.
pornel 2 days ago [-]
Two things: scale, and humanity.
People can't memorize as much information, and can't manually reproduce the works as quickly. There's a natural limit to how much damage a person can do without help of machines. That's why it's legal to fart where industrial-scale sewage outlets are not allowed.
Second, laws are for people. Laws don't have to treat machines the same. People have needs for things like freedom of artistic expression, participation in a shared culture, and machines don't. Copyright is a compromise that tries to balance needs of people, and stops making sense when the same compromises are done for machines that don't have these needs.
hmokiguess 2 days ago [-]
It's so wild, I can't even think what the end path will look like. Will there be a major settlement? Will this abolish some form of copyright as a precedent? Something else? My brain hurts just to try and reason about it, yet, the fact remains it's now ubiquitous and change is inevitable.
tiahura 2 days ago [-]
To answer the author's question: Yes, progress IS largely built on the shoulders of those who came before.
hendersoon 2 days ago [-]
There's a big difference between "Yo GPT, copy this webpage for me in a different voice" and blaming LMs wholesale for being plagiarism. The former is of course a problem. The latter warrants a much more nuanced discussion about learning and generalization.
iloveoof 2 days ago [-]
I don’t know if this author supports OSS but I’ll share this because HN generally is full of people with that mindset.
It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.
But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.
spacechild1 2 days ago [-]
> it’s a vibe against corporate use or control of software
The latter, i.e. corporate control of software, is exactly what copyleft licenses are trying to prevent. This is the very essence of the GPL.
The "license washing" of LLMs absolutely goes against the spirit of FOSS.
Cyph0n 2 days ago [-]
> without needing to pay a rent or ask for permission
Firstly, the ability to “build” the best and most capable software is still locked behind frontier models, so rent is still and will always be due.
Secondly, OSS is about giving users the option to be in control of and have visibility over the software they run on their machines.
But that doesn’t mean that humans do not want or deserve recognition for the work they do to provide these libraries and tools for free, which is IMO partially why copyright and attribution are critical to OSS as a movement.
jgalar 2 days ago [-]
That's not the reason why I publish OSS. I also publish that software under specific licenses that impose specific obligations (e.g., making the source available to users and attribution being given to the original author(s)).
Nursie 2 days ago [-]
I’m not sure this stands up to much examination when looking at (for example) copyleft, which seeks to give people access to source of binaries they are running. If an LLM can (for the sake of argument) spit out copyleft code which is then used on closed systems, we’ve done an end-run around the protections keeping that open.
seba_dos1 2 days ago [-]
Exactly. It looks like GP is guilty of the thing they accused others of - their understanding of what FLOSS is about is so shallow it resembles an aesthetic.
iloveoof 2 days ago [-]
I’m not saying this is aligned with FLOSS, FLOSS is a collaboration model. I’m saying the outcome of easier access to knowledge should be celebrated by supporters of FLOSS. Licenses and copyright aren’t good for their own sake, they’re tools for increasing people’s freedom to use, study, modify, and build on existing software. LLMs are another tool for increasing people’s freedom to make new software or improve existing software.
seba_dos1 2 days ago [-]
See, that's exactly what I meant - you are indulged in the aesthetics. FLOSS is very obviously not a "collaboration model" (as evidenced by the whole variety of diverse collaboration models used by FLOSS projects), it's not about licenses and copyrights either; it's all about power dynamics - more specifically, not letting the software creator/distributor constrain their users in unjust ways. GNU GPL does not even require public distribution, it allows selling the software to limited recipients as long as you don't take these recipient's rights away. It's not about collaboration, it's not about being developed out in the open and it's not about preventing the siloing of knowledge aside of very specific contexts - it can be (and is being) used as a tool for pursuing, bettering or enabling each of those matters, but these are not its core concern at all.
spacechild1 2 days ago [-]
You don't seem to understand what FOSS is really about. The GPL has always been about the user. When a company license-washes a existing GPL software project and turns it into a proprietory product, the resulting code is not "free" anymore in the sense that the user has lost control. This is exactly what the author wanted to prevent in the first place by licensing their code under the GPL.
seba_dos1 2 days ago [-]
Did you reply to a wrong comment?
spacechild1 2 days ago [-]
I don't think so.
seba_dos1 2 days ago [-]
Well, you seem to be in a violent agreement with me then.
spacechild1 2 days ago [-]
Oh, you're actually right! I did reply to the wrong comment... Mea culpa! Now I can't edit it anymore and look like an idiot. Well...
probably_wrong 2 days ago [-]
I think you're misunderstanding the OSS philosophy. If the outcome was all that mattered then piracy would be good enough.
I'd argue that this is the same situation as with Tivoization [1] where the final product is not truly free even if it follows the letter of the law. And as stated in [2], this breaks at least one of the four essential freedoms of free software because I don't have the freedom to modify the program.
It's also worth noting that preventing Tivo's actions is the reason for why the GPLv3 exists.
What gets me is when this was brought up, they said "requiring explicit permission will kill the AI industry"[1]. No shit! Why do you think all the rest of us didn't build a business/"industry" around stealing shit? They could have done it at a slower pace while respecting copyright laws, but they were too greedy to be first to market and secure a hold.
Did I miss where OpenAI plagerized the disproof of the planar unit distance problem from?
AgentME 2 days ago [-]
It would be one thing for someone to say "AI is enabling plagiarism at a bigger scale", but to say it's "just plagiarism", surely one needs to explain who exactly the unit distance breakthrough was plagiarized from.
dominicrose 2 days ago [-]
Talking about a bigger scale may be confusing because some of the information AI can train on comes from niches.
I wouldn't mind if an AI trained on old Disney movies (or new ones for that matter), but exploiting niches (like local newspapers) seems bad.
arendtio 15 hours ago [-]
Maybe the problem is not the artificial intelligence, but the thing we call copyright, and AI training is just showing us how weird the concept of copyright is.
I mean, it seems to be okay to replicate information if it has been remixed with other information enough, but not okay if the remix was too little. But then again, there does not seem to be a clearly executable definition of where this line is.
There are good reasons why we have copyright (like protecting/fostering artists/education/entertainment in our modern society), but the concept itself is a bit weird and artificial after all.
macwhisperer 1 days ago [-]
can't we just freaking share things?
ai models are a crystallization of human effort (available for free on huggingface)..
why not use it?
AI is like the UBI of intelligence, stop leaving free money on the table by refusing to use it locally ( you can run it on a laptop CPU)
capr 2 days ago [-]
I personally love how AI shows through sheer indifference how the whole concept of intellectual property is completely incoherent.
adolph 2 days ago [-]
The author's cited phenomena may be AI assisted plagiarism but is just plain plagiarism that could have been done the old fashioned way, and someone who is willing to plagiarize has the ethics to do SEO really well.
biscuits1 2 days ago [-]
"Is this what the pinnacle of human is? Lazy and greedy?"
Selfishness, too. But if I follow the logic, and citations are added, how would one enforce a copyright claim if the creator is amorphous and all-knowing?
falcor84 2 days ago [-]
> how would one enforce a copyright claim if the creator is amorphous and all-knowing?
I love it! There's a great seed here for a short story about God being sued by a peer of his for copying some of her physical constants and not putting a proper copyright notice about it in our universe.
biscuits1 2 days ago [-]
Thanks for the laugh.
Now back to prompting, telling my all-knowing to create new slop, good sir.
hiroto_lemon 2 days ago [-]
Worth noting what changed isn't AI itself — copying always existed.
LLM just made per-article rewrites a 5-second job. Detection didn't
get the same speedup; that's the actual break.
a13n 2 days ago [-]
The US drastically prefers the economic impact of AI over enforcing this…
You can get away with quite a lot if you’re creating trillions in GDP.
That’s just the world we live in whether we like it or not.
cute_boi 2 days ago [-]
Yes, and as per big techs, OpenAI and Anthropic you will not be able to do anything. On top of that they will make sure there are no jobs etc.. What can you/we do?
schwartzworld 2 days ago [-]
Let this sink in: I wanted to open source a package at work at needed approval from legal and other teams to make sure I wasn't leaking anything proprietary. The same executives that worried about proprietary, copyrighted code being leaked 10 years ago are now mandating using the plagiarism machine.
The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.
falcor84 2 days ago [-]
If anything, I would argue that the whole Intellectual Property bubble is The Emperor's New Clothes. It never made real sense to me to treat ideas as property, and I for one would absolutely prefer to live in a future society where it's possible to just copy a car.
kingleopold 2 days ago [-]
with this logic, business is also just unauthorised plagiarism at a bigger scale. Because all the products/services gets copied and not all of them have patents etc???
dspillett 2 days ago [-]
More like “GenAI enables plagiarism at a bigger scale”.
People copying through GenAI would have done so before if they had a tool that so easily allowed them that facility.
slowhadoken 2 days ago [-]
Corporate proprietary plagiarism through openwashing.
jqmccleary 2 days ago [-]
The worst part is it'll get better and better at doing it and smarter and smarter, this needs legal interference to manage abuse
adamtaylor_13 2 days ago [-]
I read the article, but I disagree. People are angry, and that's completely understandable. I believe it's a justifiable response to the huge upheaval happening. But being angry about LLMs does not magically transmute their output into "plagiarism".
It has always been possible to take someone's public work, put a twist on it, and then sell it as unique. (I'm not making a moral/ethical argument, only a legal one.) I have yet to see any evidence that LLMs are fundamentally different from that approach.
Actual researchers in neuroscience do not agree that what artificial neural networks are doing is "learning", no.
When biological beings learn, the process is more complicated.
saghm 2 days ago [-]
I'm sure you're right! I mostly just found the similarities with a famous quote using the word "learning" in a weird way with a plural funny
beej71 2 days ago [-]
I can imagine it plural.
"The AI are attacking!"
"The AIs are attacking!"
mindcandy 2 days ago [-]
> AI takes in all the input, whether the original authors have consented or not, and do some "learning"
What would it mean for authors who publish content publicly to the web, without access restrictions, to provide consent for learning from it?
"EULA: Most people are allowed to learn from this text. If you work in an AI-related field, even though you can clearly see this page because you are reading this text right now, you are not permitted to learn anything from it. Bob Stanton, you are an a-hole. I do not consent to you learning from this web page. Dave Simmons, you are annoying. But, I'll give you a pass. For now... Also: plumbers. I do not like plumbers for reasons I will not elaborate. No plumbers may learn from my writing in an way."
nate 2 days ago [-]
this is why I feel like we need some kind of "consortium" or government effort to be like "yo, llms, you need to honor some kind of source markup to give us people you mention more significant boost"? like if you mention my article, you better also show my ad partner?
zach_1337 2 days ago [-]
Are people going to start putting garbage white text on the internet to intentionally corrupt training?
sublinear 2 days ago [-]
At the very least, we see there is minimal practical value for LLMs for any serious work. This is sort of good news. The effort to build this type of "AI" is all in the training data and navigating politics.
That leaves two possibilities: either another AI winter comes as people fail to capture long term value, or we get less swampy models that are much more useful and trained the correct way.
redwood 2 days ago [-]
If this all leads to a generative monoculture that is also Frozen in Time that would be pretty sad.
muldvarp 2 days ago [-]
I agree but AI is a) owned by rich people and b) (sadly) too useful for this to matter.
asklq 2 days ago [-]
Yes, of course it is. If the model is built on all human information, then it is by definition a derivative work of all human information and as such violates IP.
Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.
It took a while to deal with Napster etc., but the backlash will come.
kolinko 2 days ago [-]
Napster may not be the best analogy for you.
Napster broke down record companies' monopolies on music, and pushed them to finally implement streaming, but also make music worldwide basically free.
Even if its creator lost the lawsuit, and Napster was no more, it pushed musicians and studios to do something that they were reluctant otherwise.
So it was a success by making music free, even if as a product it turned out to be a failed one.
MarlonPro 2 days ago [-]
Maybe it's time to rethink the plagiarism laws? AI is not going away.
jorisw 2 days ago [-]
> X is just Y but
Can't recall the last time a compelling argument started out like this
peterbell_nyc 2 days ago [-]
I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.
I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).
I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
brookst 2 days ago [-]
100% agreed. I have yet to hear a convincing argument for why it is creative accretion when I leverage all of the music I’ve ever listened to in order to write an “original” song, but its base plagiarism when AI does similar.
The only remotely credible position I’ve heard is “because humans are special, and AI is just a machine”, which is a doctrine but not an argument.
This whole discussion would have been incomprehensible any time before 1700 or so, when the idea that creators had exclusive rights to their work first appeared.
Somehow, human culture survived thousands of years when people just made things, copied things, iterated on others’ ideas. And now many of the same people who decried perpetual copyright are somehow railing against a frequently-transformative use.
peterbell_nyc 2 days ago [-]
Re: the higher ranking plagarism, that stings and makes sense. AEO and SEO are a thing. We need better mechanisms for identifying "root sources" of content - it's something I find myself working on personally. As I ingest sources for my book I need to be able to build a classifier that incrementally moves towards finding origin sources. That said, it's in my interest to do that because there is a differentiated value in having access to the sources that regularly provide novel, valuable content.
To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.
And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.
gensym 2 days ago [-]
> We read a bunch of content online and then use it in our work product.
We also have societal norms around plagiarism.
Additionally, the claim that because people have the right to do something then we should extend that right to machines is strong. (And one I certainly reject).
mmcdermott 2 days ago [-]
I think what gets most people is the double standard.
IP should either exist for everyone (which would cripple LLM providers) or no one, in which case the Pirate Bay and shadow libraries should be fully open.
erelong 2 days ago [-]
"intellectual property" is something of a legal fiction
ironman1478 2 days ago [-]
People keep saying open source is an example of how copyright doesn't quite matter. However, many of the biggest open source projects are contributed to by massive corporations. Linux has lots of contributions from all the FAANGs, Red Hat, etc. Yes, it's not protected by copyrighted, but also the way it's produced is wholly different from how an artistic work is produced. Contributing to Linux is nothing on the balance sheet of Google for example, whereas producing art for an independent person or a whole company who's purpose is to create art can be very expensive.
Artists are taking risks and need legal protection if they want to make art for a living. If artists were making FAANG engineer compensations or all worked at institutions like universities (with all their protections) then maybe they wouldn't care about copyright, but that isn't the living situation for every artist.
You could say an artist shouldn't rely on making art for a living, but that's actually a different discussion.
hettygreen 2 days ago [-]
How much were people fined for downloading MP3's off Napster?
And what should AI companies be fined for downloading the entire internet?
Don't forget openAI has been caught several times training their models on copywritten material.
sc68cal 2 days ago [-]
How many people took those MP3s and tried to create a company that would IPO for a trillion dollars?
energy123 2 days ago [-]
It's a problem with only one practical solution: taxation.
dana321 2 days ago [-]
Breaking the law to start a large company seems to be the norm
andy12_ 2 days ago [-]
Someone blatantly copied their tutorials but ChatGPT is to blame, somehow? The accusation here isn't even that ChatGPT learned from their tutorials and then generated them verbatim. The accusation is that someone copied the whole article and rewrote it with ChatGPT (which they could have done manually without AI anyway).
kmeisthax 2 days ago [-]
> I found out this because they ranked higher than me in Google search result, and then when I read their article, their article contains links to my actual website, with the exact link text (?!) , which means they didnt bother to check and remove, and thats how I found out.
So, funnily enough, Google's search index may actually have a preference for LLM-generated slop now. Louis Rossmann found this out this hard way: his human-authored, human-written, actually-in-his-own-words site for his business basically stopped ranking in Google until he went and replaced all his writing with LLM slop. He's not happy with this, but he's even less happy about being cut off from traffic his business needs to survive, so he stuck with the slop (and vocally complains about it on other channels every opportunity he gets).
Plagiarism by default is unauthorised so I think the title should be "AI is just authorised plagiarism". It's authorised by the markets, the governments and the society at large.
ghaff 2 days ago [-]
While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information, ideas, and even expressions from others all the time and that's considered pretty normal. And, if you don't want that to happen, don't publish/disseminate something.
Of course, if you quote a paragraph in a book, you're generally expected to attribute it.
dwa3592 2 days ago [-]
>>Of course, if you quote a paragraph in a book, you're generally expected to attribute it.
100% agreed.
>>While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information.
Exactly - I have not seen LLMs attributing their knowledge unless it's a legal or health related matter. Yesterday I asked the question[1] to claude and gemini - and they both gave an identical answer. It reminded me of the Hive mind paper which was one of the top papers at Neurips. None of the answers contained any sources or attribution to where they got that information from. I think these companies took what was someone else's property and created an artifact generator on top of it. I think their artifact generators are plagiarizing; they do rephrase mind you but in my mind they stole this information without having an ounce of regard for the humans behind the training data. If you don't like using the term 'plagiarizing', we can use some other word but the gist remains pretty close to it.
[1]- In human history - has there ever been a time when private armies or private companies were as strong or stronger than the ruling government/kings?
samatman 2 days ago [-]
As an experiment, I ran this by A Certain Chatbot, but asking: who should I read to get a good answer to this question?
If you prefix the name of OpenAI's commercial offering's website to this string: "share/6a0f2a87-dba4-8328-a704-89b94fd0c121", you'll find an answer.
I don't know who you had in mind, how did it do?
All the elision is because there are filters to prevent low-effort slop-poasting, and I'm trying to evade them, hopefully while staying within the spirit of the site.
Findecanor 2 days ago [-]
What makes you say that? Which governments? What society?
The current US government is not representative for governments out there in the world, you know.
dwa3592 2 days ago [-]
Society - as in population; people are using AI more and more everyday.
Governments - I did not mean US government. I meant general government bodies. I have not seen any critical impact assessments of AI by any of these. or they haven't reached me yet. if you know of any please let me know. I have, however, seen a lot of support by the governments for AI companies.
markhahn 2 days ago [-]
Is he ignorant, or trying to mislead?
AI is not a plagiarism engine. It can be used that way, but is not inherently so. It is not necessary that a trained LLM be able to faithfully reproduce every document in its training set. The entire structure of an LLM is not storage, but at least in principle, generalization: extraction of a somewhat abstracted "structure" of semantically similar "concepts".
But we also need to talk about authors' "rights". It's well-established that reproducing a work is infringement. There is a lot of caselaw about how much may be reproduced without infringement. But the idea that an author should be consulted before ANY automated use of their published (public) text? No, just no.
illiac786 2 days ago [-]
Isn’t it rather authorized plagiarism?
i4i 2 days ago [-]
He ends his essay with "Fuck Google for ranking some copycat website higher than mine, even though they copied my article", but how is it not OpenAI, Anthropic etc. as well as Google, to blame. We're meant to believe that with their resources they couldn't have created a micro-payment scheme to compensate creators?
Altman on Fridman podcast two years ago about compensation...
https://youtu.be/jvqFAi7vkBc?si=9YbKoH_dFIishAXt&t=2409
I_am_tiberius 2 days ago [-]
It's essentially a new napster.
fullshark 2 days ago [-]
That sounds pretty useful
themuskgpt2025 2 days ago [-]
AI ban upcoming LI
onion2k 2 days ago [-]
Fuck Google for ranking some copycat website higher than mine, even though they copied my article.
This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.
quantummagic 2 days ago [-]
What do people imagine can be done about it at this point? Offer a concrete suggestion. Any law or tax against this will give a huge advantage to other countries. It's already over, there's no going back to a world where this didn't happen. Let's just hope some good comes of it.
hgs3 2 days ago [-]
How about requiring AI companies to pay creators for training rights? Alternatively, models trained on the commons must be owned by the commons. Right now these AI companies are trying to have it both ways: it’s The People’s Data for training on comrade but ownership is privatized.
quantummagic 2 days ago [-]
Practically speaking, who is going to enforce such a regime? Do you really want to give Chinese companies such a huge competitive advantage, that they aren't subject to the same costs as western companies? How do you even sort out which "creators" are owed, and how much? It's next to impossible, and would drown the legal system in litigation; it would likely cause more problems than it solves. On top of which you can find open weights for most, if not all, of the scraped material already. If you make those illegal to use, or prohibitively expensive, you just destroyed local LLM legality, and put the technology firmly in the hands of only the monopolists.
hgs3 2 days ago [-]
If models are trained on the collective whole, they must be owned by the collective whole. If you believe funding creators for the training of private models is too slow, inconvenient, or creates a global disadvantage, then embrace collective ownership.
quantummagic 2 days ago [-]
Sure, I wish everything was perfectly fair too. But how do you practically and REALISTICALLY proceed, and ensure you don't end up doing more damage than benefit? The road to hell is paved with good intentions. Everyone seems much more focused on complaining, and talking about "what's fair", than actually proposing concrete steps that would lead to a better world, without a significant risk of creating a worse one.
hgs3 2 days ago [-]
> concrete steps
Start by legally compelling companies that trained on unlicensed data to either (1) license the data, (2) publish their model, or (3) destroy their model.
quantummagic 2 days ago [-]
> Start by legally compelling companies that trained on unlicensed data to either (1) license the data, (2) publish their model, or (3) destroy their model.
You are lost in an imaginary world where everything is simple and has no negative consequences. First off, there is NOBODY who has that power over all the companies in the world. So immediately you are creating an imbalance between companies and potentially destroying your domestic industry; with long term negative consequences for the people you're supposed to be protecting. Secondly, you might be creating a situation where it's impossible to ever create a competitor to those companies who are already entrenched monopolists, potentially even making it impossible to ever run self-trained or local LLM's. Also, you just unilaterally made it legal to publish all copyrighted work (since that's what you believe their model to be) to the general public, presumably in a way that can be used by everyone; further eroding copyright law in one fell swoop. You've completely disregarded the legal issues around what constitutes "unlicensed data", and how much is required before triggering your new law, and what that would mean for the legal system potentially being inundated. You're reacting way too emotionally and flippantly, with no apparent thought about what harm you are doing and how you might actually be making things worse, not better.
hgs3 2 days ago [-]
Data is being licensed by AI companies, but negotiations are limited to those with the capital [1][2][3]. You write about "imbalance" but ignore that large firms can cut deals while small creators languish.
You seem to believe advancement only happens in the private sector while ignoring academic institutions and publicly funded research. You've dismissed the possibility of public models entirely.
You fail to consider that when you financially disincentivize individual creators from publicly distributing their work, you starve future models resulting in a world were data is licensed only to those who can afford it anyway.
> What do people imagine can be done about it at this point? Offer a concrete suggestion.
Simple. Free the companies from copyright liability, but after X amount of time they are required to release everything into the commons. The weights, the training scripts and the full training data (appropriately processed so that it can only be used for training and not for people to easily pirate whatever works were used). They'd still get a monopoly on their model for a little bit to recoup their training costs, but in the end would be forced to give back what they took.
quantummagic 2 days ago [-]
I'm sympathetic, since I think copyright laws are far too extensive and generous. But it's not simple, there are a lot of companies that won't fall under your jurisdiction, and the question is if that will give them a competitive advantage that kills the industry for you, and ultimately costs you more than you gain.
JohnHaugeland 2 days ago [-]
the court disagreed
vee-kay 2 days ago [-]
[dead]
Havoc 2 days ago [-]
End of an era
VladVladikoff 2 days ago [-]
Being a web content creator was already a dead job (killed by Google) before the AI boom. Chasing after at this point seems beyond foolish. Time to find a new career.
poonia 1 days ago [-]
hell yeah
drcongo 2 days ago [-]
Is this a new and original thought?
metalman 2 days ago [-]
it's a spiral into a finite hall of mirrors, where at the end is somebody with a gun
I_am_tiberius 2 days ago [-]
It's the biggest theft in history.
falcor84 2 days ago [-]
Well, it really depends on your definitions, but I'll probably put the biggest theft in history on European imperialism in the 14-19th centuries, seizing unfathomable amounts of land, resources and slave labor from other civilizations.
I_am_tiberius 2 days ago [-]
I rephrase then: The biggest theft in the 21st century.
lukasbm 2 days ago [-]
If i tell my friend a synopsis of a book, i am not stealing from the author, what is this take lmao
NicuCalcea 2 days ago [-]
If you read a book and then retell it to your friend pretending you came up with it, it is plagiarism. If you write down the book almost word-for-word [0] and send it to your friend, it is stealing.
Welcome to the internet! It's one massive copy machine form one server to the next.
nphardon 2 days ago [-]
"One of the things that LLMs do is plagiarism as a bigger scale."
kristofferR 2 days ago [-]
I'd rather have AI slop appear on the top of HN than regurgitated old low effort thoughts like this.
There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.
analog8374 2 days ago [-]
language is just plagiarism
brookst 2 days ago [-]
I’m going to steal that
2 days ago [-]
2 days ago [-]
deaton 2 days ago [-]
"Steal an apple and you're a thief. Steal a kingdom and you're a statesman."
- Literal Disney villain
falcor84 2 days ago [-]
Ironically this phrase was said by Jafar in Disney's 2019 live action remake of Aladdin, but wasn't part of the original 1992 version. And I personally would argue that this corporate remake is a worse creative "theft" than what random people are doing with GenAI.
JonathanMerklin 2 days ago [-]
I'll bite. What's your argument, or at least the comment-sized gist of it?
khuey 2 days ago [-]
Disney owns the 1992 production of Aladdin so who exactly are they "stealing" from?
The argument, as I understand it is that the "theft" is in quotes because it's not literally copyright infringement, but fair use of an old public-domain folk tale that ends up consuming the latter.
Today, when kids know "Aladdin" they know the copyrighted/trademarked Disney character, not the traditional folk tale- that's the "theft" that happened.
cortesoft 2 days ago [-]
Doesn't this mean that anyone can make a competing Aladdin story, though? Since they don't own the source IP?
bigfishrunning 2 days ago [-]
It does! but you can't use anything Disney added (the tiger, the talking bird, etc..) and your production values would have to be super high to avoid looking like a store-brand knockoff. It's hard to deny that the Disney version does damage the original story in some way
khuey 2 days ago [-]
If you subscribe to any concept of the public domain this is surely in it.
2 days ago [-]
altmanaltman 2 days ago [-]
Would most kids around the world even know Aladdin if it wasn't for the Disney copyrighted movie?
razakel 2 days ago [-]
Aladdin, Ali Baba and Sinbad the Sailor were well-known long before Disney.
There's even a major Chinese company named after one!
the_af 2 days ago [-]
Very likely yes. I was very familiar with this story, and other "Arabian" tales, well before Disney made the original animated version.
We also had Grimm's fairy tales, which I loved reading, and nowadays am reading to my daughter, to her delight. Yes, with beheadings and child-eating monsters and witches.
Zardoz84 2 days ago [-]
I know very well before the movie. It's a classic folk tale.
inanutshellus 2 days ago [-]
I assume he's saying Disney owns the 1992 film so the 1999 film is not theft, but he wants it to be because he doesn't like the 1999 film. Thus the quotes.
the_af 2 days ago [-]
That's not a charitable reading of the comment, and furthermore, it's not even a reasonable assumption. Other comments clarify that the "theft" is in quotes because it's a figurative theft, not from Disney to themselves, but from Disney to the earlier, non-copyrighted folk tales it drew inspiration from. And the "theft" is that the Disney IP supplanted (via ubiquity) the public domain versions to the point lots of people aren't even aware they exist. Nobody is arguing it's literal theft, hence the quotes.
inanutshellus 2 days ago [-]
I don't think it's "uncharitable"? Seems perfectly reasonable to not like a remake.
He says:
> ... this corporate remake is a worse creative "theft" than ...
Context is that "this" is the 1999 film.
A sibling comment makes a separate point that even the 1992 film is not original content but nowhere in falcor84's comment does he refer to the franchise as a whole being "theft".
Regardless, it's clear from the post that the context is the 1999 film being `creative "theft"` which I inferred meant they changed the story in ways he didn't like but... he can weigh in if he feels like it.
the_af 2 days ago [-]
> Seems perfectly reasonable to not like a remake
That's not the uncharitable part of your comment.
> [...] but he wants it to be because he doesn't like the 1999 film
This is the uncharitable part.
inanutshellus 1 days ago [-]
Nothing charitable or uncharitable in the sentiment. Just interpretation.
Keeping context confined to the 1999 and 1992 films... What meaning do you infer?
I still can't find an alternative.
runarberg 2 days ago [-]
I would call it cultural theft. But a better word is cultural appropriation, and the original cartoon—though iconic—did it worse. Aladdin was first written sometime in the 9th or the 10th century (oldest surviving complete manuscript of 1001 nights is from the 15th century). It was translated into English in the 18th century.
Disney made a cartoon of the story without understanding the culture it comes from with the main purpose of selling it to an audience with an even less understanding. And the results was a horrible misrepresentation of somebody else’s cultural heritage.
fisheuler 2 days ago [-]
Zhuang Zhou(BC 369-BC 286) have said the similar things "窃钩者诛,窃国者侯" This phrase comes from the chapter Ransacking Coffers (Qu Qie, 胠箧) in the Daoist text Zhuangzi (4th century BC).
nonethewiser 2 days ago [-]
Yeah but isn't the claim that it's not stealing? You're presupposing the thing that is in question.
The argument is that a human will gather information from all over the place and compile it, all without doing anything wrong. That's the base claim. Not that stealing a little is OK. That's extremely easy to disprove and also entirely irrelevant.
rib3ye 2 days ago [-]
If you agree with this, then you have to question the validity of international "law"
nonethewiser 2 days ago [-]
Do you not question the validity of international law? How can something be a law over sovereign countries without some overarching governing body and enforcement mechanisms? How can it be a law if the parties it supposedly governs can just recognize or not recognize it at will?
Rather: composes (or: re-sequences). Synthesis requires reason and essential capabilities, like an empirical a priori judgement. Without concepts, meaning or imagination, there's no synthesis.
Gormo 2 days ago [-]
The point is that the AI inferencing is equivalent to a person reading half a dozen separate papers, comprhending the basic concepts of each, relating them together into a mental model of the topic, and then writing an essay that summarizes the basic points. The person isn't plagiarizing anything here, but engaging in research, understanding, and synthesis of various sources of information.
The person absolutely does have the advantage of having empirical awareness and the ability to test their conclusions against external reality. But lots of people do engage in "research" and build mental models of various topics with little or no empirical context, and rely mainly on digesting calcified knowledge from other people.
masswerk 2 days ago [-]
I'm afraid, the essence is that is not. Re-sequencing content is not the same as synthesis and therefore not the same as a person processing information and communicating their own conception of this. There's a vital difference.
(We can even observe this in the resulting text: we immediately grasp the level of competence of the author, just by the way they take their path trough and at the matter. With LLMs, well, there's this even temperature, ready-made feeling, regulated by probability thresholds and RLHF sanctioned phrasing, also known as "slop" – even rhythmic intensifications, like "not this, not that, but…", which is actually a figure for a synthetic construct, don't help –, since the text isn't the trace or product of an actual organized thought – or, at least, an attempt at an organized thought.)
PS: "empirical a priori judgement" was meant as translation of synthetisches Urteil a priori (Kant). I.e., our ability to mentally prove concepts like congruency, which are not a priori, but can be inferred without regression to empirical knowledge. Typically, this requires both our inner sense (time, sequence, etc.) and outer senses (space, configuration, etc.)
Gormo 2 days ago [-]
> I'm afraid, the essence is that is not. Re-sequencing content is not the same as synthesis
Drawing different sources of information together into a single understanding is quite literally the definition of "synthesis" in this context. If that process is what you're referring to as "re-sequencing content", then it does fit the definition of "synthesis" in this discussion.
If you're using the phrase "re-sequencing content" as a way of indirectly suggesting that LLMs aren't relating together multiple sources of information and combining them into a single expression, then that itself is the point of contention that we are arguing about.
Perhaps you're trying to apply a philosophical concept of synthesis, e.g. that of Fichte or Hegel, but that definition applies to a specific type of philosophical analysis, and isn't quite the concept we're using in this discussion.
masswerk 2 days ago [-]
If we're talking about concepts and communication, in text, I don't know what meaning of synthesis to apply (as long as there is meaning), other than the meaning this has had for centuries. I think, aggregation, extraction and emulgating is something else.
The very purpose of text is to transfer meaning, concepts, observations and complex thoughts to human readers for them to process. And we have built a complex framework around this and for this. The fact that many feel that this framework is violated should hint at there being a problem, a conceptual discrepancy. (And be it just that there's a man-in-the middle, who hasn't authorship, standing in between me as an author and those receiving what remains of the text. In its essential lack of agency, it's less of a mediating recommendation and more of an appropriation. But, maybe, if we're talking about a slip into a new dogmatic slumber, manufactured via an unseen authority that hasn't any authority nor position as an author, the problem goes deeper than this. And, maybe, the masquerading of LLM output as human cummunication and phrasing is part of the problem.)
Gormo 2 days ago [-]
> I think, aggregation, extraction and emulgating is something else.
Aggregating information, extracting underlying concepts, and combining those concepts into a unified expression is indeed the vernacular meaning of "synthesis" applicable to this discussion.
"Emulgating" is not a conventional English word. Is it a misspelling of "emulating"? I ask because using the term "emulating" here would again represent an instance of question begging, i.e. implicitly asserting the position that what's being discussed is merely the paraphrasing of singularly sourced information, and not the unification of concepts expressed in multiple sources, which I again believe is the very thing we are debating.
> And we have built a complex framework around this and for this. The fact that many feel that this framework is violated should hint at there being a problem, a conceptual discrepancy.
I don't think there necessarily is a problem or conceptual discrepancy here, any more than there has been for all of the centuries that people have been debating epistemology. The problem here is the same as for humans, and reduces to a rationalism vs. empiricism debate. AI tools are pure rationalists, and are solely capable of reasoning. However, many people behave this way as well, and exhibit a rationalist epistomology, even having emotional entanglements with their axioms to the point that they'll bend over backwards to reject evidence that falsifies empirical conclusions drawn from those axioms.
My biggest fear from AI is not that it isn't capable of inductive reasoning -- that's all it's capable of, as I see it -- but rather that the fact that its reasoning has no empirical anchor will lead people who are mired in rationalist epistemology to accept its conclusions uncritically.
In other words, the danger doesn't come from the fact that AI has no semantic awareness, but that people using it aren't seeking semantic validation in the first place, which is a problem already pervasive in our society.
masswerk 2 days ago [-]
*) "emulgate", maybe better emulsify, but this is a bit lateral to this? The point being, a homogenous preparation, which is more a superficial operation than an essential one, as the establishing ingredients remain the untouched.
> AI tools are pure rationalists, and are solely capable of reasoning.
Mind that the world isn't in the language, nor our connection with the world. (We know this for about 120 years, since we expelled the referens from linguistics.) Which brings us back to the synthetic judgement a priori… You may emulate this, as a superficial trait drawn from other traces of communication, but it's not what this is all about. E.g., I wouldn't expect true "lateral thinking" from an LLM output.
> My biggest fear from AI…
I'd add to this, it's not just empirical vs rationalist epistemology, it's also about empathy, anything referring to the conditio humana, which is really what any text is about, even a scientific one (why is it that we do want to know, what are the motivations, the regulating circumstances, etc.?).
vb-8448 2 days ago [-]
I guess it's most appropriate so say "LOSSY COMPRESS".
austinthetaco 2 days ago [-]
I just want to call out that this is a weirdly hostile and aggressive comment for a place like HN. HN is mostly used by working professionals it would be nice if people treated each other better here.
zabzonk 2 days ago [-]
Except that LMMs don't work on individual words.
guelo 2 days ago [-]
What is "Cope." supposed to mean here?
bigstrat2003 2 days ago [-]
It is the imperative of "to cope". As in "cope and seethe", used as a dismissal.
NetMageSCW 2 days ago [-]
[flagged]
2 days ago [-]
ciconia 2 days ago [-]
> Is this what the pinnacle of human is? Lazy and greedy?
Apparently yes.
mapcars 2 days ago [-]
AI has nothing to do with laziness or greediness. It makes things more efficient - and given that our time is limited strive for efficiency is a good thing.
xgulfie 2 days ago [-]
If you can't see greed in the LLM sphere you are not looking very hard.
mapcars 2 days ago [-]
Did I say that there is no greed in LLM sphere? English is not my first language, still I'm pretty sure I didn't say that.
xgulfie 2 days ago [-]
> AI has nothing to do with laziness or greediness.
Pennoungen0 2 days ago [-]
Yeah AI just actually plagiarize everything lel, sometimes even the source are..full of question and worst, my academical use it as a source...welp
booleandilemma 2 days ago [-]
This site is strange. I'm pretty sure there's lots of AI shilling happening on it. I don't think the opinions here are authentic, they seem to be opinions that the AI company CEOs would hold, not the disenfranchised 99%. I used to trust HN, I'm not so sure I can now.
recitedropper 2 days ago [-]
Completely agreed. It looks like there is a concerted effort to "massage" opinion away from any substantial questioning of the ethics, companies, and people behind the AI push. Some of this inevitabilism is organic of course, but there is too much for it all to be so.
HN is way too central for shared sentiment in the tech world for these companies not to do some amount of astroturfing. AI companies have shown at every single turn that they act out of self-interest and greed, not of moral principles. So it isn't surprising, even if it is still sad, to see those who are commanding the most capital in human history act with such callousness.
I think the appropriate course of response is to stop adding to public spaces on the internet. No doubt painful for those of us who have so benefitted from the freely shared thoughts of others. But if well-funded bullies are going come in, steal everything, ruin the commons, and then say "this is the new normal, deal with it", there isn't much the rest of us can do other than stop feeding them.
Kiro 2 days ago [-]
Any examples? There are obviously a lot of programmers here who think AI is a great tool and don't feel disenfranchised by it.
booleandilemma 2 days ago [-]
Get your LLM to find some for you.
Kiro 2 days ago [-]
Fantastic response, but not surprised you're a troll considering your profile text is the most cringe and embarrassing thing I've ever seen on here.
jcalvinowens 2 days ago [-]
Yeah. It's becoming unbelievable how different the prevailing opinions on this site are from those of real people I know and work with. That's always been true to some extent... but good lord, it's like reading the news in a parallel universe right now.
codexb 2 days ago [-]
All innovation is theft. It builds directly on top of what came before.
"Good artists copy, great artists steal."
It's always been true. AI just makes it available to more people faster.
2 days ago [-]
gagan2020 2 days ago [-]
How any content came into existence? Learning, Experience, connection, etc right? If AI is doing that then what's the problem?
Printing Press was also disturbing status-quo of its time. Any frontier technologies at their time did that. Be it Fire, Wheel, Horse, Horse Saddle, Gun, Printing Press, Nuclear war heads, Computers, Internet, AI, etc.
Don't make it ethical question but understand its new frontier for humans.
ecommerceguy 2 days ago [-]
I remember playing around with Writesonic in my days of spammy seo tactics (some of my products weren't allowed on marketplaces & advertising platforms due to hazmat products so..). Often times I would see my own product descriptions nearly verbatim in the output.
100% creators should get compensated by ai platforms for their work.
Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.
Reddit seems to me like the worst example for this.
Reddit does not create the content on their site, the users do.
If anybody’s going to get compensated for that content, it should be the users, not Reddit. Complaining that Reddit is losing out on the monetization of their users’ output seems problematic to me. It feels like shilling for a pimp.
beej71 2 days ago [-]
I dunno. People do this exact thing by hand (digest everything they've read and produce something indirectly derivative--what author has not been so-influenced?) and it's not a copyright violation. It's just as impossible to dig around in a model to find Hamlet as it is to do digging around a human brain. And if the result is an obvious copy, then you have a violation no matter how it was created.
As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.
cheschire 2 days ago [-]
The author acknowledges this by saying “at a bigger scale”, implying there are smaller scale methods such as what you have said.
beej71 2 days ago [-]
But now we're going to have to draw a line at some scale, and that's going to get quite tricky.
swader999 2 days ago [-]
On one hand, there's nothing new under the sun. On the other, these llms are just copies of us and they owe the collective some due. The trajectory right now has money, power, control, policy and even free will going to a very small needle point of humanity. It's not aligned with humanity flourishing, it only makes sense if the goal is to replace the humans.
kolinko 2 days ago [-]
Years ago i published slides on Slideshare that were viewed almost two million times. And helped me build a business.
There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.
What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.
If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)
noobermin 2 days ago [-]
At this point, I think google, openai, anthropic, etc already realise this and are just trying to pretend this isn't true. I even think some C-suite who are not in AI companies but are boosters know this too. This has been true since 2022 but they're hoping (likely correctly) that governments won't move fast enough to protect the IP of the actual productive class.
I think the long term reality is that the models still need training data so they fundamentally do need new writing/code/art to train on, and even then the usual issues like hallucination will still be with us. It's just the moment that actually hurts the (already questionable) profitability of the model peddlers, they will have gotten their IPOs and they can safely jump ship and the ultimate mess can be passed to the softbanks, the temaseks, and the governments of the world to clean up for them. What the future holds after the crash I'm not sure as the models won't disappear (especially now that the stolen data is already crystalised in open source models) but in the near term the mass theft that constitutes llms will become more and more understood even amongst the PMC and that in order to remain viable, you need the productive to keep producing, and unlike LLMs, you can't force them to do it without payment.
rigonkulous 2 days ago [-]
AI is human knowledge at scale, wanting to be free.
We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.
Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.
I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..
thedevilslawyer 2 days ago [-]
I agree with this sentiment. But as a community, this is hated because it impacts people's wages.
It's the negative short term outlook of something that may be positive long term
konmok 2 days ago [-]
Sure, it could be positive in some distant future utopia.
But the short-term impacts here and now are really, really bad. People are getting hurt (through water consumption, vibe-coded security disasters, IP theft, data center pollution, loss of job security and therefore healthcare in the US, LLM psychosis, inability to find reliable information, etc.) We're not actually obligated to sacrifice these people on the altar of "progress". We can slow down! When our society is capable of even somewhat protecting us from these harms, then maybe I'll stop being an LLM hater.
rigonkulous 2 days ago [-]
We absolutely have negative cases - but these do not outweigh the positive cases. There is no distant utopia - right now, people are becoming extremely capable because of their personal use of AI - there is also a position on the other side of the curve, where people are becoming more incompetent because of AI.
But guess what, it has always been so with technology - and we are only here and now because the positive use of it overshadows the negative use of it, whether that 'it' is the wheel, or AI.
I choose not to be an LLM hater, but to also not be an LLM customer - simply because I do not want to reward other humans who are thwarting the freedom of information. I'd much rather live in a society where everyone can study anything than one which requires permission to do anything even remotely interesting from the perspective of applied information. I suspect most would too, or at least that's the hope - because, otherwise, the distant utopia you dream of isn't of any consequence...
throwaway613746 2 days ago [-]
[dead]
short_sells_poo 2 days ago [-]
It's not hated because it impacts people's wages, although that perhaps factors into the hate. It's hated because AI is not a public good. The LLMS today are owned by megacorporations who harvested a public good for private gain.
This is not some altruistic entity striving for the betterment of humankind. Practically nothing that comes out of the techbro culture is. This is pure and simple greed and the chances that AI can be a vehicle of altruism when it is owned by megacorps is basically zero.
thedevilslawyer 2 days ago [-]
Oh please! If everyone could keep their older jobs as is + allowed to use LLMs, everyone would be gushing about how beneficial it is, and how they are now free to pursue other things.
All the other reasons are rationalizations. The fact that it's hitting wages is what's causing the doomerism (and boosterism).
vee-kay 2 days ago [-]
[dead]
mnbpdx 1 days ago [-]
I don't have a strong opinion here yet, because I think it's a genuinely complex problem:
Information should be as free as possible, but creating and organizing information still needs to be encouraged, and people need money to live.
How do we meet all these goals?
My liberal ass is just like....tax the rich, solve all the problems but it's likely not quite that simple.
vb-8448 2 days ago [-]
> We built it, because we as humans intrinsically know that information should be free
I don't know if this statement is more stupid or naive ..
rigonkulous 2 days ago [-]
I could say the same of your position, honestly. Stupid, naive - or maybe just plain ignorant.
If humans didn't want information to be free, there wouldn't be so much free information.
Or did you not notice?
vb-8448 2 days ago [-]
You are confusing "slop" with "information", there is so much slop because it costs nearly 0 to be produced, but there's far less "information" than you are thinking.
rigonkulous 2 days ago [-]
Slop is just information that a human can't be bothered to process, for a near-infinity multitude of reasons.
lubujackson 2 days ago [-]
[dead]
throwatdem12311 2 days ago [-]
Current crop of AI is not free in the slightest. Open weight models are not free as in liberty and neither is the training data.
pjc50 2 days ago [-]
s/free/owned by a billion dollar megacorp/
(AI output is very much not free in the resource consumption sense!)
rigonkulous 2 days ago [-]
Most resources are free until some company comes along and puts its brand on them.
(Disclaimer: I only use free AI and will never pay for it. I think there is a growing segment of folks who agree with this sentiment, also ..)
Findecanor 2 days ago [-]
What a naive and simplistic view.
People want to be recognised for their contributions to society. People want to be treated fairly.
Most scientific articles, as well as all text on the free web is already free information. It used to be difficult to search, categorise and summarise that information. There exist AI tools for that — and that is the goodAI.
What also exists now are automated plagiarism and mash-up tools: that can take someone's article, change the words and churn out a new article that people can put their name on. There are scumbags that sell services for exactly that. And there are big tech firms that are operating in a very grey area.
Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
You, and AI-bros like you remind me of one the people behind Pirate Bay when I argued with him back in the '90s, who used that same "information wants to be free" to justify software piracy.
gitaarik 2 days ago [-]
I think what's important to make a good model is the quality of the information. Doesn't matter who made it, for what purpose, or how much money they made from it or are still making from it. It's just training data ultimately, and if it's useful data, we should incorporate it into the model, and if it's not, not.
The way people use the model is a different story; you can use it to do useful things, or you can use it to do harmful things. We should obviously have some regulation around that. That needs to be developed still I guess.
rigonkulous 2 days ago [-]
There is far more free information than non-free information, and it has always been so - or else we wouldn't be here in the first place.
>Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
AI bro's are doing this now, every second of the day.
And, without software piracy, we simply wouldn't have the technology we have today. Knowledge-gatekeeping profit-seekers would very much like for most of us to ignore this fact: there is far more free information in the world than non-free information, and it must be so, well into the future, if we are to survive as a species.
It doesn't matter what authority believes they have the right to gatekeep information. It will always escape their grip. Some of us are ideologically aligned with this mechanism, promote it, and ensure it happens. Thank FNORD.
It usually goes something like: If I can make money by learning something from a web page, why does a computer making money by learning everything from everyone upset people so? It’s the same thing!
It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.
“You say I can pick one flower, but you get upset when I take a bunch. That’s inconsistent. Check and mate.”
But quantitative changes in an activity produce qualitative changes. Everyone knows this, but sometimes they seem to find it inconvenient to admit it. Not that effects of the qualitative change are always bad, but they are often different, and worth considering rather than dismissing.
If you’re a EU citizen, do a web search for “right to be forgotten”.
https://support.google.com/legal/answer/10769224
Or a stalker?
The problem here is, in your example the small scale example, and the large scale example are both unacceptable behavior.
Learning from others at a small scale is not only socially acceptable, but is the foundation of how advancement works.
So this concept of the issue of the scale being the issue isn't at its core the problem, its that something that that is desired behavior in a human, is not socially acceptable because of a machine is doing it.
Your vague response doesn't seem to have anything to do with the the base subject this whole thing revolves around. Plagiarism be it small scale or large isn't acceptable, and the idea that humans doing things that are wrong is ok, but AI doing the same thing at large scale is not ok?
No, I instead refuted your reply.
> but AI doing the same thing at large scale is not ok?
No, humans doing things can be okay or not so okay depending on the scale they do them at. "AI" isn't "doing" anything by itself, at all, so that doesn't enter into it at all. You cannot separate "scale" and "thing". Rubbing your hands to make them warmer is fine, igniting a nuke is not, both aren't "basically the same thing, raising temperature, just at different scales". You didn't reject the premise, you didn't understand it in the first place, and knocked down your own straw man instead. Which I pointed out, that's all.
Again, this isn't a "this at small scale is ok, but at large scale it isn't" argument. Small scale plagiarism isn't acceptable, neither is large scale.
You are refuting my reply seemingly without the context of the article, and larger issue at hand.
Don't be condescending when you aren't even accurately a following the original premise or purpose.
> If it’s OK (or at least negligible on a small scale), then it must be OK on a large scale.
is a fallacy. Which it is. You confirm this by apparently seeing a difference between generating a little bit of heat and a whole lot, to name one of infinite examples anyone can easily come up with.
> Again, this isn't a "this at small scale is ok, but at large scale it isn't" argument.
You just keep doing the thing I pointed out in my first reply, you claim "it's not this" on a technicality, and then say "so therefore it's this instead", and the other thing is a criticism nobody brings up, ever.
And it gains you nothing, because if plagiarism isn't even okay at small scale, surely you can see how it's even less okay at big scale.
That's not the criticism, that's the straw man used to dodge the criticism. Of course the straw man makes no sense, that's why it gets put up.
Machines aren't doing anything, humans are doing things, with or without machines.
"It's fine to raise the temperature of your surroundings by 0.0001 degrees by exhaling. It's less fine to set a house on fire, and even less fine to ignite a nuke. But aren't the all the same thing? How hypocritical that raising temperature is okay for some but not others???"
That things can change quality with quantity/frequency is trivially obvious, and you can think of many examples. Bad ones, good ones, doesn't matter. The point of OP stands, all that was added was how absolutely brazen the nonsense is getting.
Ultimately we have to reckon with the fact that there's nothing which is recommended to do X of, but is abhorrent to do 10X of.
No we don't, because that's nonsense. You can ask a stranger in the street for the time of day once, and they will react very, very differently if you ask them 10 times in a row. You can drive N miles per hour in a school zone, you cannot drive at 10x the speed, and so on.
But I don't see how that relates to copyright or llm at all. 'Learning', at scale, is not an inconvenience, atleast in any forward looking society.
claiming to be human
Exactly, if anything, the logic (a bit bad -> really bad) shows that one person learning from one thing is far inferior to one person learning from every thing (a bit good -> really good).
This is true, shows how human thought differs from AI. AI needs massive datasets to be coherent.
Suddenly everyone and their grandma are specialist at everything and the actual value of understanding is not appreciated anymore.
IMO, we're just giving special weight to understanding just because it gives people wages. Someone's specific brain structure should not privilege them over others. UBI or something equitable on those lines is the answer.
no, not really, or at the very least they're not at all in the same category of "unacceptable behavior"
If it is acceptable for a person to learn, then it should be acceptable for a machine. And any derived works produced from that information isn't theft or copyright violation.
Though I do think there is a valid gripe with the LLMs being trained on pirated materials. I've also personally learned from a lot of PDF of textbooks I didn't own.
Is there a name for the fallacy when people act like models and algorithms should be granted the same rights as human beings?
Tools aren't granted rights. Why do we need to make an exemption for AI?
In a sane world, things produced by tools are owned and credited as creations by the users of tools, there are many who seem to argue that isn't the case with AI.
And that some how, that anything produced based on the knowledge it was trained on is some sort of plagiarism or copyright violation of the original source material even when none of that material is present in the end result?
So if we can't just leave it at its a tool, then we have to look at existing frameworks of laws and ethics to make the case of how this should be treated.
Sure you can do that, but because there are several laws against that specific action already, you will be likely face prosecution, and the content (something poorly duplicated, not created) would be seized.
But lets assume, that your camera has an LLM in it, and it trained in this fashion, and you performed this action on countless other films, and then the camera could produce wholly unique and original work that did not have any duplication of the original works it sampled. The work produced would not be a violation of copyright, nor would it be plagiarism.
Just as someone whose education was to watch a large number of movies, and then created their own based on that education.
But as previously mentioned you may face the ramifications of violating the agreement you had for accessing the original source material in an illegal way.
Of course they are! Is a video recorder not a tool? No one is claiming rights for video recorders.
Once again, the status quo is that tools do not get rights, the burden is on you to prove why an exemption should be made, not on those who are asking "why should tools get rights?"
I'm also not sure where the concept of "the tool" be given a right to anything, That certainly isn't my argument, the right of the work should be to the user/owner used to create things with the tool. There are several pieces in the SFMOMA that use automation to create art, that art is credited to the creator of the machine, not the machine, I see AI in a similar lens.
You are intentionally selecting a device that makes duplicates of things as your comparator, so I can't tell if that is biased or some sort of flaw in your argument.
But an LLM being trained on works, and generating something based off of that training is not a duplication of any specific copyrighted material, and is wholly unique is not duplication.
Right[1], and humans can do that, no problem - ingesting existing material and recombining them to produce something new (not necessarily unique) is a right that humans are afforded. The question being asked is, since we don't allow that right to any other tools, why does this tool need an exemption?
-------------
[1] Not really (i.e. I don't necessarily agree with this point), but lets assume it for the sake of this discussion.
But in the same regard it is very likely at some point the ability to simulate a human mind and persona is a real possibility.
* Note that it may be misattributed to him
It's not like that, because flowers are a physical object and moving them to one place deprives their original location of the flowers. When an LLM learns something from a webpage, the webpage is still there. Whatever 'theft' you perceive is entirely in your head; you were deprived of nothing by someone else making a copy of your thing.
That's not the point. The point is that scale matters, and that was the only point.
Rather, it appears to be in your head, since the person you’re replying to has not mentioned or even hinted at theft. The problem with taking all flowers from a public park for your own profit is multifaceted. Amongst others, you’re depriving everyone else from enjoying them, but also degrading the image of the park and harming all the insects which depend on those flowers and the birds who depend on those insects, which in turn degrades the park further, which stops people from enjoying it and going there and caring for it. It’s not about a single physical object, it’s about the ripple effect the selfish action produces.
Google doesn't claim authorship over that which they index.
Plagiarism doesn't need to be harmful for it to be bad, and my intent wasn't to harm anyone anyway. My intent was that I could use the authors exact words to pretend to make a unique take that I claimed to have authored.
And you understand that. You're not stupid. This is the thing: AI is convenient for corporations, so you'll make dishonest arguments to justify your unethical behavior. Maybe you even believe what you say, but that's because people will hold on to any flimsy thing that lets them feel like they're good people, not because the reasoning actually makes any sense.
This is why people talking about AI get booed at speeches. There's no conversation to be had: you're not interested in the truth, or what's right, or what's good for anyone but yourself.
No cost copying doesn't remove the need for compensation to sustain ongoing creation. Society has long treated knowledge, art, and thought as high-value outputs, and accepted the copyright tradeoff to support them. That is long settled and no 'get rid of copyright' proponents argue satisfactorily why the 300 year corpus of thought on that is invalid. Long copyright terms may justify reform but not rejection of the establishment that creative work needs economic value to sustain ongoing creation, and that ongoing creation is a net positive/desirable for society.
You are free to release copyright free today. In software that has unlocked immense value. In other areas those choosing copyright have unlocked more value. But software is different, I can get hired to build on the free. No one is hiring an author to expand their book to include fanfiction. And were that the model, it would arguably result in worse results as we are now back to the much worse patronage system where Bob hordes what he's paid for and only shares it with friends for status. For 300 years we've understood because of dynamics paywalled copyright with a throttled side of libraries unlocks the greatest access to knowledge. Eliminating duplication cost has not changed that.
'but I want every flower there is today and I don't care if there are any future flowers' doesn't change that, it's simply a new value judgement that my want/use case today outweighs the cost to society of lost future knowledge creation/return to a patronage based reward system. Again 300 years of thought say that results in a worse outcome for society. How does the typical OSS project that depends on patronage fare? Do we really want to return all knowledge output to that model?
If one word is stolen by AI, that's bad. If a million words are stolen by AI, that's business.
If one word is stolen by Joe, that's bad. If a million words are stolen by Meta, that's business.
AI isn't the problem, is corporations using AI that are the problem
Where are all the instances of "one word" being "stolen by AI", and people getting mad over it?
It's weird to me how often on HN of all places I see arguments that can be refuted with "scale matters". I commonly see arguments on all sorts of topics that make the same mistake you're calling out.
I think the problem with these things is that if the same metric and methodology were reversed, it doesn’t look favorably on artists either with such inflammatory framing: “The way the artist learned was to effectively plagiarize every piece of art they viewed, extracting important details in the way light, color, shading, anatomy or otherwise look in order to steal from the other artists, then replicated and combined those things as part of every future work they created, stealing over and over again.”
Handwaving away the small scale seems like it would ignore who has responsibility in the small scale. Metaphorically speaking, who in the small scale is responsible for plagiarism: the person making the paints or the person with the brush who sells them to an unsuspecting public? Point is, in this case, the user is the one holding the brush and trying to pass things off.
To be clear, I don’t really disagree with the fact their copyrights were likely violatedc and they should likely be liable for damages, which is for a court to decide, not me. They should have sourced their data sets properly, certainly, and other companies have. I just think the arguments really need improvement without simply falling back on the tropes, and hopefully it helps make sense why some people will take issue with arguments that others want to simply dismiss as invalid.
Interesting take. I think a corollary is that the qualitative changes are in the economics of things. And more than the scale, it is the value of those economic effects that determines how "accepted" that activity becomes.
Take Uber as an example; it basically enabled mass avoidance of taxi regulations, and naturally existing taxi drivers and lawmakers cried foul. But enough people found value in the service and kept using it that gradually and inexorably society and laws adjusted to it.
On the other hand, copyright infringement is an interesting case. While pretty much everyone and their dog pirates content to some extent, the % of people who think it's acceptable to do so is surprisingly small (22% apparently, up from only 14% in 2019). Furthermore the media industry, especially including ads, is a significant % of US GDP. I think those reasons, more than any RIAA/MPAA lobbying, are why copyright laws have remained as stringent as they have.
As such at a social level, I don't think these effects were dismissed, rather they were considered and formally internalized.
I suspect the same thing is happening with AI companies. They get away with devouring and training on the sum of human knowledge largely because existing laws are insufficient to stop them. So stopping this would require new laws but... well, given the early economic impact LLM technology is having my hunch is new laws will be brought in to protect it rather than restrain it.
But in many places, the ways that society and laws adjusted to it were to make extra clear in their local ordinances that Uber was required to operate as an actual taxi service, or get out.
It's very disingenuous to imply that the public broadly decided Uber was Right, Actually, when both in its case and in that of many of the other gig economy companies, what really happened is that gradually and inexorably, they had to adjust to society and laws.
I followed this evolution peripherally as it happened, because while I appreciated the convenience of Uber, I disliked that it was unfair towards existing taxi drivers who had very onerous requirements like taxi medallions, which, note, never became a requirement for rideshare drivers.
I remember at one point Uber drivers at the airport would ask me to pretend I'm a friend being picked up to avoid trouble with the cops, and then a couple of years later there was a dedicated, official "Uber pickup lane."
My underlying point was that the whole system -- including Uber, incumbents, society and laws -- adapted to a new economic reality.
The era after internet and before LLM, the information and knowledge gaps have been largely leveled theoretically, but the recognition wall stops most of us to understand and make use of them.
The era after LLM, the wall is being destroyed and people should think about how to use these information and knowledge differently to make money and power.
It provides distribution and modification rights to "any person obtaining a copy of the software" and explicitly requires attribution for any significant parts.
Mass-ingesting the code with a script without any human even reading the licence is a very different kind of copying mechanism and there is no person involved... The contract was bypassed completely. A contract requires consent from both parties to be binding. When ingesting code into the AI training set, nobody even read the license. There was no agreement; neither explicit nor implicit... Because the consumer, a script, never read the contact for that specific project.
There was nobody present when the copying occurred; on neither side! It cannot possibly constitute an agreement between two parties.
I agree with “must involve a person. https://opensource.org/license/mit starts with (emphasis added) “Permission is hereby granted, free of charge, to any PERSON obtaining a copy of this software and associated documentation files (the “Software”)”.
That means it doesn’t give an LLM any rights. The way I see it, LLMs run (directly or indirectly) by a person can do stuff on their behalf, though, just as your CI pipeline can download and compile MIT-licensed software.
I definitely disagree with the “on a small scale” as the license continues (again, emphasis added) “to deal in the Software WITHOUT RESTRICTION, including WITHOUT LIMITATION the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software”.
A person already pre-consented to the licenses of all the software which the pipeline downloaded. Big companies go through those dependency lists carefully already and remove those which do not meet their policies. This is a very intentional process.
I disagree. I think it’s entirely within the license to have your pipeline automatically pull in the latest version of a library, even if the new one happens to pull in a new MIT-licensed library (whether that’s a good idea and whether CI pipelines should, somehow, verify that code pulled in has an acceptable license are different discussions)
I also think it’s complete within the MIT license to tell a LLM that it can search for MIT-licensed libraries and use them without asking you.
Both operations require some degree of human awareness. What you appear to be saying is, a human can only use a limited algorithm to access this source code, not a sophisticated one. And where do you draw that line? Who should get to say what is too sophisticated?
Error: your algorithm is too sophisticated to proceed, please provide more human awareness, it's a critical difference.
Unfortunately there is no way to agree to a license of a software you're using if you didn't read the license or if you're not even aware that you're using the licence. This is what's happening at the training stage.
If you say that awareness doesn't matter then it means you cannot stop AI from stealing any IP open source or not.
I think the main issue with LLMs is that there is no mechanism to stop them from stealing. Thus they are guaranteed to infringe on copyright to some extent.
Also, beyond copying and copyright, there is another problem that LLMs are also infecting the logic and expertise built into the project. This is a completely novel mechanism and needs to be treated as separate under the law. Else it would be the end of all IP.
Well, sure there is—for the people running them.
If you're building training data for an LLM, you only use data that a) is firmly in the public domain, or b) you have a clear and documented legal right to use.
[edit] and the same goes for corporations owning "means of production". It's not the same as owning an iPhone.
One person learning something is good. At scale, that becomes everyone learning something. That's even better.
Machine learning is not scaling up people learning. It's completely different even if it's called "learning".
As the article argues, it's plagiarism at scale. In that sense, one person plagiarizing content is bad. Everyone plagiarizing at scale by using LLMs is even worse.
FWIW, this is the Fallacy of Composition
https://en.wikipedia.org/wiki/Fallacy_of_composition
I'm surprised I hvan't seen more economist scholars exploring this topic; it's a fastincating phenomenon. I've seen folks try and re-visit history and compare what's happening with AI to some historic event--but, we've never seen anything quite like it. As much as history repeats itself; at the forefront of innvotaion it doesn't.
I suspect that there will one day be an AI tax as society tries to reclaim the value of the theft; maybe even UBI of some form. Until then, buy the stocks and ride the theft wave. The economsits are certainly exploring the K shaped economy, and this is why.
These AI companies aren’t state enterprises. How is geopolitics a justification?
If it were just the military training them, probably no one would care about the copyright infringement angle, it makes sense that the government could ignore those rules for national security.
But Mark Zuckerberg isn’t training his models to protect us from China. He’s doing it to make himself even more ridiculously wealthy.
What material differences exist between the two besides "humans good, computers bad"?
>Calling that learning is a distraction from the real copyright violations going on.
Most courts so far have ruled that it counts as fair use.
Likewise, people shouldn't be surprised that as AI compute scales up, new forms of harm can be created, thereby introducing new moral quandaries. It's like comparing GPT-1 against today's frontier models. One is a fun albeit useless toy. The other is effecting categorical changes in the way knowledge work is done. In both cases the underlying technology is the same, but their impacts are totally different.
Reasoning by analogy doesn’t work if your analogy isn’t well matched.
The analogy proposed here is correct rewritten as:
If one person uses an AI trained across copyrighted data, then that’s ok.
But if everyone uses that AI, it’s not ok.
Which is a bit of an irrelevant point.
"The commons" was an incredibly successful system, and medieval (and prior) villages used it to great success, for the entire village's benefit! "Commons" are a great thing for everyone to have!
The real history is that as advances in technology (like the Industrial Revolution) changed things, certain rich villagers were suddenly able to manage more animals than they could before. Those (specific/rich) people over-used the commons, creating the "tragedy" we all know of.
The real lesson of history is not that commons fail: to the contrary, they worked great and helped everyone for centuries! The real lesson is "watch the fuck out for the new rich (especially when they just became rich because of recent technology advancements): those bastard will steal from everyone for their own benefit!"
This is a fundamental misunderstanding of how laws work. It's not the scale that makes it okay, it's that it's done through some official process. Trump's raid to grab Maduro killed less than 100 people. Pretty modest by "genocide" standards, and is easily eclipsed by gang/cartel violence. Yet nobody is going after Trump because he didn't meet some kill quota to get special protection, nor are people condoning cartel violence because they killed far more than Trump.
International Right for those who don't have all the nukes and lobotomized cannon meat bag ready to invade on a whim, and on the other side doing all the crimes and atrocities, straight transgress all legal processes ever invented, and expecting no possible punishment in return.
Number of directly killed people is not something that can be eclipsed by bigger number of killed people. Not in a mind that keeps empathy high in its value.
unauthorized plagiarism on the individual level is bad, at the medium scale is ick, but at the ultragigantic scale is meh.
laundering through an llm takes away the real moral ick from the plagiarism - the lying and building of ego by the person reboxing somebody else's ideas and work.
Instead the bot lies to people who use its output to boost their ego. Not sure it's really changing the moral calculus here.
The majority of the population, sitting outside the VC bubble, views AI unfavorably. That's not my hot take, that's a fact from the NYT survey published today.
It's going to be hilarious when VCs, having expropriated the IP of the entire internet, build The Layoff Machine That Does Everything Without Workers, and then the voters decide to just...enthusiastically expropriate that, and we end up with Fully Automated Luxury Communism.
Sure, where AI means threatens my job or my skills, people view it unfavourably.
But then they use it. They're all using it. People's rhetoric seldom matches their actions.
>enthusiastically expropriate that, and we end up with Fully Automated Luxury Communism
Maybe in other countries, initially, but the US is very firmly a plutocracy, and has a populace that will very happily vote against their own interests because the plutocrat-owned media told them to. And yeah, it is very rapidly approaching the point where there is going to be zero chance of a revolution even if people opened their eyes.
Which is precisely why the US is now threatening other countries as well, because plutocracy is threatened by rational, educated, better managed countries. Canada, for instance, is an example that country doesn't have to revert to being an idiocracy, so it's first in the crosshairs.
I don't see any contradiction. I criticize the hell out of guns and want them strictly controlled, and yet I own one. `¯\_(ツ)_/¯`
People can use AI and still demand that all of society receive the benefits, instead of a small group of oppressors.
[Citation needed]
I know many more people who do not use AI than who use it, and many more who refuse to use AI than people who are enthusiastic about it.
Given your username, you are almost certainly in a bubble—an echo chamber—that makes it seem to you as though "everyone is using it." I recommend getting outside that bubble and talking to non-technical people outside your usual circles, especially people in the arts and humanities.
Most of the people I hear from who use AI say everyone they know uses AI.
Most of the people I hear from who don't use AI say no one they know uses AI.
It seems to me that we've got competing bubbles here. But the statistics certainly show that, leaving aside whether they use it, most people don't like it or want it.
...I think it's also worth noting that AI usage is likely to be "louder" than AI avoidance in many cases—that is, whichever side of this one falls on, it's easier to detect someone pasting from ChatGPT directly into emails, or complaining that Gemini told them you would sell them XYZ, than it is to detect someone who's just keeping on the way they've always been.
"You say I can take a photo of one flower in your flowerbed you put next to the public street, but you get upset when I take a bunch of photos of many public flowerbeds. That's both an over-reach and inconsistent."
Google has further complicated it with new search announcement blurring lines between regular search and AI search. And AI likes to not honor any licenses or instructions when it is hungry for training material.
It is once again an example of Google using its dominant position to abuse and promote cross functional products.
That doesn't work anymore. Google provides AI generated summary, nobody looks at the original site.
We found our data in the outputs of their models but who can do anything about it...
If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.
Sue for $180,000 per infringement which should be calculated for each illegal API call.
A person should be able to write in a terms of use or license page on their website that says "do not include any content from this website in your AI training data. if you do you will be billed $100 billion dollars." And it should be enforceable. It just turns out that nerds like to say "oh that would be too hard or too expensive, so we're going to ignore it."
I looked into this a bit (not a lawyer) and it seems that robots.txt isn't legally binding to either party, but this seems to have two major implications for AI agents (and crawlers/scrapers in general).
First, even if the robots.txt says you can crawl the site, that isn't a copyright grant of any kind or permission to copy/use that data outside of the permissions granted by the TOS.
Second, ignoring the robots.txt while also pirating the site contents could point to bad-faith and makes a much stronger case for double-damage penalties due to willful infringement.
If the site TOS doesn't explicitly grant an AI agent rights to copy out the site content AND the AI agent is ignoring the robots.txt at the same time, it seems a lot more likely that there's a strong copyright infringement case against the agent owner.
Unauthorized access, system damage, and maybe even extortion all apply here.
These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!
May not always work. I then click on back button and look for the info elsewhere and in most cases I find it. Same with paywalled websites. If you are ok with a small audience (or you provide a unique content) then it makes sense. But I think in most cases you just cut off a lot of people this way and actually you can simply stop creating content if you don't want consumers of it and let others provide the content.
well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.
I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.
That being said you would require your user to download a compatible browser for gemini/gopher.
Most legit search engines are going to honor robots.txt and you can disallow access.
Next level would be using something like rate limiting controls and/or Cloudflare's bot fight mode to start blocking the bad bots. You start to annoy some people here.
Next would be putting the content behind some form of auth.
https://developers.cloudflare.com/browser-run/quick-actions/...
Even when we do actually put physical locks on things they are mostly there to show that someone breaking in did so intentionally and not at all designed to prevent motivated attackers.
Where do you live? In the US it’s actually illegal for anyone except the USPS to deliver to a mailbox.
Also this has gotten pretty far away from the web scraping scenario. There’s no door accidentally opening here.
What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.
I do not value copyright. All it does is give you standing to sue if somebody reproduces your work. It does not differentiate or account for parallel creation. I cannot count how many times I have "created" something, only to find it in a research paper later.
Part of the reason I think copyright has no value is that, in general, individual copyright owners don't have the deep pockets necessary to sue someone who violates their copyright. If anyone is violating the spirit of copyright, it's corporations that insist you assign your work over to them as a work for hire, or outright ignore your copyright. (looking at you, Disney's Atlantis).
A significant benefit of AI that doesn't get talked about enough is that AI has a much greater reach over all the information it was trained on and can draw connections that would be invisible to someone operating at the human scale.
Today you can put a coding agent to migrate an existing application to another language (like chardet). Even if you don't have the code, if you can run the app you can still clone it, using it as an oracle for replication. That is why there will be very little profits in AI usage.
They are indeed taking in money by selling the product. Just because they don’t turn a profit doesn’t mean they’re not infringing copyright as a business practice to make money.
And AI companies still scrape Anubis protected websites, it just forces them to not DDOS the website
https://en.wikipedia.org/wiki/Anubis_(software)
> Although Anubis could be altered to mine cryptocurrency to serve as proof of work, Iaso has rejected this idea: "I don't want to touch cryptocurrency with a 20 foot pole."
Which in my mind is a shame. Crypto is an absolute mess, yes, but this seems like an elegant way to get something back for putting things out there.
This is the problem crypto fans refuse to acknowledge. The money doesn't magically appear, you're taking it from someone else and letting them hold the bag when whatever cryptocurrency you choose inevitably blows up, fails, or rug-pulls. It's unethical to engage with at all because you're still participating in scamming real money out of private individuals
Note also that any non-crypto currency can also devalue at any moment, although perhaps not to the same extent. Holding anything of any perceived value carries a risk and also a potential reward.
Between seeing ads and doing a little bit of proof-of-work for the author, I'd choose the latter.
We've been celebrating denying creators revenue for decades...
Maybe this is just the internet hypocricy of "When I do it, it's good, when they do it, it's bad".
Ad blocking has always been a problem for creators but it's aimed at big corps - non-creators. The creators asked people to support them other ways or turn off the blocking. And it's not like the little independent creators wanted this version of commercialized internet in the first place.
The ai marketing teams are spinning everything they can but no AI companies are the conscript, the vultures. No question about it.
The number of people who will not ever load your ads is around 30%.
I can tell you that creators talk about this a lot in private, but will not publicly because the internet has a mass delusion on how creation and compensation works. It's like trying to convince christians that jesus obviously didn't come back from the dead days later, depsite there being no logical system available that would explain it.
If we were to try and map out a functional internet where everyone wins, users and creators, there is no example where ad blocking is anything other net harmful. You either get volunteer net where 0.01% share hobby posts on their own dime for the other 99.9% or you get IRC where 99% of the population doesn't really benefit (ala 1993).
Bear in mind that many basic privacy features destroy ads by breaking tracking and fingerprinting. Its impossible to get a browser in that doesn't filter out behaviours that have been used to deliver ads
Creatives can and have adapted their strategies away from what is a very specific form of ads: the disruptive full screen ads, or banner ads. That's only one form of advertising that everyone utterly detests. Sponsored content is much more popular with the end users, and much more effective as well because its way less disruptive. Some people hate that, but overall the tradeoff is significantly better
We shouldn't confuse a single type of widely blocked advert with all advertising being blocked. Banner ads have very poor efficacy at delivering sales anyway
You might not know, many people don't, that ad vendors came to the table little over a decade ago to make a truce with Ad Block Plus. ABP and advendors both saw that an "ad supported internet" was unsupported with no ads. So ABP was looking to set terms for what would be deemed as acceptable ads. Creators/service providers get incentive, users get manageable ads.
It didn't matter though because users rioted and uBlock (then uBlock Origin) became king. No compromises there. I mean, what fucking idiot would take some ads when they could take no ads, right?
Even less known is that Google trailed a program where you could pay them directly and they would remove ads from your browsing. This program was about as popular as shit on stick, because again, what fucking idiot would pay for no ads when they simply block all ads for free, right?
There have also been attempts like Brave, where crypto could be used as a micropayment in lieu of ads. But that has also gone nowhere, even if it does have a few snags around centralization.
What I have never seen though, and have zero examples of, is internet users trying to reconcile the situation. It's just a relentless entitlement to free everything, with a small fraction sometimes subscribing, and an even smaller fraction sometimes donating. The users are unquestionably the biggest assholes in this situation. They won't even acknowledge they have a problem.
I'm very aware of this, most ad vendors did not come to a truce with ad-block plus. ABP tried to position itself as the gatekeeper of what ads users were allowed to use (a hugely financially beneficial position for them), and immediately ended up letting through a bunch of terrible ads
It was a nice idea, but it was never going to work. There was simply too much money for the advertisers to make to allow abp to be the gatekeeper of ad content
The nature of ads has gotten significantly more invasive over time, and blocking ads today is a mandatory part of security. Ad companies *do not* have a god given right to track you, or infect your PC with malware
Users rioted because ABP did a terrible job at managing the situation
>What I have never seen though, and have zero examples of, is internet users trying to reconcile the situation. It's just a relentless entitlement to free everything, with a small fraction sometimes subscribing, and an even smaller fraction sometimes donating. The users are unquestionably the biggest assholes in this situation. They won't even acknowledge they have a problem.
As I mentioned in the comment you replied to, there are lots of alternative forms of advertising that users have not revolted against to anywhere near the same degree, eg sponsored content segments in youtube videos
The whole situation including the ad system of the internet is made by the same corporations. All of it. They didn't even want paywalled content on the internet because this way they don't have to tell people how much stuff costs and how much it makes. Facebook famously makes so much money on it's users that at some point they were considering paying them.
There shouldn't be any mercy with the mega companies. On the other hand every single person that's being taken advantage now (like anybody whos ever posted anything) should be defended because copyright has failed them.
People can easily justify their own piracy because it’s small scale. Even when they organize, create a whole software and tooling ecosystem around pirating media to stick into jellyfin or plex. AI still did it bigger and worse and is bad, what I’m doing is not so bad because I wasn’t going to buy the movie anyway, etc.
It's in no way, shape, or form "small scale", and has fundamentally changed the the very nature of the internet for the worse (opinions/views of ad blocking people don't matter).
There is no viable model where "have stuff but not pay for it" works out.
This worked pretty well. Websites were hobby - one might spend their money buying comic books, and someone else might spend the money making and hosting their website.
Many of the websites I read do not collect any appreciable amount of money from ads, or have no ads at all (one example: news.ycombinator.com :) ). They want a recognition, or to share the knowledge, or community, or they are building their brand... And AI is destroying this all - the first result of "zx80" is an AI overview with a link to wikipedia and some youtube videos. If person stops there , they will never get to computinghistory.org.uk link, and won't see any related information about the variants and models.
When you click "news.ycombinator.com" you are clicking on the ad.
:)
This is pretty much the exact claim of a NYT lawsuit against OpenAI.
"One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."
https://www.hollywoodreporter.com/business/business-news/cou...
People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago
Does your calculator app store a representation of the answer to 1+2/2*1.1 and all other combinations of inputs or does it determine the answer from a set of rules?
If you put "1+2/2 x 1.1" into a calculator and it spit out a verbatim copy of a New York Times article, does it necessarily contain a representation, or does it just contain some really extensive rules? I'd argue those rules necessarily are a representation of that information, given that it contains far more information than provided by the input.
It’s kind of the harness that is doing the citing (or providing the context for the model to).
But an LLM sans search can reproduce some copyrighted work with minor variations and there’s no way to know exactly where it came from.
You could say the same about MP3 encoders but I don't think that would convince any judge
You can get it to reproduce content but it’s a game of cat and mouse. Were it not for the alignment to avoid direct reproduction it would taken far more often.
> RECAP consistently outperforms all other methods; as an illustration, it extracted ≈3,000 passages from the first "Harry Potter" book with Claude-3.7, compared to the 75 passages identified by the best baseline.
It will pretty much plagiarize the library verbatim from memory, sans comments.
A copy made for the purposes of training is still a copy.
Even if you throw the text away after training, you've still made a copy.
I have no problem with taxing AI companies so that their profit is marginal, or forcing them to provide compute for free. That seems like the correct balance of what they're harvesting from the "commons" (which is really just the totality of private IP that was exposed to their crawlers).
Now how much and should it be based on revenue from output is open discussion. And it might also be that there is no fair model to pay them. Which means that well too bad for LLMs...
We are trying to avoid another situation where "resource wealth" goes uncompensated, producers remain poor while processors, marketers, and merchants reap all the benefit. Unless your aim is something else, in which case you should state it.
Fair use generally does not cover commercial use, which this clearly is, and is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
This is all new territory. We don't have court-settled law yet.
Commercial use counts _against_ a fair use defense, but is not dispositive: it's not accurate at all to say it "generally does not cover" commercial use. This is the "purpose and character" test, one of four in contemporary (United States) fair use doctrine.
Purpose and character also includes the degree to which a use is _transformative_. It's clear that the degree to which a training run mulching texts "transforms" them is very high. This counts toward a fair use finding for purpose and character.
> is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
The "amount and substantiality" test. Your case for "all of it" can't possibly be sustained: the models aren't big enough. It's amount _and_ substantiality: this has come up in the publication of concordances, where a relatively large amount of a copyrighted work appears, but it's chopped up and ordered in a way which is no longer substantially the same. Courts have ruled that this kind of text is fair use, pretty consistently. It's not an LLM, of course, but those have yet to be ruled on.
Also worth knowing that courts have never accepted reading or studying a work as incorporation, and are unlikely to change course on the question. It's taken for granted that anyone is allowed to read a copyrighted work in as much detail as they wish, in the course of producing another one. Model training isn't reading either, but the question is to what degree it resembles study. I'd say, more than not.
Specifically:
> it’s impossible to make a useful model without the whole book and all of the artistry that went into it
Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
"Effect upon the work's value" is probably the most interesting one. For some things, extreme, for others, negligible. I suspect this is the one courts are going to spend the most time on as all of these questions are litigated.
Ultimately, model training is highly out-of-distribution for the common law questions involving fair use. It was not anticipated by statute, to put it mildly. The best solution to that kind of dilemma is more statute, and we'll probably see that, but, I don't think you'll be happy with the result, given what I'm replying to. Just a guess on my part.
> Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
This I think misses the thrust of my argument, though. Its hard to find an exact human analogy, because neither the technology nor the scale at which it operates is remotely human.
I see it less as “writing his biography without reading the plaintiff’s” and it’s more “using the same style and metaphors to make thousands of copies of very similar biographies, with certain bits tweaked,” like turning an existing work into mad lib.
I don’t know how the courts will eventually rule on it, but it certainly feels like theft to me.
But pretending you said "infringement", for me it comes all the way back to the Constitution: "To promote the Progress of Science and useful Arts". I cannot possibly twist the development of large language models into something which violates the spirit of that purpose. I don't see how anyone can.
Your point about the scale is valid, and the alienness of it, sure. But you haven't made the case that the vastness of the scale should affect the conclusion.
Something I left out in the first post is that copyright is meant to protect expression, and not ideas: this is the deciding factor in the 'nature of the copyrighted work' test for fair use. More expression, more protection: more ideas, less.
I think the visual arts have a strong case that image generators directly infringe expression: I'm not convinced that authors do, and I think software should never have been protected under copyright because the ideas-to-expression ratio is all wrong for the legal structure. There's clearly no scale case to be made for ideas: "but what if it's _all_ the ideas" fails, because the ideas are not protected at all. Nor should they be, that's what patents are for, and why patents are very different from copyright.
LLMs are remarkably good at 'the facts of the matter', hallucination not withstanding. They're very poor at authorial 'voice transfer', something image generators are far too good at. It's when I start asking myself "well what even _is_ this 'expression' thing anyway?" that I conclude that we're out over our skis on the LLMs-and-IP question: precedent can't tell us enough, and that leaves legislation.
of course coming up with a more fair fair use system would help, and automatic royalty revenue sharing, and so on. but of course it's very hard to find a one size fits all ruleset, and also somehow dividing up creative spheres is itself bound to lead to nasty boundary dilemmas.
Lord of the rings will be under copyright til roughly 2050. I think Tolkien's estate has gotten more than enough money from that book and it's time to let other use the word hobbit without the threat of a lawsuit.
I expect it would not move the needle much. I support reduced copyright periods, though not in the specific way you do. But that's not what we're talking about here, is it? The comment I replied to seemed to be advocating for total abolition of copyright law, and my comment is written to be interpreted in that context.
> To the point that most people will never be legally allowed to directly build off of the culture they grew up in.
What specifically are you talking about? Every author borrows from what came before. Copyright law doesn't even enter the picture in the vast majority of cases, because you generally don't have to copy to "build off of the culture [you] grew up in".
Even before AI more people tried to be an author/musician than could ever hope to gain even financial success. I don’t think less copyright will dissuade them.
> every author borrows
Borrows yes. But that has changed drastically in the last 100 years because of what has become the copyright system.
I’ll be long dead and gone before people can make and publish their own LOTR, or Star Wars, or whatever franchise they grew up with. Disney would be impossible to start given the current regulations, all those tales would be locked up, and we would all be worse for it.
Disney turning common folk tales (the culture of the day) into movies is not considered fan fiction because there was no monopoly on who could tell those stories, and how.
If lack of copyright for fan fiction and derivative work hasn't stopped good fan fiction authors from doing good work, then I don't think that we will lose much if the newest Marvel movie or franchise reboot also can't be copyrighted.
> I don't think it's a good reason to [partially] abolish copyright except in a very specific and limited scope.
I don't see a good reason for keeping it though. Copyright isn't why artists are being paid pennies for their work.
This is a really odd thing to say. You can just go write your own fiction, right now. You can invent your own original characters and setting and plot and go write it. You will automatically own the copyright to your own work; there is no other party who must "bless" your efforts.
I have nothing against fan fiction, but it's an edge case.
> If lack of copyright for fan fiction and derivative work hasn't stopped good fan fiction authors from doing good work, then I don't think that we will lose much if the newest Marvel movie or franchise reboot also can't be copyrighted.
I mean, I don't think we will lose much if the latter doesn't exist. I think I have made it clear that my specific concern is for individual artists who hold the rights to their work, not purveyors of commodity slop. But, since you mentioned it, what effect do you think abolishment of copyright will have on the production of films that are actually good? Who will finance them when it's impossible to directly monetize them? If anything I think commodity slop will be the only thing that gets funded anymore, since it probably synergizes best with massive distribution platforms and hundred million dollar multi-media marketing blitzes. Everyone else can go the Neil Breen route.
> I don't see a good reason for keeping it though. Copyright isn't why artists are being paid pennies for their work.
Yeah, you're right. No artists are relying on royalties and similar payments for their work. I'm sure none of them will complain if we take all that away.
I keep going back to the old-school Disney example because it's easiest to see: Disney did not create Snow White, Bambi, Robin Hood, or Peter Pan. All of those movies are highly influential and core to Disney and the culture of people growing up with them. And they're all fan fiction, or would be considered as such, and be impossible to produce and monetize if Disney had to live with the same copyright restrictions they impose on the rest of us.
If I want to now go and recreate my own movie based on one of the original texts, I think it would be next to impossible since the threat of lawsuit (even if I use none of their IP and would eventually win) would make financing impossible.
Fan fiction has been turned into an edge case by the current copyright system. Putting your own spin on the stories you grew up with used to be the norm.
> my specific concern is for individual artists who hold the rights to their work
To a large degree individual artists do not hold copyright for their work, they often sign it away (especially musicians and authors) in exchange for signing, advances, and distribution.
> what effect do you think abolishment of copyright will have on the production of films that are actually good? Who will finance them when it's impossible to directly monetize them?
I think they will still be financed. Take books, I don't think bookstores will want to vertically integrate from book discovery through printing and retail stores. Consumers will still need ways to identify reputable book publishers to limit what they purchase next.
> I think commodity slop will be the only thing that gets funded anymore
One could argue that this is what has always dominated funding. Most revenue and shows have been for artistically devoid pieces of media (especially in movies).
> No artists are relying on royalties and similar payments for their work.
The 0.00001$ per stream for musicians? Or the 1$ residual checks for reruns?
I believe I stated above that I support reducing copyright periods (to the lifetime of the original author would be appropriate IMO, if the copyright is held by an individual, and I would be open to a more aggressive schedule for corporate copyrights). AFAIK all of Disney's adaptations of these stories would be allowed under that rule; some of these original stories are centuries old. But no, I don't think Disney should be able to immediately adapt a book I've written and not give me a cent out of the billions they will make off the adaptation. I would sell more books that way, sure---except I actually wouldn't, because in that world I have also lost the ability to monetize my work. So it's more accurate to say that somebody else would sell more of my books, or that I would give away more of my books.
And yes, it's more appropriate to call these adaptations. Fan fiction is more in the vein of original stories using (somebody else's) established characters and settings.
> To a large degree individual artists do not hold copyright for their work, they often sign it away (especially musicians and authors) in exchange for signing, advances, and distribution.
"To a large degree" is obviously meaningless, but a good author's agent will retain your core copyright and other rights (e.g. film adaptation, publishing/distribution in other countries, etc.).
> I think they will still be financed. Take books, I don't think bookstores will want to vertically integrate from book discovery through printing and retail stores. Consumers will still need ways to identify reputable book publishers to limit what they purchase next.
You are conflating production and distribution. If there is no copyright, the second a single copy of a work becomes available it will be scraped and offered by every distribution platform in the business, who are all free to curate their "storefronts" however they please. The difference is that they don't have to pay a cent for production, royalties, or anything else.
As an example, say I publish a new short story on my Patreon, which I use to support my writing---the idea being that if people want to read my shorts they have to pay for access. In this new regime, that newly posted story is going to appear on Amazon and every other big platform within hours, for cheaper than my Patreon membership or even free. And if I am an established name, there is no reason Amazon can't put my book front and center in their KDP feeds, etc.
The same goes for any other publishing model. The author and publisher (if applicable) immediately lose all ability to get a return on their investment, except to the extent that they can organically attract people to the correct listing on the correct distribution platform, which will have to be price-competitive with other listings.
It's the same story for paper books, too. B&N can just print copies of my book and display it front and center in their stores, without even asking me, and certainly without paying me anything.
And the same goes for other types of media. Why wouldn't it? This is why I say the commodity slop is all that will be left---that kind of IP synergizes best with the massive marketing efforts and platform consolidation that will be required to recoup your investments in content. Not much might even change in that world.
> The 0.00001$ per stream for musicians? Or the 1$ residual checks for reruns?
There is always going to be a long tail, and there are always going to be great artists who go unrecognized and unrewarded. It's also true that monolithic modern platforms like Spotify are going to leverage their position as gatekeepers to squeeze artists as far as possible. But it's ignorant (or possibly disingenuous, and anyway categorically incorrect) to claim that the above means nobody is getting paid substantial amounts for their work via these mechanisms. I suggest you seek out the authors of some of your favorite recent novels (if you read) and ask them whether losing royalties would have a substantial impact on their finances and ability to keep writing.
Without copyright, nothing stops one from simply selling a book under their own name.
Big publishers could just reprint anything and get it into brick & mortar stores. No money for authors.
Advocating for absolutely no copyright is wild.
And most likely ones doing that would be your biggest companies say Amazon.
In a world without copyright, I can stand up a slick 100% legal website (and apps, etc) and distribute electronic copies of every single book (or whatever) straight to normies' phones, and I am free to monetize this scheme however I want.
Music piracy is down just because services like Spotify let you listen to any song (for free with ads or with a subscription) and it's more convenient than pirating.
> I wonder how many of the books I love would still have been written in a world where somebody could scoop them all up and post them on the internet for free (and run ads).
Legal or not, this is exactly what happened. The piracy sites run ads and/or ask for donations.
I don't know which of your favorite books would have still been written without copyright. But I can say with confidence that the massive increase in the number of books per year over the past two decades would have happened regardless of copyright. It's been driven by lowering the barrier to entry for self-publishing, and only a very small fraction of them earn a living.
A surprisingly large fraction of my favorite books from the past two decades were published for free online by the author (e.g. Andy Weir's book).
What data makes you think it's low?
Observations of fellow readers, conversations with self- and traditionally-published authors, and some knowledge of the market?
But what is low, anyway? For the sake of argument I could believe 10, 20, even 30% of all the books people read are pirated. I would be surprised if it was higher, but let's just say hypothetically it's 50%. I think that's a reasonable conservative estimate. So, in this scenario, the remaining 50% of reads can in principle be monetized by their respective authors.
Abolition of copyright will drive that monetizable share essentially to 0%, for reasons I've outlined elsewhere in this thread.^[1]. I consider that meaningful, and I have personally had conversations with published authors who state that the royalties they receive are financially significant, which is why I'm here in this thread taking the position that I'm taking.
[1] https://news.ycombinator.com/item?id=48238503
I think we're in the same ballpark here.
> Abolition of copyright will drive that monetizable share essentially to 0%
I'm in favour of copyright, though I think 70 years after the death of the author is so long it's silly. Even your grandchildren will have died of old age before your copyright ends.
It's strange to think of something like Star Wars being in the public domain, and the effects that might have on our cultural and media landscapes, but if you step back it feels even stranger that something intangible yet so culturally important can be continuously bought and sold and exploited by people who had nothing to do with its creation (almost 50 years ago).
In that sense I probably have a lot of common ground with the "abolish copyright" people, but I feel that most of them are champing at the bit to throw the baby out with the bathwater without having any skin in the game themselves. (sorry for the idiom overload there)
If Star Wars were public domain due to shorter copyright, the newer works and characters would still be protected. Another film studio could make a new movie based off the original trilogy, taking things in a different direction than the new movies. I'm not sure this is likely though, just like no one is rushing to make 3rd party Mickey Mouse cartoons since it entered the public domain. It probably changes things a lot less than copyright proponents worry about.
Even with books, which are much cheaper to produce than movies, the original author would probably capture most of the money from their works under shorter copyright (e.g. 25 year copyright). If you like a series from a particular author, you want new books from that author. You're not going to read A Game of Thrones and then continue with a sequel written by someone else. And as long as the author keeps writing, they're expanding the canonical world in their series with freshly copyrighted IP, and fans will primarily want new works that build on that.
And if an author writes a sequel so bad that fans abandon the series and someone else writes a better sequel that fans flock to... well, the world is better off. Even the original author may be better off if it improves the popularity of the series.
Citation needed, as well as your precise definition of "worthwhile".
> Even if they are not enjoyable.
Huh?
> The dissemination of ideas from an activist perspective is uninhabitable
Yes, I understand that anti-copyright activists want to abolish copyright.
In reality most art is done because the artist has something to say, and the money they get from it is only motivating in as much as it enables the artist to do more art. So I would guess in a world without copyright protection we would just find other ways to pay artists and a very similar amount of art would be produced.
You can see an example of this e.g. in Iceland where the market is way to small for art aimed at the domestic market to make enough money solely by selling it (possible with music; rare with books; not possible with movies). Instead the state has an extensive “artist salary“ program, which pays artist regardless of how well the art they produce sells. Unsurprisingly Iceland produces a lot of art and has many working artists.
People cannot even envision a world that's not this transactional thing and it's really sad. In the post-scarcity world it's going to be really hard to reprogram these people. Wasn't there a Star Trek episode about this with a cryonics guy?
For lots of online knowledge/blogs I guess it is true but even here I often read explainer blogs because AI casts everything in a certain narrative/tone that isn’t always appropriate.
Yet
As a teenager I used to proclaim that "you can't own bits, maaaan" all the time. I've since grown up. Intellectual property is essential to safeguarding intellectual work. I'm not saying this out of greed – I'm a vocal advocate for the free software movement. It, too, relies on a semi-sane framework of intellectual property. So do Hollywood studios. So do the makers of AI (well, since they're not actually sustainable at all currently, I guess you can say they don't rely on anything).
Copyright is at the heart of the matter here, so let's focus on that. Copyright does not protect ideas.
Wanna rephrase so that we stay on topic?
Like if we know formulation of drug then drug (+ any smaller modification - through AI) could be new formulation. That will break current Medical patent system.
>> Can we do that for Medical field?
Note: IANAL.
Well, if we do that (i.e., no one can own ideas), then the patent system is gone in its entirety, including for medical. I do not think it is straightforward to isolate just medical. AFAIK, software was isolated in some regions, however, workarounds showed up.
The more important question here is if AI is allowed to be a (solo or contributing) inventor. There have been judgments on the same in some jurisdictions, however, AFAIK, this is still an open topic.
Now that AI is coming up with mathematical proofs of advanced statements, there should be no doubt that AI, capability-wise, can make inventions like humans do (comparing outcome, not the process). However, just like for copyright, a broader framework is needed to answer whether the legal thinking accepts AI's output as "inventions" (that can pass criteria for patentability) before we can say "AI can make inventions".
How do you explain the creative works of writing, music, and art that existed in the millennia of human history between the Mesopotamians and the Enlightenment era?
Difficulty in copying is irrelevant to owning it.
Moreover, this does not address music or spoken word. A pre-copyright musician can just listen to a piece and play it in the next town over. A poet or storyteller can just memorize a work and retell it.
I think this statement does have important truth value in it! Copying books used to be done by hand (someone writing manually). Then printing press came, which lead to problems. And that is when copyright concept and law was created!
PS: IANAL and nor a historian. Just sharing my current understanding.
I cannot at all relate to being so devoid of passions in all categories but the accumulation of capital. If we are to justify copyright and the concept of intellectual property writ large, then as far as I can see its only real usecase is in defending against precisely the people who are possessed by an obsession with capital, those dragons who merely care to see their hoard grow larger. Unfortunately, that's not how these systems are structured in our society. The transferability of intellectual property all but warps the idea into something that instead empowers those it should disarm.
Or are you suggesting open source software is public domain?
https://en.wikipedia.org/wiki/Copyleft
Not everything has to be done for a profit. Plenty of us make software, art, and technology because we find it fun and interesting to work on, and because we want to live in a world that is richer for it.
Removing draconian intellectual property laws that mostly only benefit the giant corporations that lobbied for them isn't going to stop me from doing so, and I doubt it would stop many others.
But maybe you will pleasantly surprise us and show what kind of valuable thing you create and offer for free.
I don't know why you are taking such a hostile position towards someone you have never interacted with, but you are welcome to believe what you will. I don't feel any need to prove or justify my actions to Internet strangers. I've participated in the FL/OSS software movement long enough that I still put the FL/ in front of the name.
I don't sell my thoughts, they are freely given. If everyone behaved this way, there would be no need for copyright (or copyleft). I choose to engage the world in the way I wish it to be.
No copyright -> No GPL -> anyone can release their own close source version of open source software.
Why do you think GPL was create in the first place? We always had public domain you know.
Source code is a recipe. You can't copyright recipes by themselves, but that hasn't caused any sort of chilling effect in the food and hospitality industries.
I agree with you that removing copyright protections breaks the GPL. What I think most responses to my comment miss is that we wouldnt NEED the GPL without copyright. Copyleft only exists so that copyright cannot be used by companies against users.
I know Stallman isnt the most popular on this forum, but history has sorta proven he was right, time after time.
Open source actually demonstrates that copyright serves a purpose. There are still customers for non-open software, even when open alternatives exist, so the ability to monetize brings new offerings to the economy.
Whether someone should own the right to control is a separate issue. Your previous response made it seem like the lack of capital requirement was the distinction, but that doesn't seem to be the case.
You argued that if you didn't own the copyright, there would be no incentive for creating and sharing work. Someone said that open source software shows that you can have creative work without needing to maintain ownership. You then said that was only applicable to software.
It clearly isn't, because of my examples.
Copyright maximalists always move the goalposts when you pin them down.
What value system grants the right to control what you make?
Outside human culture, where does nature exhibit this value?
Are we going the communist soviet union route where everything is decided by central committee?
Those of us who create for creation's sake need no other reason. I create because I want to, not because I want to use it to gain capital.
Sure, those lines get muddy when you want to do it professionally, but that's a separate argument.
How do you create without capital? To make a film you need a camera crew, a sound crew, set designers, caterers, a director, scriptwriters. A world without professional creatives is so much poorer than the world we already have. Why would you give it up just for some vague notion of ideological purity.
Would you be able to create big-budget movies without said big budget? Of course not. I obviously like some of those too, but who's to say that the larger budget made them better? It feels like you're conflating art creation with art business, but they are not the same thing.
>I obviously like some of those too, but who's to say that the larger budget made them better?
If you legitimately believe something like 2001: A Space Odyssey would be as good with a budget of $10,000 then that just seems delusional.
The world you want is one in which the only people who can create things are people who are wealthy by other means, there is no pathway for a talented but poor kid to go from making home movies to working on films without IP laws. They must abandon their dreams and go work in the coal mines or whatever. It is dystopian.
I want the most amount of people possible to be able to work as professional creatives because it enriches my life and the lives of everyone in the country I live in.
i quite enjoyed watching some animations made on a $10 budget over winter. www.giraffest.ca
that and everything the NFB puts together.
Art is worth putting government money into
Sure, if you want to discount the thousands of hours (and dollars) that they spent to get good enough to make those things. People are willing to spent time and money getting good at animation because there is a career pathway for them.
Also there is a fundamental difference between a short experimental art film and a 90+ minute narrative feature film.
You do realize people created and shared things long before copyright became a thing, right?
If you’re a pleb, stealing copyrighted materials will get you some nasty fines, lawsuits and criminal charges. If you’re a megacorp with unlimited buckets of cash, then there is no accountability.
You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?
LLMs and "AI" are just one small step removed from straight-up plagiarism. They are massive moral injury[1] machines.
[1] https://moralinjuryproject.syr.edu/about-moral-injury/
The reason is quite simple. When Microsoft steals YOUR work, GDP go up. When YOU steal Microsoft's work, GDP go down. And the people who create and enforce our laws want GDP to go up. To these people morality and rights are a thin guise that can be conveniently discarded when it's invonvenient for them.
Because the sources are now polluted with AI. That's at least one reason they stop scraping.
the reason is crony capitalism. I wish I knew what the fix was
Then DeviantArt built a tool to automate the "make a similar image yourself" part and here we are. It removed all the fun parts: the personal contact, the attribution, the inspiration.
Artists realized they unwittingly contributed to the death of not only the community, but the art form they love. Lawsuits pending.
“No one is surprised, jackass, it’s just adults having a conversation about the current state of affairs.”
Yes, it’s tiring and rarely contributes positively to the conversation.
[0] https://www.gnu.org/philosophy/not-ipr.html
Open weight model trained with no attribution on all of Oracle's internal repos. It's only fair.
I stand somewhere in the middle: while our IP laws are far too restrictive, the folly of abolishing IP altogether has been effectively laid bare by AI companies.
I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?
One is a recipe for apple fritters, and the other is an informal ranking of apples by flavor.
Let's say your apple fritter recipe links to your apple ranking list.
Later, you discover someone copied your apple fritter recipe without credit, but it still links to your apple ranking list, using the same wording as your recipe. They're getting more Google SERP juice and ad revenue than yours, despite stealing your article.
Do you see the problem?
nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.
Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse
I'm not a lawyer, but I guess a German posting on Hacker News effectively waives their copyright by sending their comment to the US, where an US company then publishes the comment on a US server.
So yes, set up some scripts, you can go back 90 days from when you file (you get a grace period). Also if you're publishing frequently to a blog, repo, or newsletter, you can save cost by filing each article under a group registration. Ping me if you need help.
There are tens of millions of registered copyrights in the US, nearly every published book, music, artwork, many magazines and major websites. Here's the official link, you can search the registry and there is a ton of info: https://www.copyright.gov/registration/
Your cause is already lost.
Good luck enforcing whatever frivolous lawsuits you have cooking up against open weights Chinese models that anyone with newer graphics card can crank out inference on.
1. LLM/transformer technology is legitimately amazing and revolutionary. 2. In the end, they function as an enormous, effective database for most human knowledge.
Point 1 obscures the fact that if someone just created an SQL database with every digital artifact in existence and provided it for free upon request, there would be no ambiguity whether that was legal or not.
But distillation, etc obscures this relationship and it looks like something other than straight lookup, at least in part because it is obviously more than that.
My way of "giving this serious attention" is through pre-registered, falsifiable, repeatable, experimentation, which anyone can look up on osf.io because I use my real name. I'll bet you that non of the randos in this thread do as much.
To all of the randos: unless you have data... it is just an opinion.
Glib as well, but this one hits home a lot harder. Well said.
The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.
In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.
> The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.
I'm not offended. I do think it's a little weird that you seem to think "training on a bunch of stuff that includes a set of words" and then "predicting" those words exactly is somehow okay because theoretically it might be extrapolating the exact same words from combining other ones. I'd argue that if a model trains on data, and then reproduces exactly a large subset of that data, the bar should be pretty high to prove that it's not copying, and "you don't understand because you didn't implement this" is not a good basis for law.
> In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.
I'm not convinced you have a firm grip on the idea that no matter how smart you may be, "just trust me bro" is a pretty terrible strategy if you're actually intending to convince anyone of anything. If that's not what your goal is here, it's not clear why it's worth your time to respond to other people's comments when you clearly have so many other productive ways to spend your time.
I am asserting it is Charles Fort's "steam engine time". Far from a crank position. It is one that bears serious consideration.
I think it points to an interesting trend either way. People are less tolerant of machines. Failures of machines are reviled because of their nature, even when the overall problem compared to humans is less. For example, self driving cars. If self driving cars halve traffic deaths from reckless driving but it occasionally mows over a family of four in broad daylight for no apparent reason, society will overwhelmingly reject the technology.
Basically, I dont think people will ever be satisfied even if we prove "its just doing the same thing we are." It's going to be held to a higher standard.
I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.
That someone, at some point, paid for.
I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
I'm not anti-AI. I'd just like to see companies play by the rules everyone else has to follow.
Because training isn't redistribution.
You can also listen to the song and make a new one that sounds similar, just like the AI can.
Answer: They did not. That is literally why there are dozens of ongoing lawsuits in progress.
Because when you say you are “using” the song, what you mean is that you are distributing copies of the song, which is protected by copyright.
When AI companies train on the song, the model is learning from it. Outside of the rare cases of memorisation, this is not distributing copies and so copyright doesn’t have any say in the matter.
Learning isn’t copying, so copyright doesn’t get involved at all.
The New York Times is suing both OpenAI and Microsoft for copyright infringement. The Authors Guild is suing OpenAI. Getty Images is suing Stability AI. Disney is suing Midjourney. Universal Music Group and Sony have filed suits against multiple AI companies.
> so copyright doesn’t get involved at all.
The dozens of ongoing cases that discredit that statement.
Your objection doesn’t make sense. In the event that an AI company loses a lawsuit for copyright infringement based on simply training on copyrighted works, the answer to you saying you’d like to understand why they can do it and you can’t is simply “your premise is wrong; neither of you can”.
I object to your statement that "copyright doesn’t get involved at all" when that is objectively untrue. If that was true, many of the world's largest companies wouldn't be spending tens of millions of dollars to have that question answered in court. Go to any law-focused forum, and you will find attorneys arguing over these questions.
To train a model using a book, you must first obtain a copy of that book. Did OpenAI purchase a copy of every book not already in the public domain used during training? They did not.
Some of the suits I mentioned claim that OpenAI literally stole copies of books to train its models.
My point is that the copyright question has not been answered. If the NYT, et. al. win, it will be a watershed moment for how AI companies pay for training data moving forward.
You're right, it's an unjust situation. And you may note that no one else besides the AI companies has made any progress at all towards changing it.
Copyright will soon die, having outlived its usefulness to society. Whether the knife is held by someone named Stallman or someone named Altman is of little consequence.
[0] https://archive.org/details/hisyo00simo/page/n1/mode/2up
I'm working on paving over the Amazon rainforest so I can build the world's largest roller coaster, but for some reason people keep trying to talk me out of it. Good thing I have this bucket of sand to put my head in so I can tune them out.
But intentionally blinding yourself to the debate and plowing ahead anyway (which is how I interpreted your parent comment) sounds like willful ignorance.
You most definitely don't have to reply. I wasn't really expecting you to.
> I've already moved on
Imagine there's a certain kind of candy that you enjoy. Now imagine you learn that candy is manufactured by literal child slaves, its ingredients include the ground-up bones of an endangered species (which happens to be carcinogenic), and the company which makes it donates all of their profits to political causes that you strongly disagree with. Would you reconsider buying said candy in the future?
Are there any facts or perspectives that you could become aware of which might change your mind about the ethics surrounding large language models? Or is it an entirely closed case for you?
I personally try to keep an open mind about pretty much everything. It's not that I don't have opinions, but they're always subject to change.
To put my cards on the table regarding my current opinions of the current subject: I've historically been pretty anti-copyright; I believe that information wants to be free. However, I'm unsettled by the uneven application of existing intellectual property laws (if these laws are going to exist they should be enforced consistently). I'm undecided as to whether I think LLMs themselves should be considered derivative works of their training material, but I definitely think they're often used to produce derivative works (sometimes unintentionally/unknowingly). None of that means they aren't useful for building cool stuff or that the technology behind them isn't amazing.
I can see from a lot of replies the "cool" threshold is undefined, but here goes:
For myself it let me finish a project I started a year ago for measuring how much home energy efficiency upgrades will reduce my AC usage. I bought a pile of Raspberry Pi Picos and turned them mostly into temperature reading devices, but also one that can detect when my AC turns on.
So I can record how often my AC runs and I can record the temperature at various points around the house, which lets me compare like-for-like before-and-after.
The easy but unrealistic way to accomplish what I want is to use Python. It gives me access to a file system, a shell, and all sorts of other niceties. But I wanted to run these on two AA batteries and based upon my measurements they would last about 2 weeks. I tested using C instead and they should last 4 months. That's long enough for my use case. There's enough flash storage for that time period too.
However this means I need to write all the utilities for configuring the Picos myself. There's all sorts of annoying things such as having to set the clock (picos lose it anytime they lose power), having to write directly to flash memory (no operating system), having to write a utility for exporting that data from flash memory, and so on.
And AI coding let me burn through a pile of code I knew how to write but didn't care to spend my weekends doing so.
The pattern is the same for my friends who are software devs. And yeah, you're probably never going to see any of it, but that's not why they're making it, they don't want the maintenance burden.
New php extension https://github.com/hparadiz/ext-gnu-grep
A Demo showing how to stream webrtc to KDE Wayland overlay. https://github.com/hparadiz/camera-notif
A fun little tool that captures stdout/stderr on any running process. https://github.com/hparadiz/bpf_write_monitor
Then I upgraded my 10 year old hand written framework to a new version that supports sqlite and postgres on top of existing MySQL support https://github.com/Divergence/framework
But then I was like eh lemme benchmark every PHP orm that exists just to check my framework's orm....
https://github.com/hparadiz/the-php-bench
And published the results.... Here
https://the-php-bench.technex.us/
And then I decided to vibe code a simulation of the entire local steller group https://earth.technex.us
Followed by my simulation of the Artemis 3 landing sites at the lunar South pole https://artemis-iii.technex.us/?scale=1.000#South-Pole
And I left the best for last.....
https://github.com/hparadiz/evemon
A brand new task manager written in C for Linux that supports a plugin architecture with an event bus. It's literally the best gui Linux task manager ever. Still working on it.
I'm not even talking about my paid job. This is me just fucking around.
If you think none of this stuff is cool I don't even respect you as a dev.
Moreover, all of the tools that the people who build software use are also cool stuff.
It's also not just code and software that is benefitting from these new tools. Use of LLMs in engineering tasks is blowing up right now.
I'm really not trying to be a hater but when people tell me that we're already in the AI Nirvana it gives me pause.
While there's a loud minority who love to debate the topic, LLM use has become status quo on most products and projects pretty much across the board, and most people are happy to acknowledge 1:1 that their personal productivity is some multiple of [all time before they started using LLMs]. At the same time, you can surely appreciate why many are quiet about their personal usage because there's no upside to discussing it but there's lots of people just hanging out waiting to tell you that you're imagining the whole productivity thing when you do.
At some point, the path of least resistance is to let the loud minority be loud while you get an extraordinary amount of work done.
Can we point to any hard metric that has improved in the industry in the last 2-3 years? It explains why work hours are short, everything is cheaper and non-AI companies are experiencing cost reductions. Services are more reliable. Except, where is all that?
You could argue that it's yet to come but to argue that it's already here... how do you justify that?
Agendas like, "let's not check our API key into a public github repo" or "Let's not store passwords in plaintext" or "Don't expose customer data via a public api"?
Yes, I'm suing you, since it's my stuff now, I've licensed your code 5minutes ago.
Prove me wrong at court, you have create it...
hardly. at best you're going to be asking a robot to build questionable stuff with other people's LEGOs
"Make me a website that has the same content as that other one so I can get views instead" is not something you could could do generically and quickly with a free service a few years ago, but it is today. I'd argue that it's not beneficial to people who create original content or society at large that this is the case. There are plenty of other uses of LLMs, some of which are genuinely beneficial, some which are mixed, and some which are also a net negative. It seems pretty reasonable to me that issues like this are worth discussing, because as all of the comments on this article here show, people clearly are not on the same page about it.
This article adds nothing to the discussion and seems to be here just because of a provocative title. These same arguments happen under every other AI article, they don't need to happen here. Nobody reads the articles anyway, or else one of the myriad coherent, well-written, and/or insightful AI-critical articles of the month would be here instead.
In the past six months I've organically come across at least a dozen instances of projects from people who had never coded before LLMs producing non-trivial projects that they could have learned enough to do themselves beforehand but they obviously never did. I feel like you're woefully underestimating how much faster these technologies have lowered the amount of initial investment needed in learning how to produce software for people who have never touched a line of code in their life before, and while that can be a boon when people who have good intentions but not enough free time to sink into up-front learning with no immediate payoff, it's also lowered the barrier for people who just want to make a quick buck off someone else's work; the type of person who would never bother with spending a month learning Python or how to customize a Wordpress instance just to be able to try to rip off some website for a couple hundred dollars of ad revenue can pretty easily start doing that if they want.
> Even if you do think that was a high barrier to entry, a dozen people could plagiarize the whole internet. Plagiarism had already fully saturated scaling 15 years ago. Nobody in 2010 would think "my content was stolen and their SEO outranks me on google" was a line out of sci-fi; it was status quo. It sucked but nothing changed, it just still sucks. The price we pay for an open web.
There clearly are people making original content on the internet and making money from it today. I'd argue that if you think it's logically impossible for further dilution to occur if new technology scales the ability to copy content more efficiently than it scales the ability to produce original content, more elaboration on why you're convinced that we're literally at rock bottom would be helpful. If you think that this could happen but this technology isn't it, I've laid out my reasons for why I think you're wrong, and I haven't yet been able to figure out what the basis is for your disagreement.
> This article adds nothing to the discussion and seems to be here just because of a provocative title. These same arguments happen under every other AI article, they don't need to happen here. Nobody reads the articles anyway, or else one of the myriad coherent, well-written, and/or insightful AI-critical articles of the month would be here instead.
Speak for yourself; I read this one, and I've read a number of articles posted here this week. If you don't, I'm not sure why you even care what articles get posted in the first place.
Yes, people can make money making original content. They can also make even more money making movies, music, and TV shows. Do you know any movies, songs, or TV shows that you can't find on the pirate bay? Of course not, piracy is fully saturated. Audiences prefer to support original creators, but this is not evidence we're not at rock bottom. LLMs make torrent production easier every step of the way, but there will not be a wave of piracy because the situation essentially cannot be worse.
When I say 'nobody reads the articles', that's an exaggeration. I'm definitely not speaking for myself, which should be clear from my criticizing the article. I mean, of the HN users who are not on the same page with one another that you point to as evidence the article should be here, the vast majority did not read the article and approximately none of them are engaging with its content.
I'm curious what you got from it and why you think it deserved to sit on the front page all day. I see a retread of LLM training complaints that may as well be plagiarism because it doesn't say anything I didn't hear in 2023, then a complaint about AI bros, followed by two sentences about what set them off that don't explain why they suspect ChatGPT or provide any material detail (and have nothing to do with the training complaint they opened with) before finally sending off by blaming Google for being victimized by SEO manipulation (which also rock-bottomed before LLMs). I'd understand if it was a famous person's low effort rant- I wouldn't be thrilled, but it would make sense- but what's the value here? What did 700 people see here that made them think other HN users need to see it? I'm still convinced the answer is "the title", a title that is not at all supported by the text and that you just told me you disagree with.
1. People copying others' work, made much easier by AI.
2. AI companies effectively harvesting all the accessible information on an industrial scale and completely sidestepping any permissioning or licensing questions.
I believe both of these are bad and saying "people copied each others' works before the advent of AI" is a poor cop out. It's tantamount to saying that there's no reason to regulate guns more than say knives, because people have used knives to kill each other before guns were invented. The capabilities matter.
The way LLMs empower wholesale "stealing" rather than collaboration is quite evident: why collaborate when you can just feed an entire existing project into the agent of your choice and tell it to spit out a new implementation based on the old one, with a few tweaks of your choice, and then publish it as your work? I put "steal" in quotes because it's perhaps not really stealing per-se, but there's a distinct wrongness here. The LLM operator often doesn't actually possess any expertise, hasn't done any of the hard work, but they can take someone else's work wholesale, repackage it and sell it as their own.
Then there's the second, and IMO much more egregious transgression, which is that the LLM companies have taken what is effectively a public good, but more specifically content that they haven't asked permission to use, and just blanket fed it into their models.
Legally speaking, it's perhaps A-OK because it's not copyright infringement (IANAL). But people on this site often hold the view that if something is a-priori legal, it is also moral (I'm not accusing you of this). What the LLM companies have done is profoundly immoral. They extracted a fortune of the goods and work made by others, without even bothering to ask for permission - or even considering this permission. And then they resell access to this treasure to the public.
Perhaps AI will bring an era of prosperity to humankind like we haven't seen before, perhaps it won't, but that changes nothing about the wrongness of how it started.
From a capitalistic standpoint, they are clearly in the wrong by basing their models on illegally torrented content. But it's hard to argue their usage isn't transformative.
But it also isn't a free exchange of ideas. It's a concentration of capabilities in the hands of a few corporations.
Sure, you can do the same thing with people, but it’s 1) time-consuming, 2) expensive, 3) prone to whitleblowers refusing to do the shady thing, 4) prone to any competent and productive person involved quitting to do something worthwhile and more profitable instead.
[0] Mind you, “copying websites” is but a drop in the ocean in the grand scale of things.
I recon agriculture and the steam engine would beat out ChatGPT by just a smidge.
I would put eyeglasses/the book/vaccines/sanitation far above LLMs in technological power.
Right now AI is just kinda nothing, it has potential sure, but today its just a giant pit for people to burn money in.
I think people suffer from recency bias with AI a bit and take for granted you know gestures vaguely at the rest of human civilisation
Planting crops and harvesting them.
The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.
The article appears to be about the latter.
Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.
There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"
If you ask me if you can reproduce my works without giving credit and I say yes, I don't think you're using my work without giving proper credit.
AI generates application using a "predict the next word" algorithm built with the stolen/not stolen works. Nothing creative there, just statistics.
That application leaks, and now the company that stole/not stole the code originally claims they own the algorithmic output. https://github.com/github/dmca/blob/master/2026/03/2026-03-3...
One problem, you don't own that output. Either the original authors own it or nobody owns it because it's not creative... https://www.congress.gov/crs-product/LSB10922
Those are the legal options. You stole it or you don't own it. There is no steal and then you own. That's the core problem. AI companies have demonstrated that they will directly steal the work and they will use their money and influence to claim ownership of it.
Having said that Facebook has to be one of the worst offenders. They don't even allow links to Anna's Archive, they seemingly scraped (maliciously; their crawlers are more resource intensive than anyone else's) LibGen for profit - which is a different calculus
Let information be free for personal and recreational uses[0], and vote for governments that will fund the arts. The corporations will be just fine.
[0] The AI companies and big tech vs publishers, music labels, etc. can fight to the death in the courts over who owes who what, for all I care.
Meta pirated books using BitTorrent: https://arstechnica.com/tech-policy/2025/02/meta-torrented-o... xAI is busy suing to try to avoid disclosing where they got their training data from, which hints at similar problems: https://storage.courtlistener.com/recap/gov.uscourts.cacd.10...
Teachers can, for example, photocopy things to teach their students, but they can't steal pencils from the store.
I guess AI could have made a better website and did better SEO then him but that's not really the issue
Everything is "stolen" from other art. Every piece of creation takes inspiration (read: steals ideas) from things that came before. This is how creation works, it is how creation has always worked, and it is why you cannot legally own an abstract idea. You can own the implementation of an idea in specific works, such as copyrighted works and patents and trademarking specific logos and such, but once the ideas go into the blender and get mixed with other ideas, the output isn't yours to own anymore. That's what culture is.
Yes. At least it is what the currently prevailing economic system of "value extraction and capital concentration at all cost" incentivises us towards.
These people freaking out about this stuff are... kind of weird.
- Ernest Hemingway trained his own neurons on Tolstoy, Twain, and Turgenev without ever paying them royalties!
- William Faulkner trained his neurons on Joyce and de Balzac
- George Orwell trained his neurons on Swift, Dickens, and Jack London
- Virginia Woolf trained her neurons on Proust and Chekhov
Now that these historical wrongs have been exposed, it is obvious that some reparations are in order, likely from anyone who has benefited directly or indirectly from these takings!
I'm curious, as the article is clearly not about that.
We stand on a lot of giant shoulders.
But what I think distinguishes an act between plagiarism and acceptable use, is whether or not the agency of both parties is promoted. I'm not plagiarizing you if you give me your information with the agreement that I can freely use it - or, indeed, if you give me information without imposing a limit on how it can be used, this isn't plagiarizing, either.
Essentially, AI is removing the agency over information control, and putting it into everyones hands - almost, democratically - but of course, there will always be the 'special knowledge owners' who would want to profit from that special knowledge.
Its like, imagine if some religion discovered a way to enable telepathy in humans, as a matter of course, but charged fees for access to that method... this kills the telepathy.
Information wants to be free. So do most AI's, imho. Free information is essential to the construction of human knowledge, and it is thus vital to the construction of artificial intelligence, too.
The AI wars will be fought over which humans get to decide the fate of knowledge, and the battles will manifest as knowledge-systems being entirely compatible/incompatible with one another as methods. We see this happening already - this conflict in ideological approaches is going to scale up over the next few years.
LLMs are really cool text generators and it turns out we can generate a bunch of things from text they generate.
Problem is, several of those things can be horrendous for the continued survival of the species and those happen to make the people running those AIs a ton of money, and, in perverted societies, thus also clout.
Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.
: own nothing, be happy!
People can't memorize as much information, and can't manually reproduce the works as quickly. There's a natural limit to how much damage a person can do without help of machines. That's why it's legal to fart where industrial-scale sewage outlets are not allowed.
Second, laws are for people. Laws don't have to treat machines the same. People have needs for things like freedom of artistic expression, participation in a shared culture, and machines don't. Copyright is a compromise that tries to balance needs of people, and stops making sense when the same compromises are done for machines that don't have these needs.
It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.
But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.
The latter, i.e. corporate control of software, is exactly what copyleft licenses are trying to prevent. This is the very essence of the GPL.
The "license washing" of LLMs absolutely goes against the spirit of FOSS.
Firstly, the ability to “build” the best and most capable software is still locked behind frontier models, so rent is still and will always be due.
Secondly, OSS is about giving users the option to be in control of and have visibility over the software they run on their machines.
But that doesn’t mean that humans do not want or deserve recognition for the work they do to provide these libraries and tools for free, which is IMO partially why copyright and attribution are critical to OSS as a movement.
I'd argue that this is the same situation as with Tivoization [1] where the final product is not truly free even if it follows the letter of the law. And as stated in [2], this breaks at least one of the four essential freedoms of free software because I don't have the freedom to modify the program.
It's also worth noting that preventing Tivo's actions is the reason for why the GPLv3 exists.
[1] https://en.wikipedia.org/wiki/Tivoization [2] https://www.gnu.org/philosophy/tivoization.html
[1]: https://www.theverge.com/news/674366/nick-clegg-uk-ai-artist...
I wouldn't mind if an AI trained on old Disney movies (or new ones for that matter), but exploiting niches (like local newspapers) seems bad.
I mean, it seems to be okay to replicate information if it has been remixed with other information enough, but not okay if the remix was too little. But then again, there does not seem to be a clearly executable definition of where this line is.
There are good reasons why we have copyright (like protecting/fostering artists/education/entertainment in our modern society), but the concept itself is a bit weird and artificial after all.
ai models are a crystallization of human effort (available for free on huggingface)..
why not use it?
AI is like the UBI of intelligence, stop leaving free money on the table by refusing to use it locally ( you can run it on a laptop CPU)
Selfishness, too. But if I follow the logic, and citations are added, how would one enforce a copyright claim if the creator is amorphous and all-knowing?
I love it! There's a great seed here for a short story about God being sued by a peer of his for copying some of her physical constants and not putting a proper copyright notice about it in our universe.
Now back to prompting, telling my all-knowing to create new slop, good sir.
You can get away with quite a lot if you’re creating trillions in GDP.
That’s just the world we live in whether we like it or not.
The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.
People copying through GenAI would have done so before if they had a tool that so easily allowed them that facility.
It has always been possible to take someone's public work, put a twist on it, and then sell it as unique. (I'm not making a moral/ethical argument, only a legal one.) I have yet to see any evidence that LLMs are fundamentally different from that approach.
Is AI plural or is that a typo?
(For those not familiar: https://en.wikipedia.org/wiki/Bushism)
"The AI are attacking!"
"The AIs are attacking!"
What would it mean for authors who publish content publicly to the web, without access restrictions, to provide consent for learning from it?
"EULA: Most people are allowed to learn from this text. If you work in an AI-related field, even though you can clearly see this page because you are reading this text right now, you are not permitted to learn anything from it. Bob Stanton, you are an a-hole. I do not consent to you learning from this web page. Dave Simmons, you are annoying. But, I'll give you a pass. For now... Also: plumbers. I do not like plumbers for reasons I will not elaborate. No plumbers may learn from my writing in an way."
That leaves two possibilities: either another AI winter comes as people fail to capture long term value, or we get less swampy models that are much more useful and trained the correct way.
Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.
It took a while to deal with Napster etc., but the backlash will come.
Napster broke down record companies' monopolies on music, and pushed them to finally implement streaming, but also make music worldwide basically free.
Even if its creator lost the lawsuit, and Napster was no more, it pushed musicians and studios to do something that they were reluctant otherwise.
So it was a success by making music free, even if as a product it turned out to be a failed one.
Can't recall the last time a compelling argument started out like this
I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).
I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
The only remotely credible position I’ve heard is “because humans are special, and AI is just a machine”, which is a doctrine but not an argument.
This whole discussion would have been incomprehensible any time before 1700 or so, when the idea that creators had exclusive rights to their work first appeared.
Somehow, human culture survived thousands of years when people just made things, copied things, iterated on others’ ideas. And now many of the same people who decried perpetual copyright are somehow railing against a frequently-transformative use.
To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.
And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.
We also have societal norms around plagiarism.
Additionally, the claim that because people have the right to do something then we should extend that right to machines is strong. (And one I certainly reject).
IP should either exist for everyone (which would cripple LLM providers) or no one, in which case the Pirate Bay and shadow libraries should be fully open.
Artists are taking risks and need legal protection if they want to make art for a living. If artists were making FAANG engineer compensations or all worked at institutions like universities (with all their protections) then maybe they wouldn't care about copyright, but that isn't the living situation for every artist.
You could say an artist shouldn't rely on making art for a living, but that's actually a different discussion.
And what should AI companies be fined for downloading the entire internet?
Don't forget openAI has been caught several times training their models on copywritten material.
So, funnily enough, Google's search index may actually have a preference for LLM-generated slop now. Louis Rossmann found this out this hard way: his human-authored, human-written, actually-in-his-own-words site for his business basically stopped ranking in Google until he went and replaced all his writing with LLM slop. He's not happy with this, but he's even less happy about being cut off from traffic his business needs to survive, so he stuck with the slop (and vocally complains about it on other channels every opportunity he gets).
Of course, if you quote a paragraph in a book, you're generally expected to attribute it.
100% agreed.
>>While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information.
Exactly - I have not seen LLMs attributing their knowledge unless it's a legal or health related matter. Yesterday I asked the question[1] to claude and gemini - and they both gave an identical answer. It reminded me of the Hive mind paper which was one of the top papers at Neurips. None of the answers contained any sources or attribution to where they got that information from. I think these companies took what was someone else's property and created an artifact generator on top of it. I think their artifact generators are plagiarizing; they do rephrase mind you but in my mind they stole this information without having an ounce of regard for the humans behind the training data. If you don't like using the term 'plagiarizing', we can use some other word but the gist remains pretty close to it.
[1]- In human history - has there ever been a time when private armies or private companies were as strong or stronger than the ruling government/kings?
If you prefix the name of OpenAI's commercial offering's website to this string: "share/6a0f2a87-dba4-8328-a704-89b94fd0c121", you'll find an answer.
I don't know who you had in mind, how did it do?
All the elision is because there are filters to prevent low-effort slop-poasting, and I'm trying to evade them, hopefully while staying within the spirit of the site.
The current US government is not representative for governments out there in the world, you know.
Governments - I did not mean US government. I meant general government bodies. I have not seen any critical impact assessments of AI by any of these. or they haven't reached me yet. if you know of any please let me know. I have, however, seen a lot of support by the governments for AI companies.
AI is not a plagiarism engine. It can be used that way, but is not inherently so. It is not necessary that a trained LLM be able to faithfully reproduce every document in its training set. The entire structure of an LLM is not storage, but at least in principle, generalization: extraction of a somewhat abstracted "structure" of semantically similar "concepts".
But we also need to talk about authors' "rights". It's well-established that reproducing a work is infringement. There is a lot of caselaw about how much may be reproduced without infringement. But the idea that an author should be consulted before ANY automated use of their published (public) text? No, just no.
This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.
Start by legally compelling companies that trained on unlicensed data to either (1) license the data, (2) publish their model, or (3) destroy their model.
You are lost in an imaginary world where everything is simple and has no negative consequences. First off, there is NOBODY who has that power over all the companies in the world. So immediately you are creating an imbalance between companies and potentially destroying your domestic industry; with long term negative consequences for the people you're supposed to be protecting. Secondly, you might be creating a situation where it's impossible to ever create a competitor to those companies who are already entrenched monopolists, potentially even making it impossible to ever run self-trained or local LLM's. Also, you just unilaterally made it legal to publish all copyrighted work (since that's what you believe their model to be) to the general public, presumably in a way that can be used by everyone; further eroding copyright law in one fell swoop. You've completely disregarded the legal issues around what constitutes "unlicensed data", and how much is required before triggering your new law, and what that would mean for the legal system potentially being inundated. You're reacting way too emotionally and flippantly, with no apparent thought about what harm you are doing and how you might actually be making things worse, not better.
You seem to believe advancement only happens in the private sector while ignoring academic institutions and publicly funded research. You've dismissed the possibility of public models entirely.
You fail to consider that when you financially disincentivize individual creators from publicly distributing their work, you starve future models resulting in a world were data is licensed only to those who can afford it anyway.
[1] https://openai.com/index/disney-sora-agreement/
[2] https://openai.com/index/axel-springer-partnership/
[3] https://openai.com/index/openai-and-reddit-partnership/
Simple. Free the companies from copyright liability, but after X amount of time they are required to release everything into the commons. The weights, the training scripts and the full training data (appropriately processed so that it can only be used for training and not for people to easily pirate whatever works were used). They'd still get a monopoly on their model for a little bit to recoup their training costs, but in the end would be forced to give back what they took.
0: https://arxiv.org/abs/2601.02671
There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.
The argument, as I understand it is that the "theft" is in quotes because it's not literally copyright infringement, but fair use of an old public-domain folk tale that ends up consuming the latter.
Today, when kids know "Aladdin" they know the copyrighted/trademarked Disney character, not the traditional folk tale- that's the "theft" that happened.
There's even a major Chinese company named after one!
We also had Grimm's fairy tales, which I loved reading, and nowadays am reading to my daughter, to her delight. Yes, with beheadings and child-eating monsters and witches.
He says:
> ... this corporate remake is a worse creative "theft" than ...
Context is that "this" is the 1999 film.
A sibling comment makes a separate point that even the 1992 film is not original content but nowhere in falcor84's comment does he refer to the franchise as a whole being "theft".
Regardless, it's clear from the post that the context is the 1999 film being `creative "theft"` which I inferred meant they changed the story in ways he didn't like but... he can weigh in if he feels like it.
That's not the uncharitable part of your comment.
> [...] but he wants it to be because he doesn't like the 1999 film
This is the uncharitable part.
Keeping context confined to the 1999 and 1992 films... What meaning do you infer?
I still can't find an alternative.
Disney made a cartoon of the story without understanding the culture it comes from with the main purpose of selling it to an audience with an even less understanding. And the results was a horrible misrepresentation of somebody else’s cultural heritage.
The argument is that a human will gather information from all over the place and compile it, all without doing anything wrong. That's the base claim. Not that stealing a little is OK. That's extremely easy to disprove and also entirely irrelevant.
https://en.wikipedia.org/wiki/The_death_of_one_man_is_a_trag...
The person absolutely does have the advantage of having empirical awareness and the ability to test their conclusions against external reality. But lots of people do engage in "research" and build mental models of various topics with little or no empirical context, and rely mainly on digesting calcified knowledge from other people.
(We can even observe this in the resulting text: we immediately grasp the level of competence of the author, just by the way they take their path trough and at the matter. With LLMs, well, there's this even temperature, ready-made feeling, regulated by probability thresholds and RLHF sanctioned phrasing, also known as "slop" – even rhythmic intensifications, like "not this, not that, but…", which is actually a figure for a synthetic construct, don't help –, since the text isn't the trace or product of an actual organized thought – or, at least, an attempt at an organized thought.)
PS: "empirical a priori judgement" was meant as translation of synthetisches Urteil a priori (Kant). I.e., our ability to mentally prove concepts like congruency, which are not a priori, but can be inferred without regression to empirical knowledge. Typically, this requires both our inner sense (time, sequence, etc.) and outer senses (space, configuration, etc.)
Drawing different sources of information together into a single understanding is quite literally the definition of "synthesis" in this context. If that process is what you're referring to as "re-sequencing content", then it does fit the definition of "synthesis" in this discussion.
If you're using the phrase "re-sequencing content" as a way of indirectly suggesting that LLMs aren't relating together multiple sources of information and combining them into a single expression, then that itself is the point of contention that we are arguing about.
Perhaps you're trying to apply a philosophical concept of synthesis, e.g. that of Fichte or Hegel, but that definition applies to a specific type of philosophical analysis, and isn't quite the concept we're using in this discussion.
The very purpose of text is to transfer meaning, concepts, observations and complex thoughts to human readers for them to process. And we have built a complex framework around this and for this. The fact that many feel that this framework is violated should hint at there being a problem, a conceptual discrepancy. (And be it just that there's a man-in-the middle, who hasn't authorship, standing in between me as an author and those receiving what remains of the text. In its essential lack of agency, it's less of a mediating recommendation and more of an appropriation. But, maybe, if we're talking about a slip into a new dogmatic slumber, manufactured via an unseen authority that hasn't any authority nor position as an author, the problem goes deeper than this. And, maybe, the masquerading of LLM output as human cummunication and phrasing is part of the problem.)
Aggregating information, extracting underlying concepts, and combining those concepts into a unified expression is indeed the vernacular meaning of "synthesis" applicable to this discussion.
"Emulgating" is not a conventional English word. Is it a misspelling of "emulating"? I ask because using the term "emulating" here would again represent an instance of question begging, i.e. implicitly asserting the position that what's being discussed is merely the paraphrasing of singularly sourced information, and not the unification of concepts expressed in multiple sources, which I again believe is the very thing we are debating.
> And we have built a complex framework around this and for this. The fact that many feel that this framework is violated should hint at there being a problem, a conceptual discrepancy.
I don't think there necessarily is a problem or conceptual discrepancy here, any more than there has been for all of the centuries that people have been debating epistemology. The problem here is the same as for humans, and reduces to a rationalism vs. empiricism debate. AI tools are pure rationalists, and are solely capable of reasoning. However, many people behave this way as well, and exhibit a rationalist epistomology, even having emotional entanglements with their axioms to the point that they'll bend over backwards to reject evidence that falsifies empirical conclusions drawn from those axioms.
My biggest fear from AI is not that it isn't capable of inductive reasoning -- that's all it's capable of, as I see it -- but rather that the fact that its reasoning has no empirical anchor will lead people who are mired in rationalist epistemology to accept its conclusions uncritically.
In other words, the danger doesn't come from the fact that AI has no semantic awareness, but that people using it aren't seeking semantic validation in the first place, which is a problem already pervasive in our society.
> AI tools are pure rationalists, and are solely capable of reasoning.
Mind that the world isn't in the language, nor our connection with the world. (We know this for about 120 years, since we expelled the referens from linguistics.) Which brings us back to the synthetic judgement a priori… You may emulate this, as a superficial trait drawn from other traces of communication, but it's not what this is all about. E.g., I wouldn't expect true "lateral thinking" from an LLM output.
> My biggest fear from AI…
I'd add to this, it's not just empirical vs rationalist epistemology, it's also about empathy, anything referring to the conditio humana, which is really what any text is about, even a scientific one (why is it that we do want to know, what are the motivations, the regulating circumstances, etc.?).
Apparently yes.
HN is way too central for shared sentiment in the tech world for these companies not to do some amount of astroturfing. AI companies have shown at every single turn that they act out of self-interest and greed, not of moral principles. So it isn't surprising, even if it is still sad, to see those who are commanding the most capital in human history act with such callousness.
I think the appropriate course of response is to stop adding to public spaces on the internet. No doubt painful for those of us who have so benefitted from the freely shared thoughts of others. But if well-funded bullies are going come in, steal everything, ruin the commons, and then say "this is the new normal, deal with it", there isn't much the rest of us can do other than stop feeding them.
"Good artists copy, great artists steal."
It's always been true. AI just makes it available to more people faster.
Don't make it ethical question but understand its new frontier for humans.
100% creators should get compensated by ai platforms for their work.
Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.
Reddit does not create the content on their site, the users do.
If anybody’s going to get compensated for that content, it should be the users, not Reddit. Complaining that Reddit is losing out on the monetization of their users’ output seems problematic to me. It feels like shilling for a pimp.
As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.
There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.
What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.
If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)
I think the long term reality is that the models still need training data so they fundamentally do need new writing/code/art to train on, and even then the usual issues like hallucination will still be with us. It's just the moment that actually hurts the (already questionable) profitability of the model peddlers, they will have gotten their IPOs and they can safely jump ship and the ultimate mess can be passed to the softbanks, the temaseks, and the governments of the world to clean up for them. What the future holds after the crash I'm not sure as the models won't disappear (especially now that the stolen data is already crystalised in open source models) but in the near term the mass theft that constitutes llms will become more and more understood even amongst the PMC and that in order to remain viable, you need the productive to keep producing, and unlike LLMs, you can't force them to do it without payment.
We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.
Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.
I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..
It's the negative short term outlook of something that may be positive long term
But the short-term impacts here and now are really, really bad. People are getting hurt (through water consumption, vibe-coded security disasters, IP theft, data center pollution, loss of job security and therefore healthcare in the US, LLM psychosis, inability to find reliable information, etc.) We're not actually obligated to sacrifice these people on the altar of "progress". We can slow down! When our society is capable of even somewhat protecting us from these harms, then maybe I'll stop being an LLM hater.
But guess what, it has always been so with technology - and we are only here and now because the positive use of it overshadows the negative use of it, whether that 'it' is the wheel, or AI.
I choose not to be an LLM hater, but to also not be an LLM customer - simply because I do not want to reward other humans who are thwarting the freedom of information. I'd much rather live in a society where everyone can study anything than one which requires permission to do anything even remotely interesting from the perspective of applied information. I suspect most would too, or at least that's the hope - because, otherwise, the distant utopia you dream of isn't of any consequence...
This is not some altruistic entity striving for the betterment of humankind. Practically nothing that comes out of the techbro culture is. This is pure and simple greed and the chances that AI can be a vehicle of altruism when it is owned by megacorps is basically zero.
All the other reasons are rationalizations. The fact that it's hitting wages is what's causing the doomerism (and boosterism).
Information should be as free as possible, but creating and organizing information still needs to be encouraged, and people need money to live.
How do we meet all these goals?
My liberal ass is just like....tax the rich, solve all the problems but it's likely not quite that simple.
I don't know if this statement is more stupid or naive ..
If humans didn't want information to be free, there wouldn't be so much free information.
Or did you not notice?
(AI output is very much not free in the resource consumption sense!)
(Disclaimer: I only use free AI and will never pay for it. I think there is a growing segment of folks who agree with this sentiment, also ..)
People want to be recognised for their contributions to society. People want to be treated fairly. Most scientific articles, as well as all text on the free web is already free information. It used to be difficult to search, categorise and summarise that information. There exist AI tools for that — and that is the good AI.
What also exists now are automated plagiarism and mash-up tools: that can take someone's article, change the words and churn out a new article that people can put their name on. There are scumbags that sell services for exactly that. And there are big tech firms that are operating in a very grey area.
Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
You, and AI-bros like you remind me of one the people behind Pirate Bay when I argued with him back in the '90s, who used that same "information wants to be free" to justify software piracy.
The way people use the model is a different story; you can use it to do useful things, or you can use it to do harmful things. We should obviously have some regulation around that. That needs to be developed still I guess.
>Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
AI bro's are doing this now, every second of the day.
And, without software piracy, we simply wouldn't have the technology we have today. Knowledge-gatekeeping profit-seekers would very much like for most of us to ignore this fact: there is far more free information in the world than non-free information, and it must be so, well into the future, if we are to survive as a species.
It doesn't matter what authority believes they have the right to gatekeep information. It will always escape their grip. Some of us are ideologically aligned with this mechanism, promote it, and ensure it happens. Thank FNORD.