In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.
If you had a contributor that plagiarized at a 2-10%, would you really go “eh it has to have a degree of novelty to be a problem” rather than just ban them? The different standards baffle me sometimes.
Imagine how broken it would be otherwise. The first person to write a while loop in any given language would be the owner of it. Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.
Where are you seeing the 2-10% figure?
In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.
If you had a contributor that plagiarized at a 2-10%, would you really go “eh it has to have a degree of novelty to be a problem” rather than just ban them? The different standards baffle me sometimes.
You can find various rates mentioned here: https://dl.acm.org/doi/10.1145/3543507.3583199 and here: https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/
Imagine how broken it would be otherwise. The first person to write a while loop in any given language would be the owner of it. Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.
Sounds like an origin story for recursion.