• 0 Posts
  • 46 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle

  • The whole reason that Google exists today is that their PageRank algorithm was a great way to identify good content. At its basics, it worked by counting the number of pages that linked to a certain page. More incoming links meant the page was more useful. It didn’t matter how many relevant search terms you stuffed into your page. What matters was votes from other people, expressed in the form of linking to your page.

    But, that algorithm failed for 2 reasons. One is that it became cheaper and easier to put up sites that linked to sites you wanted to promote. The other was that people stopped blogging on their own blogs, and stopped creating their own websites, and instead used walled gardens like Facebook, Twitter, Reddit, etc. That meant it was hard to measure links back to a site, and that it was easier to create fake links.

    So, now it’s a constant war of SEO people vs. Google Search Quality people, and the Google people are losing. Sometimes there are brief victories for Google which result in good Reddit results appearing higher up. Then the SEO people catch up and either pollute Reddit and/or push Reddit links off the first page.

    It would all be really depressing even if it weren’t for generative AI being used to pollute everything. With LLMs coming in and vomiting their content all over everything, we might be forced back to the bad old days of Yahoo where some individual human curated lists of good things and 99% of content was invisible.



  • We need to talk about data as a physical object.

    We need to admit that it isn’t and that that’s a terrible metaphor.

    It’s still saved on disks somewhere, whether they’re a traditional HDD or a modern SSD.

    Yes, often multiple copies are saved. Sometimes it is aggregated with other data, sometimes not. Making a new copy is insanely cheap, and, under the hood, even when just moving the data from the hard drive to the computer’s memory, a copy is made automatically. There’s no way to avoid copying the data.

    But, to make it clear, “data” is basically “ideas”, and you can’t really treat ideas as objects. For thousands of years the idea that you could control ideas was ridiculous. You could control the physical object that an idea was expressed on, but if someone took their own time and copied it, that was a new object and the person who made the original had no claim on it.

    Copyright, and its evil friends, is a relatively new concept where the government grants a temporary monopoly on the expression of an idea. Stealing the physical object on which the idea is printed is one thing. But, now you can get in trouble for “stealing” the idea. That’s what you’re talking about with stealing “data”, is that what you’re supposedly “stealing” is information.

    But, of course it’s not theft. When you copy an idea without permission, the person with the original doesn’t lose it, they just lose control over a copy of that information.

    Treating ideas, data, etc. as physical objects just never works because ideas can be copied without the original person losing anything. This is different from physical objects where my taking it necessarily means that you no longer have it.

    In other words, data was always as physical as words on the page of a book.

    Not at all, because each copy of a book is its own physical object. Copying a book is difficult and requires its own printing press. Even a low-fidelity copy like a photocopy requires a photocopy machine, ink and paper. Copying data is essentially free. When copying a book required a printing press, you could sort-of pretend that ideas were objects because copying was so burdensome. But, with digital data it’s clearly ridiculous. That doesn’t mean you can’t have laws about data (i.e. information), it just means that those laws care going to have to be completely different from laws about physical objects.

    Why did we accept the change in how ownership worked simply because of a change of storage medium?

    Because copying is essentially free. It’s no longer an object, it’s information.

    But, having said that, the storage medium isn’t a major issue. The real question is when did people start accepting that you could treat ideas as objects. Stealing a book out of someone’s backpack and photocopying a book are completely different crimes. In one case, the person no longer has the object. In the second case, they still have it, but they don’t have control over the copies of it.

    Talking about data as if it’s an object or something you can own is a red herring. The real issue is privacy.

    For instance, say you use a period tracker app, that is owned by an non-profit, trying to use the data to better understand women’s hormone changes so that they can get better medical care. Great! Ok, now what happens if that non-profit goes bankrupt and as part of the bankruptcy proceedings sells its data to Meta or Google so that it can afford to make payroll. Well shit, your data is now owned by them, and you’re out of luck.

    A privacy rule handles that situation better. You can give the company access to your private data, and then revoke that access later. If your data is something they own, they can use it however they like. But, if you own your own privacy, it doesn’t matter if the period tracker app gets bought out or goes bankrupt or whatever. The data they have isn’t something they own and can sell, it’s private data that they had temporary access to.


  • They don’t have a monopoly like any of their competition that will easily sustain them.

    Erm, you think Bing is a serious competitor? Aside from search (91.54% of the global search market), Google is part of an ads duopoly that is only stalled by walled gardens like Amazon, TikTok, Wal*Mart, and the various entertainment companies. There’s also Google Maps, used by 77% of users between 16 and 64, and their biggest non-iOS competitor is Waze, which Google also owns. For email, 75% of the US email market is dominated by Gmail. As for the user-generated media market, YouTube absolutely dominates that. The closest competitor (Twitch, A.K.A. Amazon) is far behind.

    As for what Google engineers do, it’s mostly not rip-things-up-and-start-over innovation since these are all very mature markets with billions of users. Instead it’s small tweaks that generate hundreds of millions in savings or additional revenue.



  • Which is why the only systems that have ever worked are mixed systems that account for human nature.

    A 100% democratic system would have problems because nobody would have any experience or expertise, so people would govern based on ignorance. A 100% communist system doesn’t work because we don’t have a fair system to allocate resources, and as soon as someone becomes in charge of allocating resources, they allocate more for themselves. Even 100% authoritarian systems don’t work because a dictator has to sleep sometime. There may be a figurehead / leader in an authoritarian system, but unless that person delegates some power and control, they’ll be killed and replaced pretty quickly.


  • You’re just redefining the word to make it meaningless.

    You could argue that everything is actually anarchy because there are no “god given” or evolutionary required hierarchies. You could argue that everything is authoritarian because as soon as two people come in contact there’s a hierarchy established and one person has power over the other. You could argue that everything is democratic or communist, because in any system that doesn’t result in everyone killing everyone else, people make agreements with each-other.

    The actual definition of anarchy is really based on how it appears and functions. If nobody is functioning as a leader and there’s no obvious hierarchy, it could be described as anarchy.


  • So if a computer synthesizing Shakespeare is stealing

    Copyright infringement is never stealing. But, as to whether it’s infringing copyright, the difference is that current laws were designed based on human capabilities. If memorizing hundreds of books word for word was a typical human ability, copyright would probably look very different. Instead, normal humans are only capable of memorizing short passages, but they’re capable of spotting patterns, understanding rhythms, and so-on.

    The human brain contains something like 100 billion neurons, and many of them are dedicated to things like hearing, seeing, eating, walking, sex, etc. Only a tiny fraction are available for a task like learning to write like Shakespeare or Stephen King. GPT-4 contains about 2 trillion parameters, and every one of them is dedicated to “writing”. So, we have to think differently about whether what it’s storing is “fair” when it comes to infringing someone’s copyright.

    Personally, I think copyright is currently more harmful than helpful, so I like that LLMs are challenging the system. OTOH, I can understand how it’s upsetting for an artist or a writer to see that SALAMI can reproduce their stuff almost exactly, or produce something in their style so well that it effectively makes them obsolete.


  • One could imagine a computer “thinking” of things the same “way” that we do.

    One can imagine it, but that’s been the impossible nut to crack ever since the first computers. People were saying that artificial intelligence (what we now want to call AGI instead) was 5 years away since the 1970s, if not earlier.

    The new generative systems seem intelligent, but they’re just really good at predicting the next word. There’s no consciousness there. As good as LLMs are, they can’t plan for the future. They don’t have goals.

    The only interesting twist here is that consciousness / free will might not really exist, at least not in the form most people think of it. So, maybe LLMs are closer to being “thinking” computers not because they’re getting closer to consciousness / free will, but because we’re starting to realize free will was an illusion all along.


  • Generative AI is based on “predicting” and generating the next token. Tune it one way and it will regurgitate its training data exactly. Tune it the other way and the words it comes up with are nonsense. Tune it just right and it comes up with something that seems creative.

    The problem is that the training data is always in there somewhere. It can’t generate something in the style of Shakespeare without containing Shakespeare as reference. That’s probably fine for Shakespeare which is out of copyright, but if it contains say Stephen King’s entire collected works, that’s another issue.

    If a human writer read all of Stephen King’s books then tried to write in the style of King, that would be OK, but that’s because a human can’t memorize everything King has written word-for-word. When a human reads King, they don’t build up a database of “probable next word frequency”, instead they build heuristics having to do with how he approaches dialogue, how he reveals character, how he builds tension, etc. They may remember one especially memorable line or two, but the bits they remember, even if written down word-for-word would probably not be enough to be copyright infringing on their own.

    I would bet that we’ve come too far to completely scrap generative AI. Too many billions have been invested, and the companies have too much political power. So, the question is whether there will be significant changes to copyright law. On one side of that fight will be the trillions of dollars behind the entertainment industry. On the other side of that fight will be the trillions of dollars behind the tech industry. Of course, individual artists will be trampled in the process.




  • It’s really shitty that this trial is being kept secret. Even if it’s a fair trial, it sure doesn’t have the appearance of a fair trial. I guess Google would prefer the appearance of a corrupt trial if the alternative is embarrassing information getting out.

    Having said that, I really don’t get the issue with this:

    when Google executives used “history-off chats” to destroy conversations after 24 hours even after Google was on a litigation hold.

    You’re not allowed to destroy past chats / emails after you’ve been notified you’re on a litigation hold. That makes sense. You can’t shred any documents or delete any emails. But, this seems to be about current / future communications. It sounds like they started a history-off chat after the lawsuit started, and they may (or may not) have discussed things relevant to the case. AFAIK the default is history-off for chats within Google. So, they’d have had to specifically turn on history for any new chat.

    So, what does that mean. If they’re sued, any current or future communications between executives there have to be history-on communications in case in the future something they say is related to the trial? Are they allowed to chat in person? If they do, is it mandatory that those chats be recorded and transcribed?

    If some communication is allowed to be off-the-record (say a personal chat with someone), it seems weird to say ok, but if you use a text-based program to chat, you can’t have to keep transcripts of that chat and give them to us.



  • I argue that it’s nothing special.

    I disagree given that as far as I know, Valve is the only company in the world that operates this way. I’ve worked plenty of places where the leadership talked about not having a hierarchy, but none of them could actually pull it off. When push came to shove, there were always bosses and those bosses had bosses, and decisions flowed down from the top. There are probably small communes where they’re able to make decisions using consensus, but Valve is a 1000 person company that’s a key player in a major industry.


  • The way he runs the company (flat-hierarchy), it’s mostly self-governing

    Is it really? Or is it a dictatorship under Gabe, but he’s a benevolent dictator who very rarely uses his dictatorial powers? Are there any influential people at Valve who don’t share his vision? Or is he using his power to softly, maybe even unconsciously, ensure that everyone influential sees eye-to-eye with him?

    Don’t get me wrong, that’s a good thing. I like Valve and I like its leadership. But, I don’t think there’s any chance it would survive his death or his stepping down.


  • Yes, Apple locks you in, but Microsoft has always been worse. Apple’s options are proprietary, but Microsoft used their monopoly power to destroy their competitors like Netscape, and waged war against Linux. Meanwhile, Apple switched to a variation of NeXTSTEP which is mostly compatible with Unix and the GNU tools.

    On the iPod / iPhone front, both Google and Apple lock users in to their app stores. Both manufacture un-repairable phones. The non-standard Lightning connector was a pain in the ass, but so was the frequent switching on Android phones from mini-B B to micro-B to USB-C. And, until USB-C there was the constant problem of trying to plug in the phone and getting the orientation of the plug wrong, something Apple got right with Lightning.

    Then there’s advertising / surveillance. Google is an ad-tech company so privacy is never going to be high on their list of priorities for their end-users. Meanwhile, Apple led the way with App Tracking Transparency. Yes, Apple still surveils its users, but at least it doesn’t seem to use that data to rent eyeballs the way Google does.

    Google and Apple are both shitty companies, but if you want a modern smartphone you basically have to deal with one of them. Apple and Microsoft are both shitty companies but if you want a desktop or a laptop, without the constant toil of dealing with Linux, they’re your only options. So, it’s about what bothers you more: anticompetitive actions including embracing standards with the aim of destroying them from within, or annoying proprietary stuff? Planned obsolescence and an extreme aversion to fixability, or slightly less surveillance, a slightly more open system, but much more surveillance?

    Really, what’s needed is proper regulators who can reign in all these shitty companies.