modulus

modulus@lemmy.ml · 5 days ago

One of the things you’re missing is the same techniques are applicable to multimodality. They’ve already released a multimodal model: https://seekingalpha.com/news/4398945-deepseek-releases-open-source-ai-multimodal-model-janus-pro-7b

modulus@lemmy.ml · 8 days ago

Advertising, cryptocoin shit, pay to play… This is an awful idea.

modulus@lemmy.ml · 9 months ago

Definitely, AP is not magic. But if even within one protocol round-tripping and full-fidelity is impossible or very difficult, that makes it only harder and less likely through a bridge.

modulus@lemmy.ml · 9 months ago

IMO bridging or translation isn’t federation per se. Also it seems unlikely that protocols would converge to that extent. In fact AP implementations are already different enough that even within the same protocol it’s hard to represent all the different activities instances can present.

modulus@lemmy.ml · 9 months ago

I wouldn’t really count Mastodon/Bluesky bridging as federation. They’re incompatible protocols that were never intended to work together (arguably Bluesky was explicitly designed to avoid using AP).

modulus@lemmy.ml · 1 year ago

So, not super sure what this is or how this works. Is the idea that you run the cgi, it sets up static files, and it responds to AP requests like follows, mentions, boosts and such? I realise lots of people don’t like long docs but I didn’t really understand the use case very well.

modulus@lemmy.ml · 1 year ago

On my instance, the following control measures apply:

Only public posts are visible through the web interface.
Only public posts appear on RSS.
Following requires approval.
Authorised fetch is required.

So I think I have reason to feel fairly strongly that follower only posts are not public, and even unlisted posts are reasonably restricted.

modulus@lemmy.ml · 1 year ago

Not saying this won’t have any negative effects on people, however I think it’s a little premature to guess at what it will be like. About 3/4 of the article is commenting what it will do to men when we find out only at the end women are the majority of users.

modulus@lemmy.ml · 1 year ago

As far as I can tell, this is incorrect. If there’s a post on instance A, a reply from instance B, and someone on instance C follows the OP on A but not the RP on B, they will only see the OP without the reply.

Source: I very often notice this because I run a single-user instance, and when I open a thread it’s incomplete, lacking posts from instances that I have not suspended.

modulus@lemmy.ml · 1 year ago

The biggest issues for me are:

No centralisation means there’s no canonical single source of truth.
Account migration.
Implementation compatibility.

No single source of truth leads to the weird effect that if you check a post on your instance, it will have different replies from those on a different instance. Only the original instance where it got posted will have a complete reply set–and only if there are no suspensions involved. Some of this is fixable in principle, but there are technical obstacles.

Account migration is possible, but migration of posts and follows is non-trivial, Also migration between different implementations is usually not possible. Would be nice if people could keep a distinction between their instance, and their identity, so that the identity could refer to their own domain, for example.

Last, the issue with implementation compatibility. Ideally it should be possible to use the same account to access different services, and to some extent it works (mastodon can post replies to lemmy or upvote, but not downvote, for example).

modulus@lemmy.ml · 1 year ago

Well, in a way that’s what we’re doing now, and by and large it works but obviously there’s some leakage, which is impossible to bring down to zero but which makes sense working on improving.

The other side of the coin is that the price of this moderation model is subjecting a lot more people to a lot more horrible shit, and I unfortunately don’t know any way around that.

modulus@lemmy.ml · 1 year ago

Perhaps the manual reporting tool is enough? Then that content can be forwarded to the central ms service. I wonder if that API can report back to say whether it is positive.

The problem with a lot of this tooling is you need some sort of accreditation to use it, because it somewhat relies on security through obscurity. As far as I know you can’t just hit MS’s servers and ask “is this CSAM?” If something like that were possible it might work.

Can you elaborate on the hash problem?

Sure. When you have an image, you can do lots of things to it that change it in some way: change the compression, the format, crop it, apply a filter… This all changes the file and so it changes the hash. The perceptual hash system works on the basis of some computer vision stuff and the idea is that it will try to generate the same hash for pictures that are substantially the same. But this tech is imperfect and probably will have changes. So if there’s a change in the way the hash gets calculated, it wouldn’t be enough with keeping hashes, you’d have to keep the original file to recalculate, which is storing CSAM, which is ordinarily not allowed and for good reason.

For a hint on how bad these hashes can get, they are reversible, vulnerable to pre-image attacks, and so on.

Some of this is probably inevitable in this type of systems. You don’t want to make it easy for someone to hit the servers with a large number of hashes, and then use IPFS or BitTorrent DHT to retrieve positives (you’d be helping people getting CSAM). The problem is hard.

Personally I was thinking of generating a federated set based on user reporting. Perhaps enhanced by checking with the central service as mentioned above. This db can then be synced with trusted instances.

Something like that could work, maybe obscuring some of the hash content (random parts of it) so that it doesn’t become a way to actually find the stuff.

Whatever decisions are made have to be well thought through so as not to make the problem worse.

modulus@lemmy.ml · 1 year ago

IMO the hardest part is the legal side, and in fact I’m not very clear how MS skirted that issue other than through US lax enforcement on corporations. In order to have a db like this one must store stuff that is, ordinarily, illegal to store. Because of the use of imperfect, so-called perceptual hashes, and in case of algorithm updates, I don’t think one can get away with simply storing the hash of the file. Some kind of computer vision/AI-ish solution might work out, but I wouldn’t want to be the person compiling that training set…