• 1 Post
  • 134 Comments
Joined 3 years ago
cake
Cake day: January 17th, 2022

help-circle





  • Not a weird example. I have my self hosted video server (PeerTube) and I tinkered with transcription thanks to whisper.cpp locally. It “works” in the sense that most of it is acceptable. It still does mistake though. I provide all my content, including hosting, at my costs and to anyone in the World for free.

    So… I definitely see the value. I’m only saying that it has downsides and quality-wise relative to professional, it’s still bad.



  • Arguable… it’s OKish at best, definitely nowhere near as good as professional… then IMHO it’s like spotting a spelling mistake in an official document, you instantly look for MORE mistakes then it become distracting. There is something powerful about trust that once it’s broken, it’s hard to get back. Once a spelling or here transcription mistake happens, then we brace for more (rationally so) and it becomes a very taxing endeavor.

    So… sure STT progressed quite a bit but it’s STILL not good enough in a lot of cases.

    Case in point, IMHO when there is a choice, most people (everybody?) would rather have human made captions than AI ones.



  • Some apps are still done this way, e.g. transmission the BitTorrent client, but also ALL self-hosted Web apps. Sure it might feel a bit much to install containers on your phone “just” for that, or having to go through REST API despite being on the same actual device, but still it provides a TON of app.

    Anyway, yes I agree that it is often a better model. Still a lot of apps, e.g. Blender, Inkscape, etc do provide a CLI interface. So one can both use them with a GUI or without. It’s not decoupled like transmission but arguably it covers most needs.



  • Cool tech but I don’t get it. Who cares for ghostly 3D re-creation of moments? It demands so much more than snapping a 2D photo for a result that can be qualified “strange” at best.

    I find XR much more interesting for things that are otherwise impossible, say traveling through the solar system or the human body, playing a rhythm game while punching things in the air, etc.

    This is totally overblown. They are not “worlds”, it’s usually 10x10m spaces at most, nor photorealistic, so much is left out, there is not animation, no physics, etc.

    PS: FWIW I tried some splats in XR, and I also did some photogrametry few years ago. Again it’s an interesting process but it demands a lot for a result that few non-tech people would genuinely be impressed with to the point of replacing their holidays photos with.


  • Which… is “funny” because even though it is a genuine arm race where 2 powerful nations are competing… it’s a pointless one.

    Sure, we do get slightly better STT, TTS, some “generation” of “stuff”, as in human sounding text use for spam and scam, images and now videos without attribution, but the actual hard stuff? Not a lot of real change there.

    Anyway, interesting to see how the chips war unfold. For now despite the grand claim though from both :

    • US with software and models for AI (Claude, OpenAI, etc driven by VC backed funding looking for THE next big thing, which does NOT materialize) ) and hardware, mostly NVIDIA (so happy to sell shovels for the current gold rush) or
    • China with “cheap” to train large models (DeepSeek) and hardware (SMIC, RISC based chips) to “catch-up” without any large production batch with any comparable yield

    neither have produced anything genuinely positive IMHO.


  • Interesting question. I live in Belgium and… well first of all I don’t care for Christmas. I do like to celebrate with family and friends but the religious celebration itself, no. Second I never actually considered it. I do love snow and ice. I recently took on ice skating and… even though I also love the Summer where I can roller blade and skate, knowing that something else is coming is a genuine joy.

    So… I can’t speak for others but I absolutely love the Winter, from hot chocolate to waffle outside to ice skating, hikes in the snow then relaxing by the file place, there is just so much to look for during that season that … never dreamt of “a green Christmas”.

    Edit: I actually had one last year, going to Madeiras, Portuguese island West of Morocco, North of Africa, and… that was fine too. Honestly truth is I don’t really care where and how as long as we share a good time.



  • in any way shape or form

    I’d normally accept the challenge if you didn’t add that. You did though and it, namely a system (arguably intelligent) made an image, several images in fact. The fact that we dislike or like the aesthetics of it or that the way it was done (without prompt) is different than how it currently is remains irrelevant according to your own criteria, which is none. Anyway my point with AARON isn’t about this piece of work specifically, rather that there is prior work, and this one is JUST an example. Consequently the starting point is wrong.

    Anyway… even if you did question this, I argued for more, showing that I did try numerous (more than 50) models, including very current ones. It even makes me curious if you, who is arguing for the capabilities and their progress, if you tried more models than I did and if so where can I read about it and what you learned about such attempts.



  • Image gen did not exist in any way shape or form before.

    Typical trope while promoting a “new” technology. A classic example is 1972’s AARON https://en.wikipedia.org/wiki/AARON which, despite not being based on LLM (so not CLIP) nor even ML is still creating novel images. So… image generation has been existing since at least the 70s, more than half a century ago. I’m not saying it’s equivalent to the implementation since DALLE (it is not) but to somehow ignore the history of a research field is not doing it justice. I have also been modding https://old.reddit.com/r/computationalcrea/ since 9 years, so that’s before OpenAI was even founded, just to give some historical context. Also 2015 means 6 years before CLIP. Again, not to say this is the equivalent, solely that generative AI has a long history and thus setting back dates to grand moments like AlphaGo or DeepBlue (and on this topic I can recommend Rematch from Arte) … are very much arbitrary and in no way help to predict what’s yet to come, both in terms of what’s achievable but even the pace.

    Anyway, I don’t know what you actually tried but here is a short list of the 58 (as of today) models I tried https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence and that’s excluding the popular ones, e.g. ChatGPT, Mistal LeChat, DALLE, etc which I also tried.

    I might be making “the same mistake” but, as I hope you can see, I do keep on trying what I believe is the state of the art of a pretty much weekly basis.



  • What an impressive waste of resources. It’s portrayed as THE most important race and yet what has been delivered so far?

    Slightly better TTS or OCR, photography manipulation that is commercially unusable because sources can’t be traced, summarization that can introduce hallucinations, … sure all of that is interesting in terms of academic research, with potentially some use cases… but it’s not as if it didn’t exist before at nearly the same quality for a fraction of the resources.

    It’s a competitions where “winners” actually don’t win much, quite a ridiculous situation to be in.