Love or hate just please explain why. This isn’t my area of expertise so I’d love to hear your opinions, especially if you’re particularly well versed or involved. If you have any literature, studies or websites let me know.
Love or hate just please explain why. This isn’t my area of expertise so I’d love to hear your opinions, especially if you’re particularly well versed or involved. If you have any literature, studies or websites let me know.
They’re annoying to be honest.
I used Qwen 3.5 for some research a few weeks ago, at first the good thing was every sentence was referenced by a link from the internet. So I naturally thought “well, it’s actually researching for me, so no hallucination, good”. Then I decided to look into the linked URLs and it was hallucinating text AND linking random URL to those texts (???), nothing that the AI outputs was really in the web page that was linked. The subject was the same, output and URLs, but it was not extracting actual text from the pages, it was linking a random URL and hallucinating the text.
Related to code (that’s my area, I’m a programmer), I tried to use Qwen Code 3.5 to vibe code a personal project that was already initialized and basically working. But it just struggles to keep consistency, it took me a lot of hours just prompting the LLM and in the end it made a messy code base hard to be maintained, I asked to write tests as well and after I checked manually the tests they were just bizarre, they were passing but it didn’t cover the use cases properly, a lot of hallucination just to make the test pass. A programmer doing it manually could write better code and keep it maintainable at least, writing tests that covers actual use cases and edge cases.
Related to images, I can spot from very far most of the AI generated art, there’s something on it that I can’t put my finger on but I somehow know it’s AI made.
In conclusion, they’re not sustainable, they make half-working things, it generates more costs than income, besides the natural resources it uses.
This is very concerning in my opinion, given the humanity history, if we rely on half-done things it might lead us to very problematic situations. I’m just saying, the next Chernobyl disaster might have some AI work behind it.
Had the same research issue from multiple models. The website it linked existed and was relevant but often the specific page was hallucinated or just didn’t say what it said it did.
In the end it probably created more work than it saved.
Also a programmer and i find it OK for small stuff but anything beyond 1 function and it’s just unmaintainable slop. I tried vibe coding a project just to see what i was missing. Its fine, it did the job, but only if I dont look at the code. Its insecure, inefficient, and unmaintainable.
I agree, I assumed this error was LLM related not Qwen itself. I think LLMs aren’t able to fit the referenced URL within the text extracted from it. They probably do some extensive research (I remember it searched like 20-40 sites), but it’s up to the LLM if it’ll use an exact mention of a given web page or not. So that’s the problem…
Also it’s a complete mess to build frontend, if you ask a single landing page or pretty common interface it may be able to build something reasonable good, but for more complex layouts it’ll struggle a lot.
I think this happens because it’s hard to test interfaces. I never got deep into frontend testing but I know there are ways to write actual visual tests for it, but the LLM can’t assimilate the code and an image easily, we’d need to take constant screenshots of the result, feed it back to the LLM and ask it to fix until the interface matches what you want. We’d need a vision capable mode more a coding one.
I mean you may get good results for average and common layouts, but if you try anything different you’ll see a huge struggle from LLMs.
For context and to your knowledge of the field, is Qwen 3.5 supposed to be cutting edge?
It’s the best open source model, pretty next to Claude on benchmarks.
Is Qwen really Open Source, or do they just let you download weights? (Like LLaMa.)
Not sure now, but it says Apache 2.0 in their GitHub repo.
Qwen 3.5 is one of the best of the open-weight (self-host able) models right now. It’s not as good as some of the extra massive proprietary models like the bigger Claude models.
ah ok, I have some experience hosting Ollama and of course stable diffusion, but haven’t really messed with too many others, thanks for the insight!
Qwen 3.5 can be run via ollama
well now I have something to do this weekend if the weather is poor, thank you!