• 1 Post
  • 14 Comments
Joined 2 years ago
cake
Cake day: July 3rd, 2023

help-circle
  • This doesn’t account for blinking.

    If your friend blinks, they won’t see the light, and thus would be unable to verify whether the method works or not.

    But how does he know when to open his eyes? He can’t keep them open forever. Say you flash the light once, and that’s his signal to keep his eyes open. Okay, but how long do you wait before starting the experiment? If you do it immediately, he may not have enough time to react. If you wait too long, his eyes will dry out and he’ll blink.

    This is just not going to work. There are too many dependent variables.



  • I used to think like that, but now I’m on the fence since I’ve started working much more closely with packaging. Calling it “linux” is actually kind of harmful for adoption. Devs that claim their software works on Linux mislead people into thinking it works on any Linux distro, which is rarely true. Most of the time, those devs only test on Ubuntu and no other distro.

    Maybe when Snaps finally die out and Flatpak emerges as the one true standard for desktop apps, then that problem will go away once and for all. Until then, I think we should normalize distinguishing Ubuntu, Fedora, Arch, etc as separate “operating systems” instead of “distros”, which is an unnecessary and misleading term anyways.


  • gamer@lemm.eetoAsklemmy@lemmy.mlSuperbowl sadness
    link
    fedilink
    arrow-up
    13
    ·
    12 hours ago

    I’m seeing people say that the broadcaster (Fox Sports, of course) injected cheers into the broadcast for Trump, and boos for Taylor Swift. I don’t want to spread misinfo though so does anyone know if it’s true, or if there’s a way to validate it? (Eg by analyzing the audio)



  • 96 GB+ of RAM is relatively easy, but for LLM inference you want VRAM. You can achieve that on a consumer PC by using multiple GPUs, although performance will not be as good as having a single GPU with 96GB of VRAM. Swapping out to RAM during inference slows it down a lot.

    On archs with unified memory (like Apple’s latest machines), the CPU and GPU share memory, so you could actually find a system with very high memory directly accessible to the GPU. Mac Pros can be configured with up to 192GB of memory, although I doubt it’d be worth it as the GPU probably isn’t powerful enough.

    Also, the 83GB number I gave was with a hypothetical 1 bit quantization of Deepseek R1, which (if it’s even possible) would probably be really shitty, maybe even shittier than Llama 7B.

    but how can one enter TB zone?

    Data centers use NVLink to connect multiple Nvidia GPUs. Idk what the limits are, but you use it to combine multiple GPUs to pool resources much more efficiently and at a much larger scale than would be possible on consumer hardware. A single Nvidia H200 GPU has 141 GB of VRAM, so you could link them up to build some monster data centers.

    Nivida also sells prebuilt machines like the HGX B200 which can have 1.4TB of memory in a single system. That’s less than the 2.6TB for unquantized deepseek, but for inference only applications, you could definitely quantize it enough to fit within that limit with little to no quality loss… so if you’re really interested and really rich, you could probably buy one of those for your home lab.






  • If all you care about is response times, you can easily do that by just using a smaller model. The quality of responses will be poor though, and it’s not feasible to self host a model like chatgpt on consumer hardware.

    For some quick math, a small Llama model is 7 billion parameters. Unquantized that’s 4 bytes per parameter (32 bit floats), meaning it requires 28 billion bytes (28 gb) of memory. You can get that to fit in less memory with quantization, basically reducing quality for lower memory usage (use less than 32 bits per param, reducing both precision and memory usage)

    Inference performance will still vary a lot depending on your hardware, even if you manage to fit it all in VRAM. A 5090 will be faster than an iPhone, obviously.

    … But with a model competitive with ChatGPT, like Deepseek R1 we’re talking about 671 billion parameters. Even if you quantize down to a useless 1 bit per param, that’d be over 83gb of memory just to fit the model in memory (unquantized it’s ~2.6TB). Running inference over that many parameters would require serious compute too, much more than a 5090 could handle. This gets into specialized high end architectures to achieve that performance, and it’s not something a typical prosumer would be able to build (or afford).

    So the TL; DR is no



  • I think this comment encapsulates the problem well: laymen who are not involved in the process in any way (on either side) acting like armchair experts and passing harsh judgement. You’re making some very unfair assumptions based on age, and nothing about the actual technical arguments.

    This is why people like Martin feel justified going on social media to publicly complain, because they know they’ll get a bunch of yesmen with no credible arguments to mindlessly harrass the developers they disagree with. It’s childish and unproductive, and while I’ve personally respected Martin as a developer for a long time, I don’t believe he’s mature enough to be involved in the Rust for Linux effort (tbf, he’s not the only Rust dev with this attitude). If the project fails, it will be because of this behavior, not because of the “old guys” being stubborn.