• 3 Posts
  • 16 Comments
Joined 1 year ago
cake
Cake day: June 29th, 2023

help-circle
  • Good luck! You can try the huggingface-chat repo, or ollama with this web-ui. Both should be decent, as they have instructions to set up a docker container.

    I believe the Llama 3 models are out there in a torrent somewhere, but I didn’t dig to find it. For the 70B model, you’ll probably need around 64GB of RAM available, but the 7B one should run fine with just 8GB. It will be somewhat slow though, compared to the ChatGPT experience. The self-attention mechanism can be parallelized, which is why you will see much better results on a GPU. According to some others that tested it, if you offload some stuff to RAM, you could see ~10-12 tokens per second on an RTX 3090 for certain 70B models. But more capable ones will be at less than 1 token per second, all depending on the context window you use.

    If you don’t have a GPU available, just give the Phi-3 model a try :D If you quantize it to 4 bits, it can apparently get 12 tokens per second on an iPhone haha. It should play nice with pooling information from a search engine, or a vector database like milvus, qdrant or chroma.


  • What db2 already said. Microsoft just released Phi-3 mini, which could, allegedly, run locally on newer smartphones.

    If I understood correctly, the Rabbit thingy just captures your information locally and then forwards it to their server. So, if you want more power, you could probably do the same by submitting the same info to a bigger open source model than Phi-3, like Llama 3, hosted on your homelab. I believe you can set it up with huggingface/gradio, which sort of provides an API that you could use.

    That way, you don’t need a shitty orange box, and can always get the latest open source models with a few lines of code. There are plenty of open source frameworks in the works at the moment, and I believe that we’re not far off from having multi-modal LLMs running on homelab-level hardware (if you don’t mind a bit of lag).




  • Yeah, it’s the Osaifu-Keitai. Apple has it enabled for all phones on the market, while Android phone manufacturers avoid adding it to theirs outside Japan because they would have to pay fees to Sony for it. The funny part is that Sony itself doesn’t enable it for phones outside Japan, even though FeliCa is a subsidiary of Sony :D Another funny bit is that some phones, like the Pixel, are capable of running it on phones made for other markets. Some users were able to force the Osaifu-Keitai app to think the phone was made in Japan, and that was all it took to enable it (although you’d have to root your phone + the manufacturer should have released their phones in Japan, to ensure the chip is capable). So, yeah, although a few years ago it might have been a specific chip being needed in the phone, nowadays it’s mostly software that doesn’t allow you to use the one you have while in Japan.

    All in all, PASMO/Suica/etc is basically a very limited debit card company haha. I guess Japanese people enjoy using it mainly because it puts a cap on how much they can spend (iirc, about 100 euros allowed at once on the card). Japan is a highly consumerist society, so this format was probably adopted (instead of credit/debit cards) mainly to combat it somewhat :D





  • The Framework 13 inch model should be plenty, especially if you want to dev on the go. Much more lightweight and smaller, and you can connect it to external monitors if the screen size is not big enough. Also, you shouldn’t have issues running Linux on either laptops.

    Instead of going for the 16 version, I would use the extra 900-1000 euros (that’s the amount I saw I could save between the two almost maxed-out models) to make a dedicated server or mini-cluster to run your workloads. Deploy Kubernetes or Proxmox on it, and you’ll also get some more practice on it outside work if you want to run stuff for your home lab. That is only if you don’t want to game on your laptop, but I’d still put that money aside to make a desktop.