How to run LLMs and generate AI pics/videos locally

Idk nothing but bump
Basically having chatgpt run on your computer so you can run it completely uncensored, for as long as you want without any restrictions and generate stuff that chatgpt, gemini etc would reject.
 
  • +1
  • Woah
  • Love it
Reactions: sodiumcel, shedontluv-U, Nothing and 2 others
Soft dnrd
Iโ€™m too iqlet brah
I donโ€™t do anything to do with technology because I dislike technology

I still repped all ur posts in the thread because I am indeed mirin the effort
 
  • +1
Reactions: Nothing and Jason Voorhees
Basically having chatgpt run on your computer so you can run it completely uncensored, for as long as you want without any restrictions and generate stuff that chatgpt, gemini etc would reject.
Have you tried it?
 
  • +1
Reactions: Jason Voorhees
time to buy a mac mini i guess
Mac mini is also very decent but id suggest the studio. With mac mini you'll only be able to run smaller LLMs which is fine for day to day tasks but it can't do math, do coding and other heavy stuff.
 
Have you tried it?
Ofc nigga but I run local LLMs more fiddling around and learning things than to generate porn or use for ERP sessions
 
  • +1
Reactions: Nothing and Idontknow-
did read. goos info tbh:yes:
 
  • +1
Reactions: Nothing and Jason Voorhees
Iโ€™m too iqlet brah
I donโ€™t do anything to do with technology because I dislike technology

I still repped all ur posts in the thread because I am indeed mirin the effort
What do you do for work?
 
  • +1
Reactions: jaaba
Mac mini is also very decent but id suggest the studio. With mac mini you'll only be able to run smaller LLMs which is fine for day to day tasks but it can't do math, do coding and other heavy stuff.
was lowky joking but i shared the thread to one of my friends whos going to pursue tech related careers โค๏ธ
 
  • +1
Reactions: kdev and Jason Voorhees
  • +1
Reactions: BigBallsLarry and wilvakc
@TemporaryName
 
  • +1
Reactions: BigBallsLarry
  • So Sad
  • +1
Reactions: Nothing and Jason Voorhees
too broke for sure ๐Ÿ˜ข
Tbh these baller setups are industrial tools, not toys, and they only make sense if you're making money, creating something that many people would use or to learn the inner workings of Al. For the average user, the trade offs of privacy and censorship aren't worth a $5,000 to $10,000 barrier to entry when a monthly subscription provides top tier performance without the overhead.

You buy the Mac Studio or the RTX 5090 rigs because you need an unmonitored, air gapped top of the line powerhouse for professional development or research. Anything else is pointless tbh. But who knows maybe you are secretly a millionaire and love to burn money on fire.
 
  • +1
Reactions: Nothing, lnceIs and BigBallsLarry
Tbh these baller setups are industrial tools, not toys, and they only make sense if you're making money, creating something that many people would use or to learn the inner workings of Al. For the average user, the trade offs of privacy and censorship aren't worth a $5,000 to $10,000 barrier to entry when a monthly subscription provides top tier performance without the overhead.

You buy the Mac Studio or the RTX 5090 rigs because you need an unmonitored, air gapped top of the line powerhouse for professional development or research. Anything else is pointless tbh. But who knows maybe you are secretly a millionaire and love to burn money on fire.
tbh if i was a millionaire i would do this just to generate top tier league of legends rule34
 
  • JFL
  • +1
Reactions: Nothing and Jason Voorhees
@Beastimmung @Frenulum
 
  • +1
Reactions: Beastimmung and Frenulum
@Beastimmung @Frenulum
read this, i wanted to create tiktok brainrot so i make money, my computer is ass tho and im too retarded to set the program up.
 
  • +1
Reactions: Jason Voorhees
read this, i wanted to create tiktok brainrot so i make money, my computer is ass tho and im too retarded to set the program up.
Buy compute over the cloud and make those videos. Its not hard just watch a few videos and you'll get it.
 
  • +1
Reactions: Frenulum
@Nothyng @Yani @DrunkenSailor @tuberculosisinmybal
 
  • +1
  • Hmm...
Reactions: DrunkenSailor, Yani and Nothing
@Wuzzdio
 
  • +1
Reactions: Wuzzdio
@polonaecel
 
  • +1
Reactions: Wuzzdio
How much would this all cost?
 
  • +1
Reactions: Jason Voorhees
  • +1
Reactions: Wuzzdio
Basically having chatgpt run on your computer so you can run it completely uncensored, for as long as you want without any restrictions and generate stuff that chatgpt, gemini etc would reject.
Wow how ??

And AI like lama are not performant enough to make good pic and do we need api key
 
  • +1
Reactions: Jason Voorhees
Wow how ??

And AI like lama are not performant enough to make good pic and do we need api key
I explained exactly how in this thread nigga.
 
  • +1
Reactions: shedontluv-U
@ChadL1te
 
Wow how ??

And AI like lama are not performant enough to make good pic and do we need api key
Also Llama does not make pictures. It is just an LLM like Gemini, ChatGPT, they just predict the next word. They do not generate pixels. What you want to generate pixels is a diffusion model. You use Llama to write the prompt, but you hand that prompt over to Flux to actually draw the picture. Both completely different concepts
 
  • +1
Reactions: shedontluv-U
i just use nano banana pro like a chud
Once you get your $400k pay check after becoming quant trader or AI researcher. Invest in this to become an ascended chud
 
  • +1
Reactions: Lightโ€ข
Also Llama does not make pictures. It is just an LLM like Gemini, ChatGPT, they just predict the next word. They do not generate pixels. What you want to generate pixels is a diffusion model. You use Llama to write the prompt, but you hand that prompt over to Flux to actually draw the picture. Both completely different concepts
I know I am talking the prompt

In no world lama make a prompt Enough detailed to generate a photo like one of the false banners for example

Deepseek's API keys are better for this purpose and are cheaper in tokens.
 
  • +1
Reactions: Jason Voorhees
I know I am talking the prompt

In no world lama make a prompt Enough detailed to generate a photo like one of the false banners for example

Deepseek's API keys are better for this purpose and are cheaper in tokens.
We are talking about local inference here not paying for a cloud provider. Also Not true Llama 3 405B benchmarks right next to GPT-4 in logic and reasoning. If you can't get a 405B or even a 70B model to write a highly detailed prompts for an image generator like Flux that is a skill issue on your end. Look into prompting techniques
 
  • +1
Reactions: shedontluv-U
For everyone asking why bother, give the "Why Build Your Own Sovereign Cloud?" section a quick read - https://wiki.futo.org/index.php/Int..._software#Why_Build_Your_Own_Sovereign_Cloud?

This guide was probably my number one tool on my self-hosting mission. You might wanna try basic network protocols, maybe set up a containerised service, and get familiar with docker/podman.

Self-hosting is a marathon, not a sprint. I'm not finished with mine, it's kinda one of those projects that keeps expanding.

There are numerous ways to scrounge for cheaper hardware, btw. Things like ripping hard drives outta old set-top DVRs, some people are just giving these away, and you can find a TB or two.
 
  • +1
Reactions: Jason Voorhees
For everyone asking why bother, give the "Why Build Your Own Sovereign Cloud?" section a quick read - https://wiki.futo.org/index.php/Introduction_to_a_Self_Managed_Life:_a_13_hour_&_28_minute_presentation_by_FUTO_software#Why_Build_Your_Own_Sovereign_Cloud?

This guide was probably my number one tool on my self-hosting mission. You might wanna try basic network protocols, maybe set up a containerised service, and get familiar with docker/podman.

Self-hosting is a marathon, not a sprint. I'm not finished with mine, it's kinda one of those projects that keeps expanding.

There are numerous ways to scrounge for cheaper hardware, btw. Things like ripping hard drives outta old set-top DVRs, some people are just giving these away, and you can find a TB or two.
What does your setup look like btw? What are you running?
 
  • +1
Reactions: CollioureViews
For everyone asking why bother, give the "Why Build Your Own Sovereign Cloud?" section a quick read - https://wiki.futo.org/index.php/Introduction_to_a_Self_Managed_Life:_a_13_hour_&_28_minute_presentation_by_FUTO_software#Why_Build_Your_Own_Sovereign_Cloud?

This guide was probably my number one tool on my self-hosting mission. You might wanna try basic network protocols, maybe set up a containerised service, and get familiar with docker/podman.

Self-hosting is a marathon, not a sprint. I'm not finished with mine, it's kinda one of those projects that keeps expanding.

There are numerous ways to scrounge for cheaper hardware, btw. Things like ripping hard drives outta old set-top DVRs, some people are just giving these away, and you can find a TB or two.
I just skimmed through the thread. Irs very comprehensive. It's basically in my domain of work. Managing infrastructure, deploying containers, and optimizing local compute environments is my day to day job. You can ask me about this stuff.
 
  • +1
Reactions: CollioureViews
What does your setup look like btw? What are you running?
Mixed use. Mainly, I use it for cloud storage and media playback. It's not like a traditional enterprise rack.

I just skimmed through the thread. Irs very comprehensive. It's basically in my domain of work. Managing infrastructure, deploying containers, and optimizing local compute environments is my day to day job.
I suppose that's a nod of approval! For the creator, of course.
 
  • +1
Reactions: Jason Voorhees
Option 1 Mac Studio setup

Buy the Mac studio. Specifically the Max or Ultra chip with as much memory as you can afford. This is the easiest, cheapest and set and forget setup

View attachment 4995193

On a Mac Studio with 192GB or 256GB of unified memory, the GPU can access a lot of memory. This allows you to run massive Large Language Models like Llama-3 70B or Gemma 4 31B entirely on chip.

You also run a high quantization Llama-3 405B. At 4 bit quantization if my math is correct should be around 230GB of VRAM which is basically as good as ChatGPT-4 but it will be slow. 1-2 tokens per second almost the same speed as fast human typing so it's slow but the Main advantage is you can get a very decent setup for $4-5k and run complex models that can do coding, math and don't hallucinate as much

Option 2 Nvidia AI setup

If you want the absolute best performance possible and also plan to do AI training. You want something that is bullet proof. Then one or multiple RTX 5090. Besides being a gaming beast it is also extremely capable in AI workloads. In fact in startups and many low budget workflows they only use RTX 5090s for everything.

View attachment 4995197

Almost every Al research paper and library PyTorch, TensorFlow is written for NVIDIA/CUDA first. GDDR7 memory provides insane bandwidth you generate images and get tokens within seconds and is in 2026 the industry norm.

There's another variant of this called RTX 6000 Ada or the newer Pro 6000 . Big advantage of this GPU besided more VRAM is Error correction memory which is super important if you are doing serious stuff. You can run this GPU at 100% for months and it won't crash which is great for AI training and fine tuning but it only makes sense to buy this GPU if you are doing AI research work and your work makes you money. It alsp costs as much as a car. Like $10k each.

View attachment 4995329
View attachment 4995215

Problem with this setup is that they stupid expensive and at 32-48GB, you are effectively locked out of the most intelligent models unless you use heavy distillation or multiple cards.
You are limited to 8B to 30B still very powerful but the not the full ChatGPT replacement unless you spend $15k+ on multiple GPUs.

Local LLM models recommendation for each setup


For Mac Studio with 128-256 gb vram


As I said Llama-3 405B with 4 bit quantization is the best for general usecase imo.



You can also run Command R+ and DeepSeek-V3 / R1 mixture of experts but I would stick to Llama because these are more specialized model for specific use cases.




Image and Video Generation models. I know why most of you nighas want to run local LLMs. It is to generate AI porn. For image generation mac studio is decent but not ideal. For generation of pixels you need TFLOPS

View attachment 4995351

and everything like I said is optimized for CUDA and apple uses META which is completely different architecture
so mac studio tries to cope by trying to imitate what nvidia card does through a bunch of complex
processes that I won't get into but the result is long wait times like 30-50 seconds per image. The gold standard rn for images is Flux and for videos it's a nightmare even a 5 second video can take 10-20 minutes on Kling local. If you are on a Mac, you should also look for MLX versions of models on Hugging Face.


For the Nvidia setup with 32-48 GB RAM

Gemma 4 31B. Released just last month. Somehow very few people I think because it got overshadowed by the Gemini-4 release rumours but it's Google's local masterpiece.


View attachment 4995276

On a 5090, it hits 80+ tokens per second and it demolishes Qwen and other older models. It is basically a super intelligent instant messenger. Takes full advantages of the CUDA and tensor cores to give you blazing fast speed and is extremely good for a model that is only 31B parameters.

And for Image and video Generation. This is where Nvidia and Cuda environment earns it's price tag. same Flux image takes 1.5-3 seconds that's it. You can run also run a brand new thing in the market called Real-time SDXL where the image changes as you type the prompt which is super cool.



For Video Generation. Most models are trained and built specifically for Cuda so nvidia again shines here. A 5-second video clip renders in under 60 seconds.

Only problem is you can't get too much detail without spending a shit ton of money. Like you can't generate an 8k picture without running out of memory

If you tech savvy and understand AI on software level or you are an AI engineer. I also encourage you to fiddle on GGUF files and also do some fine tuning to make the AI behave exactly as you want. Even I'm learning about these things and experimenting but be very careful because you can give the AI a lobotomy easily.

For the ultimate baller setup. You can use both Max studio and the Nvidia GPU. This was a pipe dream since forever but Just a few weeks back a company called Tiny Corp released officially signed drivers (TinyGPU) that allow Apple Silicon Macs to talk to NVIDIA GPUs over Thunderbolt 4/5. So you get the best of both world. Only con is the bandwidth. Thunderbolt is fast but it's still slower than a PCIe but still much better than using metal.



P.S Two more cons I forgot to mention. Mac studio is very efficient barely sips like 230-250W even on full load, is dead silent and doesn't get too hot. Nvidia setup is the opposite you easily would need something like 1600W PSU or something to run them. If you live in Europe or in those older houses in america or apartments, a 1600W PC plus a monitor, a fridge on the same circuit will trip the breaker and give you a massive electricity bill. One more thing is that the fans will ramp up to 100% AI inference is intensive and is a sustained heavy load unlike gaming which is spikes, your PC will sound like a jet engine taking off and your PC and even the room it is in, will get hotter. My mum always yells at me every time I turn it on. Just something to keep in mind if you live with someone or have a small room or something.

kind of insane how fast AI has developed

within ~6 years it has gone from being unable to imagine an AI, to AI locally running in your background w/ capability to 3d model stuff for you, code, generate images and videos etc

you could insert the prompt "build me a small engine/prototype of a design", it could (probably) generate an image of a small engine, send a 3d model with circuits + code onto the circuits, send it into a 3d printer, and build any small product anonymously, all locally designed in your house, if you gave it about 10 attempts

in probably 1 year just upscaling the 3d printer + capabilities, anyone could anonymously build almost any product at home, no shipping, given they have the money

that one sentence "in the future, entertainment will be generated" really has come true, people are already near generating TV shows and games, soon their own products
 
Last edited:
  • +1
Reactions: Jason Voorhees
kind of insane how fast AI has developed

within ~6 years it has gone from being unable to imagine an AI, to AI locally running in your background w/ capability to 3d model stuff for you, code, generate images and videos etc

you could insert the prompt "build me a small engine/prototype of a design", it could (probably) generate an image of a small engine, send a 3d model with circuits + code onto the circuits, send it into a 3d printer, and build any small product anonymously, all locally designed in your house, if you gave it about 10 attempts

in probably 1 year just upscaling the 3d printer + capabilities, anyone could anonymously build almost any product at home, no shipping, given they have the money

that one sentence "in the future, entertainment will be generated" really has come true, people are already near generating TV shows and games, soon their own products
That is what happens when you thore Billions of dollars at the problem
 
  • +1
Reactions: Algernon
If interested in my ramblings @Joeseminate
 
  • +1
Reactions: Joeseminate
@Atra @sodiumcel
 
  • +1
Reactions: Atra and sodiumcel
Good post
 
  • +1
Reactions: Jason Voorhees
I have a 5060ti with 16gb of vram and juggernautXL worked pretty well :feelsokman: highly recommend.
 
  • +1
Reactions: Jason Voorhees
I have a 5060ti with 16gb of vram and juggernautXL worked pretty well :feelsokman: highly recommend.
Nice man good to see enthusiasts here
 
  • Love it
Reactions: Insomnia
That is what happens when you thore Billions of dollars at the problem
not sure if you'll agree or disagree w/ me but ig i'll say it anyways

people are always talking about a popping ai bubble, and yet it continues to grow in new ways and have several different applications

obviously progress will eventually slow, but there will (very likely) never be a "bubble pop" where investment values drastically decreases

i'm sure eventually someone will create this AI -> instant product generation at home in a single machine and sell it off to people etc, and there are certainly smarter people who thought of even bigger ideas, AI will slow but still keep growing
 
  • +1
Reactions: estonianslayerr and Jason Voorhees
I have a 5060ti with 16gb of vram and juggernautXL worked pretty well :feelsokman: highly recommend.
I probably sounded too elitist in the OP even though I didn't mean to. Models that us engineers work with are generally much heavier and are more of the workhorse grade. what i mentioned in op are true replacements for frontier llms
 
  • +1
Reactions: estonianslayerr and Insomnia
Nvidia is the best option for sure.

It can play all games at 4K+RT while being an AI powerhouse compared to the mac mini which can't.

hefty price tho
 
  • +1
Reactions: Algernon and Jason Voorhees
Option 1 Mac Studio setup

Buy the Mac studio. Specifically the Max or Ultra chip with as much memory as you can afford. This is the easiest, cheapest and set and forget setup

View attachment 4995193

On a Mac Studio with 192GB or 256GB of unified memory, the GPU can access a lot of memory. This allows you to run massive Large Language Models like Llama-3 70B or Gemma 4 31B entirely on chip.

You also run a high quantization Llama-3 405B. At 4 bit quantization if my math is correct should be around 230GB of VRAM which is basically as good as ChatGPT-4 but it will be slow. 1-2 tokens per second almost the same speed as fast human typing so it's slow but the Main advantage is you can get a very decent setup for $4-5k and run complex models that can do coding, math and don't hallucinate as much

Option 2 Nvidia AI setup

If you want the absolute best performance possible and also plan to do AI training. You want something that is bullet proof. Then one or multiple RTX 5090. Besides being a gaming beast it is also extremely capable in AI workloads. In fact in startups and many low budget workflows they only use RTX 5090s for everything.

View attachment 4995197

Almost every Al research paper and library PyTorch, TensorFlow is written for NVIDIA/CUDA first. GDDR7 memory provides insane bandwidth you generate images and get tokens within seconds and is in 2026 the industry norm.

There's another variant of this called RTX 6000 Ada or the newer Pro 6000 . Big advantage of this GPU besided more VRAM is Error correction memory which is super important if you are doing serious stuff. You can run this GPU at 100% for months and it won't crash which is great for AI training and fine tuning but it only makes sense to buy this GPU if you are doing AI research work and your work makes you money. It alsp costs as much as a car. Like $10k each.

View attachment 4995329
View attachment 4995215

Problem with this setup is that they stupid expensive and at 32-48GB, you are effectively locked out of the most intelligent models unless you use heavy distillation or multiple cards.
You are limited to 8B to 30B still very powerful but the not the full ChatGPT replacement unless you spend $15k+ on multiple GPUs.

Local LLM models recommendation for each setup


For Mac Studio with 128-256 gb vram


As I said Llama-3 405B with 4 bit quantization is the best for general usecase imo.



You can also run Command R+ and DeepSeek-V3 / R1 mixture of experts but I would stick to Llama because these are more specialized model for specific use cases.




Image and Video Generation models. I know why most of you nighas want to run local LLMs. It is to generate AI porn. For image generation mac studio is decent but not ideal. For generation of pixels you need TFLOPS

View attachment 4995351

and everything like I said is optimized for CUDA and apple uses META which is completely different architecture
so mac studio tries to cope by trying to imitate what nvidia card does through a bunch of complex
processes that I won't get into but the result is long wait times like 30-50 seconds per image. The gold standard rn for images is Flux and for videos it's a nightmare even a 5 second video can take 10-20 minutes on Kling local. If you are on a Mac, you should also look for MLX versions of models on Hugging Face.


For the Nvidia setup with 32-48 GB RAM

Gemma 4 31B. Released just last month. Somehow very few people I think because it got overshadowed by the Gemini-4 release rumours but it's Google's local masterpiece.


View attachment 4995276

On a 5090, it hits 80+ tokens per second and it demolishes Qwen and other older models. It is basically a super intelligent instant messenger. Takes full advantages of the CUDA and tensor cores to give you blazing fast speed and is extremely good for a model that is only 31B parameters.

And for Image and video Generation. This is where Nvidia and Cuda environment earns it's price tag. same Flux image takes 1.5-3 seconds that's it. You can run also run a brand new thing in the market called Real-time SDXL where the image changes as you type the prompt which is super cool.



For Video Generation. Most models are trained and built specifically for Cuda so nvidia again shines here. A 5-second video clip renders in under 60 seconds.

Only problem is you can't get too much detail without spending a shit ton of money. Like you can't generate an 8k picture without running out of memory

If you tech savvy and understand AI on software level or you are an AI engineer. I also encourage you to fiddle on GGUF files and also do some fine tuning to make the AI behave exactly as you want. Even I'm learning about these things and experimenting but be very careful because you can give the AI a lobotomy easily.

For the ultimate baller setup. You can use both Max studio and the Nvidia GPU. This was a pipe dream since forever but Just a few weeks back a company called Tiny Corp released officially signed drivers (TinyGPU) that allow Apple Silicon Macs to talk to NVIDIA GPUs over Thunderbolt 4/5. So you get the best of both world. Only con is the bandwidth. Thunderbolt is fast but it's still slower than a PCIe but still much better than using metal.



P.S Two more cons I forgot to mention. Mac studio is very efficient barely sips like 230-250W even on full load, is dead silent and doesn't get too hot. Nvidia setup is the opposite you easily would need something like 1600W PSU or something to run them. If you live in Europe or in those older houses in america or apartments, a 1600W PC plus a monitor, a fridge on the same circuit will trip the breaker and give you a massive electricity bill. One more thing is that the fans will ramp up to 100% AI inference is intensive and is a sustained heavy load unlike gaming which is spikes, your PC will sound like a jet engine taking off and your PC and even the room it is in, will get hotter. My mum always yells at me every time I turn it on. Just something to keep in mind if you live with someone or have a small room or something.

Holy Shit! Mirin the effort and the detailed guide bhai.

Unfortunately this doesn't apply since I'm under 18 but at least with this information I can learn and prepare when I do turn 18.

Thanks for the tag, appreciate the information!
 
Basically having chatgpt run on your computer so you can run it completely uncensored, for as long as you want without any restrictions and generate stuff that chatgpt, gemini etc would reject.
Stabble diffusion but the images generated are usually trash. Grok imagine was the closest thing you could get to generate NSFW images and it was the absolute best thing in the market but normies abused it to the point where XAi had to lobotomize and nerf it.
 

Similar threads

rawr
Replies
6
Views
61
null.
null.
Jason Voorhees
Replies
6
Views
54
Kara
Kara
Chadeep
Replies
9
Views
75
Chadeep
Chadeep
OldRooster
Replies
0
Views
17
OldRooster
OldRooster

Users who are viewing this thread

Back
Top