<p>In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling.</p>

<h3>Follow Us</h3>

<ul>
<li>Frank: <a href="http://twitter.com/praeclarum" target="_blank" rel="nofollow noopener">Twitter</a>,  <a href="http://praeclarum.org" target="_blank" rel="nofollow noopener">Blog</a>, <a href="http://github.com/praeclarum" target="_blank" rel="nofollow noopener">GitHub</a></li>
<li>James: <a href="http://twitter.com/jamesmontemagno" target="_blank" rel="nofollow noopener">Twitter</a>,  <a href="https://montemagno.com" target="_blank" rel="nofollow noopener">Blog</a>, <a href="http://github.com/jamesmontemagno" target="_blank" rel="nofollow noopener">GitHub</a></li>
<li>Merge Conflict: <a href="http://twitter.com/mergeconflictfm" target="_blank" rel="nofollow noopener">Twitter</a>,  <a href="https://www.facebook.com/mergeconflictfm" target="_blank" rel="nofollow noopener">Facebook</a>, <a href="http://mergeconflict.fm" target="_blank" rel="nofollow noopener">Website</a>, <a href="https://www.mergeconflict.fm/discord" target="_blank" rel="nofollow noopener">Chat on Discord</a></li>
<li>Music : Amethyst Seer - Citrine by <a href="https://soundcloud.com/adventureface" target="_blank" rel="nofollow noopener">Adventureface</a></li>
</ul>

<p>⭐⭐ <a href="https://itunes.apple.com/us/podcast/merge-conflict/id1133064277?mt=2&amp;ls=1" target="_blank" rel="nofollow noopener">Review Us</a> ⭐⭐</p>

<p>Machine transcription available on <a href="http://mergeconflict.fm" target="_blank" rel="nofollow noopener">http://mergeconflict.fm</a></p><p><a rel="payment" href="https://www.patreon.com/mergeconflictfm">Support Merge Conflict</a></p>
      

<description>
        &lt;p&gt;In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling.&lt;/p&gt;

&lt;h3&gt;Follow Us&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Frank: &lt;a href="http://twitter.com/praeclarum" target="_blank" rel="nofollow noopener"&gt;Twitter&lt;/a&gt;,  &lt;a href="http://praeclarum.org" target="_blank" rel="nofollow noopener"&gt;Blog&lt;/a&gt;, &lt;a href="http://github.com/praeclarum" target="_blank" rel="nofollow noopener"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;James: &lt;a href="http://twitter.com/jamesmontemagno" target="_blank" rel="nofollow noopener"&gt;Twitter&lt;/a&gt;,  &lt;a href="https://montemagno.com" target="_blank" rel="nofollow noopener"&gt;Blog&lt;/a&gt;, &lt;a href="http://github.com/jamesmontemagno" target="_blank" rel="nofollow noopener"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Merge Conflict: &lt;a href="http://twitter.com/mergeconflictfm" target="_blank" rel="nofollow noopener"&gt;Twitter&lt;/a&gt;,  &lt;a href="https://www.facebook.com/mergeconflictfm" target="_blank" rel="nofollow noopener"&gt;Facebook&lt;/a&gt;, &lt;a href="http://mergeconflict.fm" target="_blank" rel="nofollow noopener"&gt;Website&lt;/a&gt;, &lt;a href="https://www.mergeconflict.fm/discord" target="_blank" rel="nofollow noopener"&gt;Chat on Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Music : Amethyst Seer - Citrine by &lt;a href="https://soundcloud.com/adventureface" target="_blank" rel="nofollow noopener"&gt;Adventureface&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⭐⭐ &lt;a href="https://itunes.apple.com/us/podcast/merge-conflict/id1133064277?mt=2&amp;amp;ls=1" target="_blank" rel="nofollow noopener"&gt;Review Us&lt;/a&gt; ⭐⭐&lt;/p&gt;

&lt;p&gt;Machine transcription available on &lt;a href="http://mergeconflict.fm" target="_blank" rel="nofollow noopener"&gt;http://mergeconflict.fm&lt;/a&gt;&lt;/p&gt;
      </description>

In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling.

Merge Conflict

514: Running Local LLMs in VS Code

514: Running Local LLMs in VS Code

Description