<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://krishna-dhulipalla.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://krishna-dhulipalla.github.io/" rel="alternate" type="text/html" /><updated>2026-05-14T19:05:51+00:00</updated><id>https://krishna-dhulipalla.github.io/feed.xml</id><title type="html">Krishna Vamsi Dhulipalla</title><subtitle>Engineering thoughts on LLM systems, Agentic workflows, and Reverse Engineering.</subtitle><entry><title type="html">Building a LeetCode Solution Visualizer for Interview Prep</title><link href="https://krishna-dhulipalla.github.io/projects/interview%20preparation/developer%20tools/2026/05/14/building-a-leetcode-solution-visualizer-for-interview-prep.html" rel="alternate" type="text/html" title="Building a LeetCode Solution Visualizer for Interview Prep" /><published>2026-05-14T15:30:00+00:00</published><updated>2026-05-14T15:30:00+00:00</updated><id>https://krishna-dhulipalla.github.io/projects/interview%20preparation/developer%20tools/2026/05/14/building-a-leetcode-solution-visualizer-for-interview-prep</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/projects/interview%20preparation/developer%20tools/2026/05/14/building-a-leetcode-solution-visualizer-for-interview-prep.html"><![CDATA[<p>Interview prep has a strange failure mode.</p>

<p>You can solve the problem, pass the sample testcase, and still not feel like you understand the solution deeply enough to explain it in an interview.</p>

<p>That gap usually shows up when someone asks:</p>

<ul>
  <li>Why did this pointer move?</li>
  <li>What changed in the hash map?</li>
  <li>What is the window covering right now?</li>
  <li>Why did binary search discard that half?</li>
  <li>What is the stack doing at this line?</li>
</ul>

<p>The normal way to answer those questions is slow:</p>

<ul>
  <li>add print statements</li>
  <li>rerun the code</li>
  <li>mentally connect each print to the current line</li>
  <li>remove the print statements</li>
  <li>repeat when the next testcase behaves differently</li>
</ul>

<p>I wanted something faster than that.</p>

<p>So I built a LeetCode Solution Visualizer.</p>

<hr />

<h2 id="what-the-visualizer-does">What the visualizer does</h2>

<p>The app lets you paste a Python LeetCode-style solution, provide a testcase, and run the code directly in the browser.</p>

<p>Instead of only showing the final answer, it records the execution trace:</p>

<ul>
  <li>current line</li>
  <li>changed variables</li>
  <li>arrays and lists</li>
  <li>maps and dictionaries</li>
  <li>sets</li>
  <li>scalar values</li>
  <li>return value</li>
  <li>expected-output match</li>
</ul>

<p>Then it lets you step through the solution like a debugger, but with a layout focused on algorithm understanding instead of general-purpose debugging.</p>

<p><img src="/assets/blog/leetcode-visualizer-trace-overview.png" alt="LeetCode Solution Visualizer trace overview" /></p>

<p>The goal is not to replace solving problems.</p>

<p>The goal is to shorten the time between “my code passed” and “I can explain exactly why it passed.”</p>

<hr />

<h2 id="the-problem-with-raw-traces">The problem with raw traces</h2>

<p>The first version showed every executed line.</p>

<p>That was technically correct, but not very useful.</p>

<p>For a small binary-search problem like Koko Eating Bananas, a normal execution can produce dozens of snapshots because every loop condition, repeated <code class="language-plaintext highlighter-rouge">for</code> line, and accumulator update becomes a trace event.</p>

<p>That creates a different problem:</p>

<ul>
  <li>the trace is accurate</li>
  <li>but the learner has to work too hard to find the important changes</li>
</ul>

<p>So I shifted the UI toward variable and data-structure tracking.</p>

<p>The app still keeps the raw event log, but the default view focuses on state updates. If a variable did not change, it usually does not deserve the same visual weight as a line that changed the algorithm state.</p>

<p>That small decision makes the trace feel much closer to how people explain solutions out loud.</p>

<hr />

<h2 id="why-this-helps-in-interview-preparation">Why this helps in interview preparation</h2>

<p>Most interview prep time is not spent writing the final correct code.</p>

<p>A lot of it is spent building intuition:</p>

<ul>
  <li>understanding why a two-pointer solution works</li>
  <li>seeing how a sliding window expands and shrinks</li>
  <li>noticing how a stack represents unmatched state</li>
  <li>confirming that a DP table is being filled in the intended order</li>
  <li>checking edge cases without rewriting the explanation from scratch</li>
</ul>

<p>This tool makes those states visible.</p>

<p>For example, in a sliding-window problem, seeing <code class="language-plaintext highlighter-rouge">left</code>, <code class="language-plaintext highlighter-rouge">right</code>, the active window, the set contents, and the current line together is much easier than reconstructing all of that from print output.</p>

<p><img src="/assets/blog/leetcode-visualizer-sliding-window.png" alt="Sliding window visualization" /></p>

<p>That matters because interviews reward explanation, not just code.</p>

<p>If I can replay a solution and watch the state move, I can usually answer the “why” questions much faster.</p>

<hr />

<h2 id="the-time-saving-part">The time-saving part</h2>

<p>The biggest practical win is removing repetitive debugging work.</p>

<p>Without a visualizer, a typical practice loop looks like this:</p>

<ol>
  <li>Write the solution.</li>
  <li>Add prints for the variables you care about.</li>
  <li>Run one testcase.</li>
  <li>Add more prints because the first ones were not enough.</li>
  <li>Try to map each output line back to the source code.</li>
  <li>Delete the prints.</li>
  <li>Start again for another testcase.</li>
</ol>

<p>With the visualizer, the loop becomes:</p>

<ol>
  <li>Write or paste the solution.</li>
  <li>Add the testcase.</li>
  <li>Run trace.</li>
  <li>Step through the state changes.</li>
</ol>

<p>That saves time, but more importantly, it saves attention.</p>

<p>During interview preparation, attention is the scarce resource. I do not want to spend it formatting print statements or counting which loop iteration produced which output line.</p>

<p>I want to spend it understanding the algorithm.</p>

<hr />

<h2 id="why-i-kept-it-deterministic">Why I kept it deterministic</h2>

<p>One thing I deliberately avoided was using an LLM to “explain” the trace.</p>

<p>LLMs can be helpful, but this part of the product needs to be exact. If the UI says an array cell is active or a pointer moved, it should be because the runtime state proves it.</p>

<p>So the app uses deterministic tracing and conservative visualization rules:</p>

<ul>
  <li>direct subscripts like <code class="language-plaintext highlighter-rouge">nums[i]</code> can highlight <code class="language-plaintext highlighter-rouge">nums[i]</code></li>
  <li><code class="language-plaintext highlighter-rouge">for i in range(len(nums))</code> can highlight the indexed element</li>
  <li><code class="language-plaintext highlighter-rouge">enumerate(nums)</code> can connect index and value</li>
  <li>unique direct iteration values can be highlighted safely</li>
  <li>ambiguous cases fall back to exact variable display</li>
</ul>

<p>That last point is important.</p>

<p>A wrong visualization teaches the wrong intuition. It is better to show less and stay correct than to guess too aggressively.</p>

<hr />

<h2 id="what-i-learned-while-building-it">What I learned while building it</h2>

<p>The hardest part was not running the Python code.</p>

<p>The harder part was deciding what to hide.</p>

<p>A raw trace gives you everything, but everything is not the same as understanding.</p>

<p>The useful version of this app came from reducing noise:</p>

<ul>
  <li>show changed variables clearly</li>
  <li>group data structures by type</li>
  <li>keep raw trace available but collapsed</li>
  <li>make playback controls prominent</li>
  <li>avoid large explanation cards</li>
  <li>let users choose visualization modes for pointers, windows, trees, graphs, and DP</li>
</ul>

<p>That is the difference between a debugger and an interview-prep tool.</p>

<p>A debugger helps you inspect a program.</p>

<p>This app is trying to help you understand an algorithm.</p>

<hr />

<h2 id="where-it-can-go-next">Where it can go next</h2>

<p>There are still several features that could make this more useful:</p>

<ul>
  <li>better recursion visualization</li>
  <li>cleaner tree and graph layouts</li>
  <li>stronger DP table support</li>
  <li>shareable trace URLs</li>
  <li>exporting a trace for notes</li>
  <li>more problem presets</li>
  <li>more precise active-loop and condition context</li>
</ul>

<p>But even in its current form, it already solves the main pain point I had:</p>

<p>I can take a LeetCode solution, run it with a testcase, and quickly see how the algorithm state changes.</p>

<p>That makes practice less about staring at code and more about building intuition.</p>

<hr />

<h2 id="credits">Credits</h2>

<ul>
  <li>LeetCode-style interview problems for the examples and workflows this tool is designed around</li>
  <li>Pyodide for making browser-based Python execution possible</li>
  <li>React and Vite for the frontend foundation</li>
</ul>]]></content><author><name></name></author><category term="Projects" /><category term="Interview Preparation" /><category term="Developer Tools" /><summary type="html"><![CDATA[LeetCode practice is often slowed down by the same problem: you know the code runs, but you do not really see how the variables move. I built a browser-based solution visualizer to make that execution trace easier to inspect, replay, and understand.]]></summary></entry><entry><title type="html">Why Fast AI Power Estimation Matters More Than It Sounds</title><link href="https://krishna-dhulipalla.github.io/engineering/ai%20infrastructure/sustainability/2026/04/28/why-fast-ai-power-estimation-matters-more-than-it-sounds.html" rel="alternate" type="text/html" title="Why Fast AI Power Estimation Matters More Than It Sounds" /><published>2026-04-28T14:00:00+00:00</published><updated>2026-04-28T14:00:00+00:00</updated><id>https://krishna-dhulipalla.github.io/engineering/ai%20infrastructure/sustainability/2026/04/28/why-fast-ai-power-estimation-matters-more-than-it-sounds</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/engineering/ai%20infrastructure/sustainability/2026/04/28/why-fast-ai-power-estimation-matters-more-than-it-sounds.html"><![CDATA[<p>Most AI energy discussions are still framed at the wrong level.</p>

<p>They usually sound like:</p>

<ul>
  <li>model A is bigger than model B</li>
  <li>datacenters consume more power every year</li>
  <li>sustainability should matter more</li>
</ul>

<p>All true.</p>

<p>But for people actually running systems, the more practical question is simpler:</p>

<p><strong>Can we estimate the energy cost of a workload fast enough to make a better decision before we run it?</strong></p>

<p>That is why MIT’s new EnergAIzer work is more useful than it first appears.</p>

<hr />

<h2 id="what-the-mit-work-actually-does">What the MIT work actually does</h2>

<p>The MIT and MIT-IBM Watson AI Lab team built <strong>EnergAIzer</strong>, a framework for estimating GPU power consumption for AI workloads in <strong>seconds</strong> rather than hours or days.</p>

<p>That speed difference is the point.</p>

<p>Traditional approaches often depend on either:</p>

<ul>
  <li>detailed simulation</li>
  <li>low-level hardware profiling</li>
  <li>or slow emulation of how each GPU component gets used over time</li>
</ul>

<p>Those methods can be accurate, but they are too slow when an operator wants to compare many deployment options quickly.</p>

<p>EnergAIzer attacks that bottleneck by modeling the structured patterns that show up in AI kernels and optimized GPU programs. Instead of simulating every detail, it uses those repeated patterns as a scaffold for estimating utilization and then feeds that into a power model.</p>

<p>According to the paper, the result is competitive accuracy with much lower turnaround time:</p>

<ul>
  <li>about <strong>8% power error</strong> on NVIDIA Ampere GPUs</li>
  <li>about <strong>7% error</strong> when forecasting NVIDIA H100 power</li>
  <li>estimation wall time reduced from <strong>hours to seconds</strong></li>
</ul>

<p>That is not perfect prediction. It is fast-enough prediction for engineering decisions.</p>

<hr />

<h2 id="why-this-is-useful-in-practice">Why this is useful in practice</h2>

<p>The most interesting part of this work is not the benchmark number. It is the operational use case.</p>

<h3 id="1-datacenter-scheduling-gets-smarter">1) Datacenter scheduling gets smarter</h3>

<p>If a team can estimate the energy cost of a workload before running it, then placement decisions become more informed:</p>

<ul>
  <li>which GPU type should run this workload?</li>
  <li>should this run at a different frequency?</li>
  <li>which jobs should be co-located?</li>
  <li>where is power likely to be wasted?</li>
</ul>

<p>That matters because AI infrastructure is no longer constrained only by compute availability. It is constrained by:</p>

<ul>
  <li>power budgets</li>
  <li>cooling limits</li>
  <li>queueing delays</li>
  <li>and cost per useful token or training step</li>
</ul>

<p>Fast estimation makes those constraints easier to manage proactively.</p>

<h3 id="2-model-developers-get-feedback-earlier">2) Model developers get feedback earlier</h3>

<p>A lot of efficiency work happens too late.</p>

<p>Teams build the model, run the pipeline, deploy it, and only then start asking why it is expensive.</p>

<p>If you can estimate energy cost earlier, then architecture and inference decisions become easier to compare before production rollout:</p>

<ul>
  <li>longer context vs shorter context</li>
  <li>batch size tradeoffs</li>
  <li>preprocessing choices</li>
  <li>hardware selection for serving</li>
</ul>

<p>That makes energy part of the engineering loop instead of a postmortem metric.</p>

<h3 id="3-hardware-exploration-gets-cheaper">3) Hardware exploration gets cheaper</h3>

<p>The paper also frames EnergAIzer as useful for architectural exploration.</p>

<p>That matters because hardware teams often need fast estimates for design choices well before a configuration is broadly deployed. A tool that can forecast power behavior for emerging accelerator setups is useful even if the final measurements still require later validation.</p>

<hr />

<h2 id="the-larger-shift-energy-is-becoming-a-systems-problem">The larger shift: energy is becoming a systems problem</h2>

<p>The Daily AI Mail coverage makes a useful broader point here: AI sustainability is increasingly becoming an operations and scheduling problem, not just a clean-energy talking point.</p>

<p>That framing feels right to me.</p>

<p>The hard problem is no longer just “make models more efficient in theory.”</p>

<p>It is:</p>

<ul>
  <li>decide where workloads should run</li>
  <li>estimate the cost of those choices quickly</li>
  <li>and make power-aware decisions without slowing the whole workflow down</li>
</ul>

<p>That is a much more practical problem statement.</p>

<p>And it matches the wider infrastructure pressure around AI. MIT notes the Lawrence Berkeley National Laboratory estimate that data centers could consume up to <strong>12% of total U.S. electricity by 2028</strong>. Once the numbers get that large, power estimation stops being a side concern.</p>

<hr />

<h2 id="what-stage-is-this-at-as-of-april-2026">What stage is this at as of April 2026?</h2>

<p>As of <strong>April 2026</strong>, EnergAIzer looks like a promising research result, not a finished industry standard.</p>

<p>Current state:</p>

<ul>
  <li>the MIT News write-up was published on <strong>April 27, 2026</strong></li>
  <li>the arXiv paper was submitted on <strong>April 22, 2026</strong></li>
  <li>the work is being presented at <strong>ISPASS 2026</strong></li>
  <li>the reported results cover real workloads and real GPUs, but the method still needs broader validation across newer configurations and larger multi-GPU settings</li>
</ul>

<p>The authors also explicitly say the next steps are:</p>

<ul>
  <li>testing newer GPU configurations</li>
  <li>scaling the method to many collaborating GPUs</li>
</ul>

<p>So the right reading today is:</p>

<p><strong>important direction, early but credible stage, strong operational relevance</strong></p>

<p>not</p>

<p><strong>problem solved</strong></p>

<hr />

<h2 id="why-i-think-this-matters">Why I think this matters</h2>

<p>What I like about EnergAIzer is that it is not trying to “solve AI sustainability” with one dramatic claim.</p>

<p>It solves a narrower, more useful problem:</p>

<p><strong>give engineers a fast-enough estimate so they can make better infrastructure choices earlier.</strong></p>

<p>That is exactly the kind of systems work that compounds over time.</p>

<p>If teams can make energy-aware decisions before deployment, then efficiency stops being a slogan and starts becoming part of runtime policy.</p>

<p>That is a much better place for the industry to be.</p>

<hr />

<h2 id="credits">Credits</h2>

<ul>
  <li>MIT News, <a href="https://news.mit.edu/2026/faster-way-to-estimate-ai-power-consumption-0427">“A faster way to estimate AI power consumption”</a></li>
  <li>arXiv, <a href="https://arxiv.org/abs/2604.20105">“EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads”</a></li>
</ul>]]></content><author><name></name></author><category term="Engineering" /><category term="AI Infrastructure" /><category term="Sustainability" /><summary type="html"><![CDATA[MIT's EnergAIzer looks modest at first glance: a faster way to estimate GPU power. But that small shift matters because AI efficiency is increasingly becoming a scheduling and capacity-planning problem, not just a research talking point.]]></summary></entry><entry><title type="html">TurboQuant Is Important, but the Real Win Is Narrower Than the Headline</title><link href="https://krishna-dhulipalla.github.io/engineering/llm%20systems/inference/2026/04/28/turboquant-is-important-but-the-real-win-is-narrower.html" rel="alternate" type="text/html" title="TurboQuant Is Important, but the Real Win Is Narrower Than the Headline" /><published>2026-04-28T13:30:00+00:00</published><updated>2026-04-28T13:30:00+00:00</updated><id>https://krishna-dhulipalla.github.io/engineering/llm%20systems/inference/2026/04/28/turboquant-is-important-but-the-real-win-is-narrower</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/engineering/llm%20systems/inference/2026/04/28/turboquant-is-important-but-the-real-win-is-narrower.html"><![CDATA[<p>TurboQuant got attention for a good reason.</p>

<p>It targets one of the most painful inference bottlenecks in modern LLM systems:</p>

<p><strong>the KV cache</strong></p>

<p>As context windows get longer, KV-cache memory becomes one of the main limits on:</p>

<ul>
  <li>how much context you can keep</li>
  <li>how many requests you can serve concurrently</li>
  <li>how expensive inference becomes</li>
</ul>

<p>So a method that promises much smaller memory usage without retraining deserves attention.</p>

<p>It also deserves a more precise reading than the headlines usually give it.</p>

<hr />

<h2 id="what-innovation-turboquant-actually-brings">What innovation TurboQuant actually brings</h2>

<p>The Google Research post frames TurboQuant as a compression method for both:</p>

<ul>
  <li><strong>KV-cache compression</strong> in large language models</li>
  <li><strong>vector search</strong> over high-dimensional embeddings</li>
</ul>

<p>The key technical idea is not one isolated trick. It is the combination of several pieces that work well together.</p>

<h3 id="1-it-removes-a-specific-quantization-tax">1) It removes a specific quantization tax</h3>

<p>Traditional vector quantization often carries hidden memory overhead because it needs extra quantization constants stored in high precision for each small data block.</p>

<p>That overhead sounds small, but when you scale KV caches across long contexts, layers, and many requests, it becomes expensive.</p>

<p>TurboQuant tries to remove that tax.</p>

<h3 id="2-it-combines-polarquant-and-qjl-in-a-useful-way">2) It combines PolarQuant and QJL in a useful way</h3>

<p>Google describes the method in two stages:</p>

<ul>
  <li><strong>PolarQuant</strong> handles most of the compression by rotating vectors and making them easier to quantize cleanly</li>
  <li><strong>QJL (Quantized Johnson-Lindenstrauss)</strong> uses a tiny 1-bit residual correction step to remove bias in inner-product estimation</li>
</ul>

<p>That combination matters because compression alone is not enough. For attention to keep working well, the compressed representation still needs to preserve the relationships that matter for attention scores.</p>

<p>That is where TurboQuant looks more careful than many “just quantize harder” stories.</p>

<h3 id="3-it-is-training-free">3) It is training-free</h3>

<p>One reason this work stands out is that it does not ask teams to retrain or fine-tune models first.</p>

<p>That makes it more operationally interesting.</p>

<p>If a method can be layered onto existing models and inference stacks, it becomes easier to imagine real adoption.</p>

<hr />

<h2 id="why-engineers-care-about-this">Why engineers care about this</h2>

<p>The engineering appeal is straightforward.</p>

<p>If KV-cache memory drops enough, then a team can potentially:</p>

<ul>
  <li>run longer contexts on the same hardware</li>
  <li>increase concurrency</li>
  <li>reduce memory pressure</li>
  <li>lower serving cost for long-context tasks</li>
</ul>

<p>That matters for workloads like:</p>

<ul>
  <li>large-document question answering</li>
  <li>long codebase analysis</li>
  <li>extended chat sessions</li>
  <li>retrieval-heavy agent workflows</li>
</ul>

<p>These are exactly the cases where memory, not raw parameter count, often becomes the harder limit.</p>

<hr />

<h2 id="the-reality-is-still-very-good-just-more-specific">The reality is still very good, just more specific</h2>

<p>The Google Research blog reports strong benchmark results:</p>

<ul>
  <li>at least <strong>6x KV-memory reduction</strong></li>
  <li>up to <strong>8x faster attention-logit computation</strong> on H100 GPUs</li>
  <li>high or near-lossless downstream performance on long-context tasks</li>
</ul>

<p>Those are serious results.</p>

<p>But the Two Minute Papers summary adds useful engineering realism around what that means in practice.</p>

<p>The most useful takeaway from that analysis is not “the claims are wrong.”</p>

<p>It is:</p>

<p><strong>the biggest gains seem to show up in the workloads that are actually bottlenecked by KV-cache memory and long-context attention.</strong></p>

<p>That is an important distinction.</p>

<p>Early practical readings summarized there suggest something closer to:</p>

<ul>
  <li>roughly <strong>30-40% memory reduction</strong> in more realistic usage</li>
  <li>roughly <strong>40% speed improvement</strong> on prompt processing in those same practical settings</li>
</ul>

<p>That is smaller than the headline number, but still highly meaningful.</p>

<p>And honestly, that is often how infrastructure advances work. The lab headline points to the ceiling. The deployment value comes from where the gains remain durable after real constraints show up.</p>

<hr />

<h2 id="what-i-think-the-right-interpretation-is">What I think the right interpretation is</h2>

<p>TurboQuant looks strongest to me in three ways.</p>

<h3 id="1-it-goes-after-a-real-bottleneck">1) It goes after a real bottleneck</h3>

<p>A lot of AI optimization stories feel abstract. This one does not.</p>

<p>KV-cache growth is a concrete cost and capacity problem in long-context inference.</p>

<h3 id="2-it-improves-economics-without-asking-for-retraining">2) It improves economics without asking for retraining</h3>

<p>That makes the idea much more deployable than methods that only look good after heavy model adaptation.</p>

<h3 id="3-it-broadens-the-efficiency-conversation">3) It broadens the efficiency conversation</h3>

<p>The bigger point is not just one algorithm.</p>

<p>It is that <strong>inference efficiency is increasingly about memory movement, cache structure, and data representation</strong>, not only about model weights or FLOPs.</p>

<p>That shift matters.</p>

<hr />

<h2 id="what-stage-is-turboquant-at-as-of-april-2026">What stage is TurboQuant at as of April 2026?</h2>

<p>As of <strong>April 2026</strong>, TurboQuant looks like a strong research result with growing practical interest, but it is still early in deployment terms.</p>

<p>Current stage:</p>

<ul>
  <li>the Google Research post was published on <strong>March 24, 2026</strong></li>
  <li>the paper is accepted at <strong>ICLR 2026</strong></li>
  <li>the underlying paper has been available on arXiv since <strong>April 28, 2025</strong></li>
  <li>community analysis and early reproductions exist</li>
  <li>framework-level adoption still looks early and uneven</li>
</ul>

<p>So the current status is not “universally deployed new standard.”</p>

<p>It is more:</p>

<p><strong>credible technique, meaningful benchmarks, growing external validation, early ecosystem integration</strong></p>

<p>That is already enough to make it important.</p>

<hr />

<h2 id="my-read-on-the-significance">My read on the significance</h2>

<p>I do not think TurboQuant needs exaggerated framing to be impressive.</p>

<p>The innovation is real:</p>

<ul>
  <li>cleaner low-bit compression</li>
  <li>zero-overhead design goals</li>
  <li>strong attention-quality preservation</li>
  <li>relevance for both KV caches and vector search</li>
</ul>

<p>And the practical reality is still strong even if the best-case numbers are not what every workload will see.</p>

<p>For teams working on long-context inference, this looks like one of the more consequential efficiency directions from the last cycle.</p>

<p>Not because it changes everything overnight.</p>

<p>Because it improves one very expensive part of the stack in a way that looks mathematically grounded and operationally useful.</p>

<p>That is enough.</p>

<hr />

<h2 id="credits">Credits</h2>

<ul>
  <li>Google Research, <a href="https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/">“TurboQuant: Redefining AI efficiency with extreme compression”</a></li>
  <li>arXiv, <a href="https://arxiv.org/abs/2504.19874">“TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate”</a></li>
</ul>]]></content><author><name></name></author><category term="Engineering" /><category term="LLM Systems" /><category term="Inference" /><summary type="html"><![CDATA[TurboQuant is one of the more interesting efficiency ideas from this cycle because it targets a real bottleneck: KV-cache memory. The innovation is real. The practical benefit also looks real. But the value is clearest in long-context, memory-bound workloads rather than as a universal compression miracle.]]></summary></entry><entry><title type="html">PostgreSQL Might Be the Most Underrated Tool in Your Stack</title><link href="https://krishna-dhulipalla.github.io/engineering/databases/systems/2026/04/21/postgresql-is-the-most-underrated-tool-in-your-stack.html" rel="alternate" type="text/html" title="PostgreSQL Might Be the Most Underrated Tool in Your Stack" /><published>2026-04-21T15:30:00+00:00</published><updated>2026-04-21T15:30:00+00:00</updated><id>https://krishna-dhulipalla.github.io/engineering/databases/systems/2026/04/21/postgresql-is-the-most-underrated-tool-in-your-stack</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/engineering/databases/systems/2026/04/21/postgresql-is-the-most-underrated-tool-in-your-stack.html"><![CDATA[<p>Modern software teams often assemble a stack by default:</p>

<ul>
  <li>a database</li>
  <li>a cache</li>
  <li>a cron product</li>
  <li>a search tool</li>
  <li>a vector database</li>
  <li>an auth service</li>
  <li>an analytics pipeline</li>
  <li>an API layer sitting in front of all of it</li>
</ul>

<p>Sometimes that is the right call.</p>

<p>But the more I look at PostgreSQL, the more it feels like we often underestimate how much it can already do before we start adding extra tools.</p>

<p>This is not a “replace everything with Postgres” manifesto. It is more me being surprised that there is already a free diamond sitting in the stack, and a lot of us barely use it.</p>

<hr />

<h2 id="why-postgres-gets-underestimated">Why Postgres gets underestimated</h2>

<p>A lot of people still mentally file PostgreSQL under “boring relational database.”</p>

<p>I used to think about it that way too.</p>

<p>PostgreSQL is a general-purpose data platform with:</p>

<ul>
  <li>relational data</li>
  <li>JSONB for semi-structured application data</li>
  <li>full-text search</li>
  <li>extensions for vectors, scheduling, crypto, GraphQL, and more</li>
  <li>row-level security and mature indexing options</li>
  <li>a huge operational knowledge base because it has been battle-tested for years</li>
</ul>

<p>What keeps standing out to me is that this is already a lot of capability in one place.</p>

<p>For many projects, that can mean a simpler setup earlier on.</p>

<hr />

<h2 id="the-part-that-made-me-pause">The part that made me pause</h2>

<p>We sometimes pay for separate services for problems that Postgres might already cover well enough.</p>

<p>Not perfectly. Not always at hyperscale. But often well enough.</p>

<h3 id="1-scheduled-jobs">1) Scheduled jobs</h3>

<p>Need recurring cleanup, backfills, rollups, or TTL-style maintenance jobs?</p>

<p><code class="language-plaintext highlighter-rouge">pg_cron</code> can schedule SQL directly inside the database. That is not the same as a full workflow engine, but it did make me wonder how often people reach for a separate scheduler before they need one.</p>

<h3 id="2-search">2) Search</h3>

<p>For a lot of apps, the first search problem is not “we need Elasticsearch.”</p>

<p>It is more like “we need users to find records quickly, handle a bit of fuzziness, and get decent results.”</p>

<p>Postgres already gives you solid primitives with full-text search, <code class="language-plaintext highlighter-rouge">tsvector</code>, and GIN indexes. If the use case is product search, notes, documents, or internal lookup at a modest scale, that might be enough for much longer than expected.</p>

<h3 id="3-vector-retrieval">3) Vector retrieval</h3>

<p>If you are building retrieval-augmented generation or semantic search, <code class="language-plaintext highlighter-rouge">pgvector</code> changes the conversation quite a bit. Suddenly the default architecture does not always have to be app DB plus separate vector DB from day one.</p>

<p>Having embeddings live next to product data can be simpler and easier to reason about, especially early on.</p>

<h3 id="4-cache-like-behavior">4) Cache-like behavior</h3>

<p>The video also points out that Postgres can imitate some cache use cases with unlogged tables and expiration logic.</p>

<p>That does not mean Postgres is Redis.</p>

<p>It mostly made me think that some teams probably reach for Redis before they have a real Redis problem.</p>

<h3 id="5-api-surface">5) API surface</h3>

<p>With tools like PostgREST or GraphQL layers tied closely to Postgres, a big chunk of CRUD API work can become much thinner. That does not eliminate application logic, but it can remove a lot of repetitive plumbing.</p>

<h3 id="6-auth-adjacent-primitives">6) Auth-adjacent primitives</h3>

<p>Postgres is not a complete auth product in a box, but row-level security, crypto utilities, and token-related patterns can cover more of the access-control side than I used to assume.</p>

<p>That matters because a lot of “auth” problems are really policy and data-access problems.</p>

<hr />

<h2 id="what-openais-architecture-made-me-rethink">What OpenAI’s architecture made me rethink</h2>

<p>The OpenAI engineering post was the biggest reason I wanted to write this at all, because it pushes back on the easy assumption that Postgres is only for smaller workloads.</p>

<p>OpenAI says PostgreSQL has been one of the critical under-the-hood data systems for ChatGPT and the API platform. Over the last year, their PostgreSQL load grew by more than 10x, and they describe scaling it to support read-heavy workloads for roughly 800 million ChatGPT users.</p>

<p>What stood out to me was not just the number. It was how much careful engineering went into making that work.</p>

<p>OpenAI describes a setup centered on:</p>

<ul>
  <li>a single primary Azure PostgreSQL flexible server</li>
  <li>nearly 50 read replicas across regions</li>
  <li>aggressive read offloading</li>
  <li>PgBouncer for connection pooling</li>
  <li>cache-locking to avoid miss storms</li>
  <li>rate limiting across multiple layers</li>
  <li>workload isolation for noisy neighbors</li>
  <li>careful query tuning to avoid expensive joins and ORM-generated badness</li>
</ul>

<p>That is what I found most interesting. It is not “Postgres magically scales.” It is more like “Postgres can go very far when the workload shape is understood and the surrounding engineering is careful.”</p>

<p>There is also an important limit in the same article.</p>

<p>OpenAI is explicit that PostgreSQL is not the answer to everything in their stack. They moved shardable, write-heavy workloads to systems like Azure Cosmos DB and now default new workloads there instead of piling every new table onto the existing PostgreSQL deployment.</p>

<p>That nuance felt important to me:</p>

<ul>
  <li>Postgres can do more than I think many of us assume.</li>
  <li>Postgres still is not a free pass to ignore workload shape.</li>
</ul>

<p>For read-heavy systems with good query discipline, replica strategy, caching, and connection management, it seems like it can go much further than a lot of people expect.</p>

<hr />

<h2 id="the-real-takeaway">The real takeaway</h2>

<p>The interesting lesson to me is not “replace your whole tech stack with Postgres.”</p>

<p>It is more “maybe we should slow down before adding new infrastructure.”</p>

<p>Before buying another SaaS, I think it is worth asking:</p>

<ul>
  <li>Can Postgres already do enough of this?</li>
  <li>Is the simpler architecture better for this stage of the product?</li>
  <li>Are we solving a real problem, or just copying a stack we saw somewhere?</li>
</ul>

<p>I keep coming back to that because modern stacks can make it feel like every feature needs its own tool:</p>

<ul>
  <li>search needs a search company</li>
  <li>AI needs a separate vector platform immediately</li>
  <li>every recurring job needs an external scheduler</li>
  <li>the database should only store rows and nothing more</li>
</ul>

<p>Sometimes that is true. A lot of times, it might just be extra complexity.</p>

<hr />

<h2 id="credits">Credits</h2>

<ul>
  <li>Fireship, <a href="https://www.youtube.com/watch?v=3JW732GrMdg">“I replaced my entire tech stack with Postgres…”</a></li>
  <li>OpenAI, <a href="https://openai.com/index/scaling-postgresql/">“Scaling PostgreSQL to power 800 million ChatGPT users”</a></li>
</ul>]]></content><author><name></name></author><category term="Engineering" /><category term="Databases" /><category term="Systems" /><summary type="html"><![CDATA[PostgreSQL is still treated like 'just the database,' even though it can absorb surprising amounts of search, scheduling, vector, auth-adjacent, and API work. OpenAI's own PostgreSQL scaling story is a good reminder: this tool has far more headroom than most teams assume.]]></summary></entry><entry><title type="html">Building CoreLink AI: An Evidence-Grounded Reasoning Engine That Knows When to Search, Compute, and Stop</title><link href="https://krishna-dhulipalla.github.io/engineering/llm%20systems/agents/2026/04/21/building-corelink-ai.html" rel="alternate" type="text/html" title="Building CoreLink AI: An Evidence-Grounded Reasoning Engine That Knows When to Search, Compute, and Stop" /><published>2026-04-21T15:00:00+00:00</published><updated>2026-04-21T15:00:00+00:00</updated><id>https://krishna-dhulipalla.github.io/engineering/llm%20systems/agents/2026/04/21/building-corelink-ai</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/engineering/llm%20systems/agents/2026/04/21/building-corelink-ai.html"><![CDATA[<p>Most agent demos look convincing right up until the task becomes evidence-heavy.</p>

<p>That is where I kept seeing the same pattern:</p>

<ul>
  <li>the model answered from recall when it should have retrieved</li>
  <li>the runtime called tools without a clear strategy</li>
  <li>the system kept looping even after the evidence quality had clearly degraded</li>
</ul>

<p>I built <strong>CoreLink AI</strong> to address that exact failure mode.</p>

<p>CoreLink is a modular reasoning engine for <strong>evidence-grounded analytical tasks</strong>. The core idea is simple: if correctness depends on finding the right evidence, structuring it properly, and computing over it carefully, then the runtime needs stronger control logic than “prompt the model and hope it reasons well.”</p>

<p>In practice, that meant designing a system that can:</p>

<ul>
  <li>choose retrieval strategies intentionally</li>
  <li>enforce a stronger semantic contract before retrieval begins</li>
  <li>normalize raw evidence into typed structures</li>
  <li>prefer deterministic compute over free-form generation</li>
  <li>acquire missing compute capability in a bounded way when the built-in operation set is not enough</li>
  <li>recover from weak reasoning paths without looping forever</li>
  <li>learn from recent successful and failed strategies across tasks</li>
  <li>refuse to answer when the evidence is not good enough</li>
</ul>

<p>That last point matters more than most people admit.</p>

<hr />

<h2 id="the-3-engineering-lessons-that-shaped-corelink">The 3 engineering lessons that shaped CoreLink</h2>

<h3 id="1-retrieval-is-not-a-single-step-it-is-a-policy-decision">1) Retrieval is not a single step. It is a policy decision.</h3>

<p>One of the most common mistakes in agent systems is treating retrieval as a generic primitive: send a search query, grab top-k results, and let the model sort it out.</p>

<p>That works for shallow tasks. It breaks down on document-heavy analytical work, especially when the answer lives inside tables, multi-page reports, or semi-structured evidence.</p>

<p>CoreLink handles this by selecting retrieval strategies based on the <strong>shape of the task</strong>. But the current architecture pushes this one step earlier: before retrieval begins, the runtime builds a more explicit semantic contract around the question itself.</p>

<p>That includes things like:</p>

<ul>
  <li>evidence period vs publication period</li>
  <li>aggregation period</li>
  <li>display unit basis</li>
  <li>include/exclude constraints</li>
  <li>semantic completeness gaps that should block naive retrieval</li>
</ul>

<p>Only then does strategy selection begin. Depending on the question, the runtime can favor:</p>

<ul>
  <li>table-first retrieval</li>
  <li>text-first retrieval</li>
  <li>hybrid search</li>
  <li>multi-document evidence gathering</li>
</ul>

<p>Instead of assuming one universal search path, the engine treats retrieval as an adaptive stage in the reasoning loop.</p>

<p>This changed the system from “search and summarize” into something closer to <strong>search, test, refine, and only then proceed</strong>.</p>

<hr />

<h3 id="2-tool-use-without-bounded-control-is-just-a-more-expensive-hallucination-loop">2) Tool use without bounded control is just a more expensive hallucination loop.</h3>

<p>Adding tools to an agent does not automatically make it reliable.</p>

<p>In fact, tool-rich systems often fail in a more confusing way: they look grounded because they called APIs or fetched documents, but they still produce weak answers because the runtime has no strong policy for:</p>

<ul>
  <li>when to search again</li>
  <li>when to rotate strategy</li>
  <li>when to compute</li>
  <li>when to repair</li>
  <li>when to stop</li>
</ul>

<p>CoreLink uses bounded control loops instead of open-ended tool chaining.</p>

<p>The runtime plans the task, selects a tool family, shortlists candidates, arbitrates the evidence, extracts structured signals, computes where possible, and then validates whether the result is actually strong enough to return.</p>

<p>If the path is weak, it does not blindly repeat the same move. It can:</p>

<ul>
  <li>perform local reselection</li>
  <li>restart within the same document</li>
  <li>restart across documents</li>
  <li>rotate retrieval strategy</li>
  <li>acquire missing compute capability</li>
  <li>fall back to final synthesis only as a bounded last move</li>
  <li>invoke bounded repair logic</li>
</ul>

<p>This architecture makes that more explicit by turning repair into <strong>typed regime mutation</strong>. The runtime now records what changed, detects when there was no material change, and avoids running the same failed path under a slightly different name.</p>

<p>That distinction is important. Recovery should be <strong>typed and constrained</strong>, not a polite word for “ask the model again.”</p>

<hr />

<h3 id="3-if-the-answer-is-not-auditable-it-is-not-production-grade">3) If the answer is not auditable, it is not production-grade.</h3>

<p>For analytical systems, a fluent answer is not the same thing as a trustworthy one.</p>

<p>I wanted the runtime to produce outputs that are backed by visible evidence and, where possible, exact computation. That led to three design choices that became central to the project:</p>

<ul>
  <li><strong>structured evidence extraction</strong></li>
  <li><strong>deterministic compute first</strong></li>
  <li><strong>lightweight capability acquisition for compute</strong></li>
</ul>

<p>Once retrieval produces candidate material, CoreLink normalizes it into evidence that downstream logic can validate and compute over. When the question is numeric, the system prefers deterministic logic instead of free-form model arithmetic.</p>

<p>And when native deterministic compute is not enough, the runtime can synthesize a <strong>small constrained compute function</strong>, validate it against real structured evidence and simple checks, cache it by operation signature, and then use it as a bounded fallback.</p>

<p>That gives the system a useful middle ground between “unsupported” and “let the LLM do the math.” The generated function is still treated as a deterministic artifact: constrained, validated, cached, and traceable in compute provenance.</p>

<p>This is a much more useful reliability pattern than asking a larger model to “be careful with the math.”</p>

<hr />

<h2 id="the-architecture-mindset">The architecture mindset</h2>

<p>CoreLink is built around a few stable boundaries:</p>

<ul>
  <li><strong>constraint-sensitive semantic planning</strong> to define what the task is really asking for before retrieval starts</li>
  <li><strong>strategy kernel</strong> to choose and rotate retrieval regimes intentionally</li>
  <li><strong>candidate generation and LLM-authoritative evidence arbitration</strong> to narrow evidence intentionally</li>
  <li><strong>structured extraction</strong> to turn raw material into compute-ready signals</li>
  <li><strong>deterministic or synthesized compute</strong> to produce exact outputs when possible</li>
  <li><strong>validation, answerability policy, and typed recovery</strong> to decide whether to finalize, revise, rotate, or fail safely</li>
</ul>

<p>At a high level, the runtime flow looks like this:</p>

<p><code class="language-plaintext highlighter-rouge">intake -&gt; semantic planner -&gt; strategy selector -&gt; candidate generation -&gt; evidence arbiter -&gt; structured extraction -&gt; deterministic or synthesized compute -&gt; validator -&gt; strategy rotation or completion</code></p>

<p>The goal was not to make the runtime look clever. The goal was to make failure modes inspectable.</p>

<p>That is why the system emphasizes:</p>

<ul>
  <li>modular boundaries instead of one giant prompt</li>
  <li>semantic completeness audits before retrieval</li>
  <li>authoritative LLM arbitration over shortlisted evidence instead of layered heuristic tie-breakers</li>
  <li>explicit answerability policy instead of casual fallback answers</li>
  <li>bounded repair instead of recursive improvisation</li>
  <li>journaled strategy outcomes instead of stateless retries</li>
</ul>

<p>Another feature I care about is the <strong>cross-task strategy journal</strong>. The runtime records strategy choice, evidence quality, compute status, validator outcome, and final success or failure, then uses those recent patterns as priors for later tasks in the same process. It is intentionally lightweight and local, but it gives the system memory about what has actually been working.</p>

<p>I also wanted the runtime to stay flexible across domains. So the architecture leans on <strong>A2A and MCP-style tool integration</strong>, where tool capabilities can be discovered and invoked cleanly without baking domain routing rules into the core engine.</p>

<p>That keeps the reasoning policy separate from the concrete tool surface.</p>

<p><img src="/assets/corelink_diagram.png" alt="CoreLink AI Architecture" />
<em>Figure: CoreLink AI is organized around planning, retrieval strategy, evidence extraction, compute, and bounded recovery rather than a single unconstrained reasoning loop.</em></p>

<hr />

<h2 id="why-officeqa-became-an-important-stress-test">Why OfficeQA became an important stress test</h2>

<p>One of the most useful environments for hardening the engine has been <strong>OfficeQA-style document reasoning</strong>.</p>

<p>This class of workload is uncomfortable in exactly the right ways:</p>

<ul>
  <li>answers are buried inside dense source material</li>
  <li>tables matter as much as prose</li>
  <li>extraction quality directly affects compute quality</li>
  <li>weak retrieval can look plausible for several steps before failing</li>
</ul>

<p>That makes it a strong benchmark for whether the runtime is actually grounded, or just producing convincing language around partial evidence.</p>

<p>Working through these tasks pushed CoreLink toward more disciplined semantic planning, strategy rotation with explicit exhaustion policy, LLM-authoritative evidence arbitration, better evidence normalization, deterministic table-aware compute, compute-capability acquisition, and stronger regression testing through smoke and benchmark harnesses.</p>

<p>In other words, the benchmark did not just measure the system. It shaped the system.</p>

<hr />

<h2 id="the-real-design-goal-know-when-to-stop">The real design goal: know when to stop</h2>

<p>The most underrated feature in agent systems is not deeper reasoning. It is disciplined termination.</p>

<p>CoreLink treats failure-to-answer as a valid terminal state when the evidence is weak, conflicting, or incomplete. But the runtime also sharpens this idea: in benchmark settings where the corpus is assumed to be answerable, an insufficiency answer is not treated as a routine safe fallback. It is treated as a <strong>runtime failure diagnosis</strong> that should only appear after explicit exhaustion proof.</p>

<p>A system that always produces an answer is easy to demo.</p>

<p>A system that can say, with justification, “the evidence is not sufficient to answer this reliably” is much harder to build and much more useful in practice. And a benchmark system that can distinguish <strong>true exhaustion</strong> from <strong>premature surrender</strong> is even better.</p>

<p>That is the reliability bar I wanted this project to meet.</p>

<hr />

<h2 id="why-this-project-matters-to-me">Why this project matters to me</h2>

<p>CoreLink AI is my attempt to move beyond the usual agent pattern of “LLM + tools + retries.”</p>

<p>I wanted a runtime where:</p>

<ul>
  <li>evidence beats recall</li>
  <li>computation beats improvised arithmetic</li>
  <li>recovery is explicit</li>
  <li>stopping is a first-class decision</li>
  <li>outputs are easier to inspect, debug, and trust</li>
</ul>

<p>There is still plenty to improve, but the architecture now reflects a clearer engineering stance:</p>

<p><strong>reasoning systems should not just be powerful. They should be bounded, inspectable, and honest about uncertainty.</strong></p>

<hr />

<h2 id="project-links">Project links</h2>

<p>I built this as an open-source reasoning engine:</p>

<p><strong>GitHub:</strong> <a href="https://github.com/krishna-dhulipalla/CoreLink-AI">CoreLink AI</a><br />
<strong>Repository README:</strong> <a href="https://github.com/krishna-dhulipalla/CoreLink-AI#readme">Read the project overview</a></p>

<p>If you work on agent reliability, document-grounded reasoning, or evidence-first LLM systems, I would be interested in your feedback.</p>]]></content><author><name></name></author><category term="Engineering" /><category term="LLM Systems" /><category term="Agents" /><summary type="html"><![CDATA[Most agent systems fail in predictable ways: they trust model recall too much, use tools without policy, and keep reasoning long after the evidence ran out. CoreLink AI was built to make those failure modes explicit, bounded, and auditable.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://krishna-dhulipalla.github.io/assets/corelink_diagram.png" /><media:content medium="image" url="https://krishna-dhulipalla.github.io/assets/corelink_diagram.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why Your Vision Model Is Lying to You (And How to Catch It)</title><link href="https://krishna-dhulipalla.github.io/engineering/mlops/computer%20vision/2026/02/07/vision-model-lying.html" rel="alternate" type="text/html" title="Why Your Vision Model Is Lying to You (And How to Catch It)" /><published>2026-02-07T17:00:00+00:00</published><updated>2026-02-07T17:00:00+00:00</updated><id>https://krishna-dhulipalla.github.io/engineering/mlops/computer%20vision/2026/02/07/vision-model-lying</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/engineering/mlops/computer%20vision/2026/02/07/vision-model-lying.html"><![CDATA[<p>Most people treat computer vision monitoring as just “tracking accuracy.”</p>

<p>I used to think the same—until I deployed models into the messy, unpredictable real world.</p>

<p>What I learned is simple:</p>

<p><strong>Models don’t just fail. They drift conceptually.</strong>
And because they drift in specific ways (lighting changes, camera bumps, weather), they create <em>signals</em> that are easy to miss if you only look at top-line metrics.</p>

<p>This post is a recap of why I built <strong>VIRK (Vision Incident Response Kit)</strong>—a flight recorder for CV pipelines—and the patterns that matter most in production.</p>

<hr />

<h2 id="the-3-biggest-failure-patterns-i-noticed">The 3 biggest failure patterns I noticed</h2>

<h3 id="1-accuracy-is-a-lagging-indicator-and-often-impossible-to-get">1) Accuracy is a lagging indicator (and often impossible to get)</h3>

<p>In production, you rarely have immediate ground truth labels. Waiting for human review means you are reacting days or weeks late.</p>

<p>Instead of waiting for labels, I saw that monitoring <strong>embedding drift</strong> gave me a realtime pulse.</p>

<ul>
  <li><strong>High drift magnitude</strong> often preceded accuracy drops.</li>
  <li><strong>Sudden spikes</strong> indicated environmental shocks (e.g., lights going out).</li>
</ul>

<p>This is exactly why <strong>Drift Detection &gt; Accuracy Monitoring</strong> for immediate operational health.</p>

<p><img src="/assets/blog/virk/drift-monitor.png" alt="Drift vs Accuracy" />
<em>Figure: Drift spikes (red) often predict performance degradation long before labels arrive.</em></p>

<hr />

<h3 id="2-something-is-wrong-isnt-actionable">2) “Something is wrong” isn’t actionable</h3>

<p>Telling an engineer “the model is drifting” is useless. They need to know <em>why</em>.</p>

<p>I found that generic drift scores were just noise without context. The real signal comes from <strong>fingerprinting the cause</strong>:</p>

<ul>
  <li>Is it a <strong>brightness shift</strong>? (Camera exposure issue)</li>
  <li>Is it <strong>motion blur</strong>? (Camera mounting loose)</li>
  <li>Is it <strong>new semantic classes</strong>? (New product type)</li>
</ul>

<p>So I built a <strong>Fingerprinter</strong> that diagnoses the root cause automatically.</p>

<hr />

<h3 id="3-reproducibility-is-the-nightmare">3) Reproducibility is the nightmare</h3>

<p>This is the most practical lesson from on-call rotations:</p>

<p><strong>If you can’t reproduce it, you can’t fix it.</strong></p>

<p>For at least some incidents, the “bad data” was transient. By the time we looked, the stream was back to normal.</p>

<p>That implies:</p>

<ul>
  <li>You capture the <strong>exact batch</strong> of images that caused the drift.</li>
  <li>You capture the <strong>metadata</strong> and <strong>model state</strong>.</li>
  <li>You create an <strong>executable replay script</strong>.</li>
</ul>

<p>I automated this with the <strong>Incident Bundler</strong>, which zips up everything needed to replay the failure locally with one command.</p>

<p><img src="/assets/blog/virk/incident-bundle.png" alt="Incident Bundle Structure" />
<em>Figure: An incident bundle contains everything needed for local reproduction: images, manifest, and replay script.</em></p>

<hr />

<h2 id="the-flight-recorder-mindset">The “Flight Recorder” Mindset</h2>

<p>Once you accept that failures are inevitable, the goal shifts from “prevention” to “fastest possible diagnosis.”</p>

<p><strong>High-assurance vision systems need a black box.</strong></p>

<p>So I designed VIRK to sit alongside the inference service:</p>

<ul>
  <li><strong>Async &amp; Non-blocking</strong>: It never slows down the main prediction loop.</li>
  <li><strong>Load Shedding</strong>: If the system is overwhelmed, it drops diagnostics, not predictions.</li>
  <li><strong>Privacy-aware</strong>: It only saves data when an incident is detected.</li>
</ul>

<hr />

<h2 id="why-this-matters">Why this matters</h2>

<p>If you monitor blindly, production vision systems feel fragile and opaque.</p>

<p>If you monitor <strong>drift + root cause + reproducibility</strong>, incidents become manageable:</p>

<ul>
  <li>You know <strong>when</strong> it’s happening (Drift).</li>
  <li>You know <strong>why</strong> it’s happening (Fingerprint).</li>
  <li>You have the data to <strong>fix it</strong> (Bundler).</li>
</ul>

<p>That’s the reliability standard we need for modern MLOps.</p>

<hr />

<h2 id="project-link-if-youre-curious">Project link (if you’re curious)</h2>

<p>I built this toolkit for myself and open-sourced it:</p>

<p><strong>GitHub:</strong> <a href="https://github.com/krishna-dhulipalla/Vision-Incident-Response-Kit--VIRK-">Vision Incident Response Kit (VIRK)</a>
<strong>Documentation:</strong> <a href="https://github.com/krishna-dhulipalla/Vision-Incident-Response-Kit--VIRK-#readme">Read the docs</a></p>

<p>Setup is a single <code class="language-plaintext highlighter-rouge">pip install</code> away. Let me know what you think!</p>]]></content><author><name></name></author><category term="Engineering" /><category term="MLOps" /><category term="Computer Vision" /><summary type="html"><![CDATA[Production failures in computer vision are rarely simple 'wrong predictions.' They are complex conceptual drifts—blur, lighting, camera shifts. Here’s how I built a 'flight recorder' to catch them before they become incidents.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://krishna-dhulipalla.github.io/assets/blog/virk/drift-monitor.png" /><media:content medium="image" url="https://krishna-dhulipalla.github.io/assets/blog/virk/drift-monitor.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">What December Hiring Signals Really Looked Like (And How to Use Them for January)</title><link href="https://krishna-dhulipalla.github.io/market%20notes/2025/12/30/december-hiring-signals.html" rel="alternate" type="text/html" title="What December Hiring Signals Really Looked Like (And How to Use Them for January)" /><published>2025-12-30T00:00:00+00:00</published><updated>2025-12-30T00:00:00+00:00</updated><id>https://krishna-dhulipalla.github.io/market%20notes/2025/12/30/december-hiring-signals</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/market%20notes/2025/12/30/december-hiring-signals.html"><![CDATA[<p>Most people treat December as a write-off for job search.</p>

<p>I used to think the same—until I started tracking hiring changes daily across my company list.</p>

<p>What I learned is simple:</p>

<p><strong>December isn’t dead. It’s just uneven.</strong>
And because it’s uneven, it creates <em>signals</em> that are easy to miss if you only look at job boards.</p>

<p>This post is a December recap (based on daily hiring momentum tracking), plus how I’m preparing for January.</p>

<hr />

<h2 id="the-3-biggest-december-patterns-i-noticed">The 3 biggest December patterns I noticed</h2>

<h3 id="1-hiring-wasnt-steady--it-was-bursty">1) Hiring wasn’t steady — it was “bursty”</h3>

<p>Instead of smooth growth, I saw days with:</p>

<ul>
  <li>high additions,</li>
  <li>sudden removals,</li>
  <li>and short windows where things moved fast.</li>
</ul>

<p>This is exactly why weekly “momentum” is more useful than just counting open roles.</p>

<p><img src="/assets/blog/december-hiring/market-pulse.png" alt="Market pulse: open roles line + daily net changes bars" />
<em>Figure: December movement shows spikes and slowdowns rather than a straight line.</em></p>

<hr />

<h3 id="2-churn-mattered-more-than-raw-volume">2) “Churn” mattered more than raw volume</h3>

<p>A big company can add a lot and remove a lot in the same window.</p>

<p>That creates a different reality than “hiring is up”:</p>

<ul>
  <li>teams are backfilling,</li>
  <li>roles are being reposted,</li>
  <li>postings can disappear quickly.</li>
</ul>

<p>So I started watching <strong>Added + Removed together</strong>, not just “Added.”</p>

<hr />

<h3 id="3-many-roles-were-short-lived">3) Many roles were short-lived</h3>

<p>This is the most practical December lesson:</p>

<p>If roles close fast, your strategy must change.</p>

<p>For at least some companies in my tracking, the durability signal looked like:</p>

<ul>
  <li>median open time measured in days (not weeks),</li>
  <li>a large fraction of roles closing within a week.</li>
</ul>

<p>That implies:</p>

<ul>
  <li>if you’re applying, you can’t “wait until weekend”</li>
  <li>if you’re networking, earlier is better (you want to be in the loop before the posting)</li>
</ul>

<p><img src="/assets/blog/december-hiring/company-durability.png" alt="Company example: momentum timeline + durability (age buckets)" />
<em>Figure: A company-level view combining daily adds/removes with role lifespan buckets.</em></p>

<hr />

<h2 id="weekday-effect-when-jobs-tend-to-appear-and-disappear">Weekday effect: when jobs tend to appear (and disappear)</h2>

<p>Once you track daily diffs, an uncomfortable truth shows up:</p>

<p><strong>Hiring activity is not evenly distributed across the week.</strong></p>

<p>So I started looking at:</p>

<ul>
  <li>which weekdays had the highest additions,</li>
  <li>which had the highest removals,</li>
  <li>and how that changed during the holiday stretch.</li>
</ul>

<p>Even a simple weekday heatmap makes timing visible.</p>

<p><img src="/assets/blog/december-hiring/weekday-heatmap.png" alt="Weekday heatmap: adds/removes/net by weekday" />
<em>Figure: Some weekdays are consistently more “active” than others.</em></p>

<hr />

<h2 id="booming-vs-freezing-why-december-can-be-misleading">Booming vs freezing: why December can be misleading</h2>

<p>December is full of “false calm.”</p>

<p>A company can look stable because:</p>

<ul>
  <li>it isn’t posting much,</li>
  <li>but it also isn’t removing much.</li>
</ul>

<p>Another company can look active but be:</p>

<ul>
  <li>removing a lot (freeze risk),</li>
  <li>or churning (reposts/backfill).</li>
</ul>

<p>So I tracked a simple distribution:</p>

<ul>
  <li>how many companies were booming vs freezing vs stable each day/week.</li>
</ul>

<p><img src="/assets/blog/december-hiring/booming-freezing.png" alt="Booming vs freezing counts over time" />
<em>Figure: The market mood changes across December; stability can hide churn.</em></p>

<hr />

<h2 id="what-i-expect-in-january-and-how-im-preparing">What I expect in January (and how I’m preparing)</h2>

<p>This part is not a guarantee—just a plan based on how hiring usually behaves after holidays plus what December signals suggest.</p>

<h3 id="likely-january-dynamics">Likely January dynamics</h3>

<ul>
  <li><strong>Reactivations:</strong> paused roles reappear</li>
  <li><strong>New focus areas:</strong> fresh headcount priorities show up</li>
  <li><strong>More consistent cadence:</strong> fewer holiday-driven gaps</li>
  <li><strong>Faster closing windows:</strong> early January can move quickly</li>
</ul>

<h3 id="my-january-prep-checklist">My January prep checklist</h3>

<ol>
  <li>Identify companies with <strong>late-December momentum</strong> (they may carry into January)</li>
  <li>Prioritize companies where roles close fast → be ready to act within 48 hours</li>
  <li>For slow-durability companies → prepare targeted networking and referrals</li>
  <li>Use news only when it aligns with spikes/freezes (context, not distraction)</li>
</ol>

<hr />

<h2 id="why-this-matters">Why this matters</h2>

<p>If you’re applying randomly, December feels quiet and discouraging.</p>

<p>If you watch <strong>momentum + durability + weekday patterns</strong>, December becomes useful:</p>

<ul>
  <li>it shows which companies are gearing up,</li>
  <li>which ones are cleaning up,</li>
  <li>and where speed vs networking actually matters.</li>
</ul>

<p>That’s the mindset I’m taking into January.</p>

<hr />

<h2 id="project-link-if-youre-curious">Project link (if you’re curious)</h2>

<p>I built this tracker for myself and open-sourced it:</p>

<p><strong>GitHub:</strong> <a href="https://github.com/krishna-dhulipalla/Hiring-Trend-Tracker">Repo link</a>
<strong>Related blog post:</strong> <a href="https://krishna-dhulipalla.github.io/job%20search/2025/12/29/hiring-momentum-dashboard.html">Why I built this</a></p>

<p>Setup instructions are included in the repository.</p>]]></content><author><name></name></author><category term="Market Notes" /><summary type="html"><![CDATA[December isn’t “dead.” It’s noisy, uneven, and full of timing signals. Here’s what I observed by tracking company hiring momentum daily—and how to prepare for January after the holidays.]]></summary></entry><entry><title type="html">The Hiring Momentum Dashboard I Wish Existed</title><link href="https://krishna-dhulipalla.github.io/job%20search/2025/12/29/hiring-momentum-dashboard.html" rel="alternate" type="text/html" title="The Hiring Momentum Dashboard I Wish Existed" /><published>2025-12-29T00:00:00+00:00</published><updated>2025-12-29T00:00:00+00:00</updated><id>https://krishna-dhulipalla.github.io/job%20search/2025/12/29/hiring-momentum-dashboard</id><content type="html" xml:base="https://krishna-dhulipalla.github.io/job%20search/2025/12/29/hiring-momentum-dashboard.html"><![CDATA[<p><img src="/assets/blog/hiring-trend-tracker/dashboard.png" alt="Dashboard" /></p>

<p>Most job search tools answer: <strong>“what roles are open right now?”</strong></p>

<p>I wanted a different answer:</p>

<p><strong>“What are companies actually doing—accelerating, freezing, or quietly shifting—and what should I do about it this week?”</strong></p>

<p>So I built a small tool for myself: a <strong>Hiring Trend Tracker</strong> that watches hiring activity across dozens of companies, then turns it into signals that help with:</p>

<ul>
  <li><em>timing</em> (when to apply vs when to network),</li>
  <li><em>momentum</em> (booming vs freezing vs stable),</li>
  <li><em>durability</em> (how long roles typically stay open),</li>
  <li>and <em>context</em> (news that explains spikes and slowdowns).</li>
</ul>

<hr />

<h2 id="why-i-stopped-obsessing-over-individual-roles">Why I stopped obsessing over individual roles</h2>

<p>Job boards already do role search extremely well.</p>

<p>But they don’t tell you:</p>

<ul>
  <li>whether a company is ramping up or cooling down,</li>
  <li>whether jobs close fast (48h urgency) or stay open for weeks (networking-first),</li>
  <li>whether this week is an “apply week” or a “relationship week,”</li>
  <li>and whether a headline actually correlates with real hiring movement.</li>
</ul>

<p>That’s the gap this project tries to fill.</p>

<hr />

<h2 id="the-momentum-board-attention-without-missing-anyone">The Momentum Board: attention without missing anyone</h2>

<p>Tracking 78+ companies is overwhelming if everything looks equally important.</p>

<p>So the dashboard is intentionally split into two sections:</p>

<h3 id="1-this-week-movers">1) <strong>This Week: Movers</strong></h3>

<p>Only companies with meaningful weekly signals get expanded:</p>

<ul>
  <li><strong>accelerating / volatile churn / freezing signals</strong></li>
  <li>a short “why” statement</li>
  <li>a timing hint</li>
</ul>

<h3 id="2-all-others-collapsed-but-still-present">2) <strong>All Others</strong> (collapsed but still present)</h3>

<p>Everyone else is still visible—just collapsed by default.
You can expand the Stable/Quiet groups anytime.</p>

<p>This keeps the dashboard usable daily without hiding companies.</p>

<p><img src="/assets/blog/hiring-trend-tracker/momentum-board.png" alt="Momentum Board showing This Week: Movers + All Others collapsed" />
<em>Figure: Movers are expanded; everyone else stays visible but collapsed.</em></p>

<hr />

<h2 id="what-momentum-means-in-human-terms">What “momentum” means (in human terms)</h2>

<p>Momentum here is not a buzzword. It’s just:<br />
<strong>what changed this week vs last week, and how consistently it’s changing.</strong></p>

<p>A company might be:</p>

<ul>
  <li><strong>Booming:</strong> sustained adds, open roles trending up</li>
  <li><strong>Freezing:</strong> removals dominate, open roles trending down</li>
  <li><strong>Volatile:</strong> lots of adds/removes (churn), unclear direction</li>
  <li><strong>Stable:</strong> low movement</li>
</ul>

<p>And each label includes a simple explanation:</p>

<ul>
  <li>“Net +X in 7d”</li>
  <li>“Removals spike”</li>
  <li>“High churn”</li>
  <li>“Open roles shifted sharply”</li>
</ul>

<hr />

<h2 id="job-lifespan-the-most-practical-signal-i-didnt-expect">Job lifespan: the most practical signal I didn’t expect</h2>

<p>One insight changed how I behave immediately:</p>

<p><strong>How long jobs last at a company.</strong></p>

<p>If most postings disappear quickly, the right move is speed.
If postings linger, the right move is networking and targeting.</p>

<p>So for each company I compute:</p>

<ul>
  <li>median days a role stays open</li>
  <li>percent of roles that close in &lt;7 days</li>
  <li>age buckets (0–3 / 4–7 / 8–14 / 15–30 / 30+)</li>
</ul>

<p>This turns “job search” into <strong>timing strategy</strong>.</p>

<p><img src="/assets/blog/hiring-trend-tracker/lifespan-buckets.png" alt="Role durability / lifespan chart with age buckets" />
<em>Figure: Roles don’t last equally long across companies; durability changes your strategy.</em></p>

<hr />

<h2 id="timing-intelligence-when-to-apply-vs-when-to-network">Timing Intelligence: when to apply vs when to network</h2>

<p>Some companies post new roles on predictable weekdays.
Some remove roles in predictable bursts.</p>

<p>So the tracker surfaces:</p>

<ul>
  <li>best weekday for <strong>posting</strong></li>
  <li>best weekday for <strong>removals</strong></li>
  <li>a confidence score (do we have enough history?)</li>
</ul>

<p>The output is intentionally simple:</p>

<ul>
  <li><strong>“Apply within 48h”</strong></li>
  <li><strong>“Apply within 3–5 days”</strong></li>
  <li><strong>“Networking-first (new focus / freeze risk)”</strong></li>
</ul>

<hr />

<h2 id="news--hiring-trends-only-when-it-explains-a-signal">News + hiring trends: only when it explains a signal</h2>

<p>News is overwhelming when it’s a feed.</p>

<p>Instead, I only show it when:</p>

<ul>
  <li>it aligns with a hiring spike,</li>
  <li>it explains removals/freezing behavior,</li>
  <li>or it coincides with a role-mix shift.</li>
</ul>

<p>So the “news” section becomes:
<strong>context</strong>, not noise.</p>

<p><img src="/assets/blog/hiring-trend-tracker/amazon.png" alt="Example: a headline linked to a hiring spike/freeze signal" />
<img src="/assets/blog/hiring-trend-tracker/news-signal.png" alt="Example: a headline linked to a hiring spike/freeze signal" />
<em>Figure: News can predict hiring trends and explain hiring behavior.</em></p>

<hr />

<h2 id="if-you-want-to-try-it">If you want to try it</h2>

<p>I’ve open-sourced the project here:</p>

<p><strong>GitHub:</strong> <a href="https://github.com/krishna-dhulipalla/Hiring-Trend-Tracker">Repo link</a></p>

<p>Setup instructions are already included in the repository.</p>

<hr />

<h2 id="closing-thought">Closing thought</h2>

<p>A job search gets less stressful when you stop treating it like a lottery and start treating it like a market:</p>

<ul>
  <li>watch momentum,</li>
  <li>understand timing,</li>
  <li>and move when signals are real.</li>
</ul>

<p>That’s what I’m building for.</p>]]></content><author><name></name></author><category term="Job Search" /><summary type="html"><![CDATA[I stopped tracking individual job listings and started tracking hiring behavior—momentum, freezes, job lifespan, and timing. Here’s what changed and why it’s actually useful.]]></summary></entry></feed>