<div style="text-align: left; display: inline-block;">
<a href="https://dylandigitalgarden.com/About" style="color: white;">About</a>
</div>
<div style="text-align: left; display: inline-block; margin-left: 10px;">
<a href="https://dylandigitalgarden.com/All+My+Posts" style="color: white;">All My Posts</a>
</div>
<div style="text-align: left; display: inline-block; margin-left: 10px;">
<a href="https://dylandigitalgarden.com/All+My+Thoughts+%26+Experiments" style="color: white;">All My Thoughts & Experiments</a>
</div>
<div style="text-align: left; display: inline-block; margin-left: 10px;">
<a href="https://dylandigitalgarden.com/All+My+Archives" style="color: white;">All My Archives</a>
</div>
## Latest Posts
[[July 31, 2024 LLM & VLM-as-a-Judge]]
> LLM-as-a-Judge is a method to automatically evaluate LLM performance using models instead of human annotation. It involves prompting LLMs to evaluate attributes on a pointwise scale or through pairwise comparison. The evaluation methods have been studied and analyzed for their effectiveness.
[[July 16, 2024 LLMs Evals Thoughts]]
>Evaluating LLMs is important for understanding their abilities and solving real business problems. A good evaluation requires sufficient and high-quality data samples, clear judging criteria, meaningful evaluation tasks, and frequent private benchmarks. The process should adapt to the development of LLMs over time.
</br>
<div style="text-align: right;">
<a href="https://dylandigitalgarden.com/All+My+Posts" style="color: white;">All My Posts</a>
</div>
</br>
## Latest Thoughts & Experiments
[[August 1. Recap for July]]
>In July, I helped my team build a private LLM benchmark. This benchmark is both confidential and tailored to our specific needs.
[[July 23, DSPy with GPT-4o-mini on MMLU-Pro]]
>DSPy is an optimization framework that enhances prompts and responses from models like GPT-4o-mini. It showcases the magic of the framework and demonstrates how to use its powerful optimizers to improve the cost-effective model. The MMLU-Pro dataset is an advanced dataset with complex questions and increased answer choices. The evaluation metric is defined to check if the model's responses match the true answers.
[[July 14, 2024 How to use Yi-Vision with TextGrad]]
> TextGrad is an autograd engine that enhances language models through iterative feedback. It has recently expanded to support multimodal optimization. This guide explains how to adapt TextGrad for use with other models using Yi-Vision. The steps involve making tweaks to a script, adding the model name to a list, and utilizing ChatExternalClient with the API key. The example code demonstrates importing an image and using TextGrad for answering a question about the image. It also includes a loss function for evaluating the answer and an optimizer for improving the answer.
</br>
<div style="text-align: right;">
<a href="https://dylandigitalgarden.com/All+My+Thoughts+%26+Experiments" style="color: white;">All My Thoughts & Experiments</a>
</div>