
Can Agents Actually Self-Improve?
Düzenleyen: Nikhil Pareek
9 Temmuz 2026 Perşembe
17:30 GMT-4
9 Temmuz 2026 Perşembe
23:30 GMT-4
Ücret
Ücretsiz
Katılım ücretsiz
Etkinlik Hakkında
Overview A hands-on evening building an agent that provably gets better: simulate the edge cases, score against evals, and gate every release so the next version ships only if it beats the last. You'll leave with the loop, built on open-source tools, running on a laptop. Event details "Self-improving agents" is doing a lot of work in pitch decks right now. Some of it is real, a lot of it isn't, and his evening separates the two by building the real version in front of you. We'll together build the real version live to show the difference. Before anything reaches production, we simulate synthetic users and adversarial scenarios to surface where the agent breaks, score those runs against evals, and feed the failures back in. The eval becomes the gate: nothing ships unless the numbers move the right way. Then we keep it running, routing production traffic back through the same evals so regressions show up as dropping scores instead of support tickets. Full recursive self-improvement is still an open research problem and we won't pretend otherwise, but the bounded version is buildable today, on OpenTelemetry and open-source frameworks you can fork and self-host. Leading the workshop Nikhil, founder and CEO of Future AGI, runs the build live, end to end. He spends his days on the exact problem this evening is about: making agents reliable enough to trust in production, and measurable enough to know they're actually improving. Bring a laptop to: Instrument a baseline agent with OpenTelemetry and capture its starting eval scores Simulate synthetic users and adversarial scenarios to break it on purpose Score the runs against evals so failures become numbers, not anecdotes Read the traces to find the actual root cause, not guess from the output Feed the fixes back in, then re-simulate to confirm the scores moved Gate the ship: promote a new version only if it beats the last one Route production traffic back through the same evals to keep it improving Agenda 5:30 — Snacks and Hellos 6:00 — Guest speaker (to be announced) 7:00 — Live build: the self-improvement loop that actually ships (bring a laptop) 7:30 — Open debate and Q&A: hype versus what works 8:00 — Networking 8:30 — See you on the next iteration For engineers and AI product builders running agents in production who want the version that provably improves, built on open tools they can keep. About Pebblebed Pebblebed is a technical early stage VC founded by Pam Vagata (cofounder of OpenAI, ran AI for Stripe, inventor of FBLearner Flow); Keith Adams (founded Facebook AI Research, was chief architect at Slack, 20th engineer at VMWare) and Tammie Siew (former Sequoia Southeast Asia investor, former Sequoia & Notable Capital backed founder) About Future AGI Future AGI is an open-source AI simulation, evaluation, and observability platform. Teams use it to simulate agents before they ship, score them against real failure modes, and keep watching them in production so quality doesn't quietly degrade over time. It's self-hostable and OpenTelemetry-native, with tracing that plugs into 35+ frameworks. The loop you'll build tonight runs on the same open tooling, yours to fork and take home, no account required.
Mekan Bilgileri
San Francisco
San Francisco, San Francisco, California
Çevreyi keşfet
Ziyaretçilere Ücretsiz
Düzenleyen
Önce yukarıdan "Katılacağım"a bas. İşaretleyince bu etkinliğe katılan sana uygun kişilerle tanışabilirsin.