AI Tinkerers London
The event on the ground floor of challenger bank Monzo kicked off after some socialising, where I spoke to a chap who'd been at an e/acc hackathon the previous weekend (where someone had apparently hooked EEG up to GPT-4 API with dubious efficacy). This was my second time attending an AI Tinkerers meetup and this year they're going monthly. The event series is actively framed as a catalyst by the organiser Louis Knight-Webb.
The talk line-up announced was:
- Task Specific LLMs and LoRA - Dan Cooper at Monzo (challenger bank)
- Stable Diffusion as Game Engine Renderer - Thomas Traum at Traum Inc. (CGI contracting)
- Falcon OS, LLM Operating System - Heiko Hotz at Google (and founder of NLP London)
- Fine-tuning Mistral 7B with Synthetic Data - Felix Brockmeier at Lemon AI (LLM consultancy)
- Fine-tuning Mistral to Outperform GPT-3.5 on Coding Tasks - Sandeep Pani at CodeStory (AI-powered IDE, YC S23)
6. e/hack
There was another talk added at the last minute from Philip Botros, the winner of the e/acc hackathon this weekend, using a combination of Stable Diffusion, LLMs (GPT-4 I think), and ElevenLabs TTS. The project was based around the 'user story' of a 10 year old girl who was interacting with a fantasy world via characters who were made real through LoRA'd diffusion models for each character, such that asking a question like "how do computers remember?" would receive a multisensory response.
1. Finetuning a "small" 4B BioMedLM model
The first talk actually didn't revolve around anything from Monzo (the speaker was a fairly new arrival), but an unnamed biomed enterprise which wanted to do word sense disambiguation (to clarify polysemy in the corpus). The call to action ran along the lines of "is there still a place for narrow and small LLMs?". This became a theme: smaller BERT (encoder-only) Transformers can be finetuned, (Q)LoRA'd, etc. to outperform on specific tasks, and are far cheaper than using full LLMs (at least not perpetually). Felix Brockmeier would revisit this again later.
Dan encouraged a 'trust but verify' approach to claims in the literature, mentioning some dodgy NLU claims from Microsoft in biomedical text mining where binary classification was quietly being assisted by prompt stuffing (putting biomedical expert answers in the prompt and then saying "answer yes or no" which arguably reduces to more of a rephrasing task than a QA task).
He touched on many of the developments that have occurred since his project (fine-tuning BioMedLM on an in-house dataset on a budget of a couple of grand), such as Flash Attention by Tri Dao et al. at Together (mid-2022). He highlighted that it works by making the memory in the self-attention mechanism highly available, i.e. using the finite region of fast GPU memory preferentially.
Flash Attention uses tiling to prevent materialization of [i.e. reading and writing] the large N x N attention matrix on (relatively) slow GPU HBM [High Bandwidth Memory]... resulting in an 7.6x speedup on the attention computation.
2. Stable Diffusion as a renderer
Thomas's talk was a fun exercise in a group of people trying to scry what would be possible if the diffusion step we were watching generate (a few seconds each) was instantaneous.
He said that he really liked Fal's demo [of real-time diffusion generation from a canvas] but wished the input was more fully fledged.
The current generative image tools are toy-like and not useful to artists.
We need the GitHub Copilot for artists.
Artists need: - reliable perspective - framing - iterate over a well-defined scene, not unlimited randomness - a bit of randomness exploration is cool but we know what we want.
He made clear that the choice of tool was not particularly important (I thought he was using Blender but it might have been Cinema4D). This was demo'd hooked up to some custom middleware which I think was SAM/ControlNet and a SDXL prompt.
Just like the Fal TLDraw canvas demo, (which I think is called Draw Fast and was shown by TLDraw's Lu at the December 2023 AITL), the approach was img2img with artistic input controls.
I'd just seen a teaser of a new diffusion renderer taking a wireframe image, fal 3d, and had been dissecting Fal's SDK over the weekend so was very keen on this.
When Felix's cofounder Clemens asked me later in the night which was my favourite talk I said this one, and somehow felt obliged to add "even though it's not useful...". I don't think I truly believe this though, more a kneejerk insecurity of a relatively early technique: with further speed and quality optimisation this could very quickly become a drafting tool for professional grade imagery. Not to mention what happens if you transpose to non-image domains.
(Thinking of music primarily, but who knows if structured generation will mean we can even generate "useful" modalities like documents this way). It's all extremely to play for IMO, hence the thrill.
3. FalconOS: LLM as Operating System
This talk covered ReAct, which is a prompt engineering technique and doesn't work very well. It got a lot of attention at the time as if it had solved the issue of models as agents. The problem statement given seemed to hew closely to the recent Rabbit r1 pitch, that apps are a bug not a feature, and that user preferences should be enough to specify autonomous completion of tasks.
Heiko said he landed on the same "LLM as operating system" analogy as Andrej Karpathy (who popularised it on Twitter in September and in a YouTube video in November).
There was a demo on how a LLM can't tell you distance as the crow flies [besides any locations well-known enough to have been web scraped] and you, or your LLM agent, should use Wolfram Alpha.
So far so familiar.
The novelty came with FalconOS, an open source LLM trained at Google from the Falcon 40B model released mid-2023 as an open source LLM operating system.
There was a demo hooking it into LangChain, but my impression was that this is still a concept awaiting further advances to be realised. I was surprised by a questioner asking about whether LLMs would need to be finetuned on JSON to be better at function calling (a.k.a. structured output). To me, that is solved by guided generation with Outlines - and hints that maybe I should tinker with that project more and present it to raise awareness on its usefulness!
4. Finetuning Mistral with Synthetic Data
Felix's talk covered his work finetuning Mistral with runpod, LoRA, and so on.
The talk was very good, I unfortunately didn't take notes or photos.
Randomly, he acquired a client (contracting for a Latvian bank) via Discord!
5. Finetuning Mistral to Outperform GPT-3.5 on Coding Tasks
Sandeep's talk was on the synthetic data generation task which he kept himself busy with while training a code generation model. He stressed that data quality was key and would be the route to surpassing GPT-4. He was very keen on a tool called DiffTastic (a Rust git diff tool that is structure-aware thanks to tree-sitter).