AI Insights & Reading Notes

Updated 03.08.24

Running notes on the mechanics, behavior, and application of large language models from our own experience as well as reading through recent AI papers & implementations (started by Mario Schlosser, @mariots, marioschlosser.org).

Table of Contents

Each chapter is its own page. We’ll keep updating this as new stuff comes out.

  • Collection of take-aways from various sources inside and outside of Oscar.

    Read more

  • How do large language models actually work? The most recent survey/meta-research papers that discuss what LLMs have been shown to be capable of, plus an explainer of the transformer architecture (the T in GPT).

    Read more

  • This is where most of the action has been over the past few months - instructing LLMs to do something simply by giving it cleverly worded prompts.

    Read more

  • The latest on the research on what LLMs are capable of in terms of step-by-step inferencing, planning, and solving more complex problems. The most basic insight is that the largest models are capable of inferencing over 5-10 first-order-type logic steps right out of the box, which is somewhat surprising (given that they are autoregressive and don't have an explicitly coded way to plan ahead).

    Read more

  • How do LLMs scale and get better? What we know about the emergence of capabilities depending on model size and other hyperparamaters.

    Read more

  • How can we connect LLMs to the real world by instructing them to take actions? This has been a hot area of activity. All the AutoGPT and BabyGPT stuff you've probably read about falls into this category. The idea is that you can teach LLMs to inject explicit external calls into their own output (like calls to APIs or “plug-ins”). And then you can loop that onto itself, so LLMs can call LLMs.

    Read more

  • Is it possible to inspect how LLMs function internally (e.g., do they form a model of the world around them, is there a “Paris” neuron?). Interpretability research is trying to understand how LLMs actually gain the capabilities they have. This comes in two flavors: a) which knowledge and which state of the world LLMs really store inside of them, and b) various techniques to look under the hood.

    Read more

  • What do we know about systematic biases and errors that LLMs exhibit? When can we be sure to trust their answers, when not?

    Read more

  • What do we know about limitations exhibited by LLMs? Which fundamental constraints on their performance exist?

    Read more

  • How do you specialize LLMs for esoteric tasks? GPT-4 has a context window of just 32K tokens, which is 25 pages of text or so. That's not enough to really encode complex knowledge. There are various hacks like vector databases (LangChain and GPTIndex fall in this category). But why not just tune the actual LLM and jam more area-specific knowledge into it. That's what fine-tuning is. It comes in three flavors: a) the ideas behind it and when to use it, b) how to make it practical (i.e., how to speed it up and make it use less memory), c) various applications in the real world. The most interesting recent development here is that it turns out you can reduce fine-tuning of a full-scale 7B-parameter LLM to just re-training 1.5M of its weights and get it to do what you want, rather than re-training all 7B weights of it.

    Read more

  • LLMs have a limited context window (GPT-4 can only deal with 32K tokens at a time). That means that a) external knowledge and b) memories generated by the LLM need to come from/be stored elsewhere.

    Read more

  • The most recent applications of LLMs in the clinical space.

    Read more

  • Various other applications. A fun one is Stanford’s paper on generative agents living together in a miniature world, or the Voyager paper on using LLMs to direct Minecraft agents successfully.

    Read more

  • Most likely, foundation models will become more and more commoditized, and cloud providers will enable better and better fine-tuning-as-as-service. Which means, to really make use of LLMs, you have to think hard about systems design - how do you prompt, store learnings, iterate on LLM output etc. This section includes clever examples for that.

    Read more

  • This is the newest frontier of LLMs: originally they could only accept text, so how can you get images and sound into them? The answer is that you can get surprisingly far through embeddings, adapters and/or auto-formalization (see above).

    Read more

  • Not in the area of LLMs, but this is how Midjourney and others create images, and it is actually the most open-sourced part of the large AI models at the moment.

    Read more