AI Insights & Reading Notes

Updated 03.08.24

Running notes on the mechanics, behavior, and application of large language models from our own experience as well as reading through recent AI papers & implementations (started by Mario Schlosser, @mariots, marioschlosser.org).

Each chapter is its own page. We’ll keep updating this as new stuff comes out.

Collection of take-aways from various sources inside and outside of Oscar.
Read more
How do large language models actually work? The most recent survey/meta-research papers that discuss what LLMs have been shown to be capable of, plus an explainer of the transformer architecture (the T in GPT).
Read more
This is where most of the action has been over the past few months - instructing LLMs to do something simply by giving it cleverly worded prompts.
Read more
The latest on the research on what LLMs are capable of in terms of step-by-step inferencing, planning, and solving more complex problems. The most basic insight is that the largest models are capable of inferencing over 5-10 first-order-type logic steps right out of the box, which is somewhat surprising (given that they are autoregressive and don't have an explicitly coded way to plan ahead).
Read more
How do LLMs scale and get better? What we know about the emergence of capabilities depending on model size and other hyperparamaters.
Read more
How can we connect LLMs to the real world by instructing them to take actions? This has been a hot area of activity. All the AutoGPT and BabyGPT stuff you've probably read about falls into this category. The idea is that you can teach LLMs to inject explicit external calls into their own output (like calls to APIs or “plug-ins”). And then you can loop that onto itself, so LLMs can call LLMs.
Read more
Is it possible to inspect how LLMs function internally (e.g., do they form a model of the world around them, is there a “Paris” neuron?). Interpretability research is trying to understand how LLMs actually gain the capabilities they have. This comes in two flavors: a) which knowledge and which state of the world LLMs really store inside of them, and b) various techniques to look under the hood.
Read more
What do we know about systematic biases and errors that LLMs exhibit? When can we be sure to trust their answers, when not?
Read more
What do we know about limitations exhibited by LLMs? Which fundamental constraints on their performance exist?
Read more
How do you specialize LLMs for esoteric tasks? GPT-4 has a context window of just 32K tokens, which is 25 pages of text or so. That's not enough to really encode complex knowledge. There are various hacks like vector databases (LangChain and GPTIndex fall in this category). But why not just tune the actual LLM and jam more area-specific knowledge into it. That's what fine-tuning is. It comes in three flavors: a) the ideas behind it and when to use it, b) how to make it practical (i.e., how to speed it up and make it use less memory), c) various applications in the real world. The most interesting recent development here is that it turns out you can reduce fine-tuning of a full-scale 7B-parameter LLM to just re-training 1.5M of its weights and get it to do what you want, rather than re-training all 7B weights of it.
Read more
LLMs have a limited context window (GPT-4 can only deal with 32K tokens at a time). That means that a) external knowledge and b) memories generated by the LLM need to come from/be stored elsewhere.
Read more
The most recent applications of LLMs in the clinical space.
Read more
Various other applications. A fun one is Stanford’s paper on generative agents living together in a miniature world, or the Voyager paper on using LLMs to direct Minecraft agents successfully.
Read more
Most likely, foundation models will become more and more commoditized, and cloud providers will enable better and better fine-tuning-as-as-service. Which means, to really make use of LLMs, you have to think hard about systems design - how do you prompt, store learnings, iterate on LLM output etc. This section includes clever examples for that.
Read more
This is the newest frontier of LLMs: originally they could only accept text, so how can you get images and sound into them? The answer is that you can get surprisingly far through embeddings, adapters and/or auto-formalization (see above).
Read more
Not in the area of LLMs, but this is how Midjourney and others create images, and it is actually the most open-sourced part of the large AI models at the moment.
Read more
Read more
Read more
Read more
Read more

AI Insights & Reading Notes

Table of Contents

00: Some General Insights

01: Foundational Explanations & Capabilities

02: Steering: How To Use Input Text To Control LLMs

03: Reasoning: How LLMs Solve Complex Reasoning Tasks

04: Scaling Laws & Capabilities: Which LLMs Perform How Well

05: Agents: Teaching LLMs To Act

06: Interpretability Research: What LLMs Really Think

07: Alignment: How To Keep LLMs Safe

08: Fundamental Limitations

09: Fine-tuning

10: Memory & Retrieval for LLMs

11: Practical Applications, Part I: Clinical

12: Practical Applications, Part II: Miscellaneous

13: Systems Design: How to Make LLMs Part of a Feedback Loop

14: Multimodality: How To Use LLMs For Visual & Audio Data

15: Generative Graphics

16: Miscellaneous

17: Useful Lists

18: Examples & Tutorials

19: Terminology