• AI Breakfast
  • Posts
  • An LLM-based System that Designs and Runs Social Experiments

An LLM-based System that Designs and Runs Social Experiments

Sponsored by

Good morning. It’s Monday, April 22nd.

Did you know: Sam Altman, the CEO of OpenAI, turns 39 today?

In today’s email:

  • LLMs Run Social Experiments

  • Groq’s 800 Tokens per Second

  • Providing Examples Boosts LLM Performance

  • 5 New AI Tools

  • Latest AI Research Papers

  • AI Creates Comics

You read. We listen. Let us know what you think by replying to this email.

In partnership with:

This incredible & FREE workshop (usually $199) is all you need to become an AI Genius & learn about the power & use-cases of AI in 2024.

Sign up now & get it at $0 (offer for first 100 people only) 🎁

In this workshop, you will learn:

🚀 To do quick excel analysis, & make stunning AI-powered PPTs 

🚀 How to build your own personal AI assistant to do work at 1/10th the time

🚀 Multiple use-cases & features of ChatGPT & be on top of AI trends

🚀 Become an expert at prompting & use AI tools like never before

So are you ready to work fewer hours & drive 10x more impact? Hurry! Join the workshop here (100 FREE seats only!) 🎁

Today’s trending AI news stories

An LLM-based System that Designs and Runs Social Experiments

Researchers at MIT and Harvard have created a system using large language models to automatically develop and test theories in social science.

The system employs structural causal models as a foundation for designing agents and experiments. These models, which mathematically represent cause-and-effect relationships, guide the system in generating agents and scenarios that vary along the dimensions specified by the explanatory variables. This method allows for a seamless transition from theory to experimental design and data generation.

To evaluate the system's performance, the researchers conducted experiments on several social scenarios, including bargaining, bail hearings, job interviews, and auctions.

In some cases, the system autonomously proposed and tested hypotheses, while in others, the researchers provided input by selecting hypotheses and modifying agents. The results demonstrated the system's ability to generate findings through empirical means rather than mere model introspection.

Interestingly, when the LLM was asked to predict the experimental results directly, both in terms of path estimates (coefficients in the linear structural causal model) and point predictions, its performance lacked the confidence one would hope for to utilize in an autonomous agent. Direct elicitation of predictions led to significant overestimation of path estimates. However, when provided with the experimental path estimates and asked to predict outcomes for different combinations of variables, the LLM's predictions were markedly more accurate.

The study suggested that LLMs contain latent information about human behavior that can be effectively extracted and explored through the use of structural causal models and simulations. Read the full paper here.

Groq's Breakthrough AI Chip Achieves Blistering 800 Tokens per Second on Meta's LLaMA 3

Groq, a startup developing AI chips, achieved impressive performance in a benchmark test where their chip processed over 800 tokens per second on Meta's LLaMA 3 model.

This suggests Groq's hardware could significantly improve the speed and efficiency of running large language models compared to existing cloud services.

Groq's Tensor Streaming Processor (TSP) is designed specifically for deep learning tasks. It uses specialized hardware to avoid inefficiencies found in general-purpose processors. As large language models become bigger, efficient processing becomes crucial. Groq's chip could challenge established companies like Nvidia and offer a competitive alternative in the AI processor market. Read more.

Hundreds of Examples in Prompts Significantly Boost LLM Performance, Study Finds

A collaborative study by researchers from Google, DeepMind, and others explores Many-Shot In-Context Learning (ICL) for Large Language Models (LLMs).

ICL leverages hundreds or thousands of task-specific examples provided directly within the prompt, enabling performance gains across various tasks like translation, summarization, and question answering.

This method eliminates the need for time-intensive fine-tuning of LLMs (e.g., Google's Gemini 1.5 Pro model). While Many-Shot ICL holds promise for streamlining task-specific training, it necessitates the curation of high-quality examples by prompt writers. Read more.

🖇️ Etcetera - Stories you may have missed

5 new AI-powered tools from around the web

Grimo AI merges Obsidian, GitHub, and Quora, offering a unified platform for knowledge organization, sharing, and learning.

Parny is an AI incident on-call management tool with a social media-style interface. It centralizes alerts, enables real-time calls, and fosters team collaboration.

Direqt integrates AI chatbots into websites for personalized engagement. Conversations drive on-site time, revenue, and user loyalty, enhancing brand differentiation.

DocsHound is a knowledge base software tailored for the AI era. Simplifies documentation processes with modular editing, no-code publishing, and AI-enhanced content creation.

Omnifact is a privacy-focused, GDPR-compliant generative AI platform that enhances productivity, knowledge management, and innovation while prioritizing data security and compliance.

arXiv is a free online library where researchers share pre-publication papers.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.