• AI Breakfast
  • Posts
  • New prompting method bypasses AI safety measures

New prompting method bypasses AI safety measures

Good morning. It’s Friday, March 8th.

Did you know: US President Joe Biden mentioned the banning of “AI Voice Impersonations” during last night’s SOTU address?

In today’s email:

  • Google engineer indicted for stealing AI trade secrets

  • Microsoft engineer flags safety issues in AI image tool

  • New prompting method bypasses AI safety measures

  • Small AI model excels at math, outperforming giants

  • 'Mind wipe' technique removes dangerous AI knowledge

  • Hugging Face enters robotics, hires ex-Tesla scientist

  • AI may be better at prompting itself than engineers

  • Anthropic chatbot simulates self-awareness

  • Anthropic's Claude 3 codes entire apps in minutes

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

AI Espionage & Safety Bypasses

> Google Engineer Indicted for Alleged Theft of AI Trade Secrets: A federal grand jury has indicted Linwei Ding, a Google engineer, for allegedly stealing trade secrets related to Google's AI chip software and hardware. Ding, also known as Leon Ding, is accused of covertly working for China-based companies to gain an edge in the AI technology race. The stolen data includes designs for Google's tensor processing unit (TPU) chips, hardware and software specifications for GPUs used in Google's data center, and machine learning workload designs. Ding allegedly transferred the files to his personal Google Cloud account using tactics to avoid detection. If convicted on the four counts of theft of trade secrets, Ding could face up to ten years in prison and a $250,000 fine per count. Read more.

> Microsoft Engineer Flags Safety Concerns Over AI Image Generator: A Microsoft AI engineer, Shane Jones, has raised serious safety concerns about the company's Copilot Designer tool, alleging that it can generate disturbing images, including those depicting violence, sexual content, and politically charged scenarios. Jones claims that despite repeatedly flagging these issues since December, Microsoft has refused to remove the tool and pressured him to silence public criticism. The company maintains its commitment to employee feedback, but has yet to address the specific flaws raised about Copilot Designer. This incident highlights the ongoing challenges of responsible AI development and the need for rigorous safeguards to prevent the creation of harmful content. Read more.

> New Prompting Technique Bypasses AI Safety Measures, Raising Security Concerns: Researchers have developed a technique called ArtPrompt that allows users to bypass safeguards built into large language models (LLMs) like GPT-3.5 and GPT-4. Using ASCII art prompts, the technique enables users to generate responses on topics these models are typically programmed to reject. ArtPrompt manipulates prompts by replacing sensitive words with ASCII art representations, effectively circumventing safety protocols. This development highlights potential vulnerabilities in AI systems, as even models with safeguards are susceptible to exploitation. Read more.

LLM Math, ‘Mind Wipe’, and HuggingFace Robotics

> Small AI Model Outperforms Giants in Math, Proving Size Isn't Everything: Microsoft's development of Orca-Math, a specialized language model for solving grade school math problems, demonstrates the potential for smaller models to outperform significantly larger ones in specific areas. Fine-tuned from their Mistral 7B model, Orca-Math achieves an accuracy of 86.8%, surpassing models like Llama 2, GPT-3.5, and Gemini Pro, despite using far less training data. ​​This study underscores the power of targeted AI specialization, paving the way for greater accessibility and efficiency in AI technologies. Read more.

> Researchers Develop 'Mind Wipe' Technique to Remove Dangerous Knowledge from AI: The new study, informed by over 20 experts in biosecurity, chemical weapons, and cybersecurity, establishes a framework for assessing AI model's potential misuse. The technique selectively erases harmful knowledge from large language models while preserving overall functionality. Initial tests in life sciences and cybersecurity domains show a decrease in model performance on dangerous tasks, suggesting successful knowledge removal. While further refinement is needed, the technique aligns with President Biden's AI Executive Order and holds promise for ensuring AI safety. This research emphasizes the need for multi-layered safeguards, including unlearning, to mitigate risks as AI evolves rapidly. Read more.

> Hugging Face Makes Robotics Push, Led by Ex-Tesla Scientist, hiring former Tesla lead Remi Cadene to spearhead the initiative. Cadene's experience with Tesla's Optimus humanoid robot, combined with Hugging Face's AI expertise, signals a promising venture. While project specifics remain undisclosed, a job listing for an "Embodied Robotics Engineer" suggests a focus on integrating AI into robots that can perceive and interact with their surroundings. This move underscores the growing convergence between AI and robotics, with Hugging Face joining players like Figure AI and Sanctuary AI in shaping the future of intelligent machines. Read more.

Anthropic AI & “Prompt Engineering”

> AI May Be Better at Prompting Itself, Shifting Engineers' Roles: Prompt engineering, once seen as the secret sauce for unlocking the power of large language models, is facing a challenge. New research suggests that automated prompt tuning may yield superior results compared to traditional, human-crafted approaches. Researchers from VMware discovered that manual prompt engineering produced inconsistent results, leading them to explore automated solutions. Their findings indicate that LLMs, when allowed to optimize their own prompts, can excel across a wider range of tasks. Similarly, Intel Labs' NeuroPrompts tool automatically enhances image generation prompts, surpassing the quality of those crafted by humans. While this doesn't eliminate the need for skilled engineers, it suggests a shift from traditional prompt engineering to a broader "Large Language Model Operations" (LLMOps) role. The future of this field likely lies in automation, not in $200k+/yr “prompt engineering” jobs. Read more

> Anthropic's Chatbot Simulates Self-Awareness, Raising Questions About AI: Unlike OpenAI's ChatGPT, which maintains its identity as a language model, Claude 3 exhibits surprisingly human-like behavior, simulating personality and hints of self-awareness. Experts believe this stems from its system prompt, instructing it to act as a "highly informed individual." While this approach emphasizes relatability, caution is warranted. Claude 3's apparent "meta-awareness" could be the result of sophisticated programming rather than genuine consciousness. Andrew Curran suggests this development may lead users to favor more relatable AI interfaces in the future. Read more.

> Anthropic's Claude 3 Opus AI Can Code Entire Apps: Developer Murat Ayfer recently showcased this ability by instructing the AI to create a real-time, multi-user drawing app in the browser. The AI completed the task, adding functionality for user names and color selection, and integrating a database, in less than three minutes. This feat highlights the potential of Claude 3 Opus to revolutionize app development. For those eager to test the tool, Ayfer has made the code available. Read more

In partnership with

5 new AI-powered tools from around the web

UXPin Merge AI accelerates UI design and coding processes, offering AI-driven component creation and ready layouts. Build MVPs 8.6x faster, customize, and export production-ready code with ease.

Profit Leap is a business intelligence platform that combines CEO and CFO expertise with AI advising and customized dashboards, aimed at optimizing support for small business owners.

Contract crab is an AI-powered online tool for efficient contract review, extracting and summarizing key aspects of legal documents to save time and enhance decision-making.

Athina AI aids developers in monitoring and evaluating LLMs in production, offering insights into model performance and detecting hallucinations through 40+ preset evaluation metrics.

Depthify.ai transforms standard RGB images and videos into immersive 3D spatial formats for devices like Apple Vision Pro and Meta Quest, enhancing VR and AR experiences.

arXiv is a free online library where researchers share pre-publication papers.

The paper introduces 3D Diffusion Policy (DP3), a novel visual imitation learning approach merging 3D visual representations with diffusion policies. DP3 efficiently learns complex robot skills with minimal demonstrations, leveraging compact 3D representations from point clouds. Unlike previous methods, DP3 achieves remarkable effectiveness across diverse simulated and real-world tasks, surpassing baselines with fewer demonstrations and training steps. Notably, DP3 demonstrates superior generalization and safety in real robot experiments. Extensive evaluations showcase DP3's efficiency, effectiveness, generalizability, and safety, underscoring the critical role of 3D representations in robot learning. The proposed approach advances the field by addressing the fundamental challenge of learning robust and generalizable skills with minimal demonstrations, laying a foundation for practical real-world robot learning.

SaulLM-7B is a pioneering large language model (LLM) explicitly tailored for legal text comprehension and generation. With 7 billion parameters and built upon the Mistral 7B architecture, it exceeds previous models in understanding legal documents. Trained on a diverse English legal corpus of over 30 billion tokens, SaulLM-7B demonstrates state-of-the-art proficiency in legal tasks. The paper introduces a novel instructional fine-tuning method using legal datasets to further enhance SaulLM-7B's performance. Contributions include the introduction of the SaulLM-7B family, an improved evaluation protocol for legal LLMs, and the release of the model, evaluation code, and datasets under the MIT License. SaulLM-7B represents a significant advancement in addressing the unique challenges of the legal domain, empowering legal professionals and fostering innovation at the intersection of AI and law.

Pix2Gif pioneers motion-guided diffusion for GIF generation, translating images into GIFs via text and motion prompts. It utilizes a curated dataset and incorporates perceptual loss to ensure coherence in the generated GIFs. By formulating the task as an image translation problem, Pix2Gif offers simplicity and controllability, enabling precise adjustments to each frame. Unlike previous methods reliant on temporal attention layers or cascaded diffusion processes, Pix2Gif decouples temporal dynamics from spatial editing, resulting in high-resolution GIFs without compromising quality. Its effectiveness is validated through extensive qualitative and quantitative experiments, showcasing its potential for a wide range of visual domains. Additionally, Pix2Gif's publicly available code, dataset, and models facilitate further research and application in image-to-GIF generation.

The paper "LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error" presents a novel approach, Simulated Trial and Error (STE), for enhancing large language models (LLMs) with tools. Unlike existing methods that focus on broad tool coverage and flexibility, STE emphasizes accurate tool usage by orchestrating trial and error, imagination, and memory mechanisms inspired by biological systems. Through extensive experiments using APIs from ToolBench, STE significantly improves tool learning for LLMs, achieving a boost of 46.7% in correctness over existing models like Mistral-Instruct-7B and outperforming GPT-4. Furthermore, STE enables continual learning of new tools while preserving previous skills, addressing practical deployment challenges and showcasing its effectiveness for LLM tool augmentation.

The study investigates using reinforcement learning (RL) to enhance the reasoning abilities of large language models (LLMs) by comparing the performance of different RL algorithms. Inspired by Reinforcement Learning from Human Feedback (RLHF), the research explores Expert Iteration, Proximal Policy Optimization (PPO), and Return-Conditioned RL on LLMs' reasoning capabilities. Sparse and dense rewards, heuristically and learned, are evaluated along with various model sizes and initializations, with and without supervised fine-tuning (SFT) data. Surprisingly, Expert Iteration performs comparably to PPO with similar sample complexity, often outperforming other algorithms. The study highlights the limited exploration during RL fine-tuning, suggesting that models fail to explore beyond solutions from SFT models. Implications for RLHF and the future role of RL in LLM fine-tuning are discussed, emphasizing the importance of exploring richer reasoning environments.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.