• AI Breakfast
  • Posts
  • DALL-E 3's system prompt reveals OpenAI's rules for AI image generation

DALL-E 3's system prompt reveals OpenAI's rules for AI image generation

Good morning. It’s Wednesday, October 18th.

In today’s email:

  • AI Ethics and Governance

  • AI Integration and Technological Advancements

  • Scientific and Medical AI Breakthroughs

  • Global Competition and Expansion

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s edition is brought to you by:

Tired of explaining the same thing over and over again to your colleagues?

It’s time to delegate that work to AI.

guidde is a GPT-powered tool that helps you explain the most complex tasks in seconds with AI generated documentation.

  • Turn boring documentation into stunning visual guides

  • Save valuable time by creating video documentation 11x faster

  • Share or embed your guide anywhere for your team to see

Simply click capture on our browser extension and the app will automatically generate step-by-step video guides complete with visuals, voiceover and call to actions.

The best part? The extension is 100% free.

Today’s trending AI news stories

AI Ethics and Governance

DALL-E 3's system prompt reveals OpenAI's rules for AI image generation OpenAI’s DALL-E 3 ‘system prompt’ enforces rules for fair and copyright-compliant image generation, including translation of non-English input and a limit of four images at a time. Restrictions also prevent the generation of political or celebrity images. Added measures, such as blocklists and specialized classifiers, filter out sensitive content. View the shared chat link here to explore the rules as generated by ChatGPT.

AI pioneers Yann LeCun and Yoshua Bengio clash in an intense online debate over AI safety and governance Lecun, Meta’s chief AI scientist, emphasized designing AI systems for safety, while Bengio, founder of Element AI, advocated for prudence and major investment in AI safety and governance. Lecum urged the silent majority of AI professionals who have faith in the controllable potential of AI to share their views and also advocated for open-source AI platforms. Bengio, on the other hand, advocated for prudence and major investment in AI safety and governance. View the entire debate on Facebok here.

New York City unveiled a new plan to use AI to make its government work better The plan includes 37 key actions, such as creating an external advisory network, educating city employees about AI, and establishing policies for algorithmic tools used by city agencies. The city acknowledges potential bias in AI and aims to evaluate risks and tool efficacy. New York City has also introduced an AI chatbot to assist with business services.

EU Plans Stricter Rules for Most Powerful Generative AI Models The European Union is contemplating a three-tiered regulatory approach for controlling powerful generative AI models. The proposed system would categorize AI technology based on its capabilities, imposing stricter vetting requirements for the most advanced models.

NYC Mayor Eric Adams uses AI to make robocalls in languages he doesn’t speak New York City Mayor Eric Adam’s use of AI-powered robocalls in languages he doesn’t speak has raised ethical concerns. While promoting the city’s new AI Action Plan, Adams acknowledged the use of AI-generated calls in various languages, triggering a debate about the ethics of misrepresentation. The calls, promoting city services, did not disclose the AI-generated nature of the voice. Critics have labeled the practice as “Orwellian” and a “creepy vanity project.”

AI Integration and Technological Advancements

YouTube gets new AI-powered ads that let brands target special cultural moments YouTube has introduced “Spotlight Moments,” an AI-powered advertising package enabling brands to target popular cultural events. Leveraging Google AI, advertisers can place ads across relevant videos on curated YouTube channels. GroupM is the first to embrace this technology. YouTube’s other AI initiatives include Video Reach and Video View campaigns, delivering enhanced reach and cost-effectiveness.

GPT-4 is Getting Faster GPT-4 is showcased in this latest analysis, revealing increased speed and closing the gap with GPT 3.5. Factors affecting latency, including queueing and processing times are explored, emphasizing that higher token counts don’t always imply slower responses. Despite higher costs, GPT-4’s speed now matches GPT 3.5 for most requests. The article teases an upcoming investigation into how rate limits might influence latency.

Mac users are embracing AI apps, study finds, with 42% using AI apps daily A recent study by app subscription service Setapp reveals a substantial increase in AI adoption among Mac users. Approximately 42% of Mac users now report using AI-based apps daily, with 63% believing that AI apps are more beneficial than those without AI. Furthermore, 44% of Mac app developers have already incorporated AI or machine learning models into their apps, and 28% are actively working on such implementations.

Microsoft's HoloAssist dataset brings AI assistants closer to our daily lives The HoloAssist dataset is a groundbreaking resource for developing interactive AI assistants for daily tasks. With over 160 hours of egocentric video and seven sensor streams, the dataset captures human actions and intentions.

This initiative aims to enhance AI’s understanding of the physical world, enabling more effective real-world assistance. Microsoft has made the dataset available to the scientific community on GitHub, signifying a step forward in AI’s capability for everyday life.

Scientific and Medical AI Breakthroughs

AI just spotted its 1st supernova. Could it replace human explosion hunters? A new fully automated machine-learning algorithm called BTSbot has successfully spotted and classified its first supernova, demonstrating the potential for AI to accelerate the analysis and classification of cosmic explosions. The program, developed by Northwestern University, aims to free astronomers from the labor-intensive task of supernova hunting, allowing them to focus more on analyzing these stellar events and developing new hypotheses.

Global Competition and Expansion

ChatGPT competitor Claude 2 launches in more countries Anthropic’s Claude 2, a major competitor to ChatGPT, expands its reach to 95 countries but is yet to launch in the EU, prompting speculation on potential reasons like multilingual support and privacy concerns. With millions of users since its July launch, the chatbot offers limited free access, while full access is available at $20. Amazon’s huge investment of up to $4 billion in Anthropic positions it as a key player in generative AI, supported by access to Anthropic’s AI models and chatbots.

China's Baidu is trying to rival the US' ChatGPT-4 with Ernie 4.0 Chinese tech giant Baidu launched Ernie 4.0 to compete with the US’ ChatGPT-4, showcasing its capabilities at the Baidu World Conference in Beijing. Baidu’s CEO Robin Li highlighted Ernie 4.0’s improved abilities, demonstrating its capacity to generate various content, including advertising materials and a martial arts novel. Amidst China’s stricter AI regulations, Baidu seeks to integrate AI across its services, aiming to improve user experience beyond conventional search engine functionalities.

Nvidia’s banking on TensorRT to expand its generative AI dominance By optimizing LLMs through TensorRT-LLM, Nvidia significantly enhances the performance, particularly for advanced LLM applications like writing and coding assistants. This move allows Nvidia to not only provide powerful GPUs for training LLMs but also the essential software for faster model execution, ensuring cost-efficient generative AI solutions for users. TensorRT-LLM is accessible to the public for integration via the Nvidia SDK.

5 new AI-powered tools from around the web

LegalNow is a revolutionary AI-powered legal assistant conducting contract management for small businesses. Effortlessly draft and review contracts with customized AI, saving time and costs. Receive real-time legal support and ensure comprehensive protection.

Cosine is an advanced AI knowledge engine that provides comprehensive support for over 50 coding languages. Unlike simple LLM wrappers, it accurately understands your codebase, offering real-time explanations and suggestions for efficient coding.

Just Story It is an innovative AI-powered platform for creating and experiencing audio narratives. With a focus on personalized storytelling, users can effortlessly craft and enjoy engaging audio stories, fostering a vibrant community of creators and listeners.

novelistAI 2.0 revolutionizes digital storytelling with AI-generated novels, non-fiction books, and gamebooks. Explore diverse formats, multi-page chapters, and GPT-4 enhanced content. Enjoy a rich visual experience with customizable cover images, while captivating audiobooks bring stories to life.

Compliance.sh streamlines compliance for organizations by automating 90% of the process, facilitating adherence to ISO 27001, SOC 2 Type II, HIPAA, GDPR, and other standards. It generates policies, handles security questionnaires using AI, and offers ongoing support, simplifying and maintaining regulatory compliance.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

LLEMA is an open-access language specializing in mathematics, achieved through continued pretraining on the Proof-Pile-2 dataset. It outperforms existing models in diverse mathematical tasks and formal theorem proving. With 7 billion and 34 billion parameter models, LLEMA sets a new benchmark for open-source mathematics-focused language models. Its versatility is showcased through its exceptional performance on tasks such as chain-of-thought mathematical problem solving, mathematical problem solving with tool use, and formal mathematics, making it a promising tool for various research areas, including algorithmic reasoning and reward modeling. By openly releasing all artifacts, LLEMA serves as a platform for future advancements in mathematical reasoning.

PALI-3, a vision-language model (VLM), emphasizes being smaller, faster, and stronger, rivaling larger counterparts. Utilizing the SigLIP approach, it contrasts classification pretraining with contrastive pretraining on web-scale data, demonstrating superior performance, particularly in multimodal tasks such as localization and text understanding. With a 5B parameter backbone, it achieves a new state-of-the-art in multilingual cross-modal retrieval. PALI-3’s excellence spans visually situated text tasks, referring expression segmentation, and video question-answering, showcasing its robust generalization abilities. This study rekindles research on critical elements of complex VLMs, potentially fueling the development of next-gen scaled-up models. The findings underscore the significance of contrastive pretraining for robust VLM performance across various tasks.

MiniGPT-v2 is a model designed as a unified interface for diverse vision-language tasks, aiming to effectively handle image description, visual question answering, and visual grounding. To address the complexity of multi-modal instructions, task-specific tokens are introduced during training, enabling the model to distinguish between various tasks effortlessly. The model architecture includes a frozen visual backbone, a linear projection layer, and a large language model. The training process consists of three stages, incorporating weakly-labeled, fine-grained, and multi-modal instructional datasets. Experimental results showcase MiniGPT-v2’s superior performance compared to other vision-language generalist models and establish new state-of-the-art benchmarks.

The 4K4D real-time 4D view synthesis method offers a groundbreaking approach to high-fidelity, real-time rendering of dynamic 3D scenes at 4K resolution. It utilizes a novel 4D point cloud representation that leverages hardware rasterization for unprecedented rendering speed. This representation consists of dynamic geometry and hybrid appearance models, combining image blending and spherical harmonics to deliver efficient, high-quality results. A differentiable deth peeling algorithm enables hardware-accelerated rendering while being fully optimized for training on multi-view RGB videos. Across multiple datasets, including DNA-Rendering, ENeRF-Outdoor, NHR, and Neural3DV, 4K4D consistently outperforms state-of-the-art methods in both rendering quality and real-time performance, making it a significant advancement in dynamic view synthesis.

The paper introduces Context-Aware Meta-Learning (CAML), an approach enabling universal image classification without meta-training or fine-tuning. Drawing inspiration from Large Language Models (LLMs), CAML utilizes a frozen CLIP model for feature extraction and a Transformer encoder for sequence modeling. The theoretical analysis demonstrates the minimization of class entropy using Equal Length and Maximally Equiangular Set (ELMES). Empirical evaluations on 11 benchmarks showcase CAML’s superiority over state-of-the-art methods, including Prototypical Networks and MetaOpt, without requiring meta-training or fine-tuning. The results indicate CAML’s potential for real-time applications and signify advancements in visual meta-learning.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.