AI Breakfast
Posts
OpenAI Dev Day Recap - Expanded Edition

OpenAI Dev Day Recap - Expanded Edition

AI Breakfast
November 08, 2023

Good morning. It’s Wednesday, November 8th.

In today’s email:

OpenAI Dev Day Recap
AI in Entertainment and Media
AI Investment and Industry Growth
AI Ethics, Policy, and Detection Tools
AI in Robotics and Automation
AI Research, Science, and Predictive Tools
AI in Legal, Governance, and Community Engagement
9 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s edition is brought to you by:

Musho

…the trusty Figma Plugin you never knew you needed.

Musho transforms your prompts into nearly-complete, dev-ready layouts with copy that's just shy of miraculous.

Crafts full landing pages from scratch.
Regenerate and iterate existing designs with a prompt.
Currently specializes in Landing Pages, with aspirations to tackle all design realms. Stay tuned!

Check out Musho on Figma Community page

Today’s trending AI news stories

OpenAI’s Dev Day Coverage

OpenAI has revealed a slew of new enhancements and price reductions across its platform at DevDay making AI more accessible and capable for developers. Key announcements include:

GPTs offer a revolutionary customization of ChatGPT, allowing anyone to craft an AI for specific tasks—no coding needed. From learning games to professional tasks, GPTs enhance daily interactions by integrating web searches, image creation, and API actions while remaining committed to user privacy and data control.

GPT-4 Turbo with 128K context: GPT-4 Turbo sets a new standard for AI, with expanded context window, user customization, and innovation vision and sound capabilities at a slashed price.

Assistants API: This new API simplifies creating assistive AI applications by managing state and context, featuring tools like Code Interpreter for running Python code and generating visuals, and Retrieval for augmenting assistant knowledge with external data.

Multimodal Capabilities: Features such as vision, image creation with DALL·E 3, and text-to-speech (TTS) options. DALL-E 3 pricing starts at $0.04 per image and text-to-speech at $0.015 per 1,000 characters, marking a leap forward in AI's accessibility and application.

Function Calling Updates: Allows for the execution of multiple actions in a single message and increases the accuracy of function parameter returns.

Improved Instruction Following JSON Mode: GPT-4 Turbo exhibits superior performance in tasks requiring precise instruction adherence and introduces a JSON mode for structurally correct JSON responses.

Reproducible Outputs and Log Probabilities: The new seed parameter ensures consistent model outputs, aiding in debugging and testing, while log probabilities help build features like autocomplete.

Updated GPT-3.5 Turbo: An improved version supports a 16K context window and enhancements in instruction following, JSON mode, and parallel function calling.

Customization Options: Enables anyone to craft custom ChatGPT bots through simple conversations, bypassing coding complexities. With the upcoming bot store, users can monetize their unique AI creations.

Pricing Adjustments and Higher Rate Limits: GPT-4 Turbo introduces reduced pricing: $0.01 per 1,000 input tokens and $0.03 per 1,000 output tokens, a significant cost reduction over GPT-4—threefold less for inputs and half for outputs. Image processing costs will vary by size.

Replit CEO details path to artificial developer intelligence, raises new $20M investment from Craft Ventures. This move, distinct from typical funding rounds, serves as a liquidity event for long-term employees, aligning with Replit’s strategy to provide developers with sophisticated AI tools. These tools, known as AI agents, are poised to revolutionize coding productivity by automating tasks and supporting code deployment, a huge step in realizing the ADI vision.

Google’s AI-Enhanced Asset Creation in Performance Max, enabling the generation and refinement of creative assets, including text and imagery, to enhance marketing campaigns. This AI-guided creative process aims to provide marketers with tools to efficiently produce and experiment with high-quality assets, optimizing performance across Google's advertising channels.

Figma sweetens FigJam whiteboard tool with new AI features. The updates include an AI-assisted start for new boards, automatic grouping of sticky notes for efficient sorting, and a summarization tool that condenses the main ideas from extensive brainstorming sessions. This enhancement seeks to simplify the user experience, particularly for those new to the platform or managing large-scale collaborations.

AI in Entertainment and Media

Next-Level NPCs: Microsoft and Inworld AI Forge a Creative Partnership The partnership announced an innovative suite of AI tools for Xbox developers aimed at enhancing NPC complexity and game narrative depth. This multi-year collaboration features the AI design copilot, a system designed to assist in scripting and character development, utilizing Inworld’s AI for dynamic storytelling and Microsoft’s technological advancements. These optional tools, intended to augment the developer’s toolkit, could redefine storytelling within the gaming industry.

YouTube to test generative AI features for its premium users, featuring a comments summarizer for streamlines interaction and a conversational AI for deeper engagement, heralding a new stride in AI-assisted content navigation.

Former Myspace founders introduce a text-to-video generator that uses your selfie to personalize content Chris DeWolfe and Aber Whitcomb have launched PlaiDay, a unique text-to-video AI generator through their startup Plai Labs. Offering a distinctive feature in the generative AI space, it enables users to personalize videos by integrating their own selfies. This platform diverges from others like ChatGPT by allowing the insertion of animated user likenesses into various scenarios through simple text prompts, pushing the envelope of individualized AI-driven video content.

AI Investment and Industry Growth

Aleph Alpha raises more than $500 million in Series B financing round backed by notable investors including Bosch Ventures and SAP. Intent on bolstering its European presence, the company aims to accelerate its generative AI products’ development. Aleph Alpha stands out with its commitment to transparent, explicable AI, running Europe’s fastest AI cluster, and developing features that ensure verifiability and understandability of AI outputs for strategic and sensitive sectors.

Google DeepMind's RoboVQA speeds up data collection for real-world robot interactions The RoboVQA introduces a swift data collection method for complex robotic tasks by combining human, robot, and teleoperated activities, significantly reducing the need for human intervention in real-world interactions.

AI Ethics, Policy, and Detection Tools

‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy The ‘ChatGPT detector,’ employing machine learning, analyzes writing style features with exceptional accuracy, identifying AI-generated chemistry papers. This model, which has been trained specifically on academic writing, outshines general AI detectors by examining elements like sentence length variation and punctuation frequency. Its precision is demonstrated by near-perfect detection rates, vastly exceeding other broader-scoped AI identification tools.

Meta bars political advertisers from using generative AI ads tools Meta restricts political campaigns and regulated industries from using its generative AI ad tools to prevent misuse. These AI enhancements, capable of creating ad content from simple prompts, raise concerns over potential election misinformation.

Google Bard introduces “Human reviewers,” sparking privacy concerns over conversation monitoring The implementation of the “human review” feature, introduced to refine AI-generated conversations, is raising privacy alarms as it could involve the monitoring of user interactions. Despite Google’s assurance of privacy measures, including disassociating data from user accounts and reviewing only a random sample of interactions, concerns persist because reviewed conversations and data aren’t deleted with user activity and are retained for up to three years. Yet, the process remains defended as essential for enhancing the AI’s quality and safety.

AI in Robotics and Automation

Powered by AI, new system makes human-to-robot communication more seamless Brown University's Lang2LTL revolutionizes robot command execution, transforming complex English directives into robot-readable logic. This AI-driven system empowers robots to perform sequenced tasks with minimal training, proving effective in both simulations and real-world tests. A huge leap toward user-friendly robot interactions.

AI Research, Science, and Predictive Tools

Ex-Google CEO bets AI will shake up scientific research Eric Schmidt finances Future House, a non-profit developing AI tools for the laboratory with ambitions to revolutionize scientific research with AI that can autonomously hypothesize and experiment. This AI ‘scientist’ will synthesize research findings and craft hypotheses faster than humans.

The Singularity Is Less Than 10 Years Away, Says AI Veteran Within the next 3 to 8 years, AI pioneer Ben Goertzel anticipates the onset of the singularity—a transformative era where AI surpasses human intellect. Leading SingularityNET, he highlights the accelerated progress towards artificial general intelligence (AGI), fueled by both scientific ambition and sizable investments. While skeptics doubt, Goertzel suggests humanity's innate restlessness propels this advancement, with potential shifts in society echoing the magnitude of the agricultural revolution.

AI in Legal, Governance, and Community Engagement

An AI just negotiated a contract for the first time ever — and no human was involved This specialized legal AI is tailored to company-specific negotiation standards and offers a more focused alternative to general-purpose models like OpenAI. It reduces the hefty 80% of time legal teams spend on routine document reviews and negotiations, flags and revises questionable clauses—like amending a non-compliant six-year contract term to Luminance's standard three years—while logging each modification for transparency.

A ship with 10,000 Nvidia H100 GPUs could become the first ever AI-reliant sovereign territory Del Complex’s BlueSea Frontier Compute Cluster (BSFCC) is stirring discussions about AI’s role in sovereignty and governance. Their ship, housing 10,000 Nvidia H100 GPUs valued at $500 million, proposed a roaming, autonomous AI-driven territory at sea. Claiming to be exempt from international AI regulations, the BSFCC intends to act as a sovereign entity, operating beyond conventional legal frameworks and possibly offering tax shelter opportunities.

xAI's PromptIDE aims to create a community for sharing and collaborating on AI prompts xAI's PromptIDE, from Elon Musk's team, is an AI development environment for collaborative prompt engineering with a Python-based SDK for parallel prompting and in-depth analytics. It's designed to democratize access to their AI model Grok-1, allowing users to share and refine AI prompts within a community.

5 new AI-powered tools from around the web

Touring crafts your personal audio journey by merging generative AI with geolocation, 3D insights, speech synthesis, and expert content, delivering customized real-time narrations. Simply wear your headphones, stroll, and immerse in an experience designed for you.

Grok serves up real-time universe understanding with a sassy edge, tapping into the 𝕏 platform for current insights. In early beta, it is a conversational AI learning from user interactions, designed to tackle tough questions.

Detangle.ai simplifies the complexity of legal documents, providing a clear summary and highlighting key elements like bias, obligations, and financial terms. This tool is not a lawyer replacement. It is a clarity assistant for informed discussions, saving you time and legal fees with a per-document pricing model.

ConnectedFlow optimized eCommerce promotions via AI, enhancing order values and retention while managing overstocks. It integrates with Shopify, fine-tunes discounts, and personalized campaigns, ensuring business goals are met efficiently.

Vidiofy utilizes AI to convert text content into short-form videos, supporting multiple languages, high-quality voice overs, and a rich media library, streamlining social media content generation for publishers.

Zintlr is a B2B prospecting SaaS, introducing AI-driven Personality Intel for in-depth psychological insights into prospects, alongside a robust database with millions of verified contacts, enhancing lead qualification and connection strategies.

RivalFlowAI enhances content with AI-powered SEO analysis, pinpointing gaps against competitors and generating optimized copy to reclaim search rankings.

Circleback is an AI meeting assistant that transcribes, summarizes, and tracks action items across meetings. Integrates with major platforms while ensuring data privacy in over 100 languages.

Pullflow streamlines code reviews with AI, facilitating collaboration within GitHub, Slack, VS Code, and enhancing discussions, while integrating with CI/CD for a seamless pull-request lifecycle.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 CogVLM: Visual Expert for Large Language Models

CogVLM is an open-source visual language model that enhances multimodal integration through a trainable visual expert module within its attention and FFN layers. This approach ensures deep fusion of vision and language features, enabling the model to maintain NLP performance while excelling in image-text understanding tasks. It is trained on a mix of publicly available image-text pairs and visual grounding datasets, pushing the envelope in AI-driven visual cognition. Available on GitHub for further innovation.

📄 S-LoRA: Serving Thousands of Concurrent LoRA Adapters

S-LoRA revolutionizes the “pretrain-then-finetune” approach, enabling scalable serving of numerous task-specific LoRA adapters efficiently. This system optimizes GPU memory use, minimizes latency, and introduces Unified Paging and custom CUDA kernels for heterogeneous batching. It allows for thousands of adapters to run on a single GPU, outperforming existing methods in throughput and scalability, and promises broad application for customizable fine-tuning services in AI.

📄 RELAX: Composable Abstractions for End-to-End Dynamic Machine Learning

Relax is a compiler framework that optimizes dynamic machine learning workloads across various environments. It introduces symbolic shape annotations for tracking computations and unifies graphs with tensor programs, enabling potent cross-level optimizations. Relax supports comprehensive compilation for dynamic models, facilitates deployment to diverse hardware, and delivers performance on par with optimized platform-specific systems.

📄 MFTCoder: Boosting Code LLMS with Multitask Fine-Tuning

MFTCoder refines code-generating AI by training on several tasks at once, improving how smart and fast it works. It does better than tuning each task individually, making it simpler to use and tapping into how coding tasks are linked. MFTCoder cleverly handles unequal data and varying task speeds. When used with models like CodeLLama, it achieves a 74.4% success rate on first tries in code tasks, beating GPT-4’s initial guess accuracy.

📄 EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

EmerNeRF is a self-learning model that parses dynamic scenes by segmenting motion and stasis, refining its understanding from raw inputs. It uniquely forecasts flows within scenes to boost dynamic object depiction, sans pre-labeled data or prior flow models. Excelling in dynamic simulations, EmerNeRF outshines predecessors, tested on a new, diverse driving dataset. It sets a new benchmark in scene reconstruction and is available for community use and development online.

📄 VR-NeRF: High-Fidelity Virtualized Walkable Spaces

VR-NeRF generates realistic walkable VR environments using high-dynamic range (HDR) capture and neural radiance field, processed through a bespoke camera rig. Its specialized camera setup captures ultra-detailed scenes, which are then transformed into immersive VR experiences. The system smartly adjusts image detail based on viewer distance to enhance realism, runs on powerful multi-GPU setups for smooth performance, and employs advanced techniques to efficiently render these vivid environments in real-time.

Watch a demo here.

📄 LDM3D-VR: Latent Diffusion Model for 3D VR

LDM3D-VR by Intel Labs is an AI model suite for creating VR content, generating realistic RGBD panoramas from text and upscaling low-resolution images to high-definition using latent diffusion. It fine-tunes pre-trained models with high-res datasets, surpassing current methods in depth map synthesis and image quality.

📄 Levels of AGI: Operationalizing Progress on the Path to AGI

The DeepMind team proposes an AGI framework categorizing AI advancements by generality and performance levels, analogous to autonomous driving tiers. This structured approach aims to standardize assessments, risks, and progress tracking toward AGI, outlining a path rather than focusing solely on the end goal. The framework emphasizes capabilities over processes, cognitive tasks, potential versus deployment, ecological validity, and interaction paradigms for responsible AI advancement.

📄 I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

I2VGen-XL synthesizes high-definition videos from static images using a cascaded approach with diffusion models. Initially, it ensures semantic fidelity to the input images while inducing motion, albeit at a lower resolution. Subsequently, it refines the video, enhancing clarity and spatial-temporal coherence at 1280x720 resolution. The model was trained on a dataset of 35 million video clips and 6 billion image pairs, enabling the generation of videos with accurate dynamics and detail preservation. Watch a demo here.

📄 Tell Your Model Where to Attend: Post-Hoc Attention Steering for LLMs

This paper introduces PASTA, a technique that allows language models to understand and prioritize emphasized user input, improving the model’s task performance significantly. PASTA operates by adjusting the attention heads in a post-hoc manner, meaning it is applied after the model is fully trained and does not necessitate any parameter adjustments. It has been shown to boost task accuracy considerably, such as by 22% for the LLAMA-7B model on various tasks.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.