AI Codes Entire Projects From Scratch

Sponsored by

Good morning. It’s Wednesday, March 13th.

Did you know: Microsoft went public on this day in 1986? A $1,000 initial investment in the stock would be worth nearly $7M today.

In today’s email:

  • Devin AI codes autonomously

  • Pi raises $70M for universal robot AI

  • Command-R: scalable RAG LLM from Cohere

  • Midjourney bans Stability AI over scraping

  • Midjourney adds character consistency

  • xAI open-sourcing Grok model

  • US proposes open-source AI model ban

  • GPT-4.5 Turbo leak: June 2024 launch

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

In partnership with

Hire a world class AI team for 80% less

Trusted by leading startups and Fortune 500 companies.

Building an AI product is hard. Engineers who understand AI are expensive and hard to find. And there's no way of telling who's legit and who's not.

That's why companies around the world trust AE Studio. We help you craft and implement the optimal AI solution for your business with our team of world class AI experts from Harvard, Princeton, and Stanford.

Our development, design, and data science teams work closely with founders and executives to create custom software and AI solutions that get the job done for a fraction of the cost.

P.S. Wondering how OpenAI DevDay impacts your business? Let’s talk!

Today’s trending AI news stories

Coding, Robotics, and Automation Platforms

> Cognition Debuts “Devin”, Generative AI Tool That Writes Code and Debugs Projects: Unlike existing code completion tools, Devin goes beyond mere assistance by autonomously generating entire engineering projects from scratch, including AI models. Users provide natural language prompts, and Devin handles the coding and debugging tasks within a secure sandbox environment. Presented by CEO Scott Wu, Devin promises swift and efficient execution of complex engineering tasks. By automating low-level coding jobs, Devin allows human engineers to focus on higher-level strategic work. While currently in private preview, Devin's ability to autonomously write and debug code highlights its disruptive potential. This is further solidified by Cognition’s recent $21M Series A funding, reflecting investor confidence in Devin and the future of AI-driven automation in software development. Read more.

> Physical Intelligence Secures $70 Million for Universal Robot AI Development: Physical Intelligence (Pi), co-founded by former Google scientist Karol Hausman, has raised $70 million in seed funding to create a universal AI model for robots. The investment, led by Thrive Capital and including Khosla Ventures, Lux Capital, OpenAI, and Sequoia Capital, signals confidence in Pi's ambitious vision. The company seeks to develop an AI that can power robots of any form or function, enabling them to adapt to diverse tasks. Combining language models and machine control techniques, Pi aims to address the current limitations of AI-powered robots. The team, including renowned experts like Sergey Levine and Chelsea Finn, focuses on software adaptability to differentiate itself from competitors. In a field attracting significant interest and investment, Pi hopes to revolutionize industries with its adaptable AI for robotics. Read more.

> Command-R: Cohere's New Scalable LLM for Business Automation: Optimized for Retrieval Augmented Generation (RAG) and Tool Use, it efficiently integrates with existing Cohere models to retrieve information and generate accurate responses from large datasets. Command-R's multilingual support, longer context lengths (up to 128k tokens), and Tool Use capabilities make it an ideal productivity booster for businesses. Committed to both privacy and research, Cohere offers Command-R on hosted APIs and major cloud providers, while also making model weights available to the ML community. Command-R signifies a major step forward in scalable, enterprise-grade AI solutions. Read more.

Midjourney Adds Character Recognition, Bans Rivals

> Midjourney Bans Stability AI Employees Over Data Scraping Concerns: This alleged activity led to a 24-hour outage on Midjourney's service. Stability AI's CEO, Emad Mostaque, denies intentional scraping, but admits a team member gathered prompts for a personal project. Midjourney has adopted a new policy of banning entire company teams for overly aggressive automated actions. Critics point out the irony of Midjourney's stance, given its own use of potentially scraped data. Stability AI emphasizes its commitment to proper data practices and downplays any rivalry, highlighting a history of mutual support between the two companies. Read more.

> Midjourney Adds Character Consistency Feature for AI Image Generation: Midjourney is testing a new feature called "character reference." This allows artists to create consistent-looking characters across multiple images. Users upload a reference image, then adjust the level of detail retained using a simple command. This enables variation in details like clothing and hairstyles while maintaining core features. Read more.

Open Source & GPT-4.5 Preview

> Musk democratizes AI with Grok model to become open-source: Elon Musk's AI firm, xAI, is open-sourcing its large language model, Grok. Announced on X, the move follows Musk's lawsuit against OpenAI, which he co-founded. Grok, built upon xAI's Grok-1 model, showcases capabilities similar to GPT-3.5, including text generation and code crafting. While details surrounding the open-sourcing process remain unclear, the decision aligns with Mr. Musk's advocacy for public access to cutting-edge AI technologies, mirroring similar efforts by AI startup Mistral AI. Notably, xAI has hinted at developing a more advanced successor to Grok, underscoring its commitment to advancing AI. As xAI prepares to release Grok, questions linger about the company's future monetization strategies, with potential avenues including offering paid APIs for commercial and industrial users. Read more.

> US Report Proposes Banning Open-Source AI Models, Citing Security Risks: A U.S. government report, "An Action Plan to Increase the Safety and Security of Advanced AI," raises significant concerns about national security risks posed by artificial intelligence. It goes beyond warnings, proposing radical safety measures, including a outright ban on publishing open-source AI models and regulation of models exceeding a specific computational power threshold. The report, drawing on insights from over 200 experts and employees at leading AI companies like OpenAI, Google DeepMind, and Meta, reveals concerns about lax security practices at AI labs. Employees cited insufficient safety protocols and a lack of incentives for prioritizing safety among management. The report advocates for stricter safety testing and proposes measures to mitigate potential misuse and security vulnerabilities in AI systems. Read more.

> Search Engine Leak Suggests OpenAI's GPT-4.5 Turbo Launch in June 2024: A product page for OpenAI's upcoming GPT-4.5 Turbo model was briefly leaked on search engines like Bing and DuckDuckGo, hinting at a potential June 2024 release. The leak suggests notable improvements in scalability and accuracy, including a doubled context window of 256,000 tokens for simultaneous word processing. While the leak doesn't confirm rumored features like video or 3D capabilities, it highlights the anticipation surrounding GPT-4.5 Turbo and its potential to advance the field of artificial intelligence. Read more.

5 new AI-powered tools from around the web

Picurious AI is powered by GPT-4V, and transforms photos into learning moments. Snap, solve, and explore images with instant insights, discussions, and object identification. Free on iOS.

Terrakotta offers an AI web-phone for cold outreach. Allows users to speak with contacts and leave AI-generated voicemails with personalized, relevant information to increase callback likelihood.

Tavus for Developers’ Phoenix is an advanced replica and text-to-video model accessible via APIs. Create realistic videos with just 2 minutes of training data.

Firebender is an AI-powered platform for sales teams to discover ideal leads instantly. It eliminates noisy lists, qualifies 1M+ companies in seconds using natural language filters.

Ion Design converts Figma designs to React code, accelerating development cycles by ~40%. Utilizes a design system with 5,000+ Figma components, learning codebase structure, and reusing components.

arXiv is a free online library where researchers share pre-publication papers.

This study introduces FastV, a method to address the inefficient attention mechanisms within large-scale language-vision models (LVLMs) such as LLaVA-1.5 and QwenVL-Chat. By analyzing attention patterns, the authors discovered an imbalance between visual and textual tokens in deeper layers, leading to the development of a sparser, more efficient approach. FastV works by dynamically learning adaptive attention in early layers and then selectively pruning visual tokens in later stages. Experiments show that FastV reduces FLOPs by up to 45% in LLaVA-1.5-13B without sacrificing performance on tasks like NoCaps and A-OKVQA. Importantly, FastV's adaptability allows for customization, even outperforming models with larger parameter counts. Furthermore, FastV uniquely makes processing higher-resolution images feasible without increasing inference costs.

Google presents a groundbreaking model-stealing attack, the first of its kind, targeting black-box production language models like OpenAI’s ChatGPT and Google’s PaLM-2. Their method extracts precise information, specifically recovering the embedding projection layer of transformer models, despite typical API access. Remarkably, for under $20 USD, the attack retrieves the entire projection matrix of OpenAI’s ada and babbage models, confirming hidden dimensions of 1024 and 2048, respectively. The attack also reveals the hidden dimension size of GPT-3.5-turbo, estimating a cost of under $2,000 in queries to recover the entire projection matrix. By introducing potential defenses and discussing implications for future work, this research advances understanding of the security landscape surrounding large language models deployed in production environments.

The paper introduces VidProM, the first dataset featuring 1.67 million unique text-to-video prompts and 6.69 million videos generated by four state-of-the-art diffusion models. It highlights the time-consuming and costly curation process involved and compares VidProM with DiffusionDB, a text-to-image prompt dataset. VidProM offers a more extensive range of semantically unique prompts, utilizes the latest OpenAI text-embedding model for embedding prompts, and spans a longer collection period. Despite focusing on videos, VidProM contains fewer data points than DiffusionDB, mitigated by employing multiple diffusion models and extensive computational resources for video generation. The analysis of prompts reveals user preferences for topics like humans, science fiction, and animals, inspiring various research directions in text-to-video generation. Future plans include incorporating high-quality videos generated by Sora with detailed prompts, enhancing the dataset's richness.

Synth2 enhances Visual-Language Models (VLMs) by generating synthetic image-text pairs efficiently. Leveraging pre-trained generative models, it improves VLM performance and data efficiency while allowing customization and scalability. By employing class-based prompting, it ensures a diverse range of synthetic captions, addressing challenges of data scarcity and noise. The architecture includes a text-to-image generator pre-trained on a human-annotated dataset, ensuring fair evaluation and eliminating external data effects. Operating at the embedding level, it bypasses costly pixel-space processing, reducing resource consumption. Experiments demonstrate significant performance gains in image captioning, showcasing the potential for synthetic data in VLM training. The approach offers insights into future VLM development, emphasizing the value of synthetic data generation for advancing visual language understanding.

This paper from Amazon introduces Chronos, a novel framework for time series forecasting leveraging pretrained probabilistic language models. Chronos tokenizes time series data into a fixed vocabulary through scaling and quantization, enabling the use of existing transformer-based language models. Pretrained Chronos models, based on the T5 family, are trained on diverse datasets, including synthetic data generated via Gaussian processes, improving generalization. Evaluation across 42 datasets demonstrates that Chronos outperforms traditional methods on training data and achieves competitive zero-shot performance on new datasets. Its simplicity, efficiency, and scalability position Chronos as a promising tool for simplifying forecasting pipelines, offering robustness and generalization across diverse time series domains without the need for task-specific fine-tuning.

AI Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.