- AI Breakfast
- Posts
- Apple & Google: Gemini AI in iPhones?
Apple & Google: Gemini AI in iPhones?
Good morning. It’s Monday, March 18th.
Did you know: Sam Altman gave an interview to Korean news outlets discussing GPT-5? Read a synopsis here
In today’s email:
Grok-1: xAI's massive open-source AI model
xAI tops AI developer salaries
Apple's MM1: Multimodal large language models
Quiet-STaR: AI training mimics human reasoning
Apple & Google: Gemini AI in iPhones?
Nvidia to acquire Israeli AI startup Run:ai
China's robot workforce surges past predictions
Abu Dhabi's MGX may fund OpenAI's chip plans
Nvidia GTC 2024: In-person, generative AI focus
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
In partnership with
Guide employees to use GenAI apps securely with in-browser app banners
GenAI apps like ChatGPT are incredibly powerful tools, but many organizations are rightly nervous about data privacy. Rather than block GenAI apps outright and fall behind the competition, Push has released a smart tool designed to secure the use of SaaS applications.
By using Push’s app banners feature, you can create in-browser app banners shown to users when they access GenAI apps. Here it tells them how to use the apps safely and gives them the link to their GenAI security policy.
So before you rush to block employees, think about whether there's a way to meet them halfway and keep risks under control without sacrificing productivity. Push intervenes in the right place, at the right time to reinforce policy at the point of access and prompt secure behavior.
Thank you for supporting our sponsors!
Today’s trending AI news stories
xAI, Apple, and Quiet-STaR
> xAI Releases Massive Grok-1 Model as Largest Open-Source AI to Date: Musk's AI company, x.AI, has made Grok-1, the largest open-source mixture-of-experts AI model, available under the Apache 2.0 license. Grok-1 boasts 314 billion parameters and uses a unique architecture. Developed from scratch using a technology stack rooted in JAX and Rust, this model operates with two active expert networks per input token. However, it's worth noting that Grok-1, in its current form, lacks refinement and safety optimization, as it hasn't been fine-tuned with human feedback. Moreover, x.AI has refrained from disclosing specific details regarding its training data or providing ethical guidelines. Despite these caveats, the availability of Grok-1 on GitHub offers invaluable insights and installation guides, further enriching the AI community's collective knowledge base.
x.AI is reputedly offering the highest salaries to AI developers, surpassing even OpenAI, according to Perplexity CEO Aravind Srinivas. Many applicants are attracted by the opportunity to work directly with Musk, posing a challenge for competitors. Srinivas highlights that companies offering higher compensation packages, often with stock-based incentives, hold a competitive advantage. Character.ai also stands out for its lucrative offers, while Anthropic and OpenAI maintain competitive compensation structures. Read more.
> Apple Makes Strides in Multimodal AI with Introduction of MM1: Apple introduces MM1, a family of Multimodal Large Language Models (MLLMs) with up to 30 billion parameters, featuring state-of-the-art (SoTA) pre-training metrics and competitive fine-tuning results. Recent research in MLLMs emphasizes architectural intricacies, data selection, and methodological transparency, expanding their capabilities. Apple's MM1 stands out for its scale and architectural innovations, including dense models and mixture-of-experts variants. Notable findings include the impact of diverse pre-training data on model performance, crucial for few-shot learning scenarios. The model demonstrates competitive performance across benchmarks, highlighting the effectiveness of large-scale pre-training and strategic data selection. Read more.
> Quiet-STaR Enables Machines to Reason: Researchers at Stanford University have introduced Quiet-STaR, a novel AI training method that could significantly change machine learning. Unlike conventional systems focused on surface-level patterns, Quiet-STaR guides AI to "think between the lines," mimicking human reasoning by generating potential text continuations and refining its understanding through trial and error. This approach dramatically boosts AI performance in comprehension tasks. Though still in its early stages (tested on a 7B model), Quiet-STaR signals a major advancement, fostering AI capable of nuanced context analysis, hypothesis formation, and sophisticated communication. Read more.
Acquisitions and Collaborations
> Apple + Google Partnership for iPhone AI Upgrade Possible: Apple is in talks with Google to integrate the Gemini AI engine into iPhones, as reported by Bloomberg. This collaboration would boost the iPhone's capabilities, particularly in generative AI for tasks like image creation and writing. While details are still under wraps, industry observers speculate a potential announcement at Apple's developer conference in June. Gemini, despite facing scrutiny, continues to gain traction through strategic partnerships – it already powers AI features in Samsung smartphones. Read more.
> Nvidia Eyes Billion-Dollar Acquisition of Israeli AI Startup Run:ai: Run:ai, founded in 2018, specializes in orchestration and virtualization software tailored for AI workloads on GPUs, garnering attention with a $75 million Series C round in March 2022. Co-founded by Omri Geller and Dr. Ronen Dar, both with extensive backgrounds in engineering and research, Run:ai's Kubernetes-based platform efficiently allocates computing resources for AI clouds. If finalized, this acquisition would mark Nvidia's entry into Israel's AI market since its acquisition of Mellanox in 2019. Nvidia's soaring valuation, exceeding $2 trillion, underscores its dominance in AI chipsets, propelling Wall Street to record levels. Read more.
Around the World
> China has 12½ times more robots in its workforce than industry experts predicted, according to the ITIF analysis: Despite lagging in innovation and software development, China's prioritization of the robotics industry suggests future leadership in innovation. In 2022, China accounted for 52% of global industrial robot installations, driven by robust domestic demand and policy support. While China dominates in production and volume, the US has fallen behind due to insufficient long-term investment. Although China's robotics market benefits from substantial government subsidies, it still relies heavily on foreign technologies. Additionally, while China excels in efficiency and cost-effectiveness, it struggles with original development, often imitating established products. Read more.
> Abu Dhabi's MGX in Talks to Fund OpenAI's Chip Manufacturing Plans: Abu Dhabi's newly established sovereign wealth fund, MGX, is engaged in preliminary discussions to potentially fund OpenAI's plans to establish its own semiconductor manufacturing, with the objective of reducing reliance on Nvidia and gaining greater control over AI application components. The UAE's ambition to become a global AI hub is evident through initiatives like appointing the world's first AI minister and establishing an AI university. OpenAI's CEO, Sam Altman, estimates a $7 trillion cost for building a global AI infrastructure and proposes partnerships with sovereign wealth funds like Abu Dhabi's MGX. Apart from MGX, OpenAI is in talks with Singapore's Temasek and major manufacturers like Samsung and TSMC. Read more.
> Jensen Huang Charts the AI Course at GTC 2024: Nvidia's eagerly anticipated GPU Technology Conference (GTC) 2024 is set to make a triumphant return to an in-person format in San Jose, CA. Led by Chief Executive Jensen Huang, the event promises an inspiring keynote, with a focus on generative AI, to stimulate innovation. Product announcements, including the B100 Blackwell chip, promise faster AI training but concerns about increased power requirements loom large. With over 300 exhibitors, attendees can explore a growing ecosystem of AI solutions, offering insights into industry disruptors and potential partners. Read more.
5 new AI-powered tools from around the web
LLM Pricing is a price comparison tool for LLMs. Developed by Claude 3 Sonnet with HTML, JavaScript, Cloudflare Workers API.
JustShip is a launch kit for SaaS, AI and more. It cuts setup time by 25+ hours for iOS devs. Features include authentication, analytics, and databases.
Similarix is an AI layer for S3, enabling semantic search, deduplication. Offers secure, multilingual, easy integration and enhances image management, explores assets easily, and cuts costs.
Bidify automates RFP proposals for small businesses using AI. Parses RFPs, extracts requirements, and drafts proposals.
Kater is an AI-powered data analysis platform that enables plain English queries for intuitive data interaction. Connects to warehouses, optimizes for AI and learns from interactions so it reduces data team workload.
arXiv is a free online library where researchers share pre-publication papers.
This paper introduces VideoAgent, an AI-driven system for long-form video understanding. Utilizing a large language model (LLM) as an agent, VideoAgent iteratively reasons, retrieves, and aggregates information to comprehend lengthy video sequences efficiently. By emulating human cognitive processes, it achieves superior effectiveness and efficiency, outperforming existing methods on benchmarks like EgoSchema and NExT-QA while using fewer frames on average. The agent-based approach emphasizes decision-making, enabling selective frame sampling and fine-grained query rewriting for enhanced performance. VideoAgent represents a significant advancement in long-form video understanding, highlighting the potential of agent-based systems in complex visual tasks.
This paper from Apple Inc. presents an innovative approach called Recurrent Drafter (ReDrafter) to enhance the efficiency of serving large language models (LLMs) through speculative decoding. ReDrafter combines aspects of classic two-model speculative decoding and the single-model approach exemplified by Medusa, leveraging a single lightweight draft head with a recurrent dependency design. Unlike Medusa, which employs multiple draft heads, ReDrafter introduces dependencies among predictive heads inspired by recurrent neural networks (RNNs). This design enables efficient beam search for candidate token sequences and dynamic tree attention without the need for a data-dependent tree attention structure. Empirical evaluations on various open-source language models demonstrate the effectiveness and practicality of ReDrafter in improving LLM inference efficiency under the speculative decoding framework.
The paper introduces RAFT (Retrieval Augmented Fine Tuning), a training methodology aimed at enhancing Large Language Models (LLMs) for domain-specific question-answering tasks within an "open-book" setting. Unlike traditional supervised fine-tuning or in-context retrieval methods, RAFT leverages a combination of both to adapt LLMs for Retrieval Augmented Generation (RAG) in specialized domains. RAFT trains LLMs to generate answers from a question and a set of retrieved documents, distinguishing between "oracle" documents containing answer-relevant information and "distractor" documents. By incorporating a chain-of-thought style response and citing verbatim from relevant documents, RAFT improves the model's ability to reason and handle imperfect retrievals. Experimental results on various datasets including PubMed, HotpotQA, and Gorilla API Bench demonstrate RAFT's effectiveness in enhancing LLM performance for in-domain RAG tasks, offering a practical approach for adapting LLMs to specialized domains.
The paper introduces a new approach, Multi-view ControlNet (MVControl), for controllable text-to-3D generation, filling a gap between text-to-3D and image-to-3D tasks. MVControl enhances pre-trained multi-view diffusion models by integrating additional input conditions like edge, depth, normal, and scribble maps. The architecture incorporates a conditioning module controlling the base diffusion model using local and global embeddings derived from input condition images and camera poses. Leveraging MVControl, the paper proposes an efficient multi-stage 3D generation pipeline utilizing Gaussian splatting and SuGaR representation, which binds Gaussians to mesh triangle faces. SuGaR alleviates geometry issues in 3D Gaussians and enables sculpting fine-grained geometry on the mesh. Extensive experiments demonstrate the method's robustness and ability to generate high-quality, controllable 3D content from text prompts and condition images. This work bridges the gap in controllable 3D generation, offering a promising avenue for future research in 3D vision and graphics.
The paper introduces MusicHiFi, a novel high-fidelity stereophonic vocoding method aimed at improving audio and music generation models. Traditional vocoders produce low-resolution monophonic audio, limiting effectiveness. MusicHiFi employs a cascade of three GANs to convert low-resolution mel-spectrograms to high-resolution stereophonic audio, integrating vocoding, bandwidth extension (BWE), and mono-to-stereo (M2S) upmixing. Key contributions include a unified GAN-based architecture, a fast BWE method with residual connections, and a mono-preserving M2S module. Objective evaluations demonstrate superior performance in vocoding, BWE, and M2S tasks, with significantly faster inference speed. Subjective listening tests confirm better audio quality and spatialization control compared to existing methods. MusicHiFi offers a promising solution for high-fidelity audio synthesis and spatialization.
AI Creates Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.