- AI Breakfast
- Posts
- Google's New Paid AI System "Gemini Advanced"
Google's New Paid AI System "Gemini Advanced"
Good morning. It’s Monday, February 5th.
Did you know: 20 years ago today, TheFacebook.com was launched.
In today’s email:
Advancements in AI Technology
AI Applications in Various Fields
Ethical and Societal Implications of AI
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics
You read. We listen. Let us know what you think by replying to this email.
In Partnership with INFORMLY
Validate, Launch & Grow your Business with Informly’s AI-Powered Market Insights
Unleash your business's potential with Informly’s 11 comprehensive reports and guides, ranging from Idea Validation Reports, Pitch Decks, MVP Roadmaps, and Launch Plans, to Sales and Marketing Blueprints, Pivot Playbooks, Growth Hacking Guides and more.
Key Benefits:
Free 7-Day Trial: Access your personalised validation report tailored to your idea, audience and industry, for free for 7 days!
Save Time: Save weeks of research and get actionable insights swiftly!
Save Money: Access market intelligence at a fraction of traditional market research costs.
Minimize Risks: Spot weaknesses or flaws early and refine your idea with faster iterations.
Today’s trending AI news stories
Advancements in AI Technology
> Google is set to launch “Gemini Advanced,” which is appearing to be a formidable competitor to GPT-4, according to leaked information. Set for a February 7 release, this upgraded version of Google’s Bard chatbot will leverage the Gemini Ultra 1.0 model, promising enhanced capabilities for coding, reasoning, and collaboration. Positioned as a paid service, Gemini Advanced will undergo regular updates, including expanded multimodal capabilities and improved coding features.
> Google introduces MobileDiffusion, a breakthrough in text-to-image generation for smartphones, generating high-quality images in under a second. With a compact size of 520 million parameters, it’s optimized for mobile efficiency, delivering swift results on Android and iOS devices alike. MobileDiffusion employs a UNet architecture comprising a text encoder, diffusion network, and image encoder, reducing resource demands. Notably, it surpasses previous methods in speed and cross-platform performance, marking a leap in democratizing image generation on mobile devices.
> Apple is reportedly in talks to acquire German AI startup Brighter AI, known for its Deep Natural Anonymization technology, which offers more natural image anonymization compared to traditional blurring methods. Apple aims to integrate this technology into its products, particularly the Vision Pro VR/AR headset, to address privacy concerns about discreet image and video capture. This move suggests broader applications beyond VR/AR, potentially enhancing Apple’s mapping services.
> Researchers from the Chinese University of Hong Kong discover that AI models get better with data unrelated to their actual tasks. The Multimodal Pathway Transformer (M2PT) links diverse data sources through “cross-modal re-parameterization,” showing significant improvements in image, point cloud, video, and audio recognition. Despite the success, the theoretical basis for these gains remains open, prompting further investigation. This approach could deepen understanding of neural networks and advance AI capabilities. The findings underscore the potential of leveraging modality-complementary knowledge for enhanced model performance across various domains.
> MIT and IBM scientists have found a clever way around brute-forcing math using physics-enhanced deep surrogate (PEDS) models that blend AI with physics to solve complex equations more efficiently. By combining neural networks with physics simulators, they cut down the need for massive amounts of training data by a huge factor, making predictions with just 1,000 data points. This not only improves accuracy but also opens doors to faster weather forecasts and better-designed nuclear reactors. It’s like teaching a computer to think like a scientist, leading to smarter problem-solving in various fields.
AI Applications in Various Fields
> Hugging Face launches a new Chat Assistant feature, providing accessible AI chatbots in two clicks. Users can customize their chatbots with names, avatars, and descriptions, utilizing various language models like Llama 2 or Mixtral. Advantages include open-source model options, free inference, and easy public sharing. While the feature is in beta, there are plans to enhance it further by incorporating features like RAG and enabling web search, all outlined in the roadmap.
> Adobe Firefly AI is now available for Apple Vision Pro, complementing the previously announced Lightroom app. Firefly AI, a generative tool, creates images based on text descriptions, providing four suggestions per input. Unlike the web version, Vision Pro allows users to manipulate the generated images in its 3D environment. Although the product is not yet complete, panoramas and 360-degree imagery are in development. Pricing details remain undisclosed, but users anticipate the integration’s benefits for creative workflows.
> Adept introduces Fuyu-Heavy, its latest multimodal AI model for digital agents, showcasing prowess in UI understanding and action inference. Ranking as the third most capable multimodal model, it competes closely with GPT-4V and Gemini Ultra. Excelling in traditional benchmarks, Fuyu-Heavy is set to power Adept's enterprise product, drawing lessons for its successor. As Hugging Face challenges OpenAI's GPTs, Adept's innovation marks a significant step in advancing AI capabilities, particularly in understanding and interacting with user interfaces. The model's performance is demonstrated in a video highlighting its UI comprehension skills.
Ethical and Societal Implications of AI
> Chinese scientists have introduced Tong Tong, the world’s first AI child, created at the Beijing Institute for General Artificial Intelligence (BIGAI). Tong Tong, or Little Girl, is a virtual AI avatar representing a significant stride toward realizing a general artificial intelligence (AGI) agent—machines capable of thinking and reasoning like human beings. Developed under the leadership of Zhu Songchun, a globally recognized scholar in AI, Tong Tong exhibited autonomous actions and a level of independence previously unseen in virtual entities. This marks a noteworthy achievement in AGI research, moving beyond conventional AI models.
> In simulated wargames, AI chatbots, including OpenAI’s GPT-4, have displayed a propensity for violence and nuclear strikes, posing unpredictable risks of escalation. This occurs as the US military integrates AI into planning, with Palantir and Scale AI’s assistance. OpenAI, once opposed to military use, now collaborates with the US Department of Defense.
5 new AI-powered tools from around the web
Quartzite AI streamlines prompt creation for language models like GPT-4 and DALL-E 3, offering advanced Markdown editing, version history, and data management. With a pay-per-use GPT pricing model.
UserSketch simplifies data consolidation, aggregating customer interactions into a unified interface. It enables prompt-based data retrieval within a single tab, optimizing workflow efficiency.
CuServly empowers businesses with an AI-driven chatbot 24/7 support in 95 languages. It allows easy creation and training of chatbots, reducing support workload and offering powerful analytics insights for better engagement.
Ytube AI is a content transformation platform converting YouTube videos into SEO-friendly written content in multiple languages. Simplify repurposing with video-to-text AI conversion, SEO optimization, customizable output, and diverse export options.
Reggelia is a speech-focused language learning tutor that helps users become fluent in their chosen language. By engaging in conversations and analyzing speech patterns, Reggie offers personalized recommendations for improvement, acting as a live translator and facilitating real language practice.
arXiv is a free online library where researchers share pre-publication papers.
StepCoder presents a pioneering RL framework for code generation, adept at tackling the intricacies of exploration and optimization. Through CCCS, complex tasks are seamlessly broken down into manageable subgoals, facilitating more effective learning. FGO complements this by ensuring precise code optimization, maximizing the utility of generated code. The meticulously curated APPS+ dataset guarantees rigorous training, excluding irrelevant code snippets and ensuring a focused learning process. Experimental results substantiate its effectiveness, showcasing its superiority over existing methods in enhancing code quality through reinforcement learning. StepCoder stands as a testament to the potential of RL in revolutionizing code generation, offering a pathway towards more efficient and reliable software development processes.
POKE´LLMON, developed by researchers at Georgia Tech, is the first LLM-embodied agent to achieve human-parity performance in tactical battle games like Pokemon battles. It incorporates three key strategies: in-context reinforcement learning, knowledge-augmented generation, and consistent action generation. These strategies enable POKE´LLMON to exhibit human-like battle strategies and just-in-time decision-making, achieving significant win rates in online battles against human players. The research also addresses challenges such as hallucination and panic switching, laying the groundwork for LLM-embodied agents to excel in various game environments, marking a significant step towards the pursuit of Artificial General Intelligence.
AToM (Amortized Text-to-Mesh) is introduced as a feedforward framework for generating high-quality textured meshes from text prompts, achieving results in under 1 second. Unlike per-prompt methods, AToM optimizes across multiple prompts simultaneously, reducing training cost by 10x. Key to its success is a novel triplane-based architecture and two-stage amortized optimization, ensuring stable training and scalability. Compared to prior methods, AToM significantly outperforms in accuracy and generalizability, producing more distinguishable and higher-quality 3D outputs. It addresses instability issues in prior approaches, offering efficient and generalizable text-to-mesh generation.
The paper introduces a new benchmark focusing on travel planning, challenging language agents to navigate complex real-world scenarios. With over four million data records and 1,225 meticulously curated planning intents, TravelPlanner evaluates language agents' ability to handle multi-constraint tasks. Despite advancements in large language models (LLMs), current agents struggle with a success rate of only 0.6%, highlighting the complexity of the challenge. However, the benchmark provides a crucial testbed for future research, aiming to push language agents towards human-level planning capabilities. TravelPlanner offers a significant contribution to AI progress, inspiring innovation in agent development.
This paper investigates the comparative performance of Transformers and Generalized State Space Models (GSSMs) in the context of copying information. Despite GSSMs' advantages in efficiency, Transformers emerge as superior in copying tasks, particularly with the ability to handle exponential-length sequences effectively. The study combines theoretical analysis and empirical evidence, showcasing Transformers' efficiency and generalization in copying tasks. Pretrained models further emphasize Transformers' superiority in copying and retrieving context. While GSSMs have their merits, suggesting hybrid architectures, Transformers demonstrate a fundamental edge in practical copying tasks, underscoring their significance in sequence modeling.
ChatGPT Creates Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.