- AI Breakfast
- Posts
- This startup lets AI take over your mouse cursor
This startup lets AI take over your mouse cursor
Good morning. It’s Friday, February 2nd.
Did you know: On this day in 2012, Facebook (now Meta) filed for its initial public offering. As of this morning, it has surpassed a $1T market cap.
In today’s email:
Advancements in AI Models
AI Applications and Innovations
AI in Industry and Commerce
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics
You read. We listen. Let us know what you think by replying to this email.
In partnership with BAY AREA TIMES
Looking for visuals and charts, rather than words, to understand the daily news?
Bay Area Times is a visual-based newsletter on business and tech, with 250,000+ subscribers.
Today’s trending AI news stories
AI Applications and Innovations
> Twin Labs, a Paris-based startup, is automating repetitive tasks by letting AI take over your mouse cursor using GPT-4V. Unlike traditional LLMs, GPT-4V has been trained on various software interfaces, enabling it to understand features behind buttons. Twin Labs aims to simplify tasks like onboarding new employees and reordering items by automating web interactions. The startup raised $3 million in pre-seed funding from investors like Betaworks and plans to offer pre-trained tasks before opening its platform for custom tasks.
> Meta plans to deploy its customized AI chip, “Artemis,” in 2024 to augment its data centers’ AI capabilities, aiming to reduce dependence on Nvidia GPUs. Artemis, designed for “inference,” will complement Nvidia chips in Meta’s data centers and enhance efficiency in running recommendation models for social networks. This initiative aligns with Meta’s broader strategy to control escalating AI workload costs. In May 2023, Meta introduced the Meta Training and Inference Accelerator (MTIA) chip family, with Artemis being an advanced iteration. The move echoes industry trends where companies like OpenAI and Microsoft are developing proprietary AI chips for cost-effective, high-performance models.
> Google is enhancing Google Maps with generative AI, leveraging large language models (LLMs) to suggest places based on user queries. The feature analyzes detailed information on over 250 million places and insights from 300 million contributors. Initially available in the US, the AI-driven recommendations aim to transform Maps into a tool for discovering new places. Google is collaborating with its Local Guides community to ensure thoughtful integration and plans to expand access to other users later.
> Google has rolled out ImageFX, an AI-fueled image generator powered by Imagen 2 from DeepMind, reminiscent of DALL-E 3 and Midjourney. With its "expressive chips" for creative prompts, the tool incorporates safeguards such as SynthID watermarking. Imagen 2 integration expands across Google services like Search Generative Experience (SGE) and Vertex AI, amidst concerns about AI data training transparency. Simultaneously, Google expands its GenAI toolkit with MusicFX for music loop creation and TextFX for lyric writing, raising questions about copyright and authenticity in AI-generated content. Google's initiative underscores a substantial investment in advancing GenAI technology across visual and auditory domains.
> Midjourney’s version 6 Beta introduces “Style References,” algorithms enhancing visual consistency across images. These algorithms, termed “Consistent Styles” allow users to specify desired aesthetics by providing image URLs. Users can adjust relative weights and strength of stylization, fostering nuanced control over creative output. Updates also include improved text rendering, transparent background creation, and precise color control via hex codes.
Advancements in AI Models
> Mistral CEO recently confirmed the “leak” of a new AI model, edging closer to GPT-4’s performance. Initially posted on HuggingFace by user "Miqu Dev," the model stirred interest due to its capabilities. Mistral’s CEO, Arthur Mensch, acknowledged the leak. This development hints at Mistral’s ambitions to rival or even surpass GPT-4, potentially reshaping the AI landscape. OpenAI could face heightened competition as enterprises increasingly explore open-source models, underscoring the growing influence of the open-source AI community.
> OpenAI is exploring GPT-4’s potential in creating bioweapons and developing an early warning system. The system aims to detect biological threat development, serving as a “tripwire” for potential misuse. A study involving 100 participants, including biologists and students, found slight improvements in accuracy and completeness when using GPT-4 alongside the Internet for biohazard-related tasks. However, these effects were not statistically significant. Limitations include assessing information access rather than practical application and not examining GPT-4’s role in developing new bioweapons.
> Normic AI introduces Nomic Embed, an open-source text embedding model surpassing OpenAI’s Ada-002. The model excels in short and long-context tasks, boasting reproducibility, auditability, and an 8192-context length. It outperforms competitors on the Massive Text Embedding Benchmark, although falls short on the Jina Long Context Benchmark. Offering model weights and full training data for auditability, Nomic Embed is accessible via the Nomic Atlas Embedding API and Nomic Atlas Enterprise for enterprises, with one million free tokens for production use.
AI in Industry and Commerce
> The FCC is taking steps to criminalize AI-generated robocalls, aiming to combat the proliferation of unsolicited calls using artificial voices. The decision follows a recent incident where a fake robocall imitating President Joe Biden targeted New Hampshire voters. Proposed under the Telephone Consumer Protection Act, the change would empower state attorneys general to prosecute spammers. FCC Chairwoman Jessica Rosenworcel highlighted the growing threat of AI-generated scams, emphasizing the need for action to protect consumers, particularly vulnerable groups like seniors. The move has garnered support from organizations like AARP.
> Amazon introduced Rufus, a generative AI shopping assistant integrated into its mobile app. Trained in Amazon’s catalog and web data, Rufus answers customer queries, offers product comparisons, and facilitates discovery within the Amazon shopping experience. Features include research assistance, occasion-based shopping, category comparisons, and tailored recommendations. Rufus is in beta, gradually rolling out to US customers.
> Apple CEO Tim Cook has confirmed that the company is actively developing generative AI features slated for release “later this year.” During the recent earnings call, Cook emphasized Apple’s significant investment in AI technologies, hinting at substantial enhancements across iOS, iPadOS, and macOS. While analysts sought specifics, Cook remained tight-lipped, indicating Apple’s usual strategy of revealing developments closer to release.
> Google Bard’s recent updates extend Gemini Pro globally, enhancing its capabilities across over 40 languages and 230+ countries. This advanced feature, acclaimed by evaluators, fortifies Bard’s understanding and reasoning abilities. Moreover, a new double-check function enables response validation in multiple languages. Additionally, Bard now boasts an image generation tool powered by Imagen 2, offering users the ability to create custom visuals. To ensure responsible usage, Bard embeds digitally identifiable watermarks into generated images and applies filters to prevent the creation of inappropriate content, aligning with its AI principles.
> Pinecone’s study reveals that access to external data makes open-source models better than GPT-4. LLMs with RAG and sufficient data achieve a 13% improvement in response quality for the “Faithfulness” metric, even when trained on the same information. The positive effect scales with data availability, with sample sizes up to one billion documents tested, showcasing RAG’s potential to boost LLM accuracy.
5 new AI-powered tools from around the web
LALAL.AI, a powerful AI for audio stem separation, boasts 20 times more training, twice the speed, and 70% cleaner stem extraction. Features expanded stems and desktop/mobile applications for easier accessibility.
BrainSoup is an AI-driven platform for managing specialized AI agents via natural language. It combines LLMs and Semantic Kernel for personalized, privacy-focused automation.
Promptly AI is a no-code GenAI platform enabling app and workflow creation. It provides access to LLMs, AI agents, and connectors streamlining GenAI orchestration to make AI accessible for all.
Learniverse AI offers personalized learning paths with tailored lessons, projects, and quizzes. Users can efficiently create customized learning journeys optimized for their individual goals.
Sprite Fusion simplifies game map creation with its user-friendly, web-based tilemap editor. It integrates with Unity and Godot, allowing effortless importing, drawing, and exporting of maps.
arXiv is a free online library where researchers share pre-publication papers.
ReplaceAnything3D presents RAM3D, enabling text-guided 3D scene editing with object replacement. RAM3D swaps objects while maintaining 3D consistency across views, showcased with various realistic scenes. It leverages multiview images and text prompts for object removal, replacement, and addition, ensuring seamless integration with scene backgrounds. The method combines LangSAM for object detection and segmentation, text-guided 3D inpainting for object removal, and Hifa for text-to-3D distillation. Unlike 2D methods, RAM3D addresses multi-view consistency challenges, achieving coherent results. It significantly advances scene editing capabilities, facilitating efficient creation and modification of 3D content for immersive applications and multimedia experiences.
The paper from FAIR Meta introduces Chain-of-Abstraction (CoA) reasoning, a method designed to enhance large language models' (LLMs) multi-step reasoning capabilities by efficiently leveraging external tools. CoA trains LLMs to generate abstract reasoning chains, which are then filled with domain-specific knowledge from tools. This approach enables LLMs to learn more general reasoning strategies, improving performance across various domains. Unlike previous methods, CoA executes tool calls after reasoning chains are generated, optimizing inference speed. Evaluation across mathematical reasoning and Wikipedia QA tasks demonstrates consistent performance improvements and faster inference speeds. CoA decouples LLMs' general reasoning from domain-specific knowledge, promising adaptability to new reasoning scenarios.
Google researchers introduce MobileDiffusion, an efficient text-to-image diffusion model tailored for mobile devices. MobileDiffusion achieves sub-second inference times for generating high-quality images on mobile devices, surpassing previous state-of-the-art methods. The model's efficiency stems from optimizations in architecture and sampling techniques, including distillation and diffusion-GAN finetuning. A detailed examination of the model's architecture focuses on reducing redundancy and enhancing computational efficiency while maintaining image quality. Empirical studies demonstrate the effectiveness of MobileDiffusion, showcasing its potential for diverse applications. This work addresses the challenge of deploying large-scale generative models on resource-constrained devices, opening new possibilities for on-device image generation.
The paper, authored by researchers at Apple, investigates Large Language Models' (LLMs) contextual understanding abilities through a benchmark featuring four tasks and nine datasets tailored for generative models. It highlights challenges with in-context learning, particularly in grasping nuanced linguistic features, and evaluates the impact of model compression techniques. The study introduces prompts for in-context learning evaluations and analyzes LLM performance across tasks and model sizes, including post-training quantization models. The findings underscore the importance of contextual comprehension in language understanding and offer insights into the limitations and potential of LLMs. This benchmark, conducted by Apple researchers, serves as a valuable addition to existing evaluations, providing a comprehensive perspective on contextual language understanding.
The report introduces Dolma, a three trillion token English corpus aimed at fostering open research in language model pretraining. It addresses the lack of transparency regarding pre-training data by providing detailed information on Dolma's construction, including sources such as web content, scientific papers, code, social media, and encyclopedic materials. Dolma aims to support research on how training data impacts model capabilities and limitations. The release includes the Dolma Corpus, a diverse collection from seven sources, and the Dolma Toolkit for data curation. Design goals prioritize openness, scalability, contribution to open corpora, and minimizing harm, guided by principles of consistency, scalability, openness, and risk mitigation.
ChatGPT Creates Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.