AI Breakfast
Posts
AI Funding, CES, and Bard Advanced

AI Funding, CES, and Bard Advanced

AI Breakfast
January 05, 2024

Good morning. It’s Friday, January 5th.

Did you know: 20 years ago today, NASA's Spirit rover successfully landed on Mars?

In today’s email:

AI in Business and Investment
AI in Consumer and Entertainment
AI in Social Impact and Science
AI in Policy and Corporate Strategy
6 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI in Business and Investment

> Intel, under CEO Pat Gelsinger, is launching a new AI firm, Articul8 AI, independent from its core operations. Developed with Boston Consulting Group, Articul8 AI stems from Intel’s generative AI technology capable of processing text and images. The new entity, led by ex-Intel VP Arun Subramaniyan, addresses concerns around data privacy and cost efficiency in AI deployment. Financial details weren’t disclosed, but investors include DigitalBridge Group, Fin Capital, and others. The spinout aligns with Intel’s strategy of external funding for its divisions, following Mobileye. Global separation and a planned IPO for its programmable chip unit.

> Robin AI, a UK-based startup, secured $26 million in Series B funding for its AI legal copilot, raising its total to nearly $43 million. Leveraging Anthropic PBC’s Claude LLM, fine-tuned with over 2 million contracts, Robin AI’s technology drastically reduces contract review time and costs. Its Microsoft Word add-in offers features for contract creation, review, and editing. The funding, led by Temasek Holding, will fuel U.S. expansion, where it earns most revenue, and growth into Asia-Pacific. Robin AI’s technology is employed by major clients like PepsiCo and PwC.

> Perplexity, an emerging AI search startup, recently achieved a significant milestone by raising $74 million, reaching an impressive valuation of $520 million. This funding round attracted notable investors including Jeff Bezos and former YouTube CEO Susan Wojcicki. Despite being less than two years old, Perplexity’s innovative “answer engine” has rapidly gained traction, now serving 10 million monthly users. The company’s exponential growth is evident, with its website and app visits skyrocketing from 2.2 million in December 2022 to 53 million by November. It is positioned as a formidable competitor to Google’s SGE.

> OpenAI is proposing annual deals worth up to $5 million with news publishers for the rights to use their copyrighted articles in AI model training. This strategic move marks a significant shift in the AI industry’s approach to data acquisition, steering away from web crawler-based methods towards more ethical and legally sound partnerships. This new trend in AI development not only mirrors previous non-AI media collaborations but also underscores the increasing importance and value of licensed content in the realm of AI.

AI in Consumer and Entertainment

> Microsoft, leading the charge in AI integration, announced the addition of a physical Copilot key to PC keyboards, a move aimed at making 2024 the “year of the AI PC.” The new Copilot key’s purpose is to streamline and enhance user interaction with AI-powered features and tools. Microsoft’s focus on incorporating AI into everyday computing devices reflects its commitment to advancing AI accessibility and utility in personal computing.

> CES 2024 in Las Vegas will center on artificial intelligence, drawing over 130,000 attendees and 4,000 companies. Highlighting the event, Intel unveils AI PCs. predicting a significant upgrade cycle. Keynotes include CEOs from Intel, Qualcomm, and Snap, covering sectors like retail, health, and tech. The show spans 2.4 million square feet, showcasing innovations in consumer electronics. Notable attendees include BestBuy, Walmart, and heavyweights from various technology domains.

> A recent study by Bizreport indicates a significant pay gap in the job market, revealing that AI roles are now offering salaries 77.53% higher than other fields. Even those with just a foundational understanding of programming and AI models can thrive. In particular, entry-level positions in AI are remarkably well-paid, earning about 128.23% more than their non-AI counterparts. This trend underscores the growing importance of natural language skills in programming and AI, positioning it as a key competency in the current job market.

AI in Policy and Corporate Strategy

> Google is reportedly developing “Bard Advanced,” a new premium version of its Bard AI, powered by Gemini Ultra, its most powerful language model. This enhanced version, accessible through a Google One subscription, boasts superior skills in complex mathematical and logical reasoning. Intriguing new features under consideration include “Motoko” for bespoke bot creation, an innovative “power up” tool to enrich user queries, a “Gallery” section for Bard exploration, and a “tasks” tab for monitoring extended activities. These updates are yet to be confirmed officially.

> 2024 is poised to be a landmark year in the intersection of generative AI, foundational models, and robotics, largely propelled by Google’s DeepMind Robotics. The focus is on AutoRT, a pioneering system designed to boost robots’ awareness of their surroundings through Visual Language Models. This technology equips a group of robots with the ability to perceive and interact with their environment and objects more intelligently. Alongside, DeepMind introduces RT-Trajectory, an innovative training method utilizing video inputs. By overlaying a 2D sketch of the robot’s arm movements on videos, it provides practical visual guidance, effectively doubling the success rate of previous training models.

> OpenAI is set to launch its GPT Store next week, a platform where users can share and monetize custom AI agents created using OpenAI’s GPT-4 language model. After a bit of a shuffle in plans, with the initial rollout pushed from November, this launch is exclusively available for ChatGPT Plus and enterprise members. While specifics of compensating creators based on their AI’s popularity are still under wraps, this development is a leap forward in making AI more user-driven and commercially viable. After a hectic period for the company, OpenAI is steaming ahead with this ambitious project.

6 new AI-powered tools from around the web

Netjet.io, is a user-friendly, drag-and-drop website builder offering code-free design, AI enhancement, SEO, auto-translation, app integration, lead management, customizable templates, and extensive font options.

IconKit is an AI-driven icon generator providing custom, easy-to-create icons with a user-friendly interface, customizable options, quick turnaround, lifetime credits, and commercial usability.

Luxand.Cloud offers an AI facial recognition API for smooth integration into apps, websites, or software, featuring accurate face comparison, age/gender/emotion detection, secure storage, multi-language support, scalability, and diverse industry applications.

gptengineer.app revolutionizes web app development, enabling rapid prototyping via plain English instructions. It simplifies the design process, offering instant deployment and iterative refinement, bridging the gap between AI tools and user-friendly, no-code solutions.

Imagica offers a no-code platform for AI app creation, featuring chat interfaces, multimodal interactions, and image generation. It supports app publishing, monetization, and smooth integration with Natural OS for innovative interfaces.

OpenTaskAI is a dynamic marketplace connecting AI freelancers with businesses globally. It offers a collaborative platform for AI professionals to meet business needs, enhances AI education in partnership with universities, and provides tools for skill development.

arXiv is a free online library where researchers share pre-publication papers.

📄 GPT-4V(ision) is a Generalist Web Agent, if Grounded

The paper presents GPT-4V(ision), an AI Generalist Web Agent demonstrating a 50% success rate in live website tasks using oracle grounding, outperforming text-only models. Despite challenges in precise grounding, it effectively integrates HTML and visuals for enhanced web interactions. The study emphasizes the potential of Large Multimodal Models (LMMs) in web automation and underscores the importance of online evaluations for accurately assessing dynamic web tasks, suggesting areas for further improvement in visual grounding and reducing hallucinations.

📄 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

The paper introduces a method for creating full-bodied photorealistic avatars that gesture based on the dynamics of dyadic conversations. Utilizing conversational audio, it generates diverse and expressive facial, body, and hand motions. This is achieved by combining vector quantization for sample diversity with diffusion for high-frequency details. The motion is rendered as photorealistic avatars, capturing nuances like sneers and smirks. A unique multi-view conversational dataset enables photorealistic reconstruction. The model outperformed diffusion- and VQ-only methods, emphasizing the importance of photorealism in accurately assessing motion subtleties and gestures. The code and dataset are publicly available.

📄 Efficient Hybrid Zoom using Camera Fusion on Mobile Phones

The study introduces an efficient system for hybrid zoom super-resolution on mobile devices, leveraging synchronous Wide (W) and Telephoto (T) shots combined with machine learning models. This approach aligns and transfers details from the T image to the W image, overcoming the detail loss common in digital zoom. The system includes an adaptive blending method that deals with depth-of-field mismatches, scene occlusions, flow uncertainties, and alignment errors. The training process uses a dual-phone camera rig for capturing real-world inputs and ground truths, minimizing the domain gap between training and application. The method generates 12-megapixel images with 500Ms on mobile platforms and outperforms state-of-the-art methods in real-world scenarios. The key contributions include an ML-based system for efficient processing on mobile devices, a training strategy using dual-phone camera rigs, and the release of a diverse dataset (Hzsr dataset) for future research. The system's total latency is 521ms with a peak memory usage of 300MB, making it highly efficient for mobile applications.

📄 ODIN: A Single Model for 2D and 3D Perception

Microsoft, Stanford and Carnegie Mellon University introduce ODIN (Omni-Dimensional INstance segmentation). This innovative model challenges the traditional belief that 2D and 3D perception require distinct architectures. ODIN can segment and label both 2D RGB images and 3D point clouds using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. The key differentiation between 2D and 3D features is achieved through positional encodings of the tokens involved. ODIN demonstrates advanced performance on various 3D instance segmentation benchmarks like ScanNet200, Matterport3D, and AI2THOR, and competitive performance on ScanNet, S3DIS, and COCO.