- AI Breakfast
- Posts
- Inside the Biases of LLMs
Inside the Biases of LLMs
Happy New Year. It’s Monday, January 1st.
Did you know: Microsoft Copilot is now available on the Apple App store? Copilot gives users free access to GPT-4 and DALLE-3, available in the US.
In today’s email:
AI in Technology and Computing
AI in Creative and Social Applications
AI in Health and Public Services
5 New AI Tools
Latest AI Research Papers
ChatGPT + DALLE 3 Attempt Comics
You read. We listen. Let us know what you think by replying to this email.
Interested in reaching 47,556 smart readers like you? To become an AI Breakfast sponsor, apply here.
Today’s trending AI news stories
AI in Technology and Computing
> A recent study reveals that large language models exhibit cognitive biases and do not align with human preferences in text evaluation. This issue is critical, as LLMs are increasingly used in applications like content recommendation and job application screening. If an LLM, tasked with assessing the quality of a cover letter, is biased towards longer texts or specific keywords, it could unjustly give preference to certain applicants. This disparity in evaluation could lead to unfair advantages, regardless of the actual qualifications of the candidates. The research, involving an analysis of 15 different LLMs using the Cognitive Bias Benchmark for LLMs as EvaluatoRs" (CoBBLEr), revealed biases like egocentricity and order preference, casting doubt on the suitability of LLMs for unbiased, human-like judgment in real-world scenarios.
> Nvidia is set to release a modified version of its RTX 4090 gaming chip in China, named RTX 4090D, with 11% fewer CUDA cores. The adjusted chip, designed to align with U.S. regulations on technology exports to China, is set to launch in January. Nvidia’s adaptation reflects broader U.S. policies restricting the export of advanced chips to China, targeting those used in AI applications. Nvidia has collaborated closely with the U.S. government in developing this compliant product.
> GitHub Copilot Chat, now generally available for all users, integrates natural language coding into Visual Studio Code and Visual Studio at no extra cost. This AI-powered chat feature, based on GP-4, enables coding assistance in various natural languages, refining coding efficiency and understanding. It assists in tasks like code explanation, security vulnerability detection, and writing unit tests. With personalized interaction, Copilot Chat aids developers in translating programming language and quickly addressing coding queries, thereby streamlining the development process.
> PlayHT, an AI-powered text-to-speech tool, offers a versatile range of 832 voices across 132 languages, incorporating APIs from Google, Amazon, IBM, and Microsoft. Originally a Chrome extension for Medium articles, it now aids in creating podcasts, e-learning materials, customer service interfaces, and more. Features include custom voice cloning, Speech Synthesis Markup Language support, secure cloud storage, and collaborative tools. Its latest addition, PlayHT Turbo, provides near real-time text-to-speech conversion. With plans starting at $9 per month, PlayHTR caters to a wide array of users from individual content creators to large enterprises.
> Chinese AI giant SenseTime ventures into foreign consumer markets with SenseRobot Go, an AI-powered robot that plays Go with a human opponent on a physical board. Launching in Japan on January 5 via Amazon, Takashimaya, and Go clubs, and already available in South Korea, this product marks SenseTime’s expansion beyond governmental and corporate AI systems. SenseRobot Go, priced from 3,999 yuan ($564), features a robotic arm for real-board gameplay, differentiating from systems like Google’s AlphaGo. SenseTime’s expansion into consumer AI follows its Chinese chess-playing robot and occurs amidst company challenges, including g co-founder Tang Xiao’ou's sudden passing and a drop in share prices.
> Camera manufacturers Nikon, Sony, and Canon are developing technology to embed digital signatures in their camera, ensuring the authenticity of photos against the rising tides of deepfakes. These signatures will include key data like data, time, location, and photographer identity, marking them tamper resistant. This technology is crucial for professionals needing credible images. Set for release in 2024, these cameras will support a global standard for digital signatures, compatible with the Verify tool, a web-based platform for checking image credentials.
> OpenAI’s DALL-E3 has exhibited a capacity to replicate copyrighted characters in images, despite not receiving explicit prompts to do so, posing significant legal questions. The AI system can generate recognizable likenesses of figures such as SpongeBob SquarePants and Super Mario using vague descriptions. While ChatGPT, incorporating DALL-E 3, is equipped with a stringent moderation system, it is completely safeguarded against copyright infringement. In contrast, Microsoft’s version of this technology has already encountered issues.
> Alexander Reben, an MIT-educated artist, is set to begin a three-month residency at OpenAI in January, bringing a tech-savvy perspective to a company viewed by some in the art world as a threat. His work, blending AI technology with physical art, has been showcased at the Crocker Art Museum in Sacramento. Reben’s role at AI will offer him an inside look at the development of generative AI tools amid ongoing discourse about AI’s impact on traditional art and creativity, exploring the balance between machine output and human curation.
AI in Health and Public Services
> AI in pharmaceuticals is significantly changing drug development offering faster, cheaper alternatives. AI-driven technology is identifying individualized treatments, as exemplified in a trial by the Medical University of Vienna using Exscientia's matchmaking technology, which successfully treated a blood cancer patient with a drug previously considered ineffective for his cancer type. Exscientia is not only pairing patients with existing drugs but also designing new ones with AI, with the first AI-developed drugs now undergoing clinical trials. Hundreds of startups are exploring AI in pharmaceuticals aiming to make drug discovery faster and more cost-effective.
> South Korea plans to implement drones and AI for real-time traffic monitoring starting this year. Drones will capture footage from 200 meters above, while algorithms will analyze this data to predict traffic conditions. Tests in 2023, including monitoring crowd density during the Seoul International Fireworks Festival and Halloween, demonstrated the system’s potential. The drones will also inspect construction sites for safety and space regulations.
5 new AI-powered tools from around the web
UserWise is an AI tool that offers sentiment analysis, trend tracking, and data-driven decision support improving customer understanding and business strategy optimization.
Impakt is an AI coach for home fitness via social, AI-powered platform. It offers personalized training, rep analysis, and efficiency maximization making an innovative approach to health.
Laterbase is an innovative AI-powered bookmark manager that simplifies saving and searching bookmarks by enabling direct chatting with bookmarks for quick retrieval and insight extraction.
Behavly is an AI-powered tool designed to optimize websites and eliminate decision paralysis by offering specific, actionable tweaks that enhance user experience and website performance. By analyzing your site’s URL, it provides tailored suggestions for content, design, and functionality.
Cliptutor enhances learning and teaching with AI-powered video lectures. It allows users to interact with videos, automatically generate quizzes, and create study guides.
arXiv is a free online library where researchers share pre-publication papers.
FlowVid, developed by a team from University of Texas at Austin and Meta GenAI, introduces a new approach to video-to-video (V2V) synthesis. It addresses the challenge of maintaining temporal consistency in videos using spatial conditions and temporal optical flow clues. Unlike previous methods that strictly adhere to optical flow, FlowVid leverages its benefits while managing imperfections in flow estimation. The model encodes optical flow via warping from its first frame, serving as a supplementary reference in the diffusion model. This allows editing of the first frame using any image-to-image (I2I) model and propagating edits to successive frames. FlowVid offers flexibility in working with existing I2I models, efficiency in generating high-resolution videos quickly and high quality in user studies. It supports modifications like stylization, object swaps, and local edits.
The study explores the capabilities of Multimodal Large Language Models (MLLMs), focusing on Gemini, an advanced model from Google. It compares Gemini with OpenAI’s GPT-4V in commonsense reasoning tasks. Initial assessments using the HellaSWAG dataset suggested Gemini lagged behind GPT models in this aspect. The study assesses Gemini’s performance across 12 datasets, covering various reasoning types. Results show Gemini’s competitive performance in language-based tasks but challenges in temporal and social reasoning, and emotion recognition in images. The study highlights the need for further advancements in MLLMs for nuanced commonsense reasoning.
The study introduces PanGu-π, an enhanced language model architecture aimed at addressing the feature collapse problem in Transformer models. It integrates nonlinearity via a Series Informed Activation Function and Augmented Shortcuts, significantly improving model performance and efficiency. Extensively tested, PanGu-π shows superior results in NLP tasks and specialized domains like finance and law, marking a notable advancement in large language model development.
The study introduces "Language Agent for Role-Playing (LARP)," a framework designed for open-world games. It incorporates a cognitive architecture for memory processing and decision-making, an environment interaction module with a learnable action space, and a method for aligning different personalities. LARP enhances user-agent interactions in open-world contexts by predefining agents with unique backgrounds and personalities. The framework underscores the diverse applications of language models in entertainment, education and simulations. It addresses the challenge of adapting general-purpose language agents to complex, open-world environments, requiring long-term memory and coherent actions. LARP’s modular approach and cognitive architecture aim to provide a realistic role-playing experience in gaming, showcasing the evolution of language agents in dynamic scenarios.
Developed by researchers at Google Research and MIT CSAIL, SynCLR is a groundbreaking approach for visual representation learning, leveraging synthetic data generated by LLMs and text-to-image models. This method, focusing on synthetic captions and images, utilizes contrastive learning and excels in tasks like image classification, rivaling traditional techniques such as CLIP and DINO v2, and surpassing previous methods in semantic segmentation, demonstrating the efficacy of generative models in high-quality visual learning.
ChatGPT + DALLE 3 Attempts Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.