- AI Breakfast
- Posts
- Code Llama 70B Now Boasts 100k Context Windows
Code Llama 70B Now Boasts 100k Context Windows
Good morning. It’s Wednesday, January 31st.
Did you know: On this day in 1958, Explorer 1 was the first artificial space satellite launched by the United States?
In today’s email:
AI Model Development and Innovation
Corporate AI Partnerships and Initiatives
AI Impact on Society and Regulation
AI Research and Advancements
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics
You read. We listen. Let us know what you think by replying to this email.
In partnership with INCOGNI
Your info is on the dark web
While other personal data removal services only focus on one type of data broker, Incogni helps remove your personal information from all broker types, including your information visible on People Search Sites (e.g. WhitePages.com). Our readers exclusively get 55% off Incogni annual plans with code PRIVACY.
Today’s trending AI news stories
AI Model Development and Innovation
> Meta AI introduces Code Llama 70B, an open-source AI model designed to revolutionize code generation. With the ability to write code in multiple languages, including Python, C++, Java, and PHP Code Llama 70B outperforms previous models in speed, accuracy, and versatility. Trained on 500 billion tokens of code-related data, this model boasts a larger context window of 100,000 tokens and can handle longer and more complex code. With variants optimized for tasks like natural language instruction understanding and Python programming, Code Llama 70B promises to democratize software development and drive innovation across industries.
> Nightshade, a tool developed by University of Chicago researchers to prevent AI model training on artists’ works without consent, garnered a staggering 250,000 downloads in just five days. Ben Zhao, project lead, expressed astonishment at the overwhelming response, beckoning a global demand to safeguard creative content. Nighshade “poisons” AI image models by subtly altering artwork pixels, rendering them unusable for machine learning. Its predecessor, Glaze, amassed 2.2 million downloads since April 2023. Zhao hinted at a combined Glaze-Nightshade tool in the pipeline and a potential open-source release.
> Google has released an “AI Opportunity whitepaper” to assist ASEAN governments in harnessing AI’s potential, offering policy recommendations for maximizing benefits. Suggestions include in innovation infrastructure, building an AI-ready workforce, and promoting inclusive adoption. Google commits to supporting these efforts through partnerships and initiatives like AI programs in Singapore, cross-sector industry roundtables, and an AI Policy and Skilling Lab for Southeast Asia.
> Studio, a MasterClass competitor, has launched its AI-powered online music school. Targeting musicians, songwriters, and producers, it offers personalized curriculums, feedback, and an AI coach leveraging OpenAI’s GPT-4. The AI coach, enhanced by two proprietary frameworks, overcomes GPT-4’s educational limitations, ensuring effective long-term planning, pacing, sequencing, and personalization for creative fields like music. Studio’s algorithm also matches students with peer groups for weekly feedback. While priced at $199/month, the platform focuses on experienced creatives seeking skill enhancement. Studio plans to launch more AI-powered schools this year, covering areas like writing, filmmaking, and design.
> Semron, a German startup, is innovating with 3D-scaled chips for local AI processing on mobile devices. Co-founded by engineering graduates from Dresden University of Technology, Semron’s chips use electrical fields instead of currents for computations, promising higher energy efficiency and lower production costs. Leveraging memcapacitors, rather than transistors like traditional chips, Semrom aims to transform AI compute resources, potentially offering hundreds of layers of memcapacitors on a single chip. Despite competition, Semron has secured €10 million in funding, attracting investors like SquareOne, Join Capital, and OTB Ventures.
Corporate AI Partnerships and Initiatives
> OpenAI CEO Sam Altman is in discussions with Samsung and SK Group in South Korea to forge an AI semiconductor alliance, exploring investment opportunities. Altman reportedly contemplates manufacturing AI chips and expressed interest in sourcing High Bandwidth Memory (HBM) from Samsung and SK. Discussions with potential major investors, including G42 and Softbank Group Corp., are ongoing to establish a multi-billion dollar global network of AI chip factories. This strategic move aligns with Altman’s vision of shaping OpenAI as a significant player in the entire AI market value chain, extending to collaborations with Microsoft and ex-Apple designer Jony Ive.
> Microsoft and OpenAI are reportedly discussing an investment of up to $500 million in Figure AI Inc., a company focused on developing humanoid robots. The potential deal could value Figure AI at $1.9 billion, with Microsoft contributing around $95 million and OpenAI investing $5 million. The startup aims to deploy its AI-powered robot, Figure 01, for tasks deemed hazardous for humans, addressing labor shortages. While the funding round is ongoing, if finalized, Figure AI’s valuation may increase, marking it as a unicorn in the industry. The company previously raised $70 million in a funding round led by Parkway Venture Capital.
> Alibaba Cloud unveils a serverless iteration of its Platform for AI (PAI)-Elastic Algorithm Service (EAS), integrating generative AI to boost deployment and inference of models. The incorporation of vector engine technology into Hologres, Elasticsearch and OpenSearch facilitates access to large language models (LLMs). Cutting inference costs by 50%, it provides an efficient model deployment solution. The serverless version, undergoing beta testing for image generation, will expand its capabilities in March 2024, supporting open-source LLMs and models from Alibaba’s AI model community, ModelScope.
> Google Assistant is undergoing a major transformation with the integration of its AI chatbot, Bard, expected to roll out in March. Utilizing Google’s Gemini family of Large Language Models, the virtual assistant’s enhanced features will include multiple input methods such as voice commands, keyboard typing, and queries with images. Leaked details reveal users can activate the Bard-enabled Google Assistant with commands like “Hey Google.” Despite Google’s earlier announcement about the integration, specifics about the rollout and features have largely been undisclosed until now, creating anticipation and curiosity among users and tech enthusiasts.
AI Impact on Society and Regulation
> Elon Musk’s Neuralink successfully implanted its brain device in a human for the first time. The patient is reportedly ‘recovering well,’ holding significant promise for individuals with severe paralysis and offering them the potential to communicate and interact with digital platforms using neural signals. Dubbed “Telepathy,” Neuralink’s innovative product marks a significant leap forward in neurotechnology. Alongside other pioneering companies like Synchron and Precision Neuroscience, Neuralink is spearheading the evolution of brain-computer interface technology, paving the way for transformative applications in healthcare and beyond.
> The New York Times is assembling a team to explore the integration of generative AI into its newsroom processes. Led by Zach Stewards, the team will prototype generative AI applications to aid reporting and reader engagement. Job listings for engineers, editors, and designers have been posted, emphasizing collaboration with existing news, product, and technology teams. Despite past disputes with generative AI, the Times now aims to balance AI innovation with journalistic integrity, affirming that human journalists will continue to lead news production. Other news organizations, like Axel Springer and The Associated Press, are also venturing into AI integration.
> Microsoft introduces the Voice Clarity feature in Windows 11, leveraging AI to enhance audio quality during video calls. Previously exclusive to Surface devices, Voice Clarity now extends to all users, minimizing background noise, reverberation, and echo in real-time without requiring additional hardware. Enabled by default, it supports apps using Communications Signal Processing Mode like Phone Link and WhatsApp. This feature not only benefits online meetings but also enhances PC gaming experiences. Additionally, the latest Canary build of Windows 11 offers features like accessing recent Android device photos in Snipping Tool and support for USB 80 Gbps on select devices.
AI Research and Advancements
> Wharton School researchers have explored how different prompt methods impact the diversity of ideas generated by GPT-4. Focusing on idea generation, the study tested various prompting methods, including minimal prompts, personality-infused prompts, and creativity techniques. The "Chain of Thought" (CoT) prompting, where the AI model solves a task in multiple steps, stood out, generating the most unique ideas. CoT prompting's step-by-step approach proved effective, almost matching the idea level of a student group. The study underscores the importance of choosing the right prompting method for maximizing the diversity of ideas when utilizing AI in the ideation process.
> University of Cambridge researchers have developed a robotic sensor with AI capabilities that reads braille at double the speed of most human readers. Using machine learning algorithms, the robot slid over braille text at 315 words per minute with 87% accuracy. Although not designed as assistive technology, the robot’s high sensitivity to read braille makes it valuable for testing developments in robotic hands or prosthetics. The sensor combines camera and sensor data, with machine learning algorithms handling image processing. Researchers hope to scale the technology to humanoid hands or skin in the future.
> Arc Search, a new iOS app from The Browser Company, integrates browser, search engine, and AI functionalities to deliver personalized web browsing experiences. By typing queries and tapping "Browse for me," users receive custom-built webpages with relevant information aggregated from multiple sources. The app reflects The Browser Company's vision of unifying browsing, search, and AI chatbot capabilities within a single platform. As part of a broader strategy, Arc Search signals The Browser Company's shift towards AI-driven features and cross-platform syncing. Despite its innovative approach, questions remain about data sourcing, personalization, and monetization strategies.
5 new AI-powered tools from around the web
Momen is a no-code AI app builder empowering users to create and deploy AI-powered applications effortlessly. Integrates context-aware GPTs for accurate responses. Ideal for custom AI knowledge bases, customer service enhancement, and streamlined UI optimization.
GPT Analytics redefines AI insights, tracking GPT usage and conversation. Dive deep into performance metrics, engagement tracking and smooth integration.
Zerve AI offers a transformative data science development environment combining the strengths of Jupyter, Figma, and VSCode. Experience stable data exploration, Figma-like collaboration, parallelized code, language flexibility, and simplified development.
AI Figma to Code by Anima offers personalized code generation. With the GenAI engine, customize code to match your coding conventions. Start in Figma, then tweak code with free text instructions, presets, or code samples.
Lilac Garden accelerates dataset transformation with LLMs, enhancing data quality for AI practitioners. An open-source tool, Lilac enables efficient clustering, signal computation, and data editing.
arXiv is a free online library where researchers share pre-publication papers.
Media2Face pioneers co-speech facial animation synthesis using GNPFA and the M2F-D dataset, enabling nuanced expressions through audio, text, and image inputs. It fosters immersive experiences, enhancing AI virtual companions' emotional resonance for deeper human connections and immersive interactions. By decoupling expressions from identity, GNPFA allows flexible conditioning from diverse modalities, ensuring high fidelity and broad expressiveness in 3D facial animation. With Media2Face, users can generate lifelike facial animations from various sources, from dialogues to music, advancing the realm of human-centric AI with its rich multi-modal conditioning and realistic animation capabilities.
The paper introduces StableIdentity, a novel framework for one-shot customized generation in text-to-image models. While recent models excel in human-centric generation, achieving stable identity preservation and flexible editability remains a challenge. StableIdentity incorporates identity and editability priors, leveraging a face encoder for identity representation and celeb names for editability. The proposed masked two-phase diffusion loss optimizes pixel-level details, ensuring stable identity learning. Experiments demonstrate the method's effectiveness, outperforming existing customization approaches. Notably, StableIdentity enables direct injection of identity into video/3D generation without finetuning, showcasing its potential for unifying image, video, and 3D customized generation tasks.
The paper introduces InternLM-XComposer2, a state-of-the-art vision-language model specializing in free-form text-image composition and comprehension. It surpasses previous models by adeptly crafting integrated text-image content from diverse inputs like outlines and reference images. The model employs a Partial LoRA (PLoRA) approach, applying additional parameters exclusively to image tokens to preserve language knowledge integrity. Experimental results demonstrate superior performance across various benchmarks, outperforming existing multimodal models and matching or surpassing advanced models like GPT-4V and Gemini Pro. This advancement signifies remarkable proficiency in multimodal understanding, paving the way for highly customizable content creation.
This preprint introduces Mobile-Agent, an autonomous multi-modal mobile device agent leveraging Multimodal Large Language Models (MLLM). Mobile-Agent utilizes visual perception tools to accurately identify and locate visual and textual elements within app interfaces. It autonomously plans and executes complex tasks based on perceived context, eliminating the need for system-specific customizations. Unlike previous solutions, it relies solely on screenshots, enhancing adaptability across diverse mobile operating environments. The proposed Mobile-Eval benchmark assesses Mobile-Agent's performance, demonstrating remarkable accuracy and completion rates even with challenging instructions. The framework includes visual perception modules for text and icon localization and features self-planning and self-reflection mechanisms.
Tencent researchers introduce an object-driven one-shot fine-tuning method for text-to-image diffusion models, addressing challenges in generative tasks. Leveraging a prototypical embedding initialized by object appearance and class, the Tencent team enhances model generalizability. Their class-characterizing regularization preserves object class knowledge during fine-tuning, and an object-specific loss improves fidelity, allowing one-shot implantation of multiple objects. Outperforming existing methods, Tencent's approach excels in maintaining object identity, enhancing output quality, and extending to multi-object fine-tuning, showcasing its versatility in personalized content generation with just a single input image.
ChatGPT Creates Comics
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.