- AI Breakfast
- Posts
- Microsoft's Big AI Product Announcements
Microsoft's Big AI Product Announcements
Good morning. It’s Friday, September 22nd.
Did you know: You can browse early discussions about GPT-3 (precursor to ChatGPT) on the platform formerly known as Twitter before it was released? The AI world sure was quiet then.
In today’s email:
Microsoft AI
AI Hardware and Chips
Generative AI
AI Ethics and Concerns
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.
Today’s edition is brought to you by:
Webinar Alert!
DeepBrain AI and Lenovo present a solution for customizing and running LLMs on the edge.
Providing a private & secure AI Human customer service experience.
Join us September 25 at 8 AM Pacific.
Today’s trending AI news stories
Microsoft AI
Windows 11’s next update arrives on September 26th with Copilot, AI-powered Paint, and more. Other updates include a modernized File Explorer, Ink Anywhere for stylus users, an improved Paint app, backup enhancements, and new features for the Snipping Tool and Photos app. While some expected features are missing, a larger 23H2 update is anticipated later. The update aims to enhance the Windows 11 experience, offering a range of quality-of-life improvements and innovations.
Starting September 26, Microsoft will roll out Copilot as a free update to Windows 11, extending its features to Bing, Edge, and Microsoft 365 Copilot this fall. The upcoming Windows 11 update boasts over 150 new features, integrating Copilot and AI functionalities into apps like Paint, Photos, and Clipchamp. Bing will soon support OpenAI's DALL.E 3 model, offering personalized search results, an enhanced shopping experience, and updates to Bing Chat Enterprise. Microsoft 365 Copilot will be available for enterprise customers from November 1, 2023, accompanied by a new AI assistant, Microsoft 365 Chat. New Surface devices, which support these AI features, are now open for pre-orders.
Here’s a new preview of Window’s 11 CoPilot released yesterday:
Microsoft unveiled a range of new Surface hardware products and AI initiatives at an event in New York City. Microsoft CEO Satya Nadella highlighted the Copilot AI features in the Office suite and announced that Copilot will be built into Windows. The company also introduced Microsoft 365 Chat, a chatbot that provides summaries and highlights priorities. Additionally, Microsoft showcased the Surface Laptop Go 3, a budget clamshell laptop, and the Surface Laptop Studio 2, its most powerful system yet with 13th-Gen Intel CPUs and Nvidia GPUs.
AI Hardware and Chips
SambaNova’s New Chip Means GPTs for Everyone SambaNova, an AI startup, unveils its SN40L processor, claiming it can support large language models with up to 5 trillion parameters using just eight chips. This processor, built on a 5-nanometer process, features 102 billion transistors and a novel three-tier memory system to handle extensive AI workloads efficiently. SambaNova’s technology stack focuses on running the largest AI models, offering hardware and software solutions to unlock data insights without the need for extensive chip or AI talent acquisition, making it accessible to a broader range of companies.
AMD takes AI inferencing to space with Versal chip. The Versal AI Edge XQRVE2302 is a radiation-tolerant, space-grade chip built as a complete system-on-chip (SoC), making it suitable for space flight. It’s the first adaptive SoC designed specifically for space applications and offers significant size reduction and power savings compared to existing chips. The XQRVE2302 integrated AMD’s enhanced AI Engine technology, optimized for machine learning (ML) applications, making it ideal for anomaly and image detection in space applications. It’s field programmable for flexibility and offers robust security features. Flight-qualified parts are expected by late 2024.
Generative AI
Midjourney's upcoming version 6 promises a big leap in quality including better text understanding and image generation, aiming to rival OpenAI’s DALL-E-3 in image quality. CEO David Holz suggests that the leap from version 5 to 6 will be substantial. The company is also working on a web version with image-generation capabilities and social features. Feature plans include 3D and video generation. Holz is optimistic about the potential for generative AI to enhance video game graphics. Specific release dates for these features are yet to be confirmed.
YouTube rolls out four new AI tools for creators to simplify content creation, including AI Insights which suggests video ideas based on audience preferences, Dream Screen for creating unique backgrounds in YouTube Shorts, assistive search for creator music, and Aloud for dubbing videos into various languages. YouTube’s interest in AI aligns with the growing trend of tech companies introducing AI products, and the platform’s 2023 Culture & Trends Report revealed a significant interest in generative AI tools among viewers and creators.
Amazon’s generative AI leader, discussed the integration of the new large language model (LLM) into Alexa, transforming the device into a “super agent.” Alexa now connects with numerous devices and services through APIs, offering real-time knowledge and utility. Prasad emphasized that Alexa is distinct from chatbots like ChatGPT, focusing on real-world interactions and useful functions, highlighting the importance of privacy and transparency in data collection. He underscored that Alexa is an AI and should always be recognized as such, despite its increasingly human-like capabilities.
AI Ethics and Concerns
ChatGPT doesn't turn amateurs into great coders, Flappy Bird experiment shows A study by the DiverSE research group assessed the capabilities of OpenAI’s ChatGPT in assisting non-programmers in creating complex games like Flappy Bird using Python. The results revealed that while ChatGPT had potential for game development, it did not offer a one-size-fits-all solution. Inconsistencies in code quality and usability required direct programming knowledge to rectify issues. The study recommends exploring different programming knowledge to rectify issues. The study recommends exploring different programming languages, refining prompts for better control, and streamlining ChatGPT integration into the development environment to improve the user experience.
AI-focused tech firms locked in ‘race to the bottom’, warns MIT professor, preventing them from pausing AI development to consider potential risks. Max Tegmark, co-founder of the Future of Life Institute, organized a letter in March calling for a six-month pause in developing a powerful AI system, but it didn’t succeed due to intense competition among tech firms. The letter warned of an “out-of-control race” and urged governments to intervene if a moratorium on systems more powerful than GPT-4 couldn’t be agreed upon. Despite the letter’s lack of success, it’s been credited with raising awareness about AI risks.
5 new AI-powered tools from around the web
NotionApps revolutionizes app creation from Notion databases without coding. Tailor your apps with login, menus, lists, forms, and over 20 components, enhancing data sharing. The tool’s versatility caters to various use cases, from catalogs to employee portals. Experience a seamless Notion app-building journey with NotionApps.
Môveo AI introduces personalized AI virtual agents powered by cutting-edge NLP and Generative AI. Utilize your data, including chatlogs, knowledge base, and documents, to automate a substantial portion of customer support tasks. Their platform, already trusted by enterprises like Kaizen Gaming and Avis, enhances customer experiences and offers a 30-day trial for exploration.
GiveFlag is an AI platform that accelerates business processes by automating tasks in operations and transactions. This innovative solution aims to streamline M&A and VC desks, reducing the exorbitant costs associated with lengthy processes. It offers real-time data analysis, AI-powered document conversations, and a dedicated team of AI personas. GiveFlag’s mission is to revolutionize business efficiency, making it accessible to companies of all sizes, with a strong emphasis on preventing overbilling in the legal and financial services industry.
AskCodi, the ultimate coding companion, empowers developers with interactive workbooks, real-time chat support, IDE extensions, and support for over 50 languages. This platform is designed to enhance coding efficiency, reduce redundancy, and improve code quality, catering to both beginners and experts in the development community.
(Coming soon) DALL-E 3, a breakthrough in text-to-image generation from OpenAI, offers precise and nuanced visuals aligned with your descriptions. It’s seamlessly integrated with ChatGPT, enhancing your creative process. Safety measures and ethical considerations are prioritized, with an upcoming release for ChatGPT Plus and Enterprise users. Creative control and transparency are central to DALL-E 3’s design, and it seems to boast the ability to include text in its results.
arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.
📄 A PARADIGM SHIFT IN MACHINE TRANSLATION: Boosting Translation Performance of Large Language Models
This study introduces a novel training approach for Generative Large Language Models (LLMs) in the translation task, specifically designed for moderate model sizes (7B or 13B parameters). The proposed method, named Advanced Language Model-based trAnslator (ALMA), consists of two fine-tuning stages: initial fine-tuning on monolingual data and subsequent fine-tuning on high-quality parallel data. ALMA achieves an average improvement of over 12 BLEU and COMET scores across 10 translation directions, outperforming prior work and even larger LLMs like NLLB-54B and GPT-3.5 text-davinci-003. This approach offers a promising training paradigm for machine translation with smaller LLMs.
The Languini Kitchen is a research collective and codebase designed to empower language modeling researchers with limited computational resources. It introduces an experimental protocol for model comparisons based on accelerator hours, avoiding constraints on critical hyperparameters. The project offers high-quality datasets, and it compares models based on empirical scaling trends, providing baseline models and showcasing a novel LSTM with superior scaling. This work fosters a fair and meaningful platform for language model comparison and aims to democratize language modeling research. It encourages inclusivity, but it also underscores the need for responsible development and development of advanced language models.
LongLoRA is introduced as an efficient fine-tuning method for extending the context sizes of large pre-trained language models (LLMs) without the computation expense of full fine-tuning. LongLoRA achieves this by introducing “shift short attention” (S2-Attn) during training, which approximates standard self-attention and reduces computational costs. Additionally, it makes embedding and normalization layers trainable, unlocking the potential for longer context learning while maintaining the original attention architecture during inference. Experimental results demonstrate that LongLoRA efficiently extends context lengths, allowing for models with up to 100k context while being computationally more affordable than traditional methods. A dataset called LongQA is also presented for supervised fine-tuning.
In this paper, the authors introduce RMT, a novel vision backbone that combines Retentive Networks (RetNet) and Vision Transformers (ViT). RetNet’s retention mechanism, which uses explicit decay related to spatial distance, is adapted for 2D vision tasks in RMT. This Retentive Self-Attention (ReSA) is decomposed along two image axes to reduce computational complexity. Extensive experiments demonstrate RMT’s outstanding performance, achieving a Top1-acc of 84.1% on ImageNet-1k with only 4.5G FLOPs, surpassing other models of similar size and training strategy. RMT also excels in object detection and other downstream tasks, showcasing its potential in computer vision.
LLM-Grounder is a novel approach for 3D visual grounding that utilizes Large Language Models (LLMs) as agents to improve the handling of complex text queries. In contrast to existing models, which often exhibit “bag-of-words” behavior, LLM-Grounder decomposes complex queries into sub-tasks, interacts with visual grounding tools, and leverages spatial and commonsense knowledge. This approach enables robust-zero shot, open-vocabulary 3D visual grounding. Empirical evaluations demonstrate its effectiveness, especially for complex queries. However, there are limitations related to computational cost and latency. Nonetheless, LLM-Grounder sets a new benchmark and paves the way for future research in LLM integration with robotics.
Thank you for reading today’s edition.
Your feedback is valuable.
Respond to this email and tell us how you think we could add more value to this newsletter.