亚洲综合一区国产系列|国产无码精品一区二区|日韩久久久久久无码精品|国产在线911福利免费|国产超碰人人做人人爽AV|亚洲欧洲闷骚AⅤ妇女影院|91精品久久久久久久久无码|亚洲精品ty久久久久久久久久

    <td id="bknjv"><ins id="bknjv"><th id="bknjv"></th></ins></td>
    <i id="bknjv"><ins id="bknjv"></ins></i>
    <td id="bknjv"><ins id="bknjv"><label id="bknjv"></label></ins></td>
    <small id="bknjv"><dl id="bknjv"></dl></small>
    <td id="bknjv"><ins id="bknjv"><label id="bknjv"></label></ins></td>
    <p id="bknjv"><tr id="bknjv"></tr></p>

    position: EnglishChannel  > AI ripples> Chinese AI Model Emu3 Handles Text, Image, Video Seamlessly

    Chinese AI Model Emu3 Handles Text, Image, Video Seamlessly

    Source: Science and Technology Daily | 2024-12-17 15:44:35 | Author: Gong Qian

    On October 21, the Beijing Academy of Artificial Intelligence (BAAI), a Chinese non-profit organization engaged in AI R&D, released Emu3, a multimodal AI model that seamlessly integrates text, image, and video modalities into a single, unified framework.

    The BAAI research team said Emu3 is expected to be used in scenario applications such as robot brains, autonomous driving, multimodal dialogue and inference.

    Emu3, based solely on next-token prediction, proves that next-token prediction can be a powerful paradigm for multimodal models.

    The existing multimodal AI models are mostly designed for specific tasks. Each has its corresponding architecture and methods. For instance, in the field of video generation, many developers use the diffusion in time (DiT) architecture, as referenced by Sora. Other models such as Stable Diffusion are used for text-to-image synthesis, Sora for text-to-video conversion, and GPT-4V for image-to-text generation.

    In contrast to these models, which have a combination of isolated skills rather than an inherently unified ability, Emu3, eliminates the need for diffusion or compositional approaches. By tokenizing images, text, and videos into a discrete space, BAAI has developed a single transformer from scratch.

    Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA.

    In September, BAAI open-sourced the key technologies and models of Emu3 including the chat model and generation model after supervised fine-tuning.

    Emu3 has been receiving rave reviews from overseas developers. "For researchers, a new opportunity has emerged to explore multimodality through a unified architecture, eliminating the need to combine complex diffusion models with large language models. This approach is akin to the transformative impact of transformers in vision-related tasks," AI consultant Muhammad Umair said on social media platform Meta.

    While next-token prediction is considered a promising path towards artificial general intelligence, it struggled to excel in multimodal tasks, which were dominated by diffusion models such as Stable Diffusion and compositional approaches like CLIP combined with large language models.

    Raphael Mansuy, co-founder of QuantaLogic, an AI agent platform, thinks that Em3 has significant implications for Al development. Mansuy wrote on X that Em3's success suggests several key insights: Next-token prediction as a viable path to general multimodal Al; potential for simplified and more scalable model architectures; challenge to the dominance of diffusion and compositional approaches.

    Editor:GONG Qian

    Top News

    • On September 9, the Changtai Yangtze River Bridge — the world's longest cable-stayed bridge — officially opened to traffic. Spanning 10.03 kilometers across the river, the bridge integrates expressway, intercity railway and ordinary highway functions. It reportedly holds six world records, including being the longest span cable-stayed bridge.

    CIFTIS Puts Cooperation on Fast Track

    Held in Beijing's Shougang Park, a transformed industrial area, this year's CIFTIS, as always, is a grand international gathering, with more than 80 countries and international organizations exhibiting or holding events.

    Qingdao Emerges as Livable International City

    Qingdao, a major port city in Shandong province in east China by the Yellow Sea, is rapidly emerging as a hub of internationalization, innovation and quality life.

    抱歉,您使用的瀏覽器版本過低或開啟了瀏覽器兼容模式,這會影響您正常瀏覽本網(wǎng)頁

    您可以進(jìn)行以下操作:

    1.將瀏覽器切換回極速模式

    2.點(diǎn)擊下面圖標(biāo)升級或更換您的瀏覽器

    3.暫不升級,繼續(xù)瀏覽

    繼續(xù)瀏覽
    牙克石市| 江都市| 甘南县| 江阴市| 长葛市| 河北省| 锦州市| 临邑县| 石门县| 合阳县| 涟源市| 兰西县| 庄河市| 洞口县| 剑阁县| 钦州市| 彰化县| 鱼台县| 木里| 彰化县| 布拖县| 竹北市| 营山县| 岑溪市| 临沭县| 叙永县| 商都县| 无为县| 芜湖市| 阜康市| 密山市| 南郑县| 奈曼旗| 万源市| 崇义县| 邢台县| 芜湖县| 祁东县| 阳朔县| 宁波市| 宣威市|