On February 22, the flagship English journal of the Chinese Institute of Electronics, Chinese Journal of Electronics, held its brand academic event "CJE Frontier Academic Salon" (hereinafter referred to as the "Salon") at Shenzhen University of Advanced Technology (SUAT). More than 150 experts, scholars, industry professionals from academia and industry, as well as SUAT undergraduates, participated in this event.
This salon was themed"Prospects of Multimodal Large Models from the Perspective of DeepSeek," chaired by Peng Yuxin, Professor at Peking University and editorial board member of CJE in the fields of multimedia, computer vision, graphics, and artificial intelligence. The event invited multiple experts in artificial intelligence to discussthe core innovations of DeepSeek and frontier progress in multimodal large models, aiming to aggregate views from academia and industry, promote frontier academic exchanges in artificial intelligence with an open attitude and open-source approach. The event was co-hosted by Peng Yuxin and Professor Tang Jijun from the SUAT Faculty of Computing Science and Control Engineering.
SUAT Party Secretary Zhu Dijian delivered a speech, introducing SUAT's distinctive features in cultivating innovative talents in computer science and technology.
Pan Yi, Deputy Editor-in-Chief of CJE and Dean of the SUAT Faculty of Computing Science and Control Engineering, introduced theFaculty's talent matrix and "Computing + X" characteristics.
In the invited report session, Peng Yuxin delivered a report titled "Fine-Grained Multimodal Large Models," introducing the two major characteristics of fine-grained and multimodal in the real world, and systematically presenting the core technologies of the DeepSeek series of large models. He pointed out that the DeepSeek models currently lack fine-grained perception capabilities, historical memory capabilities, and cross-modal generation capabilities. Addressing these limitations, Peng Yuxin highlighted his team's latest research progress in fine-grained multimodal large models, including the newly developed and open-sourced fine-grained multimodal large model Finedefics, fine-grained multimodal multi-round interactive retrieval for target product recommendation, and text-guided controllable visual content generation.
Professor Lu Jiwen from Tsinghua University emphasized in his report titled "Generalizable Full-Modal Large Models" that a key hallmark of theintelligent era is large model technology. Large models are widely applied in content generation, content analysis, content reasoning, and other fields, such as language large models, vision large models, and full-modal large models. These representative new foundational artificial intelligence technologies signify major transformations in social production methods, triggering changes in work modes and many fields. Inspired by the successful applications of language large models, developing vision large models can further expand the application boundaries of large model technology. Multimodal foundation large models are a hotspot for numerous research institutions and major companies. Full-modal vision models, by modeling the evolution laws of the external world, will play a greater role in scenarios such as autonomous driving and embodied intelligence in the future.
Professor Wang Yaowei from Harbin Institute of Technology (Shenzhen) and Director of the Vision Institute at Pengcheng Laboratory delivered a report titled"The Path of Model Innovation under Resource Constraints: Insights from DeepSeek and Application Practices of Pengcheng Series Foundation Models," pointing out thatopen-sourcing large models can rapidly spawn upstream and downstream ecosystems, greatly promoting industrial development and influencing society's overall expectations for the future.Currently, over 100 platforms worldwide have integrated DeepSeek, covering multiple fields such as cloud services, network security, finance, automotive, and university research, triggering a wave of deployment adaptations. DeepSeek-R1 innovatively adopts a reasoning technology route based on large-scale reinforcement learning, breaking through from an open-source perspective and revitalizing the domestic artificial intelligence ecosystem. Relying on the industry empowerment platform developed from the Pengcheng series foundation models, it has achieved for the first time fully automated model production and cross-platform deployment based on domestic platforms, solving engineering and technical problems in large-scale industrial applications of artificial intelligence. In the future, they will unswervingly follow the path of system innovation and continuous innovation under resource-constrained conditions.
Professor Cheng Mingming from Nankai University delivered a report titled"High-Performance Personalized Image Generation," starting from topics on several important current open-source trends, providing a detailed introduction to customized text-to-image generation, with the described algorithm process arousing great interest among the on-site researchers. Cheng Mingming used examples of bringing characters from old photos or even artworks into real life, believing that image generation methods may enter the lives of vast users through the cultural industry and sectors. The broad applicability is built on the two cores of image generation: algorithm high performance and personalized information representation. Cheng Mingming introduced that connecting videos based on consistent image generation is difficult in modeling and predicting large-scale actions. Artificial intelligence-generated content (AIGC) is expected to greatly liberate productivity and bring more topics worthy of further in-depth research to social governance.
Professor Qiu Xipeng from Fudan University emphasized in his report titled"Deep Reasoning Technology for Large Models" that the"limited computing power + algorithm innovation" development mode is keyto breaking through computing power limitations. In the future, focus should be on efficient modes, conducting research around new efficient model architectures, efficient reinforcement learning, and efficient computing power utilization to further enhance model reasoning capabilities, and promoting agents centered on general large models is the development direction for the universality of future general artificial intelligence. Qiu Xipeng introduced reasoning models with reinforcement learning as the core, pointing out that DeepSeek-R1 significantly optimizes model efficiency, reduces training and inference costs, and is currently the only product with both strong reasoning capabilities and online search, already initially capable of researching complex information and providing answers.
In the roundtable discussion session, participating experts conducted in-depth discussions on scientific research issues of concern to attendees, such as howhigh-cost multimodal large models can reduce costs while maintaining performance, theoretical innovation and engineering innovation in artificial intelligence, and research paradigms for large models.
In the future, SUAT will
build more high-level academic exchange platforms
gather wisdom from experts and scholars
assist in tackling technical challenges
and create more opportunities for undergraduates to access frontier learning