{"id":260771,"date":"2026-06-01T02:30:00","date_gmt":"2026-06-01T06:30:00","guid":{"rendered":"https:\/\/news-you-need.com\/index.php\/2026\/06\/01\/running-moe-on-mobile-phones-meta-proposes-mobilemoe-speeding-up-iphone-16-pro-by-3-8-times\/"},"modified":"2026-06-01T05:00:15","modified_gmt":"2026-06-01T09:00:15","slug":"running-moe-on-mobile-phones-meta-proposes-mobilemoe-speeding-up-iphone-16-pro-by-3-8-times","status":"publish","type":"post","link":"https:\/\/news-you-need.com\/index.php\/2026\/06\/01\/running-moe-on-mobile-phones-meta-proposes-mobilemoe-speeding-up-iphone-16-pro-by-3-8-times\/","title":{"rendered":"Running MoE on Mobile Phones: Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8 Times"},"content":{"rendered":"<p><a href=\"https:\/\/eu.36kr.com\/en\/p\/3831266999887490\">Running MoE on Mobile Phones: Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8 Times<\/a><\/p>\n<p><a href=\"https:\/\/eu.36kr.com\/en\/p\/3831266999887490\">https:\/\/eu.36kr.com\/en\/p\/3831266999887490<\/a><\/p>\n<p>Publish Date: <a href=\"publish_date]\">2026-06-01 02:30:00<\/a><\/p>\n<p>Source Domain: <a href=\"eu.36kr.com\">eu.36kr.com<\/a><\/p>\n<p>In recent years, <strong>Mixture of Experts (MoE) models<\/strong> have been widely used in large cloud-based models. However, on mobile phones, <strong>Large Language Models (LLMs)<\/strong> still mainly adopt dense architectures. In the past, mobile devices had more stringent constraints on memory, computing power, and latency, and there had been a lack of systematic research on edge-side MoE within the range of sub-billion active parameters. Now, with the increase in the DRAM capacity of mobile devices, MoE also has the opportunity to be deployed on smartphones.<\/p>\n<p>The MobileMoE proposed by the Meta team <strong>has achieved efficient MoE inference on commercial smartphones for the first time<\/strong>. The results show that in 14 basic tests, with similar memory usage, MobileMoE-S\/M only uses 1\/2 to 1\/4 of the inference computation of the dense baseline, and achieves comparable or even higher average accuracy. In actual tests, MobileMoE-S shows the most significant speedup on the GPU\/MLX backend of the iPhone 16 Pro, <strong>with a maximum speedup of 3.8 times in the input stage<\/strong>.<\/p>\n<p class=\"image-wrapper\">\n<p class=\"img-desc\">Paper link: https:\/\/arxiv.org\/abs\/2605.27358<\/p>\n<p>The research team also proposed a set of edge-side MoE scaling rules to determine the model structure more suitable for mobile phone deployment. MobileMoE <strong>has established a new Pareto frontier<\/strong> for edge-side large language models, achieving better results in the trade-off between accuracy and inference computation overhead.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,401\" src=\"https:\/\/img.36krcdn.com\/hsossms\/20260530\/v2_07bc7efe328b443f8591a86a3a5415dc@000000_oswg196775oswg1080oswg401_img_000?x-oss-process=image\/format,jpg\/interlace,1\"\/><\/p>\n<p class=\"img-desc\">Figure | MobileMoE has established a new Pareto frontier for edge-side large language models.<\/p>\n<h2><strong>How is MobileMoE designed?<\/strong><\/h2>\n<p>MobileMoE can be understood in this way: it is a type of <strong>MoE language model<\/strong> designed for edge-side deployment. The overall structure is still a <strong>decoder-only Transformer<\/strong>, but the original dense feed-forward layer is replaced with a MoE layer. The router selects a small number of experts with the highest scores for each token to participate in the calculation, and there is also a shared expert that always participates in the calculation. The entire training process&#8230;<\/p>\n<p><a href=\"https:\/\/eu.36kr.com\/en\/p\/3831266999887490\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Running MoE on Mobile Phones: Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8&#8230;<\/p>\n","protected":false},"author":1,"featured_media":260772,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img.36krcdn.com\/hsossms\/20260601\/v2_88597d5929e84af49eab594b89dd540c@1743780481@ai_oswg1052705oswg1053oswg495_img_png~tplv-1marlgjv7f-ai-v3:600:400:600:400:q70.jpg","fifu_image_alt":"","footnotes":""},"categories":[120],"tags":[124],"class_list":["post-260771","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-iphone","tag-iphone-16"],"_links":{"self":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/260771"}],"collection":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/comments?post=260771"}],"version-history":[{"count":1,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/260771\/revisions"}],"predecessor-version":[{"id":260773,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/260771\/revisions\/260773"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/media\/260772"}],"wp:attachment":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/media?parent=260771"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/categories?post=260771"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/tags?post=260771"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}