摘要：
近年来，随着深度学习模型规模的不断扩大，其强大的推理能力为诸多应用领域带来了突破性发展。然而，模型规模的增长也伴随着显著的推理成本提升，限制了其在资源受限场景中的应用。为此，本文创新性地提出了一种结合强化学习(RL)与Scaling Law的后训练范式，以优化深度学习模型的推理成本为核心目标。首先，本文深入分析了Scaling Law在描述模型性能与其规模、训练资源间关系的理论基础，以便准确捕捉不同规模模型在推理任务的表现及资源消耗。随后，基于强化学习设计了一种动态参数调整策略，旨在自动权衡模型推理性能与资源开销，从而获得最优性能—成本组合。通过实验验证发现，该方法相较于传统的模型优化策略在多种任务中显著降低了推理成本，同时保持了推理性能的竞争力。此外，该后训练范式的通用性也得到了验证，可适配不同类型的深度学习模型，并且无需过多微调，便可实现在多种推理任务场景中的成本优化。本文成果不仅为资源受限环境中的推理应用提供了新方向，还为结合Scaling Law与强化学习的理论研究开拓了新路径。

关键词：强化学习; Scaling Law; 推理成本优化; 深度学习模型

Abstract:
In recent years, the continuous expansion of deep learning model scales has brought about breakthrough developments in various application domains due to their powerful inference capabilities. However, the growth in model scale has also been accompanied by a significant increase in inference costs, limiting their application in resource-constrained scenarios. To address this, this paper innovatively proposes a post-training paradigm that combines Reinforcement Learning (RL) with Scaling Law, with the core objective of optimizing the inference cost of deep learning models. Firstly, this paper delves into the theoretical foundations of Scaling Law in describing the relationships between model performance, its scale, and training resources, to accurately capture the performance and resource consumption of models of different scales in inference tasks. Subsequently, a dynamic parameter adjustment strategy is designed based on reinforcement learning, aiming to automatically balance model inference performance and resource expenditure, thereby achieving an optimal performance-cost combination. Experimental validation reveals that this method significantly reduces inference costs across various tasks compared to traditional model optimization strategies while maintaining competitive inference performance. Additionally, the generality of this post-training paradigm has been verified, as it can be adapted to different types of deep learning models and achieve cost optimization in various inference task scenarios without excessive fine-tuning. The findings of this paper not only provide a new direction for inference applications in resource-constrained environments but also open up new pathways for theoretical research combining Scaling Law with reinforcement learning.

Keywords: Reinforcement learning; Scaling law; Inference cost optimization; Deep learning model

正文内容 / Content：

可下载并阅读全文PDF，请按照本文版权许可使用。

Download the full text PDF for viewing and using it according to the license of this paper.

参考文献 / References：

陈曦.以证据推理促化学深度学习[J].试题与研究：教学论坛,2020,0(08):0048-0048.
孟泠宇,郭秉礼,杨雯,等.基于深度强化学习的网络路由优化方法[J].系统工程与电子技术,2022,44(07):2311-2318.
饶瑞,潘志松,黎维,等.基于深度强化学习的高频交易优化算法[J].南京理工大学学报,2022,46(03):304-312.
于铁忠,罗婧,王利琴,等.融合TuckER嵌入和强化学习的知识推理[J].计算机系统应用,2022,31(09):127-135.
张昊迪,陈振浩,陈俊扬,等.显式知识推理和深度强化学习结合的动态决策[J].软件学报,2023,34(08):3821-3835.
范鑫磊,李栋,张尉,等.基于深度强化学习的导弹规避决策训练研究[J].电光与控制,2021,28(01):81-85.
宋浩楠,赵刚,王兴芬.融合知识表示和深度强化学习的知识推理方法[J].计算机工程与应用,2021,57(19):189-197.
张新艳,郭鹏,余建波.应用深度强化学习的压边力优化控制[J].哈尔滨工业大学学报,2020,52(07):20-28.
宋浩楠,赵刚,孙若莹.基于深度强化学习的知识推理研究进展综述[J].计算机工程与应用,2022,58(01):12-25.
解帅.深度学习和深度强化学习综述[J].信息技术与信息化,2020,(05):225-227.
孟伟,袁丽雅,韩炳涛,等.深度学习推理侧模型优化架构探索[J].信息通信技术与政策,2020,(09):42-47.
庄玮玮,陈财,邱国新.基于深度强化学习的动态投资组合优化新模型[J].中国科学技术大学学报,2022,52(11):12-25.