摘 要:
近年来,随着深度学习模型规模的不断扩大,其强大的推理能力为诸多应用领域带来了突破性发展。然而,模型规模的增长也伴随着显著的推理成本提升,限制了其在资源受限场景中的应用。为此,本文创新性地提出了一种结合强化学习(RL)与Scaling Law的后训练范式,以优化深度学习模型的推理成本为核心目标。首先,本文深入分析了Scaling Law在描述模型性能与其规模、训练资源间关系的理论基础,以便准确捕捉不同规模模型在推理任务的表现及资源消耗。随后,基于强化学习设计了一种动态参数调整策略,旨在自动权衡模型推理性能与资源开销,从而获得最优性能—成本组合。通过实验验证发现,该方法相较于传统的模型优化策略在多种任务中显著降低了推理成本,同时保持了推理性能的竞争力。此外,该后训练范式的通用性也得到了验证,可适配不同类型的深度学习模型,并且无需过多微调,便可实现在多种推理任务场景中的成本优化。本文成果不仅为资源受限环境中的推理应用提供了新方向,还为结合Scaling Law与强化学习的理论研究开拓了新路径。
关键词:强化学习; Scaling Law; 推理成本优化; 深度学习模型
Abstract:
In recent years, the continuous expansion of deep learning model scales has brought about breakthrough developments in various application domains due to their powerful inference capabilities. However, the growth in model scale has also been accompanied by a significant increase in inference costs, limiting their application in resource-constrained scenarios. To address this, this paper innovatively proposes a post-training paradigm that combines Reinforcement Learning (RL) with Scaling Law, with the core objective of optimizing the inference cost of deep learning models. Firstly, this paper delves into the theoretical foundations of Scaling Law in describing the relationships between model performance, its scale, and training resources, to accurately capture the performance and resource consumption of models of different scales in inference tasks. Subsequently, a dynamic parameter adjustment strategy is designed based on reinforcement learning, aiming to automatically balance model inference performance and resource expenditure, thereby achieving an optimal performance-cost combination. Experimental validation reveals that this method significantly reduces inference costs across various tasks compared to traditional model optimization strategies while maintaining competitive inference performance. Additionally, the generality of this post-training paradigm has been verified, as it can be adapted to different types of deep learning models and achieve cost optimization in various inference task scenarios without excessive fine-tuning. The findings of this paper not only provide a new direction for inference applications in resource-constrained environments but also open up new pathways for theoretical research combining Scaling Law with reinforcement learning.
Keywords: Reinforcement learning; Scaling law; Inference cost optimization; Deep learning model
--