一种热点敏感的自适应跳跃表

文韬; 吉锋; 刘丽霞

doi:10.13718/j.cnki.xdzk.2020.12.001

摘要: 研究了利用访问局部性原理提升跳跃表查询效率的问题，同时研究了以上加速策略的自适应机制.首先改进了跳跃表查询操作，令其额外返回在特定层次上的访问路径子集；其次利用蓄水池采样对跳跃表查询操作进行采样，然后根据采样结果对跳跃表特定层次的工作负载进行预测，根据工作负载选择热点区域，在热点区域设置加速点以提升查询效率；最后根据查询效率的提升程度和系统负载变化情况，利用SARSA算法和奖励塑形机制自动调节加速点层级和规模，以达到查询加速和管理成本之间的平衡.实验证明：在访问倾斜的跳跃表应用场景下，该方法相比原生跳跃表查询算法有更低的延迟，在访问模式逐步变化的应用场景下，该方法能够随环境变化灵活调整加速策略.

关键词:

Abstract: In this study, the locality principle of program access is used to improve the skiplist query efficiency, and the adaptive mechanism of the above acceleration strategy is investigated. Firstly, the skiplist query operation is improved, so that it can additionally return a subset of the access paths at a specific level. Next, the skiplist query operation is sampled by using reservoir sampling. Then, the workload of the specific level of the skiplist is predicted based on the sampling results, hot-spot areas are selected according to workload, and acceleration points are setin the hot-spot areas to improve query efficiency. Finally, according to the improvement of query efficiency and the changes insystem load, SARSA algorithm and reward shaping mechanism are usedto automatically adjust the acceleration point level and scale so as to achieve atrade-off between query acceleration and management costs. Experiments show that in the application scenario of accessing skewed skiplist, the proposed method has lower latency than the original skiplist query algorithm; and in the application scenario where the access mode changes gradually, the proposed method can flexibly adjust the acceleration strategy as the environment changes.

Key words:

图 1 原生跳跃表Lookup操作示意图

下载: 全尺寸图片幻灯片

图 2 热点敏感跳跃表查询加速示意图

下载: 全尺寸图片幻灯片

图 3 热点敏感的自适应跳跃表工作流程图

下载: 全尺寸图片幻灯片

图 4 跳跃表区域和加速段示意图

下载: 全尺寸图片幻灯片

图 5 区域region的捷径值shortcut示意图

下载: 全尺寸图片幻灯片

图 6 局部性弱的场景测试结果

下载: 全尺寸图片幻灯片

图 7 局部性强的场景测试结果

下载: 全尺寸图片幻灯片

图 8 工作负载中速漂移场景测试结果

下载: 全尺寸图片幻灯片

图 9 工作负载高速漂移场景测试结果

下载: 全尺寸图片幻灯片

表 1 状态空间定义表

状态属性	含义	离散化映射示例
S1	令JS[m，T，T-1]为T与T-1时刻第m层工作负载采样分布之间的Jensen-Shannon距离(以下简称JS距离，用S_JS表示)^[9]; S_JS[m，T-1，T-2]为T-1与T-2时刻第m层工作负载采样分布之间的S_JS距离; S1=(S_JS[m，T，T-1]+S_JS[m，T-1，T-2])/2	L1：[0，1/4) L2：[1/4，1/2) L3：[1/2，3/4) L4：[3/4，1]
S2	令S_JS[m-1，T，T-1]为T与T-1时刻第m-1层工作负载采样分布之间的JS距离; S_JS[m-1，T-1，T-2]为T-1与T-2时刻第m-1层工作负载采样分布之间的JS距离; S2=(S_JS[m-1，T，T-1]+S_JS[m-1，T-1，T-2])/2	L1：[0，1/4) L2：[1/4，1/2) L3：[1/2，3/4) L4：[3/4，1]
S3	令NS是T时段工作负载查询落在跳跃表加速层m上的锚点集合; p(x)是锚点x在NS中出现的频率 $S3 = \frac{{ - \sum\limits_{x \in NS} {p(x) \times {\rm{lo}}{{\rm{g}}_2}(p(x))} }}{{{\rm{lo}}{{\rm{g}}_2}\|{\rm{unique}}(NS)\|}}$	L1：[0，1/2) L2：[1/2，3/4) L3：[3/4，+∞)
S4	T时段跳跃表加速层m的节点规模	L1：[0，5K) L2：[5K，50K) L3：[50K，+∞)
S5	令C_cpuinfo为T时刻CPU Load Average指标，令C_Cores为服务器CPU核心总数; S5=C_cpuinfo/C_Cores	L1：[0，1/2) L2：[1/2，3/4) L3：[3/4，+∞)
S6	当前状态属性组合[S1，S2，S3，S4，S5]维持不变的时段长度	L1：[0，2) L2：[2，5) L3：[5，+∞)

下载: 导出CSV

表 2 动作空间定义表

动作	层数控制策略	topk控制策略
Action1	加速点所在层数下移1层	精简Tidy
Action2	加速点所在层数下移1层	普通Common
Action3	加速点所在层数保持不变	精简Tidy
Action4	加速点所在层数保持不变	普通Common
Action5	加速点所在层数上移1层	精简Tidy
Action6	加速点所在层数上移1层	普通Common

下载: 导出CSV

[1]	PUGH W. Skip Lists: A Probabilistic Alternative to Balanced Trees[J]. Communications of the ACM, 1990, 33(6): 668-676. doi: 10.1145/78973.78977
[2]	MUNRO J, PAPADAKIS T, SEDGEWICK R. Deterministic Skip Lists[C] //In Proceedings of the 3rd Annual ACM-SIAM Symposium on Discrete Algorithms, Florida, 1992: 367-375.
[3]	ZHANG J T, WU S, TAN Z Y, et al. S3: A Scalable in-Memory skip-list Index for key-Value Store[C]. Proceedings of the VLDB Endouiment, 2019, 12(12): 2183-2194.
[4]	KIM C, CHHUGANI J, SATISH N, et al. FAST: Fast Architecture Sensitive tree Search on Modern Cpus and Gpus[C] //SIGMOD, 2010: 339-350.
[5]	潘恬, 黄涛, 张雪贝.基于局部性原理跳表的内容路由器缓存快速查找机制[J].计算机学报, 2018, 41(9): 2029-2043. doi: http://www.cnki.com.cn/Article/CJFDTotal-JSJX201809006.htm
[6]	VITTER J S. Random Sampling with a Reservoir[J]. ACM Transactions on Mathematical Software, 1985, 11(1): 37-57. doi: 10.1145/3147.3165
[7]	NG A Y, HARADA D, RUSSELL S J. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping[C] //In: Proc. Of the 16th Int'l Conf. On Machine Learning. New York: Morgan Kaufmann Publishers, 1999: 278-287.
[8]	SUTTON R S, BARTO A G.强化学习[M]. 2版.俞凯, 译.北京: 电子工业出版社, 2019: 127-129.
[9]	doi: http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=VIRT04000005000011000081000001&idtype=cvips&gifs=Yes MAJTEY A P, LAMBERTIP W, PRATO D P. Jensen-Shannon Divergence as a Measure of Distinguishability Between Mixed Quantum States[J]. Physical Review, 2005, 72(5): 762-776.
[10]	REN C X, WANG C B, YIN C C, et al. The Prediction of Short-Term Traffic Flow Based on the Niche Genetic Algorithm and BP Neural Network[M]. Berlin: Springer, 2012: 775-781.
[11]	HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory. Neural Computation[J], 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
[12]	doi: http://dl.acm.org/citation.cfm?doid=1961189.1961199 CHANG C, LIN C. LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27-65.
[13]	HOLT C. Forceasting Seasonals and Trends by Exponentially Weighted Moving Averages[J]. International Journal of Forecasting, 2004, 20(1): 5-10. doi: 10.1016/j.ijforecast.2003.09.015
[14]	HAN J W, KAMBER M.数据挖掘: 概念与技术[M]. 3版.范明, 孟小峰, 译.北京: 机械工业出版社, 2012.

姓名
	姓名不能为空！
邮箱
	邮箱不能为空！非法的邮箱地址。
手机号码
	电话不能为空！请输入有效手机号!
标题
	标题不能为空！
留言内容
	内容不能为空！
验证码
	验证码不能为空！验证码错误！

留言板