阿里巴巴智能化運維探索_第1頁
阿里巴巴智能化運維探索_第2頁
阿里巴巴智能化運維探索_第3頁
阿里巴巴智能化運維探索_第4頁
阿里巴巴智能化運維探索_第5頁
已閱讀5頁,還剩31頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

1、阿里巴巴智能化運維探索PART 01BackgroundPART 02What is SDDPPART 03Key AlgorithmPART 04Deployment and ResultCONTENTSFour Eras of Alibaba Database單機房 單個應用單機MySQL同城多機房 垂直A分 商業(yè) IOE異地雙活 單元化 AliSQL異地多活 云化MLPOLARDB2005 - 2010QPE代2003 - 2004淘寶初創(chuàng)2011 - 2015DO之I2016 -新機遇新B戰(zhàn)1,000,000 DBs 100 BUsx1000Background - Do More w

2、ith LessAlibaba Database Usage Alibaba GroupFinanceRetailer ManufacturerMedia and EntertainmentInternational ClientsAlibaba Database Usage Public CloudSingles Day (11/11)Alibaba Database OverviewTools and UtilitiesEngineProprietaryOpen Source/Third-partyOperati PlatforDTSADAMData Migration &DB Trans

3、missionMigrationEvaluationDMSDBSGUI for CentralizedDB Management and DevOps,BackupEfficient and SecureServiceHDMHybrid Cloud DBManagementDBAdvisor Intelligent Diagnostics and OptimizationOLTPOLAPNoSQLGraphDBHBase+X-Pack (Multi-model Analysis)POLARDBCloud Native DB: Decoupled Compute and Storage with

4、 Hardware AccelerationAnalyticDBPB-grade Data. High Concurrency.High PerformanceData Lake Analytics Serverless Interactive Query Servicewith Presto and Spark IntegrationRedisAliSQL-MySQL/PG/ MariaDB/MS SQL ServerTSDBTime Series, Spatial Temporal DBMongoDBonCloud Database Operation PlatformmEnd-to-en

5、d Tracking and Monitoring ServiceDatabase Expert ServiceChallenges All in OneManagement at ScaleScheduling ProtectionRuntime Management Optimization Backup/Restore Security Workloads DiversitySLA-driven Workload-aware Agility100 BUs20,000 Developers Super HotspotScalability Stability Cost Cloud- Nat

6、ive DatabaseDBPaaSLarge-scale DB Lifecyle Management InfraSDDPSelf-Driving Database PlatformCommercial DatabaseOracleOpen Source DatabaseAliSQL/MySQL-20102010-2017Human LaborDBAsPOLARDB(VLDB 2018, SIGMOD 2019)AnalyticDB (VLDB 2019)2017-nowJourney to SDDP at AlibabaWhat is SDDP?Self-detectionSelf-rep

7、airingSelf-tuningSelf-securingSelf-decision-makingSelf-Driving Database Platform Database Management PlatformDatabases DiversityPOLARDB, AnalyticDB, AliSQL, MySQL, NoSQL, etc.Self-driving CapabilitiesSelf-detection, Self-decision-makingSelf-repairing, Self-tuning, Self-securingMinimum/Zero Human Lab

8、orSDDP PhilosophySDDP=+Execution (Automation)Feedback+Data (Detection)EyesMachine Learning (Decision Making)BrainHands & FeetLearning from millions of DBs to empowerDBs with availability, security and performance at scale回SDDP LevelsLevelDescriptionDecision MakingKey CapabilitiesLevel 0No Automation

9、HumanLevel 1Human AssistanceHumanStatistics Collecting, Monitoring, Alerting, Scripts/ToolsLevel 2Single Point AutomationHuman + SDDPAnomaly Detection, SQL Advisor, Capacity Planning, Health Diagnostic Framework, etc.Level 3Partial Scenarios AutomationSDDP + HumanAutomatic Repairing, Automatic SQL T

10、uning, Automatic Configuration Tuning, Auto-scaling, Automatic Resource Scheduling, Automatic Data Access Protection, etc.Level 4Full AutomationSDDPFully End-to-End Automation for All Scenarios , No Human InterventionComponents of SDDPAlibaba Self-Driving Database Platform (SDDP) provides cloud data

11、bases with automatic operation and maintenanceKey features of SDDPKnobs tuningHot/cold separationAnomaly detectionNL2SQLClouDBenchL 3ColdSelect where A =1;Workload GeneratorSelect . where B =2;Select where A = 2;Select where A =Workload Replay2;Select where B = 3;Select where A =1;lnsert .;DB Desrec

12、ommeiglnnsert ;Update ;ndUapdtaitoe n;Auto lndex/shardingHuman Languag eSQL DBMSe.g OtterTuneSlow SQLThrottlingiBTune - MotivationThe memory uses at Alibaba product environmentBuffer pool is the largest memory consumerBufferTmallD-ngd-ng Memory is bottleneck among the resourcesHemaBufferBufferBuffer

13、BufferBufferSDDP Self-Dr-v-ng Database PlatformSDDP: Self-Driving Database PlatformDBSyst/m m/tricsSQ& coll/ctionDBAdvisorControl syst/mCold hot mod/l Ind/x mod/l M/mory mod/lMod/l Pr/dictionautotuning slow sql spac/ analysisSQ& & DB M/tricSQ& & DB m/tricsParam/t/r updat/DBDBR/sourc/ sch/dulingSDDPA

14、nomaly d/t/ctionupdat/SDDPMemory buffer tuning iBTune:more than 10,000 instances, memory saving of 20TBTMallBufferTaobaoBufferBufferDingdingBufferBufferBufferAn example: iBTune (individualized BufferTuning, VLDB 2019)DBA manually uses a small number of BP sizes (10 configurations in our case) Each i

15、nstances BP size might be different as the query workload is different Manual tuning is not scalable for large cloud databases since each instance has different BP size iBTune: Individualized Buffer Tuning for Largescale Cloud DatabasesReduce memory (buffer pool) while guaranteeing SLA (response tim

16、e) CDF of individual BP sizesbefore and after the iBTune appliesiBTune - MotivationiBTune - Preliminary AttemptBuffer pool (BP) size is sensitive to miss ratio: BP size is reduced from 188G to 80G when its hit ratio is from 99.968% to 99.950%Response Timehit ratioCPU usageChallenge: Heuristic method

17、 (such as shrinking 10% each time) does not work, since we have to try many times, which makes the system unstable and is unacceptable for mission-critical applications Intuition:Calculate BP based on hit ratio (miss ratio) to avoid restarting system multiple timesConfirm whether the BP size meets t

18、he requirement of SLAtolerate miss ratio(t_miss_ratio) 曰 (tmissratio) = New BP size 2(tmissratio) = Response timePractical functionPairwise DNN21G) during holidays and workdays:Red line is the time when BP size is adjustedGreen lines show the holiday which is 7-daysPredicted RT: only 3 points exceed

19、ed which is acceptableThe IO read metric is the real IO, since all our DB instances turn on direct IO10 representative instances. The memory saving ranges from 50% to 10%, which strongly supports that a single number does not fit all. Instance 1 has a large increase in RT after the adjustment. We fi

20、nd that there is one query that consumes 99.97% of the total response time. The lookup value in WHERE condition changes for this query.Multiple instancesConclusion & uture WorkSDDP has been widely used at Alibaba. Its key algorithm “iBTune” has been deployed on 10,000 database instances with memory

21、saving : 17%Future workCache preloadBackup node needs to run SQLs to load data into cache after BP adjustmentPerform switching after preloadBuffer increaseCurrently reply on DBA Automatic increase bufferMultiple parameters tuningDBMS configure fileSDDP Big Picture Architecture, WIPDecision MakingAut

22、omatic RepairingKnowledge BaseDomain KnowledgeResource ManagementData CleaningFeature EngineeringModel TrainingEvaluationDeploymentMachine Learning PipelineSigmaAlibaba CloudKubernetesOthersDatabase OperatingPOLARDBAliSQLMySQLMongoDBPostgreSQL.PerceptionAutomation StoringMessage QueueCollectingProcessingBlinkAction PlannerAction SchedulerAction ExecutorAutomatic TuningSecurity ProtectingResource SchedulingBackup/RestoreAutomatic ScalingE

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論