數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)

上傳人：2*** IP屬地：湖北上傳時(shí)間：2022-01-16 格式：PPTX 頁數(shù)：27 大小：440.40KB 積分：28 舉報(bào) 版權(quán)申訴

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)_第2頁

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)_第3頁

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)_第4頁

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)_第5頁

已閱讀5頁，還剩22頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、Data MiningPractical Machine Learning Tools and TechniquesSlides for Chapter 8 of Data Mining by I. H. Witten, E. Frank andM. A. HallData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Ensemble learningCombining multiple modelsThe basic ideaBaggingBias-variance decomposition, bag

2、ging with costsRandomizationRotation forestsBoostingAdaBoost, the power of boostingAdditive regressionNumeric prediction, additive logistic regressionInterpretable ensemblesOption trees, alternating decision trees, logistic model treesStackingData Mining: Practical Machine Learning Tools and Techniq

3、ues (Chapter 8)Combining multiple modelsBasic idea:build different “experts”, let them voteAdvantage:often improves predictive performanceDisadvantage:usually produces output that is very hard to analyzebut: there are approaches that aim to produce a single comprehensible structureData Mining: Pract

4、ical Machine Learning Tools and Techniques (Chapter 8)BaggingCombining predictions by voting/averagingSimplest wayEach model receives equal weight“Idealized” version:Sample several training sets of size n(instead of just having one training set of size n)Build a classifier for each training setCombi

5、ne the classifiers predictionsLearning scheme is unstable almost always improves performanceSmall change in training data can make big change in model (e.g. decision trees)Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bias-variance decompositionUsed to analyze how much sele

6、ction of any specific training set affects performanceAssume infinitely many classifiers,built from different training sets of size nFor any learning scheme,Bias=expected error of the combinedclassifier on new dataVariance=expected error due to theparticular training set usedTotal expected error bia

7、s + varianceData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on baggingBagging works because it reduces variance by voting/averagingNote: in some pathological hypothetical situations the overall error might increaseUsually, the more classifiers the betterProblem: we only

8、have one dataset!Solution: generate new ones of size n by sampling from it with replacementCan help a lot if data is noisyCan also be applied to numeric predictionAside: bias-variance decomposition originally only known for numeric predictionData Mining: Practical Machine Learning Tools and Techniqu

9、es (Chapter 8)Bagging classifiersLet n be the number of instances in the training dataFor each of t iterations:Sample n instances from training set(with replacement)Apply learning algorithm to the sampleStore resulting modelFor each of the t models:Predict class of instance using modelReturn class t

10、hat is predicted most oftenModel generationClassificationData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bagging with costsBagging unpruned decision trees known to produce good probability estimatesWhere, instead of voting, the individual classifiers probability estimates are

11、 averagedNote: this can also improve the success rateCan use this with minimum-expected cost approach for learning problems with costsProblem: not interpretableMetaCost re-labels training data using bagging with costs and then builds single treeData Mining: Practical Machine Learning Tools and Techn

12、iques (Chapter 8)RandomizationCan randomize learning algorithm instead of inputSome algorithms already have a random component: eg. initial weights in neural netMost algorithms can be randomized, eg. greedy algorithms:Pick from the N best options at random instead of always picking the best optionsE

13、g.: attribute selection in decision treesMore generally applicable than bagging: e.g. random subsets in nearest-neighbor schemeCan be combined with baggingData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Rotation forestsBagging creates ensembles of accurate classifiers with re

14、latively low diversityBootstrap sampling creates training sets with a distribution that resembles the original dataRandomness in the learning algorithm increases diversity but sacrifices accuracy of individual ensemble membersRotation forests have the goal of creating accurate and diverse ensemble m

15、embersData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Rotation forestsCombine random attribute sets, bagging and principal components to generate an ensemble of decision treesAn iteration involvesRandomly dividing the input attributes into k disjoint subsetsApplying PCA to ea

16、ch of the k subsets in turnLearning a decision tree from the k sets of PCA directionsFurther increases in diversity can be achieved by creating a bootstrap sample in each iteration before applying PCAData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)BoostingAlso uses voting/ave

17、ragingWeights models according to performanceIterative: new models are influenced by performance of previously built onesEncourage new model to become an “expert” for instances misclassified by earlier modelsIntuitive justification: models should be experts that complement each otherSeveral variants

18、Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)AdaBoost.M1Assign equal weight to each training instanceFor t iterations: Apply learning algorithm to weighted dataset,store resulting model Compute models error e on weighted dataset If e = 0 or e 0.5: Terminate model generatio

19、n For each instance in dataset: If classified correctly by model: Multiply instances weight by e/(1-e) Normalize weight of all instancesModel generationClassificationAssign weight = 0 to all classesFor each of the t (or less) models:For the class this model predictsadd log e/(1-e) to this classs wei

20、ghtReturn class with highest weightData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on boosting IBoosting needs weights butCan adapt learning algorithm . orCan apply boosting without weightsresample with probability determined by weightsdisadvantage: not all instances are

21、 usedadvantage: if error 0.5, can resample againStems from computational learning theoryTheoretical result:training error decreases exponentiallyAlso:works if base classifiers are not too complex, andtheir error doesnt become too large too quicklyData Mining: Practical Machine Learning Tools and Tec

22、hniques (Chapter 8)More on boosting IIContinue boosting after training error = 0?Puzzling fact:generalization error continues to decrease!Seems to contradict Occams RazorExplanation:consider margin (confidence), not errorDifference between estimated probability for true class and nearest other class

23、 (between 1 and 1)Boosting works with weak learnersonly condition: error doesnt exceed 0.5In practice, boosting sometimes overfits (in contrast to bagging)Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Additive regression ITurns out that boosting is a greedy algorithm for fi

24、tting additive modelsMore specifically, implements forward stagewise additive modelingSame kind of algorithm for numeric prediction:1.Build standard regression model (eg. tree)2.Gather residuals, learn model predicting residuals (eg. tree), and repeatTo predict, simply sum up individual predictions

25、from all modelsData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Additive regression IIMinimizes squared error of ensemble if base learner minimizes squared errorDoesnt make sense to use it with standard multiple linear regression, why?Can use it with simple linear regression t

26、o build multiple linear regression modelUse cross-validation to decide when to stopAnother trick: shrink predictions of the base models by multiplying with pos. constant 0.5, otherwise predict 2nd classData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Option treesEnsembles are

27、not interpretableCan we generate a single model?One possibility: “cloning” the ensemble by using lots of artificial data that is labeled by ensembleAnother possibility: generating a single structure that represents ensemble in compact fashionOption tree: decision tree with option nodesIdea: follow a

28、ll possible branches at option nodePredictions from different branches are merged using voting or by averaging probability estimatesData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)ExampleCan be learned by modifying tree learner:Create option node if there are several equally

29、promising splits (within user-specified interval)When pruning, error at option node is average error of optionsData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Alternating decision treesCan also grow option tree by incrementally adding nodes to itStructure called alternating d

30、ecision tree, with splitter nodes and prediction nodesPrediction nodes are leaves if no splitter nodes have been added to them yetStandard alternating tree applies to 2-class problemsTo obtain prediction, filter instance down all applicable branches and sum predictionsPredict one class or the other

31、depending on whether the sum is positive or negativeData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)ExampleData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Growing alternating treesTree is grown using a boosting algorithmEg. LogitBoost described earlier

32、Assume that base learner produces single conjunctive rule in each boosting iteration (note: rule for regression)Each rule could simply be added into the tree, including the numeric prediction obtained from the ruleProblem: tree would grow very large very quicklySolution: base learner should only con

33、sider candidate rules that extend existing branchesExtension adds splitter node and two prediction nodes (assuming binary splits)Standard algorithm chooses best extension among all possible extensions applicable to treeMore efficient heuristics can be employed insteadData Mining: Practical Machine L

34、earning Tools and Techniques (Chapter 8)Logistic model treesOption trees may still be difficult to interpretCan also use boosting to build decision trees with linear models at the leaves (ie. trees without options)Algorithm for building logistic model trees:Run LogitBoost with simple linear regression as base learner (

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)

文檔簡介

溫馨提示

最新文檔

評論

數(shù)據(jù)挖掘：實(shí)用機(jī)器學(xué)習(xí)工具與技術(shù)_08深度學(xué)習(xí)

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔