版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
31December2022DataMining:ConceptsandTechniques1DataMining:
ConceptsandTechniques
—Chapter6—JiaweiHanDepartmentofComputerScienceUniversityofIllinoisatUrbana-C/~hanj?2006JiaweiHanandMichelineKamber,Allrightsreserved31December2022DataMining:ConceptsandTechniques2Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques3Classification
predictscategoricalclasslabels(discreteornominal)classifiesdata(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdataPredictionmodelscontinuous-valuedfunctions,i.e.,predictsunknownormissingvaluesTypicalapplicationsCreditapprovalTargetmarketingMedicaldiagnosisFrauddetectionClassificationvs.Prediction31December2022DataMining:ConceptsandTechniques4Classification—ATwo-StepProcess
Modelconstruction:describingasetofpredeterminedclassesEachtuple/sampleisassumedtobelongtoapredefinedclass,asdeterminedbytheclasslabelattributeThesetoftuplesusedformodelconstructionistrainingsetThemodelisrepresentedasclassificationrules,decisiontrees,ormathematicalformulaeModelusage:forclassifyingfutureorunknownobjectsEstimateaccuracyofthemodelTheknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodelAccuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodelTestsetisindependentoftrainingset,otherwiseover-fittingwilloccurIftheaccuracyisacceptable,usethemodeltoclassifydatatupleswhoseclasslabelsarenotknown31December2022DataMining:ConceptsandTechniques5Process(1):ModelConstructionTrainingDataClassificationAlgorithmsIFrank=‘professor’ORyears>6THENtenured=‘yes’Classifier(Model)31December2022DataMining:ConceptsandTechniques6Process(2):UsingtheModelinPrediction
ClassifierTestingDataUnseenData(Jeff,Professor,4)Tenured?31December2022DataMining:ConceptsandTechniques7Supervisedvs.UnsupervisedLearningSupervisedlearning(classification)Supervision:Thetrainingdata(observations,measurements,etc.)areaccompaniedbylabelsindicatingtheclassoftheobservationsNewdataisclassifiedbasedonthetrainingsetUnsupervisedlearning
(clustering)TheclasslabelsoftrainingdataisunknownGivenasetofmeasurements,observations,etc.withtheaimofestablishingtheexistenceofclassesorclustersinthedata31December2022DataMining:ConceptsandTechniques8Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques9Issues:DataPreparationDatacleaningPreprocessdatainordertoreducenoiseandhandlemissingvaluesRelevanceanalysis(featureselection)RemovetheirrelevantorredundantattributesDatatransformationGeneralizeand/ornormalizedata31December2022DataMining:ConceptsandTechniques10Issues:EvaluatingClassificationMethodsAccuracyclassifieraccuracy:predictingclasslabelpredictoraccuracy:guessingvalueofpredictedattributesSpeedtimetoconstructthemodel(trainingtime)timetousethemodel(classification/predictiontime)Robustness:handlingnoiseandmissingvaluesScalability:efficiencyindisk-residentdatabasesInterpretabilityunderstandingandinsightprovidedbythemodelOthermeasures,e.g.,goodnessofrules,suchasdecisiontreesizeorcompactnessofclassificationrules31December2022DataMining:ConceptsandTechniques11Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques12DecisionTreeInduction:TrainingDatasetThisfollowsanexampleofQuinlan’sID3(PlayingTennis)31December2022DataMining:ConceptsandTechniques13Output:ADecisionTreefor“buys_computer〞age?overcaststudent?creditrating?<=30>40noyesyesyes31..40nofairexcellentyesno31December2022DataMining:ConceptsandTechniques14AlgorithmforDecisionTreeInductionBasicalgorithm(agreedyalgorithm)Treeisconstructedinatop-downrecursivedivide-and-conquermannerAtstart,allthetrainingexamplesareattherootAttributesarecategorical(ifcontinuous-valued,theyarediscretedinadvance)SamplesarepartitionedrecursivelybasedonselectedattributesTestattributesareselectedonthebasisofaheuristicorstatisticalmeasure(e.g.,informationgain)ConditionsforstoppingpartitioningAllsamplesforagivennodebelongtothesameclassTherearenoremainingattributesforfurtherpartitioning–majorityvotingisemployedforclassifyingtheleafTherearenosamplesleft31December2022DataMining:ConceptsandTechniques15AttributeSelectionMeasure:InformationGain(ID3/C4.5)SelecttheattributewiththehighestinformationgainLetpibetheprobabilitythatanarbitrarytupleinDbelongstoclassCi,estimatedby|Ci,D|/|D|Expectedinformation(entropy)neededtoclassifyatupleinD:Informationneeded(afterusingAtosplitDintovpartitions)toclassifyD:InformationgainedbybranchingonattributeA31December2022DataMining:ConceptsandTechniques16AttributeSelection:InformationGainClassP:buys_computer=“yes〞ClassN:buys_computer=“no〞means“age<=30〞has5outof14samples,with2yes’esand3no’s.HenceSimilarly,31December2022DataMining:ConceptsandTechniques17ComputingInformation-GainforContinuous-ValueAttributesLetattributeAbeacontinuous-valuedattributeMustdeterminethebestsplitpointforASortthevalueAinincreasingorderTypically,themidpointbetweeneachpairofadjacentvaluesisconsideredasapossiblesplitpoint(ai+ai+1)/2isthemidpointbetweenthevaluesofaiandai+1ThepointwiththeminimumexpectedinformationrequirementforAisselectedasthesplit-pointforASplit:D1isthesetoftuplesinDsatisfyingA≤split-point,andD2isthesetoftuplesinDsatisfyingA>split-point31December2022DataMining:ConceptsandTechniques18GainRatioforAttributeSelection(C4.5)InformationgainmeasureisbiasedtowardsattributeswithalargenumberofvaluesC4.5(asuccessorofID3)usesgainratiotoovercometheproblem(normalizationtoinformationgain)GainRatio(A)=Gain(A)/SplitInfo(A)Ex.Theattributewiththemaximumgainratioisselectedasthesplittingattribute31December2022DataMining:ConceptsandTechniques19Giniindex(CART,IBMIntelligentMiner)IfadatasetDcontainsexamplesfromnclasses,giniindex,gini(D)isdefinedas
wherepjistherelativefrequencyofclassjinDIfadatasetDissplitonAintotwosubsetsD1andD2,theginiindexgini(D)isdefinedasReductioninImpurity:Theattributeprovidesthesmallestginisplit(D)(orthelargestreductioninimpurity)ischosentosplitthenode(needtoenumerateallthepossiblesplittingpointsforeachattribute)31December2022DataMining:ConceptsandTechniques20Giniindex(CART,IBMIntelligentMiner)Ex.Dhas9tuplesinbuys_computer=“yes〞and5in“no〞SupposetheattributeincomepartitionsDinto10inD1:{low,medium}and4inD2butgini{medium,high}is0.30andthusthebestsinceitisthelowestAllattributesareassumedcontinuous-valuedMayneedothertools,e.g.,clustering,togetthepossiblesplitvaluesCanbemodifiedforcategoricalattributes31December2022DataMining:ConceptsandTechniques21ComparingAttributeSelectionMeasuresThethreemeasures,ingeneral,returngoodresultsbutInformationgain:biasedtowardsmultivaluedattributesGainratio:tendstopreferunbalancedsplitsinwhichonepartitionismuchsmallerthantheothersGiniindex:biasedtomultivaluedattributeshasdifficultywhen#ofclassesislargetendstofavorteststhatresultinequal-sizedpartitionsandpurityinbothpartitions31December2022DataMining:ConceptsandTechniques22OtherAttributeSelectionMeasuresCHAID:apopulardecisiontreealgorithm,measurebasedonχ2testforindependenceC-SEP:performsbetterthaninfo.gainandginiindexincertaincasesG-statistics:hasacloseapproximationtoχ2distributionMDL(MinimalDescriptionLength)principle(i.e.,thesimplestsolutionispreferred):Thebesttreeastheonethatrequiresthefewest#ofbitstoboth(1)encodethetree,and(2)encodetheexceptionstothetreeMultivariatesplits(partitionbasedonmultiplevariablecombinations)CART:findsmultivariatesplitsbasedonalinearcomb.ofattrs.Whichattributeselectionmeasureisthebest?Mostgivegoodresults,noneissignificantlysuperiorthanothers31December2022DataMining:ConceptsandTechniques23OverfittingandTreePruningOverfitting:AninducedtreemayoverfitthetrainingdataToomanybranches,somemayreflectanomaliesduetonoiseoroutliersPooraccuracyforunseensamplesTwoapproachestoavoidoverfittingPrepruning:Halttreeconstructionearly—donotsplitanodeifthiswouldresultinthegoodnessmeasurefallingbelowathresholdDifficulttochooseanappropriatethresholdPostpruning:Removebranchesfroma“fullygrown〞tree—getasequenceofprogressivelyprunedtreesUseasetofdatadifferentfromthetrainingdatatodecidewhichisthe“bestprunedtree〞31December2022DataMining:ConceptsandTechniques24EnhancementstoBasicDecisionTreeInductionAllowforcontinuous-valuedattributesDynamicallydefinenewdiscrete-valuedattributesthatpartitionthecontinuousattributevalueintoadiscretesetofintervalsHandlemissingattributevaluesAssignthemostcommonvalueoftheattributeAssignprobabilitytoeachofthepossiblevaluesAttributeconstructionCreatenewattributesbasedonexistingonesthataresparselyrepresentedThisreducesfragmentation,repetition,andreplication31December2022DataMining:ConceptsandTechniques25ClassificationinLargeDatabasesClassification—aclassicalproblemextensivelystudiedbystatisticiansandmachinelearningresearchersScalability:ClassifyingdatasetswithmillionsofexamplesandhundredsofattributeswithreasonablespeedWhydecisiontreeinductionindatamining?relativelyfasterlearningspeed(thanotherclassificationmethods)convertibletosimpleandeasytounderstandclassificationrulescanuseSQLqueriesforaccessingdatabasescomparableclassificationaccuracywithothermethods31December2022DataMining:ConceptsandTechniques26ScalableDecisionTreeInductionMethodsSLIQ(EDBT’96—Mehtaetal.)BuildsanindexforeachattributeandonlyclasslistandthecurrentattributelistresideinmemorySPRINT(VLDB’96—J.Shaferetal.)ConstructsanattributelistdatastructurePUBLIC(VLDB’98—Rastogi&Shim)Integratestreesplittingandtreepruning:stopgrowingthetreeearlierRainForest(VLDB’98—Gehrke,Ramakrishnan&Ganti)BuildsanAVC-list(attribute,value,classlabel)BOAT(PODS’99—Gehrke,Ganti,Ramakrishnan&Loh)Usesbootstrappingtocreateseveralsmallsamples31December2022DataMining:ConceptsandTechniques27ScalabilityFrameworkforRainForestSeparatesthescalabilityaspectsfromthecriteriathatdeterminethequalityofthetreeBuildsanAVC-list:AVC(Attribute,Value,Class_label)AVC-set(ofanattributeX)ProjectionoftrainingdatasetontotheattributeXandclasslabelwherecountsofindividualclasslabelareaggregatedAVC-group(ofanoden)SetofAVC-setsofallpredictorattributesatthenoden
31December2022DataMining:ConceptsandTechniques28Rainforest:TrainingSetandItsAVCSets
studentBuy_Computeryesnoyes61no34AgeBuy_Computeryesno<=303231..4040>4032CreditratingBuy_Computeryesnofair62excellent33AVC-setonincomeAVC-setonAgeAVC-setonStudentTrainingExamplesincomeBuy_Computeryesnohigh22medium42low31AVC-setoncredit_rating31December2022DataMining:ConceptsandTechniques29DataCube-BasedDecision-TreeInductionIntegrationofgeneralizationwithdecision-treeinduction(Kamberetal.’97)ClassificationatprimitiveconceptlevelsE.g.,precisetemperature,humidity,outlook,etc.Low-levelconcepts,scatteredclasses,bushyclassification-treesSemanticinterpretationproblemsCube-basedmulti-levelclassificationRelevanceanalysisatmulti-levelsInformation-gainanalysiswithdimension+level31December2022DataMining:ConceptsandTechniques30BOAT(BootstrappedOptimisticAlgorithmforTreeConstruction)Useastatisticaltechniquecalledbootstrappingtocreateseveralsmallersamples(subsets),eachfitsinmemoryEachsubsetisusedtocreateatree,resultinginseveraltreesThesetreesareexaminedandusedtoconstructanewtreeT’ItturnsoutthatT’isveryclosetothetreethatwouldbegeneratedusingthewholedatasettogetherAdv:requiresonlytwoscansofDB,anincrementalalg.31December2022DataMining:ConceptsandTechniques31PresentationofClassificationResults31December2022DataMining:ConceptsandTechniques3231December2022DataMining:ConceptsandTechniques33InteractiveVisualMiningbyPerception-BasedClassification(PBC)31December2022DataMining:ConceptsandTechniques34Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques35BayesianClassification:Why?Astatisticalclassifier:performsprobabilisticprediction,i.e.,predictsclassmembershipprobabilitiesFoundation:BasedonBayes’Theorem.Performance:AsimpleBayesianclassifier,na?veBayesianclassifier,hascomparableperformancewithdecisiontreeandselectedneuralnetworkclassifiersIncremental:Eachtrainingexamplecanincrementallyincrease/decreasetheprobabilitythatahypothesisiscorrect—priorknowledgecanbecombinedwithobserveddataStandard:EvenwhenBayesianmethodsarecomputationallyintractable,theycanprovideastandardofoptimaldecisionmakingagainstwhichothermethodscanbemeasured31December2022DataMining:ConceptsandTechniques36BayesianTheorem:BasicsLetXbeadatasample(“evidence〞):classlabelisunknownLetHbeahypothesisthatXbelongstoclassCClassificationistodetermineP(H|X),theprobabilitythatthehypothesisholdsgiventheobserveddatasampleXP(H)(priorprobability),theinitialprobabilityE.g.,Xwillbuycomputer,regardlessofage,income,…P(X):probabilitythatsampledataisobservedP(X|H)(posterioriprobability),theprobabilityofobservingthesampleX,giventhatthehypothesisholdsE.g.,GiventhatXwillbuycomputer,theprob.thatXis31..40,mediumincome31December2022DataMining:ConceptsandTechniques37BayesianTheoremGiventrainingdata
X,posterioriprobabilityofahypothesisH,P(H|X),followstheBayestheorem
PredictsXbelongstoCiifftheprobabilityP(Ci|X)isthehighestamongalltheP(Ck|X)forallthekclassesPracticaldifficulty:requireinitialknowledgeofmanyprobabilities,significantcomputationalcost31December2022DataMining:ConceptsandTechniques38TowardsNa?veBayesianClassifierLetDbeatrainingsetoftuplesandtheirassociatedclasslabels,andeachtupleisrepresentedbyann-DattributevectorX=(x1,x2,…,xn)SupposetherearemclassesC1,C2,…,Cm.Classificationistoderivethemaximumposteriori,i.e.,themaximalP(Ci|X)ThiscanbederivedfromBayes’theoremSinceP(X)isconstantforallclasses,onlyneedstobemaximized31December2022DataMining:ConceptsandTechniques39DerivationofNa?veBayesClassifierAsimplifiedassumption:attributesareconditionallyindependent(i.e.,nodependencerelationbetweenattributes):Thisgreatlyreducesthecomputationcost:OnlycountstheclassdistributionIfAkiscategorical,P(xk|Ci)isthe#oftuplesinCihavingvaluexkforAkdividedby|Ci,D|(#oftuplesofCiinD)IfAkiscontinous-valued,P(xk|Ci)isusuallycomputedbasedonGaussiandistributionwithameanμandstandarddeviationσandP(xk|Ci)is31December2022DataMining:ConceptsandTechniques40Na?veBayesianClassifier:TrainingDatasetClass:C1:buys_computer=‘yes’C2:buys_computer=‘no’DatasampleX=(age<=30,Income=medium,Student=yesCredit_rating=Fair)31December2022DataMining:ConceptsandTechniques41Na?veBayesianClassifier:AnExampleP(Ci):ComputeP(X|Ci)foreachclassX=(age<=30,income=medium,student=yes,credit_rating=fair)P(X|Ci):P(X|Ci)*P(Ci):
Therefore,Xbelongstoclass(“buys_computer=yes〞) 31December2022DataMining:ConceptsandTechniques42Avoidingthe0-ProbabilityProblemNa?veBayesianpredictionrequireseachconditionalprob.benon-zero.Otherwise,thepredictedprob.willbezero
Ex.Supposeadatasetwith1000tuples,income=low(0),income=medium(990),andincome=high(10),UseLaplaciancorrection(orLaplacianestimator)Adding1toeachcaseProb(income=low)=1/1003Prob(income=medium)=991/1003Prob(income=high)=11/1003The“corrected〞prob.estimatesareclosetotheir“uncorrected〞counterparts31December2022DataMining:ConceptsandTechniques43Na?veBayesianClassifier:CommentsAdvantagesEasytoimplementGoodresultsobtainedinmostofthecasesDisadvantagesAssumption:classconditionalindependence,thereforelossofaccuracyPractically,dependenciesexistamongvariablesE.g.,hospitals:patients:Profile:age,familyhistory,etc.Symptoms:fever,coughetc.,Disease:lungcancer,smoking,etc.DependenciesamongthesecannotbemodeledbyNa?veBayesianClassifierHowtodealwiththesedependencies?BayesianBeliefNetworks31December2022DataMining:ConceptsandTechniques44BayesianBeliefNetworksBayesianbeliefnetworkallowsasubsetofthevariablesconditionallyindependentAgraphicalmodelofcausalrelationshipsRepresentsdependencyamongthevariablesGivesaspecificationofjointprobabilitydistributionNodes:randomvariablesLinks:dependencyXandYaretheparentsofZ,andYistheparentofPNodependencybetweenZandPHasnoloopsorcycles31December2022DataMining:ConceptsandTechniques45BayesianBeliefNetwork:AnExampleFamilyHistoryLungCancerPositiveX-RaySmokerCoughHardtoBreathLC~LC(FH,S)(FH,~S)(~FH,S)(~FH,~S)0.10.9BayesianBeliefNetworksTheconditionalprobabilitytable(CPT)forvariableLungCancer:CPTshowstheconditionalprobabilityforeachpossiblecombinationofitsparentsDerivationoftheprobabilityofaparticularcombinationofvaluesofX,fromCPT:31December2022DataMining:ConceptsandTechniques46TrainingBayesianNetworksSeveralscenarios:Givenboththenetworkstructureandallvariablesobservable:learnonlytheCPTsNetworkstructureknown,somehiddenvariables:gradientdescent(greedyhill-climbing)method,similartoneuralnetworklearningNetworkstructureunknown,allvariablesobservable:searchthroughthemodelspacetoreconstructnetworktopologyUnknownstructure,allhiddenvariables:NogoodalgorithmsknownforthispurposeRef.D.Heckerman:Bayesiannetworksfordatamining31December2022DataMining:ConceptsandTechniques47Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques48UsingIF-THENRulesforClassificationRepresenttheknowledgeintheformofIF-THENrulesR:IFage=youthANDstudent=yesTHENbuys_computer=yesRulepreconditionvs.ruleconsequentAssessmentofarule:coverageandaccuracyncovers=#oftuplescoveredbyRncorrect=#oftuplescorrectlyclassifiedbyRcoverage(R)=ncovers/|D|/*D:trainingdataset*/accuracy(R)=ncorrect/ncoversIfmorethanoneruleistriggered,needconflictresolutionSizeordering:assignthehighestprioritytothetriggeringrulesthathasthe“toughest〞requirement(i.e.,withthemostattributetest)Class-basedordering:decreasingorderofprevalenceormisclassificationcostperclassRule-basedordering(decisionlist):rulesareorganizedintoonelongprioritylist,accordingtosomemeasureofrulequalityorbyexperts31December2022DataMining:ConceptsandTechniques49age?student?creditrating?<=30>40noyesyesyes31..40nofairexcellentyesnoExample:Ruleextractionfromourbuys_computerdecision-treeIFage=youngANDstudent=noTHENbuys_computer=noIFage=youngANDstudent=yesTHENbuys_computer=yesIFage=mid-age THENbuys_computer=yesIFage=oldANDcredit_rating=excellentTHENbuys_computer=yesIFage=youngANDcredit_rating=fairTHENbuys_computer=noRuleExtractionfromaDecisionTreeRulesareeasiertounderstandthanlargetreesOneruleiscreatedforeachpathfromtheroottoaleafEachattribute-valuepairalongapathformsaconjunction:theleafholdstheclasspredictionRulesaremutuallyexclusiveandexhaustive31December2022DataMining:ConceptsandTechniques50RuleExtractionfromtheTrainingDataSequentialcoveringalgorithm:ExtractsrulesdirectlyfromtrainingdataTypicalsequentialcoveringalgorithms:FOIL,AQ,CN2,RIPPERRulesarelearnedsequentially,eachforagivenclassCiwillcovermanytuplesofCibutnone(orfew)ofthetuplesofotherclassesSteps:RulesarelearnedoneatatimeEachtimearuleislearned,thetuplescoveredbytherulesareremovedTheprocessrepeatsontheremainingtuplesunlessterminationcondition,e.g.,whennomoretrainingexamplesorwhenthequalityofarulereturnedisbelowauser-specifiedthresholdComp.w.decision-treeinduction:learningasetofrulessimultaneously31December2022DataMining:ConceptsandTechniques51HowtoLearn-One-Rule?Starwiththemostgeneralrulepossible:condition=emptyAddingnewattributesbyadoptingagreedydepth-firststrategyPickstheonethatmostimprovestherulequalityRule-Qualitymeasures:considerbothcoverageandaccuracyFoil-gain(inFOIL&RIPPER):assessesinfo_gainbyextendingconditionItfavorsrulesthathavehighaccuracyandcovermanypositivetuplesRulepruningbasedonanindependentsetoftesttuplesPos/negare#ofpositive/negativetuplescoveredbyR.IfFOIL_PruneishigherfortheprunedversionofR,pruneR31December2022DataMining:ConceptsandTechniques52Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques53Classification:predictscategoricalclasslabelsE.g.,Personalhomepageclassificationxi=(x1,x2,x3,…),yi=+1or–1x1:#ofaword“homepage〞x2:#ofaword“welcome〞MathematicallyxX,yY={+1,–1}Wewantafunctionf:XYClassification:AMathematicalMapping31December2022DataMining:ConceptsandTechniques54LinearClassificationBinaryClassificationproblemThedataabovetheredlinebelongstoclass‘x’Thedatabelowredlinebelongstoclass‘o’Examples:SVM,Perceptron,ProbabilisticClassifiersxxxxxxxxxxooooooooooooo31December2022DataMining:ConceptsandTechniques55DiscriminativeClassifiersAdvantagespredictionaccuracyisgenerallyhighAscomparedtoBayesianmethods–ingeneralrobust,workswhentrainingexamplescontainerrorsfastevaluationofthelearnedtargetfunctionBayesiannetworksarenormallyslowCriticismlongtrainingtimedifficulttounderstandthelearnedfunction(weights)BayesiannetworkscanbeusedeasilyforpatterndiscoverynoteasytoincorporatedomainknowledgeEasyintheformofpriorsonthedataordistributions31December2022DataMining:ConceptsandTechniques56Perceptron&Winnow
Vector:x,wScalar:x,y,wInput: {(x
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 《大學(xué)物理(上冊)》課件-第1章
- 2025-2030全球車輛燃油油位計行業(yè)調(diào)研及趨勢分析報告
- 2025-2030全球電積銅行業(yè)調(diào)研及趨勢分析報告
- 2025年全球及中國直接空氣捕獲和儲存(DACS)行業(yè)頭部企業(yè)市場占有率及排名調(diào)研報告
- 2025-2030全球多層土壤傳感器行業(yè)調(diào)研及趨勢分析報告
- 2025年全球及中國阻燃塑料薄膜和片材行業(yè)頭部企業(yè)市場占有率及排名調(diào)研報告
- 2025-2030全球醫(yī)用手指康復(fù)訓(xùn)練儀行業(yè)調(diào)研及趨勢分析報告
- 2025-2030全球化學(xué)谷物熏蒸劑行業(yè)調(diào)研及趨勢分析報告
- 2025年全球及中國智慧教育公共服務(wù)平臺行業(yè)頭部企業(yè)市場占有率及排名調(diào)研報告
- 2025年全球及中國工業(yè)膠囊填充設(shè)備行業(yè)頭部企業(yè)市場占有率及排名調(diào)研報告
- 2025年度院感管理工作計劃(后附表格版)
- 勵志課件-如何做好本職工作
- 化肥銷售工作計劃
- 2024浙江華數(shù)廣電網(wǎng)絡(luò)股份限公司招聘精英18人易考易錯模擬試題(共500題)試卷后附參考答案
- 2024年山東省濟南市中考英語試題卷(含答案解析)
- 2024年社區(qū)警務(wù)規(guī)范考試題庫
- 2025中考英語作文預(yù)測:19個熱點話題及范文
- 第10講 牛頓運動定律的綜合應(yīng)用(一)(講義)(解析版)-2025年高考物理一輪復(fù)習(xí)講練測(新教材新高考)
- 靜脈治療護理技術(shù)操作標準(2023版)解讀 2
- 2024年全國各地中考試題分類匯編(一):現(xiàn)代文閱讀含答案
- GB/T 30306-2024家用和類似用途飲用水處理濾芯
評論
0/150
提交評論