《Python數(shù)據(jù)分析與應(yīng)用》實驗四 使用scikit_第1頁
《Python數(shù)據(jù)分析與應(yīng)用》實驗四 使用scikit_第2頁
《Python數(shù)據(jù)分析與應(yīng)用》實驗四 使用scikit_第3頁
《Python數(shù)據(jù)分析與應(yīng)用》實驗四 使用scikit_第4頁
《Python數(shù)據(jù)分析與應(yīng)用》實驗四 使用scikit_第5頁
已閱讀5頁,還剩2頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、實驗四使用scikit-learn構(gòu)建模型教材P196實訓(xùn)1-實訓(xùn)41、實訓(xùn)1#讀數(shù)據(jù)import pandas as pdwine=pd.read_csv(D:桌面 實驗四 datawine.csv)winequality=pd.read_csv(D:桌面 實驗四 datawinequality.csv,sep=;)#數(shù)據(jù)和標(biāo)簽拆分開wine_data=wine.iloc:,1:wine_target=wineClassprint(wine 數(shù)據(jù)集的數(shù)據(jù)為:n,wine_data)print(wine 數(shù)據(jù)集的標(biāo)簽為:n,wine_target)winequality_data=winequ

2、ality.iloc:,:-1winequality_target=winequalityqualityprint(winequality 數(shù)據(jù)集的數(shù)據(jù)為:n,winequality_data)print(winequality 數(shù)據(jù)集的標(biāo)簽為:n,winequality_target)wine藪據(jù)集的數(shù)據(jù)為=Alcohol Malic_acid心, HueOD280/OD315_of_d iluted_wi n esProline014-231-712-431出3-921065113.201.782.141.053.401050213.162.362.671.033,171185314.37

3、1.952.50 S863,4514S6413-242-592-S72*9373517313.715 652.450641.7474017413.403.912.48-71,567517513.274.282260.591.5683517613-172.592.37S6&84617714.134.102.740,6116056617Brows x 13 columnswine數(shù)據(jù)星的標(biāo)簽為:_011121314117331743175317631773Name;Classj Length: 17Bjdtype:int64fixed acidity volatile acidity citric

4、 acidPHsulphatesalcohol074.3.51孔417*S32&.6B孔827.8磯了跪磯前 3269-65孔8311.2如墓3磯5E 316n5S9.S47.40.00 351e.se9.4,/. jU a 15940.600.a 3450n5B10.515955-90.559磯1/3520.7611-2159&0,510.13342n75IK15975.90.645.12 3570.7110.215980.3100.47 3390.6611.01599rows k 11 columns 1winequa lity教據(jù)集的標(biāo)簽為:0515253645IS 9451595615

5、9&G1575159&Name :qualityj Length: 1599jdtype: int64# 劃分訓(xùn)練集和測試集from sklearn.model_selection import train_test_split wine_data_train, wine_data_test, wine_target_train, wine_target_test = train_test_split(wine_data, wine_target, test_size=0.1, random_state=6) winequality_data_train, winequality_data_t

6、est, winequality_target_train, winequality_target_test = train_test_split(winequality_data, winequality_target, test_size=0.1, random_state=6)#標(biāo)準化數(shù)據(jù)集from sklearn.preprocessing import StandardScaler stdScale = StandardScaler().fit(wine_data_train) wine_trainScaler = stdScale.transform(wine_data_train

7、) wine_testScaler = stdScale.transform(wine_data_test) stdScale = StandardScaler().fit(winequality_data_train)winequality_trainScaler = stdScale.transform(winequality_data_train)winequality_testScaler = stdScale.transform(winequality_data_test)#PCA降維from sklearn.decomposition import PCApca = PCA(n_c

8、omponents=5).fit(wine_trainScaler)wine_trainPca = pca.transform(wine_trainScaler)wine_testPca = pca.transform(wine_testScaler)pca = PCA(n_components=5).fit(winequality_trainScaler)winequality_trainPca = pca.transform(winequality_trainScaler)winequality_testPca = pca.transform(winequality_testScaler)

9、2、實訓(xùn)2#根據(jù)實訓(xùn)1的wine數(shù)據(jù)集處理的結(jié)果,構(gòu)建聚類數(shù)目為3的K-Means模型from sklearn.cluster import Kmeanskmeans = KMeans(n_clusters = 3,random_state=1).fit(wine_trainScaler) print(構(gòu)建的 KMeans 模型為:n,kmeans)#對比真實標(biāo)簽和聚類標(biāo)簽求取FMIfrom sklearn.metrics import fowlkes_mallows_score score=fowlkes_mallows_score(wine_target_train,kmeans.labe

10、ls_) print(wine 數(shù)據(jù)集的 FMI:%f%(score)5=wineS據(jù)集的FMIiS, 924119#在聚類數(shù)目為210類時,確定最優(yōu)聚類數(shù)目for i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=123).fit(wine_trainScaler) score = fowlkes_mallows_score(wine_target_train,kmeans.labels_) print(iris 數(shù)據(jù)聚d 類 FMI 評價分值為:%f %(i,score)訃詁數(shù)握聚衛(wèi)類FMI評價分值灼:如砧00跆 計詁

11、數(shù)據(jù)聚3類FMI評價分值為:&. 936567 外氐數(shù)據(jù)鞘類FMT評價分值為:&.S4636S 計詁數(shù)據(jù)聚5類斤II評倩分值為;嘰74療43 E感據(jù)聚白類評價分值為:0 .669224 2詁數(shù)據(jù)集7類FMI評價始值為:9.671255 :L壇數(shù)據(jù)聚日類FMI評價分值為:尋.6432漏數(shù)據(jù)聚9類FMI評價分值為;嘰,59卷花 計也數(shù)據(jù)聚10類Fill評價分值為:0.544814#求取模型的輪廓系數(shù),繪制輪廓系數(shù)折線圖,確定最優(yōu)聚類數(shù)目from sklearn.metrics import silhouette_scoreimport matplotlib.pyplot as pltsilhoue

12、ttteScore =for i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=1).fit(wine)score = silhouette_score(wine,kmeans.labels_)silhouettteScore.append(score)plt.figure(figsize=(10,6)plt.plot(range(2,11),silhouettteScore,linewidth=1.5, linestyle=-)plt.show()#求取Calinski-Harabasz指數(shù),確定最優(yōu)聚類數(shù)from sk

13、learn.metrics import calinski_harabaz_scorefor i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=1).fit(wine)score = calinski_harabaz_score(wine,kmeans.labels_)print(seeds 數(shù)據(jù)聚d 類 calinski_harabaz 指數(shù)為:%f%(i,score)cannot import name calin5ki_harabaz_5core from sklearnmetrics (E;anacondalib5

14、ite-package55klearnXmetric5_init_.py)出現(xiàn)錯誤,代碼沒有問題。換了個電腦出現(xiàn)結(jié)果;目曰曰曲數(shù)據(jù)聚衛(wèi)類calinski_harabsz指敖為:55.425&S3 seeds數(shù)據(jù)聚3類calin3ki_harabaztB數(shù)為;561-805-171 seedsI據(jù)聚4類calinski_harab3z指數(shù)為:702-648-113 能曲m數(shù)據(jù)槃5類calinski_harabaz指數(shù)為:7S7-0L1163 seeds數(shù)據(jù)聚白類calinski_har abaz指數(shù)為 853-737946 擊曰曲數(shù)據(jù)聚7類Galinski_haraLaz指數(shù)為:11S7.42.

15、1S17 seeds 數(shù)據(jù)聚8類ca 1 inski_harbazta為;1297.354659 seedsI 據(jù)聚日類 calinski_harab3z 指數(shù)為:1349-991148 能曲m數(shù)據(jù)槃IS類calinski_har3baz指數(shù)為:1B17.2&47803、實訓(xùn)3#讀取wine數(shù)據(jù)集,區(qū)分標(biāo)簽和數(shù)據(jù)import pandas as pdwine = pd.read_csv (D:桌面 實驗四 datawine.csv)wine_data=wine.iloc:,1:wine_target=wineClass#將wine數(shù)據(jù)集劃分為訓(xùn)練集和測試集from sklearn.model_

16、selection import train_test_splitwine_data_train, wine_data_test, wine_target_train, wine_target_test = train_test_split(wine_data, wine_target, test_size=0.1, random_state=6)#使用離差標(biāo)準化方法標(biāo)準化wine數(shù)據(jù)集from sklearn.preprocessing import MinMaxScaler stdScale = MinMaxScaler().fit(wine_data_train) wine_trainS

17、caler = stdScale.transform(wine_data_train) wine_testScaler = stdScale.transform(wine_data_test) #構(gòu)建SVM模型,并預(yù)測測試集結(jié)果。from sklearn.svm import SVCsvm = SVC().fit(wine_trainScaler,wine_target_train) print(建立的 SVM 模型為:n,svm)wine_target_pred = svm.predict(wine_testScaler)r四項測前10 個結(jié)果為:_n,wine_target_pred:10

18、)#打印出分類報告,評價分類模型性能from sklearn.metrics import classification_report print(使用SVM預(yù)測iris數(shù)據(jù)的分類報告為:,n, classification_report(wine_target_test,wine_target_pred)使用5VM預(yù)馳圮七數(shù)據(jù)的分類報告為;fl-scaresupportprecisionrecall11.001.001.00921.001.001.00S31.001.991.001accuracy1.09L¯o avg1.091.901.00ISweighted avg1.091.9

19、01.00IS4、實訓(xùn)4#根據(jù)wine_quality數(shù)據(jù)集處理的結(jié)果,構(gòu)建線性回歸模型。from sklearn.linear_model import LinearRegressionclf = LinearRegression().fit(winequality_trainPca,winequality_target_train)y_pred = clf.predict(winequality_testPca)print(線性回歸模型預(yù)測前10個結(jié)果為:,n,y_pred:10)線性-回歸模型預(yù)11前的個結(jié)果為二66.41&6791 6.23&73173 5.22673901 6.010

20、22972 5.4L15Z4B75.19534622 5-57988078 5-LS2S1258 5.42316S32#根據(jù)wine_quality數(shù)據(jù)集處理的結(jié)果,構(gòu)建梯度提升回歸模型from sklearn.ensemble import GradientBoostingRegressorGBR_wine = GradientBoostingRegressor().fit(winequality_trainPca,winequality_target_train)wine_target_pred = GBR_wine.predict(winequality_testPca)print(梯度

21、提升回歸模型預(yù)測前10個結(jié)果為:,n,wine_target_pred:10)print(真實標(biāo)簽前十個預(yù)測結(jié)果為:,n,list(winequality_target_test:10)梯度握升回歸模型預(yù)刑前明個結(jié)果為;&.6SS29089 6-3457347 &.025B99S4 5.56539492 5-7BB79856 5.5966SB4S5.21665472 5.71703838 5. 2&027603 5.38079&76真實標(biāo)簽前+個預(yù)刪結(jié)果為:666566翕655#結(jié)合真實評分和預(yù)測評分,計算均方誤差、中值絕對誤差、可解釋方差值 #根據(jù)得分,判定模型的性能優(yōu)劣from sklea

22、rn.metrics import mean_absolute_errorfrom sklearn.metrics import mean_squared_errorfrom sklearn.metrics import median_absolute_errorfrom sklearn.metrics import explained_variance_scorefrom sklearn.metrics import r2_scoreprint(線性回歸模型評價結(jié)果:)print(winequality數(shù)據(jù)線性回歸模型的平均絕對誤差為:,mean_absolute_error(winequa

23、lity_target_test,y_pred)print(winequality數(shù)據(jù)線性回歸模型的均方誤差為:,mean_squared_error(winequality_target_test,y_pred)print(winequality數(shù)據(jù)線性回歸模型的中值絕對誤差為:,median_absolute_error(winequality_target_test,y_pred)print(winequality數(shù)據(jù)線性回歸模型的可解釋方差值為:,explained_variance_score(winequality_target_test,y_pred) print(winequa

24、lity數(shù)據(jù)線性回歸模型的R方值為:,r2_score(winequality_target_test,y_pred)print(梯度提升回歸模型評價結(jié)果:)from sklearn.metrics import explained_variance_score,mean_absolute_error,mean_squared_error,median_absolute_error,r2_score print(winequality數(shù)據(jù)梯度提升回歸樹模型的平均絕對誤差為:,mean_absolute_error(winequality_target_test,wine_target_pred)print(winequality數(shù)據(jù)梯度提升回歸樹模型的均方誤差為:,mean_squared_error(winequality_target_test,wine_target_pred)print(winequality數(shù)據(jù)梯度提升回歸樹模型的中值絕對誤差為:,median_absolute_error(winequality_target_test,wine_target_pred) print(winequality數(shù)據(jù)梯度提升回歸樹模型的可解釋方差值為:,explaine

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論