中文自然語言處理-商品評論情感判別

上傳人：小*** IP屬地：天津上傳時間：2022-08-13 格式：DOC 頁數：3 大小：39KB 積分：20 舉報 版權申訴

全文預覽已結束

 下載本文檔

版權說明：本文檔由用戶提供并上傳，收益歸屬內容提供方，若內容存在侵權，請進行舉報或認領

文檔簡介

1、中文自然語言處理一商品評論情感判別1、數據集下載fromsklearn.model_selectionimporttrain_test_splitfromgensim.models.word2vecimportWord2Vecimportnumpyasnpimportpandasaspdimportjiebafromsklearn.externalsimportjoblibfromsklearn.svmimportSVC2、載入數據，做預處理(分詞)，切分訓練集與測試集#載入數據，做預處理(分詞)，切分訓練集與測試集defload_file_and_preprocessing():neg=pd

2、.read_excel(chinese_data/neg.xls,header=None,index=None)pos=pd.read_excel(chinese_data/pos.xls,header=None,index=None)cw=lambdax:list(jieba.cut(x)poswords=pos0.apply(cw)negwords=neg0.apply(cw)#use1forpositivesentiment,0fornegativey=np.concatenate(np.ones(len(pos),np.zeros(len(neg)#訓練集：測試集=8:2x_train

3、,x_test,y_train,y_test=train_test_split(np.concatenate(poswords,negwords),y,test_size=0.2)#NumPy提供了多種文件操作函數方便存取數組內容(npy格式以二進制存儲數據的)np.save(pre_data/y_train.npy,y_train)np.save(pre_data/y_test.npy,y_test)returnx_train,x_test3、計算訓練集和測試集每條評論數據的向量并存入文件#對每個句子的所有詞向量取均值，來生成一個句子的/ectordefbuild_sentence_vect

4、or(text,size,w2v_model):vec=np.zeros(size).reshape(1,size)count=0forwordintext:try:vec+=w2v_modelword.reshape(1,size)count+=1exceptKeyError:continueifcount!=0:vec/=countreturnvec#計算詞向量defget_train_vecs(x_train,x_test):n_dim=300#詞向量維度#試用Word2Vec建立詞向量模型w2v_model=Word2Vec(size=n_dim,window=5,sg=0,hs=0,

5、negative=5,min_count=10)w2v_model.build_vocab(x_train)#準備模型詞匯表#在評論訓練集上建模w2v_model.train(x_train,total_examples=w2v_model.corpus_count,epochs=w2v_model.iter)#川練詞向量#訓練集評論向量集合train_vecs=np.concatenate(build_sentence_vector(z,n_dim,w2v_model)forzinx_train)np.save(pre_data/train_vecs.npy,train_vecs)#將訓練集

6、保存到文件中print(train_vecs.shape)#輸出訓練集的維度#在測試集上訓練w2v_model.train(x_test,total_examples=w2v_model.corpus_count,epochs=w2v_model.iter)w2v_model.save(pre_data/w2v_model/w2v_model.pkl)test_vecs=np.concatenate(build_sentence_vector(z,n_dim,w2v_model)forzinx_test)np.save(pre_data/test_vecs.npy,test_vecs)prin

7、t(test_vecs.shape)4、獲得訓練集向量和標簽，測試集向量和標簽#獲得訓練集向量和標簽，測試集向量和標簽defget_data():train_vecs=np.1oad(pre_data/train_vecs.npy)y_train=np.load(pre_data/y_train.npy)test_vecs=np.1oad(pre_data/test_vecs.npy)y_test=np.load(pre_data/y_test.npy)returntrain_vecs,y_train,test_vecs,y_test5、訓練SVM模型#訓練SVM模型defsvm_train(

8、train_vecs,y_train,test_vecs,y_test):c1f=SVC(kerne1=rbf,verbose=True)c1f.fit(train_vecs,y_train)#艮據給定的訓練數據擬合SVM模型job1ib.dump(c1f,pre_data/svm_mode1/mode1.pk1)#保存訓練好的SVM模型print(c1f.score(test_vecs,y_test)#輸出測試數據的平均準確度6、構建待遇測句子的向量#構建待遇測句子的向量defget_predict_vecs(words):n_dim=300w2v_mode1=Word2Vec.load(p

9、re_data/w2v_mode1/w2v_mode1.pk1)train_vecs=bui1d_sentence_vector(words,n_dim,w2v_mode1)returntrain_vecs7、對單個句子進行情感判斷#對單個句子進行情感判斷defsvm_predict(string):words=jieba.lcut(string)words_vecs=get_predict_vecs(words)clf=joblib.load(pre_data/svm_model/model.pkl)result=clf.predict(words_vecs)ifint(resultO)=1:print(string,positive)else:print(string,negative)if_name_=_main_:#x_train,x_test=loadile_and_preprocessing()#get_train_vecs(x_train,x_test)#train_vecs,y_train,test_vecs,y_test=get_data()#svm_train(train_vecs,y_train,test_v

人人文庫> 全部分類> 行業(yè)資料 > 信息產業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 人人文庫網僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

中文自然語言處理-商品評論情感判別

文檔簡介

溫馨提示

最新文檔

評論

相關文檔