版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
:隨著Web2.0的快速發(fā)展,越來越多的人開始在互聯(lián)網(wǎng)上公開私人信息以及發(fā)實時的發(fā)布消息以及交互。作為數(shù)量急速增加的結(jié)果,大量的信息和夾雜著情感復(fù)雜的數(shù)據(jù)發(fā)布在這個平臺上,的研究受到越來越多的關(guān)注,尤其是對情感分析的問題的三種對于傾向性判斷的方法,并對這幾種方法的精確度:引隨著Web2.0應(yīng)用的擴散,社交開始了。除了線上閱讀,人們還需要共享思想和展示自我,在網(wǎng)上社交生活中[1-3]發(fā)出自己?;谶@些共同的需求,各種、社交把握住這一良好的契機和機遇迅速 )的統(tǒng)計,對于客的持續(xù)總次數(shù)遠(yuǎn)超過其它社交所占的百分比。因此,不可否認(rèn)是作為互聯(lián)網(wǎng)上的一種新的信息來源,“”是供大家信息的空間。通實際上它們存在著巨大的潛在,比如對博客的研究[4],預(yù)測票房和根據(jù) 語級[10]。另一種粒度層次還有文檔級[11-12]然而,在過去的研究中,都只關(guān)注于對西方,的探究,而沒有人 近些年人開始留心與字符的分許以及用戶的行為。在本文中,我們的目的是探究和比較()的情感分類的體現(xiàn)。尤過分析的情感這一過程中,對樸素,LibSVMSMO 改進(jìn)文本分類模型,以得到一個更好的解決情感分析的方案;理論背語級[10]。另一種粒度層次還有文檔級[11-12]性,而對于文章的情統(tǒng)計文本中的情感詞組,再比較普通的情感詞得分來確定情感傾向。Turney在2002年寫的推進(jìn)了這一領(lǐng)域的研究。他們從 銀行,,旅游目的地的評論作為實驗數(shù)據(jù)。該實驗分為三個步驟:確定情感樸素,SVM,以及某種優(yōu)化的SMO[12,21-23]。給定一個文檔d,以及項t,則有:dP(c|d)=P(c)∏1≤k≤n d c
log(P(c)+∑1≤k≤n d而且在性探測中更加具體[3,25],因此這個方法是一個強有力的競爭對手。在我們的實驗中,我們設(shè)置的參數(shù)[3,25]與缺省設(shè)置一樣。我們使SMO,即序列最小優(yōu)化方法,是訓(xùn)練SVM[26]的一種優(yōu)化算法。SMO概念簡許多針對Tweets的研究都展現(xiàn)于 Sentiment和TweetFeel,這三個情感分析的網(wǎng)頁應(yīng)用來初始化情感標(biāo)識,然后根據(jù)預(yù)先定下的規(guī)則來預(yù)處理tweets信息,最終使用預(yù)處理過的帶有情感的tweets信息作為訓(xùn)練數(shù)據(jù)[27]。論是:積極的觀點影響要超過聯(lián)系,而且性強。????????????(??)=??????????(??. ????????????(??)=1? 表明了,只要我們的特征能夠捕獲的對于tweets的抽象表示,相比于其它(1)只能包含140個字,但其中的信息量比tweets上多的多。假如我(2)在中文信息中,在無的信息中哪個特征更加有用研究方由于中國的文化和特點,包含好幾個中沒有的功能。其中一些特征我們應(yīng)該考慮到:字?jǐn)?shù)限制,社交的反饋,多樣化,微話題,賬戶驗同的收集數(shù)據(jù)集:測試數(shù)據(jù),訓(xùn)練數(shù)據(jù)以及客觀(中性)訓(xùn)練數(shù)據(jù)。提供了一個寄存器來用戶信息并包括如,@用戶名,URL,以及圖像的直接插入功能。用戶可以發(fā)布自本文數(shù)據(jù)是通過提供的接口(API在2012年9月17日到11月3日間收集的。我按照時間軸,捕獲了所有在這段時間內(nèi)發(fā)布信息以及每由于每個IP的最大請求數(shù)是150,而且3600/150=24,為了釋放一些請求以應(yīng)對突況,所以我選擇了在每隔25秒就發(fā)送一次請求以獲取的公共數(shù)時會有一些空數(shù)據(jù)最終我們抓取了634359條 有顯著特征信息。如表1所示,它描述收集數(shù)據(jù)中的正面,中性,全是手工標(biāo)注將其分類。從本質(zhì)上來說,的情感分析問題就是一個分類問題。受到Jiang在向,并且還包括一些數(shù)據(jù)的壓縮工作;進(jìn)而提取與話題無關(guān)的特征,以SVM分在流程圖(圖3.1)中,顯然該算法的是訓(xùn)練SVM方法,同時,如何精確的分類的輸入向量,如圖3.2.用SVM模型將文本轉(zhuǎn)換成向量集文本,并計算每個的權(quán)值。器。SVM的特征格式為: ,指數(shù)值,如表3.1.實驗與結(jié)
其實,精確率和率可以告訴我們關(guān)于分類器某些方面的性能。精確率展80%20%對極性和非極性的文本分類時更加穩(wěn)定,所以SMOLibsvm更好。結(jié)論和展
在我們的實驗中對于三種分類(,正面,中性)最高的準(zhǔn)確率為90.03%。盡管這個數(shù)值是由SMO得到的,但是我們可以清楚地看到,樸素,LibSVM器在實際測試中得到的結(jié)果比其它分類器要好。而貢獻(xiàn)則是:對于的數(shù) 性)和測試數(shù)據(jù)都是通過不同的API從上收集而來的。從 取來的都是帶有大量和用戶信息的XML文本,我編寫了一段Perl代碼來抓部分中有著大量的用上的中文詞語數(shù)據(jù)進(jìn)行的實驗。本次研究表明,微錯的結(jié)果。通過這些觀察,樸素,SVM,SMO析中。當(dāng)然,也有極少部分對于領(lǐng)域的這些模型的研究;但是很明顯,鮮有參考文[1]KwakH,LeeC,ParkH,etal.Whatis,asocialnetworkoranewsmedia?[C]//Proceedingsofthe19thInternationalConferenceonWorldWideWeb.ACM,2010:591-600.[2]BPang,LLee.Opinionminingandsentimentysis.NowPublishersInc,2008.[3]BPang,LLee,SVaithyanathan.Thumbsup?:sentimentclassificationusingmachinelearningtechniques//ProceedingsoftheACL-02ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.Stroudsburg,PA,USA,2002,10:79-86.[4]MBautin,LVijayarenu,SSkiena.Internationalsentimentysisfornewsandblogs//ProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[5]TurneyPD.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassificationofreviews[C]//Proceedingsofthe40thforComputationalLinguistics,2002:417-424.[6]PTurney.Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation[J].ACMTransactionsonInformationSystems,2003,21(4):315-346.GhoseA,IpeirotisPG,SundararajanA.Opitionminingusingeconometrics:Acasestudyonreputationsystems[C]//Proceedingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL).Morristown,NJ,USA:AssociationforComputationalLinguistics,2007:416-423.fortopicsinChinesesentence[J].JournalofChineseInformationProcessing,2007,21(5):73-79.YuH,HatzivassiloglouV.Towardsansweringopinionquestions:Separatingfactsfromopinionsandidentifyingthepolarityofopinionsentences[C]//Proceedingsofthe2003conferenceonEmpiricalmethodsinnaturallanguageprocessing.AssociationforComputationalLinguistics,2003:129-136.RiloffE,WiebeJ,WilsonT.Learningsubjectivenounsusingextractionpatternbootstrap.Proceedingsofthe7thConferenceonNaturalLanguageLearning,2003:25-32.SindhwaniV,MelvilleP.wordco-regularizationforsemi-supervisedsentimentysis//EighthIEEEInternationalConferenceonDataMining,2008.PangB,LeeL.Asentimentaleducation:Sentimentysisusingsubjectivitysummarizationbasedonminimumcuts//ProceedingsoftheACL,2004:271-278.design:asemanticsimilaritymatchingapproach[C]//PlanningtoLearnWorkshop(PlanLearn’10)atECAI,2010:27-34.GuoZ,LiZ,TuH.SinaMicroblog:Aninformationdrivenonlinesocialnetwork[C]//Cyberworlds(CW),2011InternationalConferenceon.IEEE,2011:160-167.YuL,AsurS,HubermanBA.WhattrendsinChinesesocialmedia[J].arXivpreprintarXiv:1107.3522,2011.[C]//WaveletActiveMediaTechnologyandInformationProcessing(ICWAMTIP),2012InternationalConferenceonIEEE,2012:385-389.GaoQ,AbelF,HoubenGJ,etal.Acomparativestudyofusers’microbloggingbehavioronsinaweiboand[J].UserModeling,Adaptation,andalization,2012:88-101.characteristicsofmicroblogusers:Take“SinaWeibo”forexample[J].LibraryandInformationService,2010,54(14):66-70.(inChinese)[19]Zi-qiongZ,Yi-junLI,QiangYE,etal.SentimentclassificationforChineseproductreviewsusinganunsupervisedinternet-basedmethod[C]//ManagementScienceandEngineering,2008.ICMSE2008.15thAnnualConferenceProceedings,InternationalConferenceonIEEE,2008:3-9.[20]PotenaD,DiamantiniC.Miningopinionsonthebasisoftheiraffectivity[C]//CollaborativeTechnologiesandSystems(CTS),2010InternationalSymposiumon.IEEE,2010:245-254.[21]DasguptaS,NgV.Minetheeasy,classifythehard:Asemi-supervisedapproachtoautomaticsentimentclassification[C]//ProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingofthepreprintarXiv:1107.3522,2011.KeerthiSS,ShevadeSK,BhattacharyyaC,etal.ImprovementstoPlatt’sSMOalgorithmforSVMclassifierdesign[J].NeuralComputation,2001,13(3):637-649.PlattJ.Sequentialminimaloptimization:Afastalgorithmfortrainingsupportvectormachines[J],1998.EsuliA,SebastianiF.Sentiwordnet:Apubliclyavailablelexicalresourceforopinionmining[C]//ProceedingsofLREC,2006,6:417-422.PlattJC.12FastTrainingofsupportvectormachinesusingsequentialminimaloptimization[J].1999.BarbosaL,FengJ.RobustsentimentdetectiononfrombiasedComputationalLinguistics:Posters.AssociationforComputationalLinguistics,2010:36-44.KaplanA,HaenleinM.Usersoftheworld,unite.TheChallengesandOpportunitiesofSocialMediaBusinessHorizons,2010,53(1):59-68.CookT,HopkinsL.Socialmediaor“Howwestoppedworryingandlearnttolovecommunication”Yourorganizationandweb2.0(3rded),e-book.RetrievedMarch28,2008,frombuildyourbusiness.Hoboken,NJ:JohnWiley&Sons,2007.20132013InternationalConferenceonManagementScience&Engineering(20th)July17-19, Harbin, ysisofSinaWeiboBasedonSemanticSentimentSpaceHUANG:WiththerapiddevelopmentofWeb2.0,moreandmorepeoplebegintopublishinformationortheircustomopinionsontheInternet.Micro-blog’sapplicationsatisfiespeople’sneedandprovidesapublicplatformforpeopletopostandinteractinrealtime.Asaresultoftherapidlyincreasingnumberofmicro-blogupdates,alotofinformationandemotionscomplexdatareleaseinthisplatform,researchesonmicro-bloghaveattractedmoreandmoreattention,especially,onecontinuousheattopic,sentimentysisofshortmessage.Sofar,Chinesemicro-blogexplorationstillneedslotsoffurtherwork.FocusonSinaWeibo’ssentimentysis,thekeyofthispaperistoputforwardthreemethodsofMicro-BlogorientationclassificationtoresolvetheproblemofMicro-Blogsentimentysis,andcomparetheaccuracyandperformanceofeachclassification:sinaweibo,sentimentysis,machinelearning,featureextractionWiththeproliferationsofWeb2.0applications,socialmediarevolutioncomes.Exceptforreadingonlinenews,peoplealsoneedtoshareconsiderationsandshowthemselves,expresstheirvoicesintheonlinesociallife[1-3].Basedonthesedemandsinthecommonsense,variouskindsofforums,socialmediawebsitesriseinresponsetothepropertimeandconditionsinChina.ThroughtheneweststatisticsfromChinaInternetdataplatform ),thetotalfrequencyanddurationofvisitsinMicro-blogfarexceedthepercentageofSNSwebsites.SothereisnodenythatMicro-blogisuniqueplatformcombininginformationpublishesandsocialnetworkperfectly.AsanewinformationresourceonInternet,“microblog”isthespaceforeveryonetoshareinformation.Generally,theseviewsfullofcustomemotionsexpressauthors’positiveandnegativeopinions.Itseemsthattheinformationiscomplexanduseless,actuallytheyexistmanypotentialcommercialvalue,suchasblogstudies[4],forecastbox-officesalesandupdatingproductswith transactiondata[7],distinguishattributeandsentimentstructurethroughChinesecarindustry
contributestotext ,questionandanswersystemSentimentysisoropinionmining,asitissometimescalled,isoneofmanyareasofcomputationalstudiesthatdealwithopinion-orientednaturallanguageprocessing.Weperformsentimentysisonmicro-bloginwhichasinglemessagetypicallyconsistsofoneortwosentencesfewerthan140words.Supportedbythisobservation,thetypeofgranularitywestudyisthesentenceandword[9]orphrase[10]level.Othergranularitylevelscanbethe level[11-12].Thelevelofdetailtypicallygoesintodeterminingthepolarityofamessage,whichiswhatthisarticleinvestigatesaswell.Amoredetailedapproachcouldbetodeterminetheemotionexpressedinadditiontothepolarity[13].However,onlywesternmicro-blog, ,havebeenexaminedinthepaststudies,andthereisnotmuchresearchfocusonsentimentysisofChinesemicro-blogarea,andtoourknowledge,amongresearchofmicro-blog,itbeginswithbasicintroductionandprediction[14-15],andinrecentyearsmoreaboutysis[16]andusermicro-blogbehaviors[17-Inthispaper,weaimatexploringandcomparingtheperformanceofsentimentclassificationforSinaWeibo(Weibo).Especially,theinterestisinthesentimentofWeibopostsbyusersaboutpeople’sstatus.ThroughyzingthesentimentofWeibo,thispapermakesacomprehensivecomparison,whichtakesNa?veBayes,LibSVM,andSMOmodelsintoThisarticleselectscommonWeibomessagesalongtimelinetobethestudyobject,andbeginsfromthefollowingrespects.ViarecognizingthecharacteristicsofWeibomessages,thispaperwillimprovefeatureselectionapproach,findoutthesetsuitableformicro-blogitself,buildmicro-blogsentimentspacemodel;Onthebasisofusingmachinelearningtotextclassification,thispaperwilladduppositiveandnegativesentimentdictionarymaterials,updatetextclassificationmodel,getasolutionforsupportingmicro-blogsentimentysisbetter.DevelopasentimentysisprototypefocusonChinesemicro-blog,throughSinaAPItoobtaindata,examinetheaccuracyandviabilityoftestingsamplewithdifferentsentimentclassifications.-206EnglishwordssentimentSentimentysisoropinionmining,asitissometimescalled,isoneofmanyareasofcomputationalstudiesthatdealwithopinion-orientednaturallanguageprocessing.Weperformsentimentysisonmicro-bloginwhichasinglemessagetypicallyconsistsofoneortwosentencesfewerthan140words.Supportedbythisobservation,thetypeofgranularitywestudyisthesentenceandword[9]orphrase[10]level.Othergranularitylevelscanbethe level[11-12].Thelevelofdetailtypicallygoesintodeterminingthepolarityofamessage,whichiswhatthisarticleinvestigatesaswell.Amoredetailedapproachcouldbetodeterminetheemotionexpressed[20]inadditiontothepolarity.Eventhoughthesentiment ysisofChinesetestisimmature,itisawell-developedtechniqueofEnglishwordsSentiment ysis.Referringtonocentretopic,itmeanstojustutilizeone orsentencetojudgeitssentimentpolarity.Therearethreeclassesclassificationsinthis,methodsbasedondictionary,supervisedmachinelearningmethods,andunsupervisedmachinelearningmethods.Unsupervisedmachinelearningmethodsuseappointedbasicsentimentwords,calculatethesentimentphasesexactedfromtexts,andthencomparewithnormalsentimentwordsscoretodeterminesentiment.Turney,in2002wroteonepaper[5]contributingtothisresearch.Theygetreviewsofmobile,bank,movie,traveldestinationfrom asexperimentdata.Theexperimentisathree-stepprocess:exactsentimentphrases,estimatethetendentiousnessofexactiontwowordsphrases,figureaveragesemanticstendentiousnessofeachreview.Heresupervisedmachinelearningmethodsshouldbedeeplydiscussed.ThismethodmainlyreferstoNa?veBayes,SVM,andsomeoptimizedSMO.[12,21-23]TheNa?veBayesmethodforclassificationisoftenusedintextclassificationduetoitsspeedandsimplicity.Itmakestheassumptionthatwords(ork-grams)are
Inourexperimentsweusethesameparametersettingsas[3,25],whousedthedefaultsettings.WeusethesamefeaturespacesasforNaiveBayes,usingtokens,tags,acombinationofbothorpatternsasfeatures.(3)SequentialMinimalOptimization,orSMO,thisisanoptimizedalgorithmfortrainingSVM[26].SMOisconceptuallysimple,easytoimplement,isoftenfaster,andhasbetterscalingpropertiesthanastandard“chunking”algorithmthatusesprojectedconjugateSMOchoosestosolvethesmallestpossibleoptimizationproblemateverystepratherthanpreviousNextchapter,wewilldeeputilizethesemodelsintheEnglishmicro-blogsentimentManystudiesareforTweets,postedin Theseresearchesdividedintotwoaspects,micro-blogsentimentysiswithouttopic,andmicro-blogEnglishsentimentwithspecifictopic.Referringtomicro-blogsentimentysiswithouttopic,scholarsusethehashtagandsysofTweetsastag,trainasupervisedKNNclassificationAnotherarticleinthis usesthesentimentysisapplicationsofthreewebsites,Twendz,SentimentandTweetFeeltogettheinitialsentimenttag,andthenpreprocesstweetsaccordingtorulesestablished,andfinallyusingthepreprocessedtweetswithsentimenttagastrainingdata[27].Thefirststepis,classifyobjectiveandsubjectivewithexactionfeaturetrainingclassification.Theyexacttwoclassesoffeatures:Meta-informationofthewordsFeaturesandTweetsrelativesyntaxfeatures.Theresultconcludesonthebasisoftheirinfluenceextent:positivesentimentpolarityaffectsmorethanlinks,andthenquitestrongsubjectivity,uppercases,verbs.Thesecondstepistoclassifysentimentpolarityutilizingchangingwordsinsamefeatures.Inthisstep,authorsuseformula2-2andformula2-3tocorrectthepolarityofsentimentwords.generatedindependentlyofwordposition.Foragiven polposwcountw,pos/countw setofclasses,itestimatestheprobabilityofaclass, given ,d,withterms,t,k k
Besides,theystillusethesamefeaturesinthefirststeptotrainclassification.Intheirexperiments,theyTheclassifierthenreturnstheclasswiththehighestprobabilitygiventhe .Inpractice,thelogprobabilityisestimated,givenby:
showedthatsinceourfeaturesareabletocaptureamorerepresentationoftweets,oursolutionismoreeffectivethanpreviousonesandalsomorerobust argmaxlogP?cPt 1k
regardingbiasedandnoisydata,whichisthekindofdataprovidedbythesesources.TheresultstatesbasedonThepriorclassprobabilityisgivenbythefractionofappearancesofthatclassinthetrainingset.(2)SimilartoNaiveBayes,SVMapproachesoftenshowverypromisingresultsintraditionaltextcategorization[24],andmorespecificallyinsubjectivitydetection[3,25],this
approachishenceadirectorderofthefunctioninfluence,thelistsare:negativesentimentpolarity>positivesentimentpolarity>verbs>emoticonpresentspositive>theuppercases.Assofar,therearestilllotsofworkshouldbe
inChinesemicro-blogsentimentysis,sinceithasabigdifferencewithbothEnglishmicro-blogandtraditionalblog,soitstillhasmountainsofresearch-207SinceWeiboonlycontains140words,theamountofinformationismuchmorethantweets.Ifwedoitindifferentway,oneislookingasonemessage,anotheroneislookingasseveraldividedsentences,whethertheresultsaredifferentornot.IntheChineseWeibomessages,whichfeaturesaremoreusefulintheno-thememessages?ResearchBecauseoftheculturesandcharacteristicsofChina,SinaWeiboincludesseveralfunctionsthatarenotincludedon.Somefeaturesshouldbeconsidered,limitationofwords,convenientsocialfeedback,richmedia,microtopics,verifiedaccount,andself-censorship.SowegetridofsomeWeibowithlinkstoguaranteewordkitprecision.Datacollectionfortheresearchisnotassimpleasitmayseematfirstthought.Thereareassumptionsanddecisionstobemade.Therearethreedifferentlycollecteddatasets:testdata,subjectivetrainingdata,andobjective(neutral)trainingdata.SinaWeibo,itprovidestheregisterauserprofileandcontainsfunctionslikerepost,@usernames,hashtag(#),privateinstantchat,URLshortening,anddirectinsertionswithgraphics.Userscanposttheirownupdates,followtheirfavoriteWeiboaccounts,createeventwithhashtags,reposttheirconcernedmessages,andinteractwithothersviacomments.TheWeibodatainthispaperwascollectedfrom17thSepto3rdNovin2012,basedontheApplicationProgrammingInterfaces(APIs)providedbyWeibo.IcapturedalltheWeibopostedorderedbythetimelineduringthisperiod,andtheprofileofeachWeibo’suser,e.g.thenumberoffollowersandfollowings,andthegenderandtheprovinceofusers.SinceeachIP’s umrequestcountsis150,and3600/150=24,inordertofreeoutsomerequestcountsforunexpectedsituation,soIchooseevery25secondsinawhile,sendarequesttoacquirethenewestpublictimelinelists.Thisrequestreturnsthenewestposted20Weibo.ButbecauseofnetworkandSinaserverproblems,therearesomenulldata.Asasummary,634,359WeiboandrelativedetailinformationanduserinformationhadbeenAsmentionedabove,eventhoughthispapercollectedmountainsofWeibo,itcanbeusedonlyaftermanualtaggingitsclass,positive,neutral,negative.Thisisatoughtask,whichisimpossibletoclassifyalltheWeibo.Afterpreprocessing,thispaperusestherandomclassifiedSinaWeibodatawith2071messages,603negative,287neutral,624positive,and557Weibowithnosignificantwhenrandom
Jiang,etc,2011,thisarticledesignedthealgorithmprototypeasFig.1.FromFig.1,itstatesgeneraltrainsofthought,firstpreprocessthetrainsample,thispartmainlyismanualworktolabelthesentimentpolarity,andalsoincludessomedatacontractionwork;andthenextractthecharacteristicshavingnothingtodowithtopic,traintheSVMclassificationtoclassifysentimentpolaritywiththetestsample,theoutputissentimenttagresult.TrainingTesting…TrainingTesting…Fig.1AlgorithmprototypedesignflowIntheflowchart(Fig.1),itisobviousthecoreofthisalgorithmusedisthetrainingmethodSVM,atthesametime,howtoexactcharacteristicsisalsothekeypart.Afterpreprocessing,thetextshavealreadyseparatedandmarkedthewordswiththeirpartofspeech,characteristiccalculateprototypebasedonSVMistocalculatethefeaturevectorofeachtestthroughcharacteristicextraction,theoutputofthispartistheinputvectorofclassification,likeFig.2.Index:Index:Featuresetdw,wtdtItdescribesthenumberofpositive,neutral,negativecollectedWeibo,showninTab.1.Allthepolarityisseparatedbymanualwork.Essentially,thesentimentysisproblemofSinaWeiboisaclassifyissue.Inspiredbytheresearchfrom
Fig.2FeatureextractionflowInthechart,Indexpresentsfeatures,Valuepresentsfeatureweight.Thismeansreferringtothefeatureset-208usingSVMmodeltotransferWeibotextsettotextvectorset,andcalculateeachWeibo’sweight.SVMclassificationisthekeymethodoftheclassificationprototype;inthispaperIuseopensourceSVMclassification.ThefeaturesformatofSVM:labelindexvalue,likeTab.2.0123456789………………Afterthetrainingprocess,eachWeibowillgetafeaturevectorpresentation;itlikesthesampleshowninWeibo:Weibo:土狗老師你好我又 310:0.359810351:0.141443476:0.359810477:0.282574Fig.3SVMinputdataExperimentsandThepurposeofexperimentinlastchapteristotestandverifythecapabilityofSVMclassificationindealing Commonlyindexesusedtoevaluatetheperformanceofclassificationareaccuracy,precision,recallandF1measure.FromTab.3,accuracy,precision,recallandFmeasurearecomputedasfollowsinformulas(5-8).
oftheclassifierwithrespecttoeachclassandrecalllsthecompletenessoftheclassifierwithrespecttoeachclass.Recallenablestoidentifytheclasswithrespecttowhichtheclassifierishavingdifficultypredictingandtousethisinfototiptheclassifierinfavorofthatclass.101ActualPos.0Accuracyisthemeasurebywhichalltheresultsoftheabovealgorithmswerecompared.Erroristheotherwayoftalkingaboutit.Soifanalgorithmhas80%accuracy,itmeansithas20%error.Belowisaresultfromseveralclassificationsontrainingdataandtestdata.Thewhole1514usefuldatawillbecutinfourfolds,inotherwords,threequarterofthemwillbetrainingdata,andtheleftdatawillbethetestdata.Theaccuracyisonthetestdataandthisistheclassificationatwhichthehighestaccuracywasachieved.Basedontheprocessedfeaturevalueofthewholetextset,nextwewillyzetheresultsofeachAtfirst,wetesttheaccuracyofthreemethodsunderthesamedataset.TheresultstatesinFig.4.Fig.4ExperimentOnthewhole,Na?veBayesisgoodataccuracy TPTPFPFNprecision TP
polarwords,butisincapacitytodiscernneutraltexts,andthesameasLibSVM.TheperformanceofSMOisrelativestable,inotherwords,itcanclassifythreeclasseswell.Buttheaverageaccuracy,SMOisbetterthanLibSVM,andNa?veBayes,itisalsoshowedinrecall TP
Tab.4.ItshowstheresultsofthreeclassifierswhenalmosteveryparameterachievesrelativelyhigherscoreF12precisionrecallprecisionrecall
thanthebaseline,whichmeansclassificationmodels,areefficientforWeibo.InTab.4,wecompareHowever,precisionandrecallcan lusaboutsomeaspectsofaclassifier.Precision lstheexactness
effectivenessofthreemethods,Na?veBayes,LibSVM,andSMOinfurtherdetail.Inoursample,the-209resultofNaiveBayesclassificationindicatesthatthismethodcaneffectivelyclassifypolaritytexts,buttheperformanceinneutraltextsisworsethanLibSVMandsmo.eventhoughtheaccuracyoflibsvmandsmoisalmostthesame,smoisbetterthanlibsvm,sincesmoisshowntobefast,fairlyaccurate,andmuchstableinpolarityandnon-polaritytextsintheexperimentresult.ConclusionsandfutureThispapersetsouttosolveapracticalproblemofsentimentysisofSinaWeibopostssortedoftimeline.ComparedtosentimentclassificationforChinesetraditionalreviews,thisstudyexploresthefeasibilityofclassificationsonshortmessages,Weibo.Toconclude,thearticlehasshownthattextsinChinaWeibopostsplatformcanbeautomaticallycollectedandsuccessfullyyzedfortheirsentiment.SMOclassifierwasfoundtogivethehighestaccuracywithnotopicWeibosample.Thehighestaccuracyachievedforathree-classed(negative,positive,neutral)classifieris90.03%inourexperiment.EventhoughthisscoreisgotfromSMO,butthroughtheresults,wecanclearlyseethatNa?veBayes,LibSVMindeeddosomecontributeonpolarclasses.Inourexperiment,SMOcanbeappliedforpracticalapplicationsdealingwithsentimentysisofWeiboingeneral.Thisthesishasmadesomeconfirmationsofpreviousfindingandthreemainnovelcontributions.TheconfirmationsarethatintheChineseWeibotextsmachinelearningtechniquesoutperformkeyword-basedtechniquesandthatSMOclassifiergivesbetterresultsthanotherinstancerepresentations.Thecontributionsare:datacollectionofSinaWeibodataanddatapreprocessing,empiricalstudyoftheroleofChinese-wordcontextinsentimentysisofWeibosortedbytimelinewithouttopic,andcomparesentimentclassificationmodels.Alltrainingdata(negative,positive,neutral)andtestdatawerecollectedfromSinaWeibousingdifferentAPIs.FetcheddatafromWeibowebsiteareallxml swithamountsofWeiboandusersinformation,IcompileapieceofPerlcodetoexactexperimentdata,andgeneratemountainsoftextfilesautomatically.Supportedbythetheoryconception,themainpartofthisthesisisanempiricalexperimentusingSinaWeiboChinese-worddatatoevaluatetheperformanceofseveralsentimentyses.Thisempiricalstudyindicatesthatsentimentysis
Weiboingeneralcanbedoneindependentlywithoutregardtotheircontext.Thisisthemaintheoreticalcontributionofthisthesisfortherewasnotany specificstudyofsentimentofWeibobefore.Besides,thereisanotherveryimportantcontributionisinevaluatingtheperformanceofsentimentysismodelaimedatWeiboclassifyissue.Acarefulreviewoftheliteratureonsentimentysisshowedthatthereisnoonebestfeaturevectorthatissuitedtosentimentysis.Therearesomesentimentysisstudiesthatachievedgoodresultswithtweetspresencepostson .Aftertheseobservations,Na?veBayes,SVM,andSMOareallabletoutilizeinsentimentysis.ButthereislittleresearchonWeibo withthesemodels;obviously,lessresearchmakesacomparisonoftheclassificationresultswiththesemodelsinWeibo .Soaresultscomparisonofmodelsperformanceisanothertheoreticalcontributioninthisthesis. classificationshouldbeconsidered.Weexploresomeotherwaysintextparticipleandphraseextractingtoimproveaccuracy.AndwefindoutthebestmatchtoyzethesentimentofWeiboshorttext.Furthermore,acomparisonbetweenChinesereviewsandsentimentysiscanbe[1]KwakH,LeeC,ParkH,etal.Whatis ,asocialnetworkoranewsmedia?[C]//Proceedingsofthe19th[2]BPang,LLee.Opinionminingandsentimentysis.NowPublishersInc,2008.[3]BPang,LLee,SVaithyanathan.Thumbsup?:sentimentclassificationusingmachinelearningtechniques//ProceedingsoftheACL-02ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.Stroudsburg,PA,USA,2002,10:79-86.[4]MBautin,LVijayarenu,SSkiena.Internationalsentimentysisfornewsandblogs//ProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[5]TurneyPD.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassificationofreviews[C]//Proceedingsofthe40thAnnualMeetingonAssociationforComputationalLinguistics.AssociationforComputationalLinguistics,2002:417-424.-210[6]PTurney.Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation[J].ACMTransactionsonInformationSystems,2003,21(4):315-GhoseA,IpeirotisPG,SundararajanA.Opitionminingusingeconometrics:Acasestudyonreputationsystems[C]//Proceedingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL).Morristown,NJ,USA:AssociationforComputationalLinguistics,2007:416-423.YaoTianf
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 孕期手指發(fā)麻的健康宣教
- 《預(yù)算培訓(xùn)材料》課件
- 紅痣的臨床護(hù)理
- 《機械設(shè)計基礎(chǔ) 》課件-第1章
- 李寧公司導(dǎo)購銷售技巧培訓(xùn)課件
- 化學(xué)反應(yīng)的方向課件
- 動量定理的應(yīng)用課件
- JJF(陜) 104-2023 裂隙燈顯微鏡校準(zhǔn)規(guī)范
- JJF(陜) 016-2019 呼吸器綜合檢測儀校準(zhǔn)規(guī)范
- 《酒店對客服務(wù)培訓(xùn)》課件
- 四年級上冊數(shù)學(xué)說課稿-圖形與幾何-北師大版
- 山東省建筑自動消防設(shè)施檢測收費標(biāo)準(zhǔn)
- 高血壓心臟病的護(hù)理查房
- 2023年4月自考11742商務(wù)溝通方法與技能試題及答案
- 食品試驗設(shè)計與統(tǒng)計分析期末復(fù)習(xí)資料
- 項目計劃書:3D數(shù)字設(shè)計和制造平臺創(chuàng)業(yè)方案
- 航空餐飲服務(wù)的注意事項
- DB42T 1144-2016燃?xì)庥貌讳P鋼波紋軟管安裝及驗收規(guī)范
- 二級醫(yī)院規(guī)章制度匯編
- 2023-2024學(xué)年安徽省合肥市小學(xué)數(shù)學(xué)五年級上冊期末自測題
- GB/T 702-2017熱軋鋼棒尺寸、外形、重量及允許偏差
評論
0/150
提交評論