美聯(lián)儲-制造業(yè)情緒:用文本分析預測工業(yè)生產(chǎn) Manufacturing Sentiment Forecasting Industrial Production with Text Analysis 2024_第1頁
美聯(lián)儲-制造業(yè)情緒:用文本分析預測工業(yè)生產(chǎn) Manufacturing Sentiment Forecasting Industrial Production with Text Analysis 2024_第2頁
美聯(lián)儲-制造業(yè)情緒:用文本分析預測工業(yè)生產(chǎn) Manufacturing Sentiment Forecasting Industrial Production with Text Analysis 2024_第3頁
美聯(lián)儲-制造業(yè)情緒:用文本分析預測工業(yè)生產(chǎn) Manufacturing Sentiment Forecasting Industrial Production with Text Analysis 2024_第4頁
美聯(lián)儲-制造業(yè)情緒:用文本分析預測工業(yè)生產(chǎn) Manufacturing Sentiment Forecasting Industrial Production with Text Analysis 2024_第5頁
已閱讀5頁,還剩66頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

FinanceandEconomicsDiscussionSeries

FederalReserveBoard,Washington,D.C.

ISSN1936-2854(Print)

ISSN2767-3898(Online)

ManufacturingSentiment:ForecastingIndustrialProductionwithTextAnalysis

TomazCajner,LelandD.Crane,ChristopherKurz,NormanMorin,PaulE.Soto,BetsyVrankovich

2024-026

Pleasecitethispaperas:

Cajner,Tomaz,LelandD.Crane,ChristopherKurz,NormanMorin,PaulE.Soto,andBetsyVrankovich(2024).“ManufacturingSentiment:ForecastingIndustrialProductionwithTextAnalysis,”FinanceandEconomicsDiscussionSeries2024-026.Washington:BoardofGov-ernorsoftheFederalReserveSystem,

/10.17016/FEDS.2024.026

.

NOTE:StafworkingpapersintheFinanceandEconomicsDiscussionSeries(FEDS)arepreliminarymaterialscirculatedtostimulatediscussionandcriticalcomment.TheanalysisandconclusionssetfortharethoseoftheauthorsanddonotindicateconcurrencebyothermembersoftheresearchstafortheBoardofGovernors.ReferencesinpublicationstotheFinanceandEconomicsDiscussionSeries(otherthanacknowledgement)shouldbeclearedwiththeauthor(s)toprotectthetentativecharacterofthesepapers.

ManufacturingSentiment:

ForecastingIndustrialProductionwithTextAnalysis*

TomazCajner

NormanMorin

LelandD.Crane

PaulE.Soto

April2024

ChristopherKurz

BetsyVrankovich

Abstract

Thispaperexaminesthelinkbetweenindustrialproductionandthesentimentex-pressedinnaturallanguagesurveyresponsesfromU.S.manufacturingirms.Wecom-pareseveralnaturallanguageprocessing(NLP)techniquesforclassifyingsentiment,rangingfromdictionary-basedmethodstomoderndeeplearningmethods.Usingamanuallylabeledsampleasgroundtruth,weindthatdeeplearningmodels—partiallytrainedonahuman-labeledsampleofourdata—outperformothermethodsforclas-sifyingthesentimentofsurveyresponses.Further,wecapitalizeonthepanelnatureofthedatatotrainmodelswhichpredictirm-levelproductionusinglaggedirm-leveltext.Thisallowsustoleveragealargesampleof“naturallyoccurring”labelswithnomanualinput.Wethenassesstheextenttowhicheachsentimentmeasure,aggregatedtomonthlytimeseries,canserveasausefulstatisticalindicatorandforecastindustrialproduction.Ourresultssuggestthatthetextresponsesprovideinformationbeyondtheavailablenumericaldatafromthesamesurveyandimproveout-of-sampleforecast-ing;deeplearningmethodsandtheuseofnaturallyoccurringlabelsseemespeciallyusefulforforecasting.Wealsoexplorewhatdrivesthepredictionsmadebythedeeplearningmodels,andindthatarelativelysmallnumberofwords—associatedwithverypositive/negativesentiment—accountformuchofthevariationintheaggregatesentimentindex.

JELcodes:C1,E17,O14

Keywords:IndustrialProduction,NaturalLanguageProcessing,MachineLearning,Forecasting

*AllauthorsareattheFederalReserveBoardofGovernors.WethanktheInstituteforSupplyManage-ment,includingKristinaCahill,TomDerry,DebbieFogel-Monnissen,RoseMarieGoupil,PaulLee,SusanMarty,andDenisWolowiecki,foraccesstoandhelpwiththemanufacturingsurveydatathatunderlietheworkdescribedbythispaper.WearethankfulforcommentsandsuggestionsfromStephenHansen,AndreasJoseph,JuriMarcucci,ArthurTurrell,andparticipantsattheSocietyforGovernmentEconomistsAnnualConference,theESCoEConferenceonEconomicMeasurement,theGovernmentAdvancesinStatisticalProgrammingConference,theSocietyforEconomicMeasurementConference,andtheNontraditionalData,MachineLearning,andNaturalLanguageProcessinginMacroeconomicsConference.Theanalysisandcon-clusionssetforthherearethoseoftheauthorsanddonotindicateconcurrencebyothermembersoftheresearchstafortheBoardofGovernors.

2

1Introduction

Inrecentyearstherehasbeenanexplosionofinterestinnaturallanguageprocessing(NLP)withininanceandmacroeconomics.Theuseoftextdatatoforecastandassistinmodelestimationisbecomingincreasinglycommonplace.Still,therearemanyopenquestionsaroundtheuseofNLPinempiricalwork.Forexample,whichofthenumerousavailablemethodsworkbest,andworkbestinspeciiccontexts?Areof-the-shelftoolsappropriate,oraretheregreaterreturnstospecializingmodelstothedataathand?Howusefulistextforforecastingrealoutputindicators,suchasmanufacturingoutput?WhatexplainsthepredictionsmadebycomplicatedNLPmodels?Thispaperaddressesthesequestions,usinganoveldatasetandavarietyofNLPmethodsrangingfromtraditionaldictionariestoine-tunedtransformerneuralnetworks.

OurprimarydatasourceisthemonthlysurveymicrodataunderlyingtheInstituteforSupplyManagement’s(ISM)ManufacturingReportonBusiness.ThesurveyistakenbypurchasingmanagersatarepresentativesampleofU.S.manufacturingirms.Partofthesurveyconsistsofcategorical-responsequestionsaboutaspectsoftheircurrentoperations,includingproduction,inventories,backlogs,employment,andneworders.Theanswerstothesequestionsareoftheform“worse/thesame/betterthanlastmonth”,andareaggregatedintothewidely-reportedISMdifusionindexes.Butthesurveyalsoincludesfree-responsetextboxes,wherepurchasingmanagerscanprovidefurthercommentseitheringeneraloraboutspeciicaspectsoftheirbusinesses;thesecommentsareanovelsourceofsignalabouttheeconomyandourfocusinthispaper

.1

Ourirststepistoquantifythetextintoaneconomicallyimportantandinterpretablemeasure.Wefocusonsentiment,giventhatwavesofoptimismandpessimismhavehis-

toricallybeenlinkedtobusinesscycleluctuations(Keynes,

1937).Webeginbyevaluating

variousNLPmethodsintermsoftheirabilitytocorrectlyclassifythesentimentexpressedinindividualcomments.Ourcontextisfairlyspeciic:thedataaremanufacturing-sectorpurchasingmanagersopiningaboutaboutthebusinessoutlookfortheirirm,withoutmuchdiscussionofinancialconditions.Whiletherearenumeroussentimentclassiicationmod-elsavailable,manyweredevelopedwithotherdatainmind,suchassocialmediaposts

(Nielsen

,

2011)

.Evenwithineconomicsandinance,mostworkhasfocusedoninance-

1WhileISMcollectstheseresponsesthroughthesurvey,thistextisconidentialandnotincor-poratedintothepublicizedindexes.AsampleofresponsesarepublishedinthemonthlyISMRe-portonBusiness(see

/supply-management-news-and-reports/reports/

ism-report-on-business/)

.

3

relatedlanguage(Araci,

2019;

Correaetal.,

2021;

Huangetal.,

2022)

.Thelackofresultsformanufacturing-speciicdatasetsmotivatesourassessmentofavarietyofNLPtechniques.

Onecommonapproachistocountthefrequencyofwordswithinasentimentdictionary.Economistsinitiallyusedpositiveandnegativewordsfrompsychologyliterature,buthavesincemovedontousingdomain-speciicwords(e.g.,

Correaetal.,

2021)andusingsimple

wordcountstomeasureothertypesoftone,suchasuncertainty(see

Bakeretal.,

2016

and

Gentzkowetal.,

2019).Whilethismethodistransparent,itmayfailtocapturenegation,

synonyms,andoftenrequirescontext-speciicdictionariesthatmaynotbeavailable.Morerecentlydevelopedtechniquesemploydeeplearningmethodsthataccountforthenuancesoflanguage.WefocusonvariantsofBERT(see

Devlinetal.,

2018),aprecursorofpopular

largelanguagemodelslikeChatGPT.Thesemodelsarepre-trained:theparametersaresetbyexposingthemodeltoalargecorpusoftext—suchastheentiretyofWikipedia—andattemptingtopredictmissingwordsortherelationshipbetweensentences.Thepre-trainedmodelscanbeusedtoclassifysentimentdirectly,ortheycanbefurthertrained(“ine-tuned”)onaspeciicdataset.Thelatterapproachattemptstogetthebestofbothworlds:asolidabilitytoparselanguagefromtheexposuretoalargequantityoftrainingdata,plusthecontext-speciicnuancefromtheine-tuningdata.Whiledeeplearninggetsenormousattention,itisex-anteunclearwhetheritshouldoutperformcarefullycurateddictionariesinourcontext.

Comparingtheaccuracyofthesediferentmethodsonasampleofhand-codedcommentsfromourdatasetweindthatdeeplearningdoeshaveanadvantageonourdata,inpartbecausethebrevityofthecommentsmeansthatmanycommentshavenooverlapwithdictionaryterms.Inaddition,weindthatthereisvalueinspecializingthemodelstoourdata:themodelsine-tunedonourdatahavethehighestsentimentclassiicationaccuracyonahold-outsample.Theseresultspointtotheadvantagesofusingpre-trainedmodels,aswellascarefullyspecializingthemtothetaskathand.OurhopeisthattheseresultshelpguideothereconomistswhendecidingbetweenNLPapproaches.

Thesentimentmeasuresbasedonfree-formtextualresponsesintheISMdataaggre-gateintoindexesthatcloselymirrorboththedifusionindexbasedontheresponsestothecategoricalsurveyandaggregatemanufacturingoutput,asmeasuredbythemanufactur-ingcomponentofindustrialproduction.Wefurtherinvestigatetherelationshipbetweentheaveragesentimentexpressedbypurchasingmanagersandmanufacturingoutputeconometri-cally.Ourbaselineforecastingmodelaskswhethersentimentcanhelpforecastmanufacturingoutputandincludes—amongothercontrols—someoftheISMdifusionindexes,sothetest

4

iswhetherthesentimentindexeshaveadditionalinformationbeyondtheISMcategoricalresponsesdata.Weindthatmostdictionary-basedtextvariablesdonothelppredictman-ufacturingoutput,withtheexceptionofacuratedinancialstability-speciicdictionary.Ontheotherhand,sentimentvariablesfromthedeeplearningmodelsarepredictiveoffuturemanufacturingoutput.Out-of-sampleforecastingexercisesshowthattheinancialstabil-itydictionaryanddeeplearningtechniquessigniicantlyreducethemeansquaredforecasterrorsaswell.Overall,ourresultssuggestthatpurchasingmanagers’surveyresponsescon-tainusefulforward-lookinginformation,andthatsentiment-basedmeasurescanimprovetheaccuracyofforecastsofmanufacturingoutput.

Theexercisesdescribedaboverelyonamanually-labeledsampleofthedata,bothtoassesstheaccuracyofdiferentmethodsandtohelpine-tunesomeofthedeep-learningbasedmethods.However,thepanelmicrodataallowforadiferentapproach.Sinceirmsareinthesurveyformultiplemonths,wecanlinkthetext(andother)datafromagivenmonthtonextmonth’sirm-levelproductiondata.Fittingamodeltothesedataletsusforecastirm-levelproductionusingirm-levellaggedinformation.Thismethodologyhastwoadvantages.First,itgivesusamuchlargertrainingsamplesizeascomparedtothemanuallylabeleddata.Second,italignsthetrainingdataobjectiveverypreciselywiththeaggregateforecastingobjective.Onthissecondpoint,wedoourbestwhenmanuallylabelingdatatodiscernwhetherthecommentisindicativeofrisingorfallingindustrialproduction.Butthereareplentyofambiguouscases,sotherearesomeclearadvantagestolettingthedataspeak,andseeingwhattextisactuallyassociatedwithfuture(irmlevel)changesinproduction.Weindthatine-tuninginthiswayiscompetitivewithusingthemanuallabels,andinsomecasespreferable.

Finally,wemakeprogressontheexplainabilityofdeeplearningmodels.Thesemodelsarenotoriouslyopaque,aconsequenceoftheirveryhighparametercountandextremelynonlineararchitecture.Thiscanmakeitdi伍culttotrusttheoutputsofsuchmodels,asitisnotinitiallycleariftheseeminglygoodpredictionsarebasedonsolidfoundations.Weuseastandardmachinelearninginterpretabilitymethod—Shapleydecompositions—toscorethecontributionofeachindividualwordineachcomment.Ourresultspointtoasensibleinterpretationofourdeeplearningmodels.First,thescoreforeachwordisroughlyconstantovertime:wordsdonotdramaticallychangetheiraverageconnotation(thoughtheunderlyingdeeplearningmodelallowsforthis).Second,therearefattailstothescores:mostwordshavescoresveryclosetozero(neutral),witharelativelysmallnumberofwordshavingextremesentiment.Forexample,themostpositivewordsinclude

5

“brisk”,“excellent”,“booming”,“improve”,and“e伍cient”;amongthemostnegativewords

are“unstable”,“insu伍cient”,“fragile”,“inconsistent”,and“questionable”.Theclose-to-neutralwordscontributeverylittletoaggregatesentiment,evenafteraccountingforthefactthattheyoccurveryfrequently.Finally,weindthatchangesinouraggregatedsentimentindexarelargelyaccountedforbychangesinthefrequencyofthewordswiththemostextreme(positiveornegative)sentimentscores,withthevastmajorityofwordsplayinglittlerole.Thus,whileitmaybedi伍culttomanuallyconstructadomain-speciicdictionaryfromscratch,itispossibletoextractafairlysimple,interpretabledictionaryfromthedeeplearningmodel.

Ourpapercontributestotwostrandsofliterature.First,ourcomparisonofNLPtech-niquesformeasuringsentimentaddstothegrowingbodyofliteratureincorporatingNLPintoeconomicandinancialresearch.Sincetheseminalworkof

Tetlock

(2007),manystudies

haveuseddictionary-basedmethods(Bakeretal.,

2016;

Hassanetal.,

2019;

Youngetal.,

2021

;

Cowheyetal.,

2022),andreinedlexiconsforspeciiccontextshavebeenshownto

improveperformanceinmeasurementandforecasting(Correaetal.,

2021;

Gardneretal.,

2022

;

Sharpeetal.,

2023).Machinelearningtechniqueshavealsobeenusedtoselectword

lists(ManelaandMoreira,

2017;

Soto,

2021).Morerecentpapersincorporatemoresophis

-

ticatedmachinelearningmethodstoextractthetenseandtopicoftexts(Angelicoetal.,

2022

;

HanleyandHoberg,

2019;

Hansenetal.,

2018;

Kalamaraetal.,

2022)

.AdvancesinNLP,particularlytheuseofdeeplearningtechniques,havesigniicantlyimprovedsentiment

classiication(HestonandSinha,

2017;

Araci,

2019;

Huangetal.,

2022;

Bybee,

2023;

Jhaet

al.

,

2024)

.

Second,wecontributetotheliteratureonforecastingindustrialproduction(D’Agostino

andSchnatz

,

2012;

LahiriandMonokroussos,

2013;

Ardiaetal.,

2019;

Cimadomoetal.,

2022

;

Andreouetal.,

2017).Ouranalysisoftherelationshipbetweensentimentandindus

-trialproductionprovidesnewinsightsintotheroleofunstructuredtextdataineconomic

forecasting(Marcucci,

2024)

.BycomparingvariousNLPtechniques,weareabletoiden-tifywhichmethodsaremostefectiveforclassifyingsentimentandincorporatingthemintopredictivemodelsofindustrialproduction.

Thepapermostsimilartooursis

Shapiroetal.

(2022),whoindthatdomainspeciic

dictionariescanimprovepredictionsofhumanratedsentiment.Weindbroadlysimilarresultsusingainancialstability(ratherthanageneralpurpose)dictionarytomeasuresentiment,butmoveonestepfurtherbyprovidingarobustcomparisontolargelanguagemodels.Ourpaperdifersfromtheirsintwoimportantways.First,wefocusoncreating

6

asentimentindexfromirm-leveldata,ratherthanbeginningtheanalysisatanaggregatemacroeconomiclevel.Insteadofmeasuringconsumersentimentthroughnewspaperarticles,wemeasuremanufacturingsentimentfromapanelofsurveyresponses.Ouruniquemicro-leveldataallowustounderstandthevalueoftextbeyondcategoricalresponsesandnaturallyoccurringlabels.Second,

Shapiroetal.

(2022)compareslexicon-basedsentimentapproaches

onlytobaselineBERT,whichatthetimewasthemostdevelopedtransfer-learningbasedmodel.WealsoconsidernewerdeeplearningmodelsbasedonBERT,particularlythoseine-tunedondomainspeciicandnaturallyoccurringdata.Weapplyinterpretabilitytechniquestothese‘blackbox’modelsandshowthataggregatesentimentindexesderivedfromdeeplearninghingeonthefrequenciesofrelativelyfewwords.

Theremainderofthepaperisstructuredasfollows.Section

2

presentsourdata.Section

3

reviewshowwemeasuresentimentfromthetextualsurveydataandSection

4

overviewstheresultingindexes.Section

5

presentstheempiricalstrategyandindings,andSection

6

evaluatesthemechanismsthroughwhichirmsurveyresponsespredictindustrialproduction.Section

7

concludes.

2Data

TheprimarydataforthisstudycomesfromtheInstituteforSupplyManagement(ISM).Eachmonth,ISMconductsasurveyofpurchasingmanagersfromasampleofmanufacturingirmsintheUnitedStates

.2

Difusionindexesbasedontheresponses(describedbelow)arepublishedveryrapidly,andarecloselywatchedbymarkets.Ashighlightedin

Boketal.

(2018),notonlydoessuchsurveydataprovideimportantsignalaboutthestateofthe

economy,buttheISMdatainparticularprovidesthe“earliestavailableinformationforthenationaleconomyonanygivenquarter”.Inaddition,theISMdatahavealongtime

series,whichisconducivetotime-seriesmodeling.3

Thetimelinessandrelevanceofthedatamotivatesourexplorationofthefree-responsetext.

TheISMsurveyincludesaseriesofquestionsabouttherespondents’operations,includingtheirproductionlevels,neworders,backlog,employment,supplierdeliverytimes,inputinventories,exports,andimports.Thesequestionshaveacategoricalresponse,wherethepurchasingmanagersspecifywhetherthesemetricshaveincreased,decreased,orstayedthesamebetweenlastmonthandthecurrentmonth.Thecategoricalresponsesareaggregated

2ISMalsosurveysnon-manufacturingirmsandhospitalsseparately.

3ISMseriesextendbackto1948,butmoststatisticalanalysesusedatathatstartsin1972.

7

intopublicly-releaseddifusionindexes,discussedmorebelow.Inadditiontothecategorical

response,purchasingmanagerscanprovidefurtherexplanationinaccompanyingtextboxes.Therearefreeresponsequestionsaccompanyingnearlyeverycategoricalquestion,askingforthereasonfortheresponse.Inadditionthereisa“GeneralRemarks”ieldatthebeginning,wheretherespondentcanputanygeneralremarkstheywish.TentotwelveofthesetextresponsesarefeaturedintheISM’sdatareleasetoprovidecontextforthedifusionindexes,butotherwisearenotreleasedpublicly.

TheISMmanufacturingsurveydatesbacktothe1930s.Thedatasetweanalyzecoversirm-monthobservationsfromNovember2001toJanuary2020.Mostrecently,thesamplecoversroughly350responsespermonth.Thedark-shadedareaofFigure

1

showstheper-centageofirmsinthesamplewithtextresponsesovertime.Theigureillustratesthatthemajorityofrespondentsprovidetextinadditiontotheirquantitativesurveyanswers.TheblacklineinFigure

1

presentstheaveragewordcountoverthesampleperiod.Thewordcountsrangefrom10to33wordsonaveragepermonth.Themeanwordcountappearstoluctuateoverthebusinesscycleandjumpsdramaticallyin2018.Thesuddenincreaseinwordcountin2018ismostlyduetoheightenedtensionssurroundingtradepolicyatthetime.Indeed,afterremovingresponsesthatcontaintheword“tarif,”weobserveasmootherincreaseinwordcounts(seeFigure

A1

intheappendixforfurtherdetails).

Table

1

providesasummaryofthetextresponses.Nearly49percentofthegeneralremarkssectionscontaintext,whilethenextmostcommonsectionscontainingtextarethoserelatedtoemployment,production,andneworders.Thelastrowshowsstatisticsforallthetextieldsconcatenatedtogether:69percentofirm-monthobservationshaveanytextatall,andthetextisabout17wordslongonaverage.TheaveragewordcountishighestfortheGeneralRemarkssection,withanaverageof8wordsusedintheseresponses.Whenconsideringonlythoseresponsesthatcontaintext,theaveragewordcountfortheGeneralRemarkssectionincreasesto16words.

TurningfromISM’ssurveymicrodata,weuseseveraltimeseriesinourforecastingexer-cises.Ourfocusisonforecastingthemanufacturingindustrialproduction(IP)index.Weuserealtimedataontherighthandside,relectingwhatpolicymakersknewatthetime,andforecastthefullyrevisedseries.InadditiontoIPseries,weusetheISMdifusionindexesasregressors.Thedifusionindexesareaggregationsofthecategoricalresponsequestionsinthesurvey.Forexample,theproductiondifusionindexisaweightedaverageoftheresponsestotheproductionquestion(paraphrasing,“Isproductionhigher/thesame/lowerthanlastmonth?”),withthe“Higher”responsesgettingweight100,“Same”responsesgettingweight

8

50,and“Lower”responsesgettingweight0.Theformulaforthedifusionindexinperiodt,withNt

totalirmsrespondingisshowninequation(1):

Dt=[100·1fResponseiis“Higher”g+50·1fResponseiis“Same”g](1)

Thesedifusionindexeshavevaluesbetween0and100,with0indicatingthatallrespondentssaythingsareworseand100indicatingthatallrespondentssaythingsarebetter

.4

ISMpublishesindexesforeachquestion,aswellasa“PMIComposite”,whichisanequally-weightedaverageofthedifusionindexesforneworders,production,employment,supplierdeliveries,andinventories.

3MeasuringSentiment

OurgoalistoextractusefulinformationfromtheISMsurveytextresponses.Wefocusonsentimentanalysis:measuringtheextenttowhichthepurchasingmanagersresponseispositiveornegative.Evenfocusingonsentimentanalysis,thewiderangeofNLPtechniquesavailablecanmakeitchallengingtochooseanappropriatemethod.Inthissectionwediscussthemethodsweuse,leavingacompletedescriptionoftheapproachestotheAppendix.

3.1Dictionaries

Oneofthesimplestmethodsformeasuringsentimentisdictionary-basedanalysis,whichinvolvescountingthefrequencyofapredeterminedlistofsentimentwordsinthetext.

WeusecommonsentimentdictionariessuchastheHarvard(Tetlock,

2007)andAFINN

(Nielsen

,

2011)wordlists

.However,wealsorecognizethatcertainwordsthatmaybeconsiderednegativeinothercontextsmaynotbeconsiderednegativeinthecontextofinance,suchas“taxing”or“l(fā)iability”.Assuch,wealsoapplyinance-speciicwordlists,includingthesentimentwordlistfrom

LoughranandMcDonald

(2011)(henceforth,“LM”)

andtheinancialstabilitywordlistfrom

Correaetal.

(2021).Foralldictionaries,wescore

commentsonascaleof-1to+1,usingthepercentoftotalwordsinthecommentthatarepositivelessthepercentoftotalwordsthatarenegative.Whenwerequirediscrete

4Theresponsesare“better”,“same”,or“worse”forthenewordersquestion,production,andnewexportorders.Foremployment,inventories,prices,andimportstheresponsesare“higher”,“same”,and“l(fā)ower”.Forbacklogsthechoicesare“greater”,“same”,and“l(fā)ess”.

9

classiications,asinFigure

2,weclassifythecommentaspositiveifthescoreisgreaterthan

zero,negativeifitislessthanzero,andneutralifitequalszero.

3.2DeepLearningModels

Anotherapproachtosentimentanalysisinvolvesittingamodeltothedata.Wetryseveralvariationsonthistheme.Unlikethedictionarymethods,alloftheseapproachesrequirelabeleddata:asampleofobservationsthathavealreadybeenclassiied,whichisusedtoitthemodelandclassifytheremainingobservations.

Wecreatealabeleddatasetfromarandomlyselectedsubsampleof1,000responses

withtextfromtheindividualquestions.5

Eachresponsewasclassiiedforsentimentbytwoeconomistsusingthefollowingquestionasaguide:“Isthiscommentconsistentwithmanu-facturingIPrisingmonthovermonth?”Theclassiicationswereeitherpositive,neutral,ornegative,where“neutral”includescaseswhereisitisimpossibletodeterminethesentiment.Botheconomistsagreedonthesentimentclassiicationforroughly700cases.Thissubsam-pleisfurthersplitintoa“training”dataset,usedtoitthemodels,and“test”dataset,usedtoassesstherelativemeritsofthemodels

.6

Deeplearningmodelshavegainedpopularityinrecentyears,drivenbytheirimpres-siveperformanceonlanguage-relatedtasks.Muchoftheprogresshasoccurredwithinaparticularclassofdeeplearningmodelscalledtransformers(see,e.g.,

Devlinetal.,

2018,

Radfordetal.,

2018,

Chungetal.,

2022,

Ouyangetal.,

2022,and

Touvronetal.,

2023)

.Thedeiningfeatureoftransformers—relativetootherneuralnetworkarchitectures—isamechanismcalledattention;awaytointeractwordswithinasentence,allowingthecon-textofaparticularwordtoinluencethemeaning.Afullexplanationoftransformersandtheattentionmechanismisbeyondthescopeofthispaper,butwedoprovideabriefsum-maryintheAppendix.Theimportantpointsarethat(unlikedictionariesandbag-of-wordsapproaches)transformerstakeintoaccountinteractionsbetweenwords,wordorder,andcontext-dependentmeanings(polysemy).

Onenotabletransformermodelis“BERT”,orBidirectionalEncoderRepresentationsfromTransformers,developedby

Devlinetal.

(2018)

.ItisimportanttonotethatBERTisapre-trainedmodel:

Devlinetal.

(2018)speciiedthearchitectureandthentrainedthe

modelonacorpusincludingtheentiretyof(English)Wikipediaandanumberofbooks.

5Note,thatthecategoricalresponsescanbeconsideredakindoflabelforthecorrespondingtext.InSection

4.1

weinvestigatehowwellmodelscanpredictthecategoricalresponsefromtheassociatedtext.

6Thetestdataconsistsofobservationsfrom2018m1to2020m1andisnotusedbyanyofthemodelsduringtraining.

10

Themodelislargebythestandardsoftheeconomicsliterature,wit

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論