評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）

上傳人：1*** IP屬地：山西上傳時(shí)間：2024-12-22 格式：DOCX 頁(yè)數(shù)：23 大小：2.40MB 積分：19.9 舉報(bào) 版權(quán)申訴

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）_第2頁(yè)

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）_第3頁(yè)

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）_第4頁(yè)

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）_第5頁(yè)

已閱讀5頁(yè)，還剩18頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Assessing

OpportunitiesforLLMs

inSoftware

EngineeringandAcquisition

Authors

StephanyBellomo

ShenZhangJamesIversJulieCohenIpekOzkaya

NOVEMBER2023

LARGELANGUAGEMODELS(LLMS)AREGENERATIVEARTIFICIALINTELLIGENCE(AI)MODELSthathave

beentrainedonmassivecorpusesoftextdataand

canbepromptedtogeneratenew,plausiblecontent.LLMsareseeingrapidadvances,andtheypromisetoimproveproductivityinmanyfields.OpenAI’sGPT-

41andGoogle’sLaMDA2aretheunderlyingLLMs

ofserviceslikeChatGPT3,CoPilot4,andBard5.Theseservicescanperformarangeoftasks,including

generatinghuman-liketextresponsestoquestions,summarizingartifacts,andgeneratingworkingcode.Thesemodelsandservicesarethefocusofextensiveresearcheffortsacrossindustry,government,and

academiatoimprovetheircapabilitiesandrelevance,andorganizationsinmanydomainsarerigorously

exploringtheirusetouncoverpotentialapplications.

TheideaofharnessingLLMstoenhancetheefficiencyof

softwareengineeringandacquisitionactivitiesholdsspecialallurefororganizationswithlargesoftwareoperations,suchastheDepartmentofDefense(DoD),asdoingsooffersthepromiseofsubstantialresourceoptimization.PotentialusecasesforLLMsareplentiful,butknowinghowtoassessthebenefitsandrisksassociatedwiththeiruseisnontrivial.

Notably,togainaccesstothelatestadvances,organizationsmayneedtoshareproprietarydata(e.g.,sourcecode)withserviceproviders.UnderstandingsuchimplicationsiscentraltointentionalandresponsibleuseofLLMs,especiallyfor

organizationsmanagingsensitiveinformation.

Inthisdocument,weexaminehowdecisionmakers,suchastechnicalleadsandprogrammanagers,canassessthefitnessofLLMstoaddresssoftwareengineeringandacquisition

needs

[Ozkaya2023]

.Wefirstintroduceexemplarscenariosinsoftwareengineeringandsoftwareacquisitionand

identifycommonarchetypes.Wedescribecommonconcerns

involvingtheuseofLLMsandenumeratetacticsformitigatingthoseconcerns.Usingthesecommonconcernsandtactics,

wedemonstratehowdecisionmakerscanassessthefitnessofLLMsfortheirownusecasesthroughtwoexamples.

CapabilitiesofLLMs,risksconcerningtheiruse,andour

collectiveunderstandingofemergingservicesandmodelsareevolvingrapidly

[Brundageetal.2022]

.Whilethisdocumentisnotmeanttobecomprehensiveincoveringallsoftware

engineeringandacquisitionusecases,theirconcerns,andmitigationtactics,itdemonstratesanapproachthatdecisionmakerscanusetothinkthroughtheirownLLMusecasesasthisspaceevolves.

/research/gpt-4

https://blog.google/technology/ai/lamda/

/features/copilot

WhatIsanLLM?

AnLLMisadeepneuralnetworkmodeltrainedonanextensivecorpusofdiversedocuments(e.g.,websitesandbooks)to

learnlanguagepatterns,grammarrules,factsandevensomereasoningabilities

[Wolfram2023]

.LLMscangenerateresponsestoinputs(“prompts”)byiterativelydeterminingthenextwordorphraseappearingafterothersbasedonthepromptand

patternsandassociationslearnedfromtheirtrainingcorpususingprobabilisticandrandomizedselection

[Whiteetal.

2023]

.ThiscapabilityallowsLLMstogeneratehuman-liketextthatcanbesurprisinglycoherentandcontextuallyrelevant,eveniftheymaynotalwaysbesemanticallycorrect.

WhileLLMscanperformcomplextasksusingtheirtrainedknowledge,theylacktrueunderstanding.Rather,theyare

sophisticatedpatternmatchingtools.Moreover,duetotheirprobabilisticreasoning,theycangenerateinaccurateresults(oftenreferredtoas“hallucinations”),suchascitationsto

non-existentreferencesormethodcallstononexistent

applicationprogramminginterfaces(APIs).WhileLLMscanperformanalysisandinferencingonnewdatatheyhave

beenpromptedwith,dataonwhichLLMshavebeentrainedcanlimittheiraccuracy.However,thetechnologyisrapidlyadvancingwithnewmodelshavingincreasingcomplexity

andparameters,andbenchmarkshavealreadyemergedforcomparingtheirperformance

[Imsys2023]

.Inaddition,LLMserviceprovidersareworkingonwaystousemorerecentdata

[D’Cruze2023]

.Despitetheselimitations,thereare

productiveusesofLLMstoday.

ChoosinganLLM

TherearealreadydozensofLLMsandservicesbuiltusingLLMs,andmoreemergeeveryday.Thesemodelsvaryinmanydimensions,fromtechnicaltocontractual,andthe

detailsofthesedifferencescanbedifficulttokeepstraight.ThefollowingdistinctionsareagoodstartingpointwhenchoosinganLLMforuse.

ModelorService.ChatGPTisachatbotbuiltonOpenAI’sGPTfamilyofLLMs

[OpenAI2023]

.Thedifferenceisimportant,

asservicesbuiltonLLMscanaddadditionalcapabilities(e.g.,

specializedchatbotfeatures,specializedtrainingbeyondthe

coreLLM,ornon-LLMfeaturesthatcanimproveresultsfromanLLM).AservicelikeChatGPTistypicallyhostedbyaserviceprovider,meaningthatitmanagesthecomputingresources

(andassociatedcosts)andthatusersaretypicallyrequired

tosendtheirprompts(andpotentiallysensitivedata)tothe

serviceprovidertousetheservice.Amodel,likeMeta’sLlama26,canbefine-tunedwithdomain-ororganization-specificdatatoimproveaccuracy,butittypicallylackstheaddedfeatures

andresourcesofacommerciallysupportedservice.

/llama/

GeneralorSpecialized.LLMsarepre-trainedonacorpus,andthecompositionofthatcorpusisasignificantfactoraffectinganLLM’sperformance.GeneralLLMsaretrainedontextsourceslikeWikipediathatareavailabletothepublic.SpecializedLLMsfinetunethosemodelsbyaddingtrainingmaterialfromspecificdomainslikehealthcareandfinance

[Zhouetal.2022;Wuet

al.2023]

.LLMslikeCodeGen7havebeenspecializedwithlargecorpusesofsourcecodeforuseinsoftwareengineering.

OpenSourceorProprietary.OpensourceLLMsprovidea

platformforresearchersanddeveloperstofreelyaccess,use,andevencontributetothemodel’sdevelopment.ProprietaryLLMsaresubjecttovaryingrestrictionsonuse,makingthemlessopentoexperimentationorpotentialdeployment.

Someproviders(e.g.,Meta)usealicensethatislargely,butnotcompletely,open

[Hull2023]

.OpenAIoffersadifferent

compromise:WhiletheGPTseriesofLLMsisnotopen

source,OpenAIdoespermitfinetuning(forafee)asameansofspecializationandlimitedexperimentationwiththeir

proprietarymodel.

ThefieldofLLMsisafast-movingspace.Moreover,theethicsandregulationssurroundingtheirusearealsoinastateofflux,associetygrappleswiththechallengesandopportunitiesthesepowerfulmodelspresent.

KeepingapprisedofthesedevelopmentsiscrucialfortakingadvantageofthepotentialofferedbyLLMs.

/salesforce/CodeGen

UseCases

TheabilityofLLMstogenerateplausiblecontentfortextandcodeapplicationshassparkedtheimaginationsofmany.

Arecentliteraturereviewexamines229researchpapers

writtensince2017ontheapplicationofLLMstosoftware

engineeringproblems

[Houetal.2023]

.Applicationareasspanrequirements,design,development,testing,maintenance,andmanagementactivities,withdevelopmentandtestingbeingthemostcommon.

Ourteam,whichworkswithgovernmentorganizations

daily,tookabroaderperspectiveandbrainstormedseveraldozenideasforusingLLMsincommonsoftwareengineeringandacquisitionactivities(see

Table1

forexamples).Two

importantobservationsquicklyemergedfromthisactivity.

First,mostusecasesrepresenthuman-AIpartnershipsin

whichanLLMorLLM-basedservicecouldbeusedtohelp

humans(asopposedtoreplacehumans)completetasks

morequickly.Second,decidingwhichusecaseswouldbe

mostfeasible,beneficial,oraffordableisnotatrivialdecisionforthosejustgettingstartedwithLLMs.

Table1:SampleAcquisitionandSoftwareEngineeringUseCases

ACQUISITIONUSECASES

SOFTWAREENGINEERINGUSECASES

A1.AnewacquisitionspecialistusesanLLMtogeneratean

overviewofrelevantfederalregulationsforanupcomingrequestforproposal(RFP)review,expectingthesummarytosavetimeinbackgroundreading.

SE1.AdeveloperusesanLLMtofindvulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

A2.AchiefengineerusesanLLMtogenerateacomparisonofalternativesfrommultipleproposals,expectingittousethebudgetandscheduleformulasfromprevioussimilarproposalreviewsandgenerateaccurateitemizedcomparisons.

SE2.AdeveloperusesanLLMtogeneratecodethatparses

structuredinputfilesandperformsspecifiednumericalanalysisonitsinputs,expectingittogeneratecodewiththedesired

capabilities.

A3.AcontractspecialistusesanLLMtogenerateideasfora

requestforinformation(RFI)solicitationgivenasetofconcernsandvagueproblemdescription,expectingittogenerateadraftRFIthatisatleast75%alignedwiththeirneeds.

SE3.AtesterusesanLLMtocreatefunctionaltestcases,

expectingittoproduceasetoftexttestcasesfromaprovidedrequirementsdocument.

A4.ACTOusesanLLMtocreateareportsummarizingallusesofdigitalengineeringtechnologiesintheorganizationbased

oninternaldocuments,expectingitcanquicklyproduceaclearsummarythatisatleast90%correct.

SE4.AdeveloperusesanLLMtogeneratesoftware

documentationfromcodetobemaintained,expectingittosummarizeitsfunctionalityandinterface.

A5.AprogramofficeleadusesanLLMtoevaluateacontractor’scodedeliveryforcompliancewithrequireddesignpatterns,

expectingthatitwillidentifyanyinstancesinwhichthecodefailstouserequiredpatterns.

SE5.AsoftwareengineerwhoisunfamiliarwithSQLusesan

LLMtogenerateanSQLqueryfromanaturallanguage

description,expectingittogenerateacorrectquerythatcanbetestedimmediately.

A6.AprogrammanagerusesanLLMtosummarizeasetof

historicalartifactsfromthepastsixmonthsinpreparationforahigh-visibilityprogramreviewandprovidesspecificretrievalcriteria(e.g.,deliverytempo,statusofopendefects,and

schedule),expectingittogenerateanaccuratesummaryofprogramstatusthatcomplieswiththeretrievalcriteria.

SE6.AsoftwarearchitectusesanLLMtovalidatewhethercodethatisreadyfordeploymentisconsistentwiththesystem’s

architecture,expectingthatitwillreliablycatchdeviationsfromtheintendedarchitecture.

A7.AprogrammanagerusesanLLMtogenerateareviseddraftofastatementofwork,givenashortstartingdescriptionand

alistofconcerns(e.g.,cybersecurity,softwaredeliverytempo,andinteroperabilitygoals).Theprogrammanagerexpectsittogenerateastructurethatcanbequicklyrefinedandthat

includestopicsdrawnfrombestpracticestheymaynotthinktorequestexplicitly.

SE7.AdeveloperusesanLLMtotranslateseveralclassesfrom

C++toRust,expectingthatthetranslatedcodewillpassthesametestsandbemoresecureandmemorysafe.

A8.ArequirementsengineerusesanLLMtogeneratedraft

requirementsstatementsforaprogramupgradebasedonpastsimilarcapabilities,expectingthemtobeagoodstartingpoint.

SE8.AdeveloperusesanLLMtogeneratesynthetictestdataforanewfeaturebeingdeveloped,expectingthatitwillquicklygeneratesyntacticallycorrectandrepresentativedata.

A9.Acontractofficerisseekingfundingtoconductresearchonahigh-prioritytopictheyarenotfamiliarwith.ThecontractofficerusesanLLMtocreateexampleprojectdescriptionsfortheir

context,expectingittoproducereasonabledescriptions.

SE9.AdeveloperprovidesanLLMwithcodethatisfailingin

productionandadescriptionofthefailures,expectingittohelpthedeveloperdiagnosetherootcauseandproposeafix.

Archetypes

Commonalitiesamongtheusecaseslendthemselvesto

abstractingthesetintoamanageablenumberofarchetypes.Twodimensionsarehelpfulinthisregard:thenatureof

theactivityanLLMisperformingandthenatureofthedatathattheLLMisactingon.Takingthecross-productofthesedimensions,theseusecasesfallintothearchetypesdepictedin

Table2

Table2:UseCaseArchetypes

ACTIVITYTYPE

DATATYPE

Text

Code

Model

Images

Retrieve

Information

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

generate-code

generate-model

generate-images

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

analyze-code

analyze-model

analyze-images

Matchingaspecificusetoanarchetypehelpsidentify

commonconcernsamongsimilarusecasesandknownsolutionscommonlyappliedforsimilarusecases.

Archetypescanbeatoolthatorganizationsusetogroupsuccesses,gaps,andlessonslearnedinastructuredway.

ActivityTypecapturesdifferencesinassociationsthatanLLMwouldneedtomaketosupportausecase,withsomeaskinganLLMtodothingsthatalanguagemodelwasnotdesignedtodo:

?RetrieveInformationasksanLLMtoconstructaresponsetoaquestion(e.g.,what’stheObserverpattern?)forwhichaknownanswerislikelyfoundinthetrainingcorpus,directlyoracrossrelatedelements.

?GenerateArtifactasksanLLMtocreateanewartifact(e.g.,asummaryofatopicoraPythonscriptthatperformsastatisticalanalysis)thatlikelybearssimilaritywithexistingexamplesinthecorpus.

?ModifyArtifactasksanLLMtomodifyanexistingartifact

toimproveitinsomeway(e.g.,translatePythoncodetoJavaorremoveadescribedbug)thatresemblesanalogousimprovementsamongartifactsinthetrainingcorpus.

?AnalyzeArtifactasksanLLMtodrawaconclusionaboutprovidedinformation(e.g.,whatvulnerabilitiesareinthiscodeorwillthisarchitecturescaleadequately?)thatlikelyrequiressemanticreasoningaboutdata.

DataTypecapturesdifferencesinthekindofdatathatanLLMoperatesonorgenerates,suchasthedifferencesinsemanticrulesthatmakedata(e.g.,code)well-formed:

?Textinputsvarywidelyinformalityandstructure(e.g.,

informalchatversusstructuredtextcapturedintemplates).

?Codeistextwithformalrulesforstructureandsemantics,andagrowingnumberofLLMsarebeingspecializedtotakeadvantageofthisstructureandsemantics.

?Modelsareabstractions(e.g.,fromsoftwaredesignor

architecture)thatoftenusesimpleterms(e.g.,publisher)thatimplydeepsemantics.

?Imagesareusedtocommunicatemanysoftwareartifacts(e.g.,classdiagrams)andoftenemployvisualconventions

that,muchlikemodels,implyspecificsemantics.WhileLLMsoperateontext,multimodalLLMs(e.g.,GPT-4)aregrowingintheirabilitytoingestandgenerateimagedata.

Figure1

showsanexampleofusingthearchetypesto

generateideasforLLMusecasesinaparticulardomain.

Thisexamplefocusesonindependentverificationand

validation(IV&V),aresource-intensiveactivitywithintheDoDthatinvolvesmanydifferentactivitiesthatmightbenefit

fromtheuseofLLMs.MorecomplexusecasesforIV&V

couldalsobegeneratedthatinvolveintegrationofmultiplearchetypesintoalargerworkflow.

ACTIVITYTYPE

Text

Code

DATATYPE

Model

Images

RetrieveInformation

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

generate-code

generate-model

generate-images

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

analyze-code

analyze-model

analyze-images

AnIV&VevaluatorusesanLLMtoanalyzesoftwaredesigndocumentsagainsta

specificsetofcertificationcriteriaandto

generateacertificationreport,expectingittodescribecertificationviolationsthattheywillreviewtoconfirm.

AdeveloperusesanLLMtocreatea

networkviewforauthorizationtooperate(ATO)certificationfromadescriptionofthearchitecture,expectingittoproducearoughnetworkdiagramtheycanrefine.

Figure1:UsingArchetypestoHelpBrainstormPotentialUseCases

AtesterusesanLLMtocreateintegrationtestdescriptionsfromasetofAPIsand

integrationscenarios,expectingitto

produceasetoftestcasedescriptionsthatcanbeusedtoimplementtests.

AnIV&VevaluatorusesanLLMtocreateaverificationchecklistfromasetof

certificationregulationsandasystem

description,expectingittoproducea

context-sensitivechecklisttheycantailor.

AdeveloperusesanLLMtofind

vulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

AnewdeveloperusesanLLMasapair

programmertowritecode,expectingittohelpcreatevulnerability-freecode.

Mistakeshave

smallconsequences

Mistakesarehardforusersto?nd

SE1A8

SE8

SE4

SE3

SE5

SE7

SE6

SE9SE2

Mistakeshave

largeconsequences

Mistakesareeasyforusersto?nd

Figure2:TwoWaystoLookatConcernswiththeGenerationofIncorrectResults(A:AcquisitionUseCases,SE:SoftwareEngineeringUseCases

[Table1

])

ConcernsandHowtoAddressThem

RecognizingconcernsaroundapplicationsofLLMsto

softwareengineeringandacquisition,anddecidinghowto

addresseach,willhelpdecisionmakersmakemoreinformedchoices.TherearemultipleperspectivesoneshouldconsiderbeforegoingforwardwithanLLMusecase.Animportant

realityisthattheresultsgeneratedbyLLMsareinfact

sometimeswrong.

Figure2

illustratesthisperspectivebasedontwoquestions:

?Howsignificantwoulditbetoactonanincorrectresultinagivenusecase?

?HoweasywoulditbeforauserintheusecasetorecognizethataresultfromanLLMisincorrect?

Thisfigureshowsanotionalplacementoftheusecasesfrom

Table1

(actualplacementwouldbereliantonrefinement

oftheseusecases).Thegreenquadrantisidealfromthis

perspective:Mistakesarenotparticularlyconsequentialandrelativelyeasytospot.UsecasesinthisquadrantcanbeagreatplacefororganizationstostartLLMexperimentation.Theredquadrant,ontheotherhand,representstheleastfavorablecasesforLLMuse:Mistakescreaterealproblemsandarehardforuserstorecognize.

Theconsequencesofmistakesandeaseofspottingthemisonlyoneperspectiveofevaluation.Anotherperspectiveis

theexpectedsignificanceofimprovementsorefficienciesachievablewithLLMs.Amongmanyconcerns,wediscussfivecategoriesinfurtherdetailinthisdocument—correctness,disclosure,usability,performance,andtrust—astheyare

relevanttoallusecases.

Correctness:Thesignificanceofcorrectnessasaconcern

dependsonfactorssuchashowtheresultswillbeused,thesafeguardsusedinworkflows,andtheexpertiseofusers.

Correctnessreferstotheoverallaccuracyandprecisionof

outputrelativetosomeknowntruthorexpectation.Accuracy

hingesgreatlyonwhetheranLLMwastrainedorfine-tunedwithdatathatissufficientlyrepresentativetosupportthe

specificusecase.Evenwithrichtrainingcorpuses,some

inaccuracycanbeexpected

[Ouyangetal.2023]

.Forexample,arecentstudyoncodetranslationfoundGPT-4toperform

betterthanotherLLMs,eventhoughmorethan80%of

translationsonapairofopensourceprojectscontainedsomeerrors.Advancesarelikelytoimprove,butnoteliminate,

thesenumbers

[Panetal.2023]

Disclosure:WhenusersinteractwithLLMs,someusecases

mayrequiredisclosingproprietaryorsensitiveinformationtoaserviceprovidertocompleteatask(e.g.,sharingsourcecodetohelpdebugit).Thedisclosureconcernisthereforerelatedtotheamountofproprietaryinformationthatmustbeexposedduringuse.Ifusersshareconfidentialdata,tradesecrets,or

personalinformation,thereisariskthatsuchdatacouldbestored,misused,oraccessedbyunauthorizedindividuals.Moreover,itmightbecomepartofthetrainingdatacorpusanddisseminatedwithoutusershavinganymeanstotrackitsorigin.Forexample,GSACIOIL-23-01(theU.S.GeneralServicesAdministrationinstructionalletterSecurityPolicy

forGenerativeArtificialIntelligence[AI]LargeLanguageModels[LLMs])bansdisclosureoffederalnonpublicinformationasinputsinpromptstothird-partyLLMendpoints

[GSA2023]

Usability:LLMusershavevastlydifferentbackgrounds,

expectations,andtechnicalabilities.Usabilitycaptures

theabilityofLLMuserswithdifferentexpertisetocomplete

tasks.Usersmayneedexpertiseonboththeinput(craftingappropriateprompts)andoutput(judgingthecorrectnessofresults)sidesofLLMuse

[Zamfirescu-Pereiraetal.2023]

.Thesignificanceofusabilityasaconcerndependsonthe

degreetowhichgettingtoacceptableresultsissensitivetotheexpertiseofusers.Astudycompletedwithdevelopers’earlyexperiencesusingCoPilotreflectsthatthereisashiftfromwritingcodetounderstandingcodewhenusingLLMsoncodingtasks

[Birdetal.2023]

.Thisobservationhintsattheneedfordifferentusabilitytechniquesforinteractionmechanisms,aswellastheneedtoaccountforexpertise.

Performance:WhileusinganLLMrequiresmuchless

computingpowerthantraininganLLM,responsiveness

canstillbeafactorinLLMuse,especiallyifsophisticated

promptingapproachesareincorporatedintoanLLM-

basedservice.Forthepurposesofconcernsrelatedtousecases,performanceexpressesthetimerequiredtoarriveatanappropriateresponse.Modelsize,underlyingcompute

power,andwherethemodelrunsandisaccessedfromareamongthefactorsthatinfluenceresponsiveness

[Patterson

etal.2022]

.ServicesbuiltonLLMsmayintroduceadditionalperformanceoverheadduetothewayinwhichother

capabilitiesareintegratedwiththeLLM.

Trust:Toemploythetechnologywiththerequisitelevel

oftrust,usersmustgraspthelimitationsofLLMs.Trust

reflectstheuser’sconfidenceintheoutput.Overrelianceon

anLLMwithoutunderstandingitspotentialforerroror

biascanleadtoundesirableconsequences

[Rastogietal.

2023]

.Asaresult,severalotherconcerns(e.g.,explainability,bias,privacy,security,andethics)areoftenconsideredin

relationshiptotrust

[Schwartzetal.2023]

.Forexample,theDoDpublishedethicalAIprinciplestoadvancetrustworthyAIsystems

[DoD2020]

Howsignificanttheseandotherconcernsareforeachuse

casewillvarybycontextanduse.Thequestionsprovided

Table3

canhelporganizationsassesshowrelevanteachconcernisforaspecificusecase.AstartingpointcouldbetocategorizethesignificanceofeachconcernasHigh,Medium,orLow.Thisinformationcanhelporganizationsdecide

whetheranLLMisfitforpurposeandwhatconcernsneedtobemitigatedtoavoidunacceptableoutcomes.

Table3:ExampleQuestionstoHelpDeterminetheSignificanceofCommonConcernsforaSpecificUseCase

CONCERN

SIGNIFICANCEQUESTIONS

Correctness

?Whatistheriskorimpactofusinganincorrectresultintheusecase?

?Howdifficultisitfortheexpectedusertodeterminewhetheraresultiscorrect?

?Aretheregapsinthedatausedtotrainthe

LLMthatcouldadverselyimpactresults(e.g.,thedataisnotcurrentwithrecenttechnologyreleasesorcontainslittledataforanesotericprogramminglanguage)?

Disclosure

?CananLLMbepromptedwithoutdisclosingproprietaryinformation(e.g.,usinggenericquestionsorabstractingproprietarydetails)?

?Whatistheriskorimpactofathirdpartybeingabletoobserveyourprompts?

?Arethereexistingdatadisclosureconstraintsthatstrictlyneedtobeobserved?

Usability

?HowadeptareexpectedusersatpromptinganLLM?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareinaccurate?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareincomplete?

Performance

?Howquicklymustauserormachinebeabletoactonaresult?

?Aretheresignificantcomputingresourcelimitations?

?ArethereintermediatestepsintheinteractionwiththeLLMthatmayaffectend-to-end

performance?

Trust

?Areyourexpecteduserspredisposedto

acceptgeneratedresults(automationbias)orrejectthem?

?IsthedatatheLLMwastrainedonfreeofbiasandethicalconcerns?

?HastheLLMbeentrainedondatathatisappropriateforuse?

Thesecommonconcerns,andquestionstodetermine

theirsignificance,enableidentificationofcommontacticsforaddressingeachconcern.Atacticisacourseofactionthatcanbetakentoreducetheoccurrenceorimpactofaconcern.

Table4

summarizesacollectionoftacticsthatcanhelpmitigateeachconcern,alongwitharoughestimate

(High[H],Medium[M],orLow[L])oftherelativepotentialcostofusingeachtactic.Typically,themoreresources

(humanandcomputation)atacticrequires,thehigherthe

cost.Forexample,promptengineeringandmodeltraining

bothaddresscorrectness,butpromptengineeringistypicallymuchlessexpensive.Ofnote,sometactics(purplerows)

focusontechnicalinterventions,others(greenrows)focusonhuman-centeredactions,andtherest(grayrows)couldemploytechnicalorhuman-centeredinterventions.

Table4:TacticsThatCanBeUsedtoAddressCommonConcernswithLLMUse

CONCERN

TACTIC

DESCRIPTION

COST

Correctness

Promptengineering

Educateusersonpromptengineeringtechniquesandpatternstogeneratebetterresults.

Validatemanually

Dedicatetimetoallowuserstocarefullyvalidateinterimandfinalresults.

Adjustsettings

Changesettingsofexposedmodelparametersliketemperature

(randomnessofthemodel’soutput)andthemaximumnumberoftokens.

Adoptnewermodel

Usenewermodelsthatintegratetechnicaladvancesorimprovedtrainingcorpusesthatcanproducebetterresults.

Finetunemodel

Tailorapretrainedmodelusingorganization-ordomain-specificdatatoimproveresults.

Trainnewmodel

Useacustomtrainingcorpusorproprietarydatatotrainanewmodel.

Disclosure

Opendisclosurepolicy

Establishapolicythatallowsuserstoshareasmuchdeta

人人文庫(kù)> 全部分類> 應(yīng)用文書 > 研究報(bào)告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)（英文版）

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔