評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)(英文版)_第1頁(yè)
評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)(英文版)_第2頁(yè)
評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)(英文版)_第4頁(yè)
評(píng)估LLM在軟件工程和采購(gòu)中的機(jī)會(huì)(英文版)_第5頁(yè)
已閱讀5頁(yè),還剩18頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Assessing

OpportunitiesforLLMs

inSoftware

EngineeringandAcquisition

Authors

StephanyBellomo

ShenZhangJamesIversJulieCohenIpekOzkaya

NOVEMBER2023

2

LARGELANGUAGEMODELS(LLMS)AREGENERATIVEARTIFICIALINTELLIGENCE(AI)MODELSthathave

beentrainedonmassivecorpusesoftextdataand

canbepromptedtogeneratenew,plausiblecontent.LLMsareseeingrapidadvances,andtheypromisetoimproveproductivityinmanyfields.OpenAI’sGPT-

41andGoogle’sLaMDA2aretheunderlyingLLMs

ofserviceslikeChatGPT3,CoPilot4,andBard5.Theseservicescanperformarangeoftasks,including

generatinghuman-liketextresponsestoquestions,summarizingartifacts,andgeneratingworkingcode.Thesemodelsandservicesarethefocusofextensiveresearcheffortsacrossindustry,government,and

academiatoimprovetheircapabilitiesandrelevance,andorganizationsinmanydomainsarerigorously

exploringtheirusetouncoverpotentialapplications.

TheideaofharnessingLLMstoenhancetheefficiencyof

softwareengineeringandacquisitionactivitiesholdsspecialallurefororganizationswithlargesoftwareoperations,suchastheDepartmentofDefense(DoD),asdoingsooffersthepromiseofsubstantialresourceoptimization.PotentialusecasesforLLMsareplentiful,butknowinghowtoassessthebenefitsandrisksassociatedwiththeiruseisnontrivial.

Notably,togainaccesstothelatestadvances,organizationsmayneedtoshareproprietarydata(e.g.,sourcecode)withserviceproviders.UnderstandingsuchimplicationsiscentraltointentionalandresponsibleuseofLLMs,especiallyfor

organizationsmanagingsensitiveinformation.

Inthisdocument,weexaminehowdecisionmakers,suchastechnicalleadsandprogrammanagers,canassessthefitnessofLLMstoaddresssoftwareengineeringandacquisition

needs

[Ozkaya2023]

.Wefirstintroduceexemplarscenariosinsoftwareengineeringandsoftwareacquisitionand

identifycommonarchetypes.Wedescribecommonconcerns

involvingtheuseofLLMsandenumeratetacticsformitigatingthoseconcerns.Usingthesecommonconcernsandtactics,

wedemonstratehowdecisionmakerscanassessthefitnessofLLMsfortheirownusecasesthroughtwoexamples.

CapabilitiesofLLMs,risksconcerningtheiruse,andour

collectiveunderstandingofemergingservicesandmodelsareevolvingrapidly

[Brundageetal.2022]

.Whilethisdocumentisnotmeanttobecomprehensiveincoveringallsoftware

engineeringandacquisitionusecases,theirconcerns,andmitigationtactics,itdemonstratesanapproachthatdecisionmakerscanusetothinkthroughtheirownLLMusecasesasthisspaceevolves.

1

/research/gpt-4

2

https://blog.google/technology/ai/lamda/

3

4

/features/copilot

5

WhatIsanLLM?

AnLLMisadeepneuralnetworkmodeltrainedonanextensivecorpusofdiversedocuments(e.g.,websitesandbooks)to

learnlanguagepatterns,grammarrules,factsandevensomereasoningabilities

[Wolfram2023]

.LLMscangenerateresponsestoinputs(“prompts”)byiterativelydeterminingthenextwordorphraseappearingafterothersbasedonthepromptand

patternsandassociationslearnedfromtheirtrainingcorpususingprobabilisticandrandomizedselection

[Whiteetal.

2023]

.ThiscapabilityallowsLLMstogeneratehuman-liketextthatcanbesurprisinglycoherentandcontextuallyrelevant,eveniftheymaynotalwaysbesemanticallycorrect.

WhileLLMscanperformcomplextasksusingtheirtrainedknowledge,theylacktrueunderstanding.Rather,theyare

sophisticatedpatternmatchingtools.Moreover,duetotheirprobabilisticreasoning,theycangenerateinaccurateresults(oftenreferredtoas“hallucinations”),suchascitationsto

non-existentreferencesormethodcallstononexistent

applicationprogramminginterfaces(APIs).WhileLLMscanperformanalysisandinferencingonnewdatatheyhave

beenpromptedwith,dataonwhichLLMshavebeentrainedcanlimittheiraccuracy.However,thetechnologyisrapidlyadvancingwithnewmodelshavingincreasingcomplexity

andparameters,andbenchmarkshavealreadyemergedforcomparingtheirperformance

[Imsys2023]

.Inaddition,LLMserviceprovidersareworkingonwaystousemorerecentdata

[D’Cruze2023]

.Despitetheselimitations,thereare

productiveusesofLLMstoday.

ChoosinganLLM

TherearealreadydozensofLLMsandservicesbuiltusingLLMs,andmoreemergeeveryday.Thesemodelsvaryinmanydimensions,fromtechnicaltocontractual,andthe

detailsofthesedifferencescanbedifficulttokeepstraight.ThefollowingdistinctionsareagoodstartingpointwhenchoosinganLLMforuse.

ModelorService.ChatGPTisachatbotbuiltonOpenAI’sGPTfamilyofLLMs

[OpenAI2023]

.Thedifferenceisimportant,

asservicesbuiltonLLMscanaddadditionalcapabilities(e.g.,

specializedchatbotfeatures,specializedtrainingbeyondthe

coreLLM,ornon-LLMfeaturesthatcanimproveresultsfromanLLM).AservicelikeChatGPTistypicallyhostedbyaserviceprovider,meaningthatitmanagesthecomputingresources

(andassociatedcosts)andthatusersaretypicallyrequired

tosendtheirprompts(andpotentiallysensitivedata)tothe

serviceprovidertousetheservice.Amodel,likeMeta’sLlama26,canbefine-tunedwithdomain-ororganization-specificdatatoimproveaccuracy,butittypicallylackstheaddedfeatures

andresourcesofacommerciallysupportedservice.

6

/llama/

3

GeneralorSpecialized.LLMsarepre-trainedonacorpus,andthecompositionofthatcorpusisasignificantfactoraffectinganLLM’sperformance.GeneralLLMsaretrainedontextsourceslikeWikipediathatareavailabletothepublic.SpecializedLLMsfinetunethosemodelsbyaddingtrainingmaterialfromspecificdomainslikehealthcareandfinance

[Zhouetal.2022;Wuet

al.2023]

.LLMslikeCodeGen7havebeenspecializedwithlargecorpusesofsourcecodeforuseinsoftwareengineering.

OpenSourceorProprietary.OpensourceLLMsprovidea

platformforresearchersanddeveloperstofreelyaccess,use,andevencontributetothemodel’sdevelopment.ProprietaryLLMsaresubjecttovaryingrestrictionsonuse,makingthemlessopentoexperimentationorpotentialdeployment.

Someproviders(e.g.,Meta)usealicensethatislargely,butnotcompletely,open

[Hull2023]

.OpenAIoffersadifferent

compromise:WhiletheGPTseriesofLLMsisnotopen

source,OpenAIdoespermitfinetuning(forafee)asameansofspecializationandlimitedexperimentationwiththeir

proprietarymodel.

ThefieldofLLMsisafast-movingspace.Moreover,theethicsandregulationssurroundingtheirusearealsoinastateofflux,associetygrappleswiththechallengesandopportunitiesthesepowerfulmodelspresent.

KeepingapprisedofthesedevelopmentsiscrucialfortakingadvantageofthepotentialofferedbyLLMs.

7

/salesforce/CodeGen

UseCases

TheabilityofLLMstogenerateplausiblecontentfortextandcodeapplicationshassparkedtheimaginationsofmany.

Arecentliteraturereviewexamines229researchpapers

writtensince2017ontheapplicationofLLMstosoftware

engineeringproblems

[Houetal.2023]

.Applicationareasspanrequirements,design,development,testing,maintenance,andmanagementactivities,withdevelopmentandtestingbeingthemostcommon.

Ourteam,whichworkswithgovernmentorganizations

daily,tookabroaderperspectiveandbrainstormedseveraldozenideasforusingLLMsincommonsoftwareengineeringandacquisitionactivities(see

Table1

forexamples).Two

importantobservationsquicklyemergedfromthisactivity.

First,mostusecasesrepresenthuman-AIpartnershipsin

whichanLLMorLLM-basedservicecouldbeusedtohelp

humans(asopposedtoreplacehumans)completetasks

morequickly.Second,decidingwhichusecaseswouldbe

mostfeasible,beneficial,oraffordableisnotatrivialdecisionforthosejustgettingstartedwithLLMs.

4

Table1:SampleAcquisitionandSoftwareEngineeringUseCases

ACQUISITIONUSECASES

SOFTWAREENGINEERINGUSECASES

A1.AnewacquisitionspecialistusesanLLMtogeneratean

overviewofrelevantfederalregulationsforanupcomingrequestforproposal(RFP)review,expectingthesummarytosavetimeinbackgroundreading.

SE1.AdeveloperusesanLLMtofindvulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

A2.AchiefengineerusesanLLMtogenerateacomparisonofalternativesfrommultipleproposals,expectingittousethebudgetandscheduleformulasfromprevioussimilarproposalreviewsandgenerateaccurateitemizedcomparisons.

SE2.AdeveloperusesanLLMtogeneratecodethatparses

structuredinputfilesandperformsspecifiednumericalanalysisonitsinputs,expectingittogeneratecodewiththedesired

capabilities.

A3.AcontractspecialistusesanLLMtogenerateideasfora

requestforinformation(RFI)solicitationgivenasetofconcernsandvagueproblemdescription,expectingittogenerateadraftRFIthatisatleast75%alignedwiththeirneeds.

SE3.AtesterusesanLLMtocreatefunctionaltestcases,

expectingittoproduceasetoftexttestcasesfromaprovidedrequirementsdocument.

A4.ACTOusesanLLMtocreateareportsummarizingallusesofdigitalengineeringtechnologiesintheorganizationbased

oninternaldocuments,expectingitcanquicklyproduceaclearsummarythatisatleast90%correct.

SE4.AdeveloperusesanLLMtogeneratesoftware

documentationfromcodetobemaintained,expectingittosummarizeitsfunctionalityandinterface.

A5.AprogramofficeleadusesanLLMtoevaluateacontractor’scodedeliveryforcompliancewithrequireddesignpatterns,

expectingthatitwillidentifyanyinstancesinwhichthecodefailstouserequiredpatterns.

SE5.AsoftwareengineerwhoisunfamiliarwithSQLusesan

LLMtogenerateanSQLqueryfromanaturallanguage

description,expectingittogenerateacorrectquerythatcanbetestedimmediately.

A6.AprogrammanagerusesanLLMtosummarizeasetof

historicalartifactsfromthepastsixmonthsinpreparationforahigh-visibilityprogramreviewandprovidesspecificretrievalcriteria(e.g.,deliverytempo,statusofopendefects,and

schedule),expectingittogenerateanaccuratesummaryofprogramstatusthatcomplieswiththeretrievalcriteria.

SE6.AsoftwarearchitectusesanLLMtovalidatewhethercodethatisreadyfordeploymentisconsistentwiththesystem’s

architecture,expectingthatitwillreliablycatchdeviationsfromtheintendedarchitecture.

A7.AprogrammanagerusesanLLMtogenerateareviseddraftofastatementofwork,givenashortstartingdescriptionand

alistofconcerns(e.g.,cybersecurity,softwaredeliverytempo,andinteroperabilitygoals).Theprogrammanagerexpectsittogenerateastructurethatcanbequicklyrefinedandthat

includestopicsdrawnfrombestpracticestheymaynotthinktorequestexplicitly.

SE7.AdeveloperusesanLLMtotranslateseveralclassesfrom

C++toRust,expectingthatthetranslatedcodewillpassthesametestsandbemoresecureandmemorysafe.

A8.ArequirementsengineerusesanLLMtogeneratedraft

requirementsstatementsforaprogramupgradebasedonpastsimilarcapabilities,expectingthemtobeagoodstartingpoint.

SE8.AdeveloperusesanLLMtogeneratesynthetictestdataforanewfeaturebeingdeveloped,expectingthatitwillquicklygeneratesyntacticallycorrectandrepresentativedata.

A9.Acontractofficerisseekingfundingtoconductresearchonahigh-prioritytopictheyarenotfamiliarwith.ThecontractofficerusesanLLMtocreateexampleprojectdescriptionsfortheir

context,expectingittoproducereasonabledescriptions.

SE9.AdeveloperprovidesanLLMwithcodethatisfailingin

productionandadescriptionofthefailures,expectingittohelpthedeveloperdiagnosetherootcauseandproposeafix.

Archetypes

Commonalitiesamongtheusecaseslendthemselvesto

abstractingthesetintoamanageablenumberofarchetypes.Twodimensionsarehelpfulinthisregard:thenatureof

theactivityanLLMisperformingandthenatureofthedatathattheLLMisactingon.Takingthecross-productofthesedimensions,theseusecasesfallintothearchetypesdepictedin

Table2

.

Table2:UseCaseArchetypes

ACTIVITYTYPE

DATATYPE

Text

Code

Model

Images

Retrieve

Information

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

generate-code

generate-model

generate-images

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

analyze-code

analyze-model

analyze-images

5

Matchingaspecificusetoanarchetypehelpsidentify

commonconcernsamongsimilarusecasesandknownsolutionscommonlyappliedforsimilarusecases.

Archetypescanbeatoolthatorganizationsusetogroupsuccesses,gaps,andlessonslearnedinastructuredway.

ActivityTypecapturesdifferencesinassociationsthatanLLMwouldneedtomaketosupportausecase,withsomeaskinganLLMtodothingsthatalanguagemodelwasnotdesignedtodo:

?RetrieveInformationasksanLLMtoconstructaresponsetoaquestion(e.g.,what’stheObserverpattern?)forwhichaknownanswerislikelyfoundinthetrainingcorpus,directlyoracrossrelatedelements.

?GenerateArtifactasksanLLMtocreateanewartifact(e.g.,asummaryofatopicoraPythonscriptthatperformsastatisticalanalysis)thatlikelybearssimilaritywithexistingexamplesinthecorpus.

?ModifyArtifactasksanLLMtomodifyanexistingartifact

toimproveitinsomeway(e.g.,translatePythoncodetoJavaorremoveadescribedbug)thatresemblesanalogousimprovementsamongartifactsinthetrainingcorpus.

?AnalyzeArtifactasksanLLMtodrawaconclusionaboutprovidedinformation(e.g.,whatvulnerabilitiesareinthiscodeorwillthisarchitecturescaleadequately?)thatlikelyrequiressemanticreasoningaboutdata.

DataTypecapturesdifferencesinthekindofdatathatanLLMoperatesonorgenerates,suchasthedifferencesinsemanticrulesthatmakedata(e.g.,code)well-formed:

?Textinputsvarywidelyinformalityandstructure(e.g.,

informalchatversusstructuredtextcapturedintemplates).

?Codeistextwithformalrulesforstructureandsemantics,andagrowingnumberofLLMsarebeingspecializedtotakeadvantageofthisstructureandsemantics.

?Modelsareabstractions(e.g.,fromsoftwaredesignor

architecture)thatoftenusesimpleterms(e.g.,publisher)thatimplydeepsemantics.

?Imagesareusedtocommunicatemanysoftwareartifacts(e.g.,classdiagrams)andoftenemployvisualconventions

that,muchlikemodels,implyspecificsemantics.WhileLLMsoperateontext,multimodalLLMs(e.g.,GPT-4)aregrowingintheirabilitytoingestandgenerateimagedata.

Figure1

showsanexampleofusingthearchetypesto

generateideasforLLMusecasesinaparticulardomain.

Thisexamplefocusesonindependentverificationand

validation(IV&V),aresource-intensiveactivitywithintheDoDthatinvolvesmanydifferentactivitiesthatmightbenefit

fromtheuseofLLMs.MorecomplexusecasesforIV&V

couldalsobegeneratedthatinvolveintegrationofmultiplearchetypesintoalargerworkflow.

ACTIVITYTYPE

Text

Code

DATATYPE

Model

Images

RetrieveInformation

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

1

2

generate-code

4

generate-model

generate-images

6

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

3

analyze-code

5

analyze-model

analyze-images

3

AnIV&VevaluatorusesanLLMtoanalyzesoftwaredesigndocumentsagainsta

specificsetofcertificationcriteriaandto

generateacertificationreport,expectingittodescribecertificationviolationsthattheywillreviewtoconfirm.

2

AdeveloperusesanLLMtocreatea

networkviewforauthorizationtooperate(ATO)certificationfromadescriptionofthearchitecture,expectingittoproducearoughnetworkdiagramtheycanrefine.

Figure1:UsingArchetypestoHelpBrainstormPotentialUseCases

AtesterusesanLLMtocreateintegrationtestdescriptionsfromasetofAPIsand

integrationscenarios,expectingitto

produceasetoftestcasedescriptionsthatcanbeusedtoimplementtests.

AnIV&VevaluatorusesanLLMtocreateaverificationchecklistfromasetof

certificationregulationsandasystem

description,expectingittoproducea

context-sensitivechecklisttheycantailor.

AdeveloperusesanLLMtofind

vulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

AnewdeveloperusesanLLMasapair

programmertowritecode,expectingittohelpcreatevulnerability-freecode.

4

6

5

1

6

Mistakeshave

smallconsequences

Mistakesarehardforusersto?nd

SE1A8

A4

SE8

A3

SE4

A9

SE3

SE5

A1

SE7

A5

SE6

A2

A6

SE9SE2

A7

Mistakeshave

largeconsequences

Mistakesareeasyforusersto?nd

Figure2:TwoWaystoLookatConcernswiththeGenerationofIncorrectResults(A:AcquisitionUseCases,SE:SoftwareEngineeringUseCases

[Table1

])

ConcernsandHowtoAddressThem

RecognizingconcernsaroundapplicationsofLLMsto

softwareengineeringandacquisition,anddecidinghowto

addresseach,willhelpdecisionmakersmakemoreinformedchoices.TherearemultipleperspectivesoneshouldconsiderbeforegoingforwardwithanLLMusecase.Animportant

realityisthattheresultsgeneratedbyLLMsareinfact

sometimeswrong.

Figure2

illustratesthisperspectivebasedontwoquestions:

?Howsignificantwoulditbetoactonanincorrectresultinagivenusecase?

?HoweasywoulditbeforauserintheusecasetorecognizethataresultfromanLLMisincorrect?

Thisfigureshowsanotionalplacementoftheusecasesfrom

Table1

(actualplacementwouldbereliantonrefinement

oftheseusecases).Thegreenquadrantisidealfromthis

perspective:Mistakesarenotparticularlyconsequentialandrelativelyeasytospot.UsecasesinthisquadrantcanbeagreatplacefororganizationstostartLLMexperimentation.Theredquadrant,ontheotherhand,representstheleastfavorablecasesforLLMuse:Mistakescreaterealproblemsandarehardforuserstorecognize.

Theconsequencesofmistakesandeaseofspottingthemisonlyoneperspectiveofevaluation.Anotherperspectiveis

theexpectedsignificanceofimprovementsorefficienciesachievablewithLLMs.Amongmanyconcerns,wediscussfivecategoriesinfurtherdetailinthisdocument—correctness,disclosure,usability,performance,andtrust—astheyare

relevanttoallusecases.

Correctness:Thesignificanceofcorrectnessasaconcern

dependsonfactorssuchashowtheresultswillbeused,thesafeguardsusedinworkflows,andtheexpertiseofusers.

Correctnessreferstotheoverallaccuracyandprecisionof

outputrelativetosomeknowntruthorexpectation.Accuracy

hingesgreatlyonwhetheranLLMwastrainedorfine-tunedwithdatathatissufficientlyrepresentativetosupportthe

specificusecase.Evenwithrichtrainingcorpuses,some

inaccuracycanbeexpected

[Ouyangetal.2023]

.Forexample,arecentstudyoncodetranslationfoundGPT-4toperform

betterthanotherLLMs,eventhoughmorethan80%of

translationsonapairofopensourceprojectscontainedsomeerrors.Advancesarelikelytoimprove,butnoteliminate,

thesenumbers

[Panetal.2023]

.

7

Disclosure:WhenusersinteractwithLLMs,someusecases

mayrequiredisclosingproprietaryorsensitiveinformationtoaserviceprovidertocompleteatask(e.g.,sharingsourcecodetohelpdebugit).Thedisclosureconcernisthereforerelatedtotheamountofproprietaryinformationthatmustbeexposedduringuse.Ifusersshareconfidentialdata,tradesecrets,or

personalinformation,thereisariskthatsuchdatacouldbestored,misused,oraccessedbyunauthorizedindividuals.Moreover,itmightbecomepartofthetrainingdatacorpusanddisseminatedwithoutusershavinganymeanstotrackitsorigin.Forexample,GSACIOIL-23-01(theU.S.GeneralServicesAdministrationinstructionalletterSecurityPolicy

forGenerativeArtificialIntelligence[AI]LargeLanguageModels[LLMs])bansdisclosureoffederalnonpublicinformationasinputsinpromptstothird-partyLLMendpoints

[GSA2023]

.

Usability:LLMusershavevastlydifferentbackgrounds,

expectations,andtechnicalabilities.Usabilitycaptures

theabilityofLLMuserswithdifferentexpertisetocomplete

tasks.Usersmayneedexpertiseonboththeinput(craftingappropriateprompts)andoutput(judgingthecorrectnessofresults)sidesofLLMuse

[Zamfirescu-Pereiraetal.2023]

.Thesignificanceofusabilityasaconcerndependsonthe

degreetowhichgettingtoacceptableresultsissensitivetotheexpertiseofusers.Astudycompletedwithdevelopers’earlyexperiencesusingCoPilotreflectsthatthereisashiftfromwritingcodetounderstandingcodewhenusingLLMsoncodingtasks

[Birdetal.2023]

.Thisobservationhintsattheneedfordifferentusabilitytechniquesforinteractionmechanisms,aswellastheneedtoaccountforexpertise.

Performance:WhileusinganLLMrequiresmuchless

computingpowerthantraininganLLM,responsiveness

canstillbeafactorinLLMuse,especiallyifsophisticated

promptingapproachesareincorporatedintoanLLM-

basedservice.Forthepurposesofconcernsrelatedtousecases,performanceexpressesthetimerequiredtoarriveatanappropriateresponse.Modelsize,underlyingcompute

power,andwherethemodelrunsandisaccessedfromareamongthefactorsthatinfluenceresponsiveness

[Patterson

etal.2022]

.ServicesbuiltonLLMsmayintroduceadditionalperformanceoverheadduetothewayinwhichother

capabilitiesareintegratedwiththeLLM.

Trust:Toemploythetechnologywiththerequisitelevel

oftrust,usersmustgraspthelimitationsofLLMs.Trust

reflectstheuser’sconfidenceintheoutput.Overrelianceon

anLLMwithoutunderstandingitspotentialforerroror

biascanleadtoundesirableconsequences

[Rastogietal.

2023]

.Asaresult,severalotherconcerns(e.g.,explainability,bias,privacy,security,andethics)areoftenconsideredin

relationshiptotrust

[Schwartzetal.2023]

.Forexample,theDoDpublishedethicalAIprinciplestoadvancetrustworthyAIsystems

[DoD2020]

.

Howsignificanttheseandotherconcernsareforeachuse

casewillvarybycontextanduse.Thequestionsprovided

in

Table3

canhelporganizationsassesshowrelevanteachconcernisforaspecificusecase.AstartingpointcouldbetocategorizethesignificanceofeachconcernasHigh,Medium,orLow.Thisinformationcanhelporganizationsdecide

whetheranLLMisfitforpurposeandwhatconcernsneedtobemitigatedtoavoidunacceptableoutcomes.

Table3:ExampleQuestionstoHelpDeterminetheSignificanceofCommonConcernsforaSpecificUseCase

CONCERN

SIGNIFICANCEQUESTIONS

Correctness

?Whatistheriskorimpactofusinganincorrectresultintheusecase?

?Howdifficultisitfortheexpectedusertodeterminewhetheraresultiscorrect?

?Aretheregapsinthedatausedtotrainthe

LLMthatcouldadverselyimpactresults(e.g.,thedataisnotcurrentwithrecenttechnologyreleasesorcontainslittledataforanesotericprogramminglanguage)?

Disclosure

?CananLLMbepromptedwithoutdisclosingproprietaryinformation(e.g.,usinggenericquestionsorabstractingproprietarydetails)?

?Whatistheriskorimpactofathirdpartybeingabletoobserveyourprompts?

?Arethereexistingdatadisclosureconstraintsthatstrictlyneedtobeobserved?

Usability

?HowadeptareexpectedusersatpromptinganLLM?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareinaccurate?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareincomplete?

Performance

?Howquicklymustauserormachinebeabletoactonaresult?

?Aretheresignificantcomputingresourcelimitations?

?ArethereintermediatestepsintheinteractionwiththeLLMthatmayaffectend-to-end

performance?

Trust

?Areyourexpecteduserspredisposedto

acceptgeneratedresults(automationbias)orrejectthem?

?IsthedatatheLLMwastrainedonfreeofbiasandethicalconcerns?

?HastheLLMbeentrainedondatathatisappropriateforuse?

8

Thesecommonconcerns,andquestionstodetermine

theirsignificance,enableidentificationofcommontacticsforaddressingeachconcern.Atacticisacourseofactionthatcanbetakentoreducetheoccurrenceorimpactofaconcern.

Table4

summarizesacollectionoftacticsthatcanhelpmitigateeachconcern,alongwitharoughestimate

(High[H],Medium[M],orLow[L])oftherelativepotentialcostofusingeachtactic.Typically,themoreresources

(humanandcomputation)atacticrequires,thehigherthe

cost.Forexample,promptengineeringandmodeltraining

bothaddresscorrectness,butpromptengineeringistypicallymuchlessexpensive.Ofnote,sometactics(purplerows)

focusontechnicalinterventions,others(greenrows)focusonhuman-centeredactions,andtherest(grayrows)couldemploytechnicalorhuman-centeredinterventions.

Table4:TacticsThatCanBeUsedtoAddressCommonConcernswithLLMUse

CONCERN

TACTIC

DESCRIPTION

COST

Correctness

Promptengineering

Educateusersonpromptengineeringtechniquesandpatternstogeneratebetterresults.

L

Validatemanually

Dedicatetimetoallowuserstocarefullyvalidateinterimandfinalresults.

M

Adjustsettings

Changesettingsofexposedmodelparametersliketemperature

(randomnessofthemodel’soutput)andthemaximumnumberoftokens.

L

Adoptnewermodel

Usenewermodelsthatintegratetechnicaladvancesorimprovedtrainingcorpusesthatcanproducebetterresults.

M

Finetunemodel

Tailorapretrainedmodelusingorganization-ordomain-specificdatatoimproveresults.

M

Trainnewmodel

Useacustomtrainingcorpusorproprietarydatatotrainanewmodel.

H

Disclosure

Opendisclosurepolicy

Establishapolicythatallowsuserstoshareasmuchdeta

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論