版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
Assessing
OpportunitiesforLLMs
inSoftware
EngineeringandAcquisition
Authors
StephanyBellomo
ShenZhangJamesIversJulieCohenIpekOzkaya
NOVEMBER2023
2
LARGELANGUAGEMODELS(LLMS)AREGENERATIVEARTIFICIALINTELLIGENCE(AI)MODELSthathave
beentrainedonmassivecorpusesoftextdataand
canbepromptedtogeneratenew,plausiblecontent.LLMsareseeingrapidadvances,andtheypromisetoimproveproductivityinmanyfields.OpenAI’sGPT-
41andGoogle’sLaMDA2aretheunderlyingLLMs
ofserviceslikeChatGPT3,CoPilot4,andBard5.Theseservicescanperformarangeoftasks,including
generatinghuman-liketextresponsestoquestions,summarizingartifacts,andgeneratingworkingcode.Thesemodelsandservicesarethefocusofextensiveresearcheffortsacrossindustry,government,and
academiatoimprovetheircapabilitiesandrelevance,andorganizationsinmanydomainsarerigorously
exploringtheirusetouncoverpotentialapplications.
TheideaofharnessingLLMstoenhancetheefficiencyof
softwareengineeringandacquisitionactivitiesholdsspecialallurefororganizationswithlargesoftwareoperations,suchastheDepartmentofDefense(DoD),asdoingsooffersthepromiseofsubstantialresourceoptimization.PotentialusecasesforLLMsareplentiful,butknowinghowtoassessthebenefitsandrisksassociatedwiththeiruseisnontrivial.
Notably,togainaccesstothelatestadvances,organizationsmayneedtoshareproprietarydata(e.g.,sourcecode)withserviceproviders.UnderstandingsuchimplicationsiscentraltointentionalandresponsibleuseofLLMs,especiallyfor
organizationsmanagingsensitiveinformation.
Inthisdocument,weexaminehowdecisionmakers,suchastechnicalleadsandprogrammanagers,canassessthefitnessofLLMstoaddresssoftwareengineeringandacquisition
needs
[Ozkaya2023]
.Wefirstintroduceexemplarscenariosinsoftwareengineeringandsoftwareacquisitionand
identifycommonarchetypes.Wedescribecommonconcerns
involvingtheuseofLLMsandenumeratetacticsformitigatingthoseconcerns.Usingthesecommonconcernsandtactics,
wedemonstratehowdecisionmakerscanassessthefitnessofLLMsfortheirownusecasesthroughtwoexamples.
CapabilitiesofLLMs,risksconcerningtheiruse,andour
collectiveunderstandingofemergingservicesandmodelsareevolvingrapidly
[Brundageetal.2022]
.Whilethisdocumentisnotmeanttobecomprehensiveincoveringallsoftware
engineeringandacquisitionusecases,theirconcerns,andmitigationtactics,itdemonstratesanapproachthatdecisionmakerscanusetothinkthroughtheirownLLMusecasesasthisspaceevolves.
1
/research/gpt-4
2
https://blog.google/technology/ai/lamda/
3
4
/features/copilot
5
WhatIsanLLM?
AnLLMisadeepneuralnetworkmodeltrainedonanextensivecorpusofdiversedocuments(e.g.,websitesandbooks)to
learnlanguagepatterns,grammarrules,factsandevensomereasoningabilities
[Wolfram2023]
.LLMscangenerateresponsestoinputs(“prompts”)byiterativelydeterminingthenextwordorphraseappearingafterothersbasedonthepromptand
patternsandassociationslearnedfromtheirtrainingcorpususingprobabilisticandrandomizedselection
[Whiteetal.
2023]
.ThiscapabilityallowsLLMstogeneratehuman-liketextthatcanbesurprisinglycoherentandcontextuallyrelevant,eveniftheymaynotalwaysbesemanticallycorrect.
WhileLLMscanperformcomplextasksusingtheirtrainedknowledge,theylacktrueunderstanding.Rather,theyare
sophisticatedpatternmatchingtools.Moreover,duetotheirprobabilisticreasoning,theycangenerateinaccurateresults(oftenreferredtoas“hallucinations”),suchascitationsto
non-existentreferencesormethodcallstononexistent
applicationprogramminginterfaces(APIs).WhileLLMscanperformanalysisandinferencingonnewdatatheyhave
beenpromptedwith,dataonwhichLLMshavebeentrainedcanlimittheiraccuracy.However,thetechnologyisrapidlyadvancingwithnewmodelshavingincreasingcomplexity
andparameters,andbenchmarkshavealreadyemergedforcomparingtheirperformance
[Imsys2023]
.Inaddition,LLMserviceprovidersareworkingonwaystousemorerecentdata
[D’Cruze2023]
.Despitetheselimitations,thereare
productiveusesofLLMstoday.
ChoosinganLLM
TherearealreadydozensofLLMsandservicesbuiltusingLLMs,andmoreemergeeveryday.Thesemodelsvaryinmanydimensions,fromtechnicaltocontractual,andthe
detailsofthesedifferencescanbedifficulttokeepstraight.ThefollowingdistinctionsareagoodstartingpointwhenchoosinganLLMforuse.
ModelorService.ChatGPTisachatbotbuiltonOpenAI’sGPTfamilyofLLMs
[OpenAI2023]
.Thedifferenceisimportant,
asservicesbuiltonLLMscanaddadditionalcapabilities(e.g.,
specializedchatbotfeatures,specializedtrainingbeyondthe
coreLLM,ornon-LLMfeaturesthatcanimproveresultsfromanLLM).AservicelikeChatGPTistypicallyhostedbyaserviceprovider,meaningthatitmanagesthecomputingresources
(andassociatedcosts)andthatusersaretypicallyrequired
tosendtheirprompts(andpotentiallysensitivedata)tothe
serviceprovidertousetheservice.Amodel,likeMeta’sLlama26,canbefine-tunedwithdomain-ororganization-specificdatatoimproveaccuracy,butittypicallylackstheaddedfeatures
andresourcesofacommerciallysupportedservice.
6
/llama/
3
GeneralorSpecialized.LLMsarepre-trainedonacorpus,andthecompositionofthatcorpusisasignificantfactoraffectinganLLM’sperformance.GeneralLLMsaretrainedontextsourceslikeWikipediathatareavailabletothepublic.SpecializedLLMsfinetunethosemodelsbyaddingtrainingmaterialfromspecificdomainslikehealthcareandfinance
[Zhouetal.2022;Wuet
al.2023]
.LLMslikeCodeGen7havebeenspecializedwithlargecorpusesofsourcecodeforuseinsoftwareengineering.
OpenSourceorProprietary.OpensourceLLMsprovidea
platformforresearchersanddeveloperstofreelyaccess,use,andevencontributetothemodel’sdevelopment.ProprietaryLLMsaresubjecttovaryingrestrictionsonuse,makingthemlessopentoexperimentationorpotentialdeployment.
Someproviders(e.g.,Meta)usealicensethatislargely,butnotcompletely,open
[Hull2023]
.OpenAIoffersadifferent
compromise:WhiletheGPTseriesofLLMsisnotopen
source,OpenAIdoespermitfinetuning(forafee)asameansofspecializationandlimitedexperimentationwiththeir
proprietarymodel.
ThefieldofLLMsisafast-movingspace.Moreover,theethicsandregulationssurroundingtheirusearealsoinastateofflux,associetygrappleswiththechallengesandopportunitiesthesepowerfulmodelspresent.
KeepingapprisedofthesedevelopmentsiscrucialfortakingadvantageofthepotentialofferedbyLLMs.
7
/salesforce/CodeGen
UseCases
TheabilityofLLMstogenerateplausiblecontentfortextandcodeapplicationshassparkedtheimaginationsofmany.
Arecentliteraturereviewexamines229researchpapers
writtensince2017ontheapplicationofLLMstosoftware
engineeringproblems
[Houetal.2023]
.Applicationareasspanrequirements,design,development,testing,maintenance,andmanagementactivities,withdevelopmentandtestingbeingthemostcommon.
Ourteam,whichworkswithgovernmentorganizations
daily,tookabroaderperspectiveandbrainstormedseveraldozenideasforusingLLMsincommonsoftwareengineeringandacquisitionactivities(see
Table1
forexamples).Two
importantobservationsquicklyemergedfromthisactivity.
First,mostusecasesrepresenthuman-AIpartnershipsin
whichanLLMorLLM-basedservicecouldbeusedtohelp
humans(asopposedtoreplacehumans)completetasks
morequickly.Second,decidingwhichusecaseswouldbe
mostfeasible,beneficial,oraffordableisnotatrivialdecisionforthosejustgettingstartedwithLLMs.
4
Table1:SampleAcquisitionandSoftwareEngineeringUseCases
ACQUISITIONUSECASES
SOFTWAREENGINEERINGUSECASES
A1.AnewacquisitionspecialistusesanLLMtogeneratean
overviewofrelevantfederalregulationsforanupcomingrequestforproposal(RFP)review,expectingthesummarytosavetimeinbackgroundreading.
SE1.AdeveloperusesanLLMtofindvulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.
A2.AchiefengineerusesanLLMtogenerateacomparisonofalternativesfrommultipleproposals,expectingittousethebudgetandscheduleformulasfromprevioussimilarproposalreviewsandgenerateaccurateitemizedcomparisons.
SE2.AdeveloperusesanLLMtogeneratecodethatparses
structuredinputfilesandperformsspecifiednumericalanalysisonitsinputs,expectingittogeneratecodewiththedesired
capabilities.
A3.AcontractspecialistusesanLLMtogenerateideasfora
requestforinformation(RFI)solicitationgivenasetofconcernsandvagueproblemdescription,expectingittogenerateadraftRFIthatisatleast75%alignedwiththeirneeds.
SE3.AtesterusesanLLMtocreatefunctionaltestcases,
expectingittoproduceasetoftexttestcasesfromaprovidedrequirementsdocument.
A4.ACTOusesanLLMtocreateareportsummarizingallusesofdigitalengineeringtechnologiesintheorganizationbased
oninternaldocuments,expectingitcanquicklyproduceaclearsummarythatisatleast90%correct.
SE4.AdeveloperusesanLLMtogeneratesoftware
documentationfromcodetobemaintained,expectingittosummarizeitsfunctionalityandinterface.
A5.AprogramofficeleadusesanLLMtoevaluateacontractor’scodedeliveryforcompliancewithrequireddesignpatterns,
expectingthatitwillidentifyanyinstancesinwhichthecodefailstouserequiredpatterns.
SE5.AsoftwareengineerwhoisunfamiliarwithSQLusesan
LLMtogenerateanSQLqueryfromanaturallanguage
description,expectingittogenerateacorrectquerythatcanbetestedimmediately.
A6.AprogrammanagerusesanLLMtosummarizeasetof
historicalartifactsfromthepastsixmonthsinpreparationforahigh-visibilityprogramreviewandprovidesspecificretrievalcriteria(e.g.,deliverytempo,statusofopendefects,and
schedule),expectingittogenerateanaccuratesummaryofprogramstatusthatcomplieswiththeretrievalcriteria.
SE6.AsoftwarearchitectusesanLLMtovalidatewhethercodethatisreadyfordeploymentisconsistentwiththesystem’s
architecture,expectingthatitwillreliablycatchdeviationsfromtheintendedarchitecture.
A7.AprogrammanagerusesanLLMtogenerateareviseddraftofastatementofwork,givenashortstartingdescriptionand
alistofconcerns(e.g.,cybersecurity,softwaredeliverytempo,andinteroperabilitygoals).Theprogrammanagerexpectsittogenerateastructurethatcanbequicklyrefinedandthat
includestopicsdrawnfrombestpracticestheymaynotthinktorequestexplicitly.
SE7.AdeveloperusesanLLMtotranslateseveralclassesfrom
C++toRust,expectingthatthetranslatedcodewillpassthesametestsandbemoresecureandmemorysafe.
A8.ArequirementsengineerusesanLLMtogeneratedraft
requirementsstatementsforaprogramupgradebasedonpastsimilarcapabilities,expectingthemtobeagoodstartingpoint.
SE8.AdeveloperusesanLLMtogeneratesynthetictestdataforanewfeaturebeingdeveloped,expectingthatitwillquicklygeneratesyntacticallycorrectandrepresentativedata.
A9.Acontractofficerisseekingfundingtoconductresearchonahigh-prioritytopictheyarenotfamiliarwith.ThecontractofficerusesanLLMtocreateexampleprojectdescriptionsfortheir
context,expectingittoproducereasonabledescriptions.
SE9.AdeveloperprovidesanLLMwithcodethatisfailingin
productionandadescriptionofthefailures,expectingittohelpthedeveloperdiagnosetherootcauseandproposeafix.
Archetypes
Commonalitiesamongtheusecaseslendthemselvesto
abstractingthesetintoamanageablenumberofarchetypes.Twodimensionsarehelpfulinthisregard:thenatureof
theactivityanLLMisperformingandthenatureofthedatathattheLLMisactingon.Takingthecross-productofthesedimensions,theseusecasesfallintothearchetypesdepictedin
Table2
.
Table2:UseCaseArchetypes
ACTIVITYTYPE
DATATYPE
Text
Code
Model
Images
Retrieve
Information
retrieve-text
retrieve-code
retrieve-model
retrieve-images
GenerateArtifact
generate-text
generate-code
generate-model
generate-images
ModifyArtifact
modify-text
modify-code
modify-model
modify-images
AnalyzeArtifact
analyze-text
analyze-code
analyze-model
analyze-images
5
Matchingaspecificusetoanarchetypehelpsidentify
commonconcernsamongsimilarusecasesandknownsolutionscommonlyappliedforsimilarusecases.
Archetypescanbeatoolthatorganizationsusetogroupsuccesses,gaps,andlessonslearnedinastructuredway.
ActivityTypecapturesdifferencesinassociationsthatanLLMwouldneedtomaketosupportausecase,withsomeaskinganLLMtodothingsthatalanguagemodelwasnotdesignedtodo:
?RetrieveInformationasksanLLMtoconstructaresponsetoaquestion(e.g.,what’stheObserverpattern?)forwhichaknownanswerislikelyfoundinthetrainingcorpus,directlyoracrossrelatedelements.
?GenerateArtifactasksanLLMtocreateanewartifact(e.g.,asummaryofatopicoraPythonscriptthatperformsastatisticalanalysis)thatlikelybearssimilaritywithexistingexamplesinthecorpus.
?ModifyArtifactasksanLLMtomodifyanexistingartifact
toimproveitinsomeway(e.g.,translatePythoncodetoJavaorremoveadescribedbug)thatresemblesanalogousimprovementsamongartifactsinthetrainingcorpus.
?AnalyzeArtifactasksanLLMtodrawaconclusionaboutprovidedinformation(e.g.,whatvulnerabilitiesareinthiscodeorwillthisarchitecturescaleadequately?)thatlikelyrequiressemanticreasoningaboutdata.
DataTypecapturesdifferencesinthekindofdatathatanLLMoperatesonorgenerates,suchasthedifferencesinsemanticrulesthatmakedata(e.g.,code)well-formed:
?Textinputsvarywidelyinformalityandstructure(e.g.,
informalchatversusstructuredtextcapturedintemplates).
?Codeistextwithformalrulesforstructureandsemantics,andagrowingnumberofLLMsarebeingspecializedtotakeadvantageofthisstructureandsemantics.
?Modelsareabstractions(e.g.,fromsoftwaredesignor
architecture)thatoftenusesimpleterms(e.g.,publisher)thatimplydeepsemantics.
?Imagesareusedtocommunicatemanysoftwareartifacts(e.g.,classdiagrams)andoftenemployvisualconventions
that,muchlikemodels,implyspecificsemantics.WhileLLMsoperateontext,multimodalLLMs(e.g.,GPT-4)aregrowingintheirabilitytoingestandgenerateimagedata.
Figure1
showsanexampleofusingthearchetypesto
generateideasforLLMusecasesinaparticulardomain.
Thisexamplefocusesonindependentverificationand
validation(IV&V),aresource-intensiveactivitywithintheDoDthatinvolvesmanydifferentactivitiesthatmightbenefit
fromtheuseofLLMs.MorecomplexusecasesforIV&V
couldalsobegeneratedthatinvolveintegrationofmultiplearchetypesintoalargerworkflow.
ACTIVITYTYPE
Text
Code
DATATYPE
Model
Images
RetrieveInformation
retrieve-text
retrieve-code
retrieve-model
retrieve-images
GenerateArtifact
generate-text
1
2
generate-code
4
generate-model
generate-images
6
ModifyArtifact
modify-text
modify-code
modify-model
modify-images
AnalyzeArtifact
analyze-text
3
analyze-code
5
analyze-model
analyze-images
3
AnIV&VevaluatorusesanLLMtoanalyzesoftwaredesigndocumentsagainsta
specificsetofcertificationcriteriaandto
generateacertificationreport,expectingittodescribecertificationviolationsthattheywillreviewtoconfirm.
2
AdeveloperusesanLLMtocreatea
networkviewforauthorizationtooperate(ATO)certificationfromadescriptionofthearchitecture,expectingittoproducearoughnetworkdiagramtheycanrefine.
Figure1:UsingArchetypestoHelpBrainstormPotentialUseCases
AtesterusesanLLMtocreateintegrationtestdescriptionsfromasetofAPIsand
integrationscenarios,expectingitto
produceasetoftestcasedescriptionsthatcanbeusedtoimplementtests.
AnIV&VevaluatorusesanLLMtocreateaverificationchecklistfromasetof
certificationregulationsandasystem
description,expectingittoproducea
context-sensitivechecklisttheycantailor.
AdeveloperusesanLLMtofind
vulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.
AnewdeveloperusesanLLMasapair
programmertowritecode,expectingittohelpcreatevulnerability-freecode.
4
6
5
1
6
Mistakeshave
smallconsequences
Mistakesarehardforusersto?nd
SE1A8
A4
SE8
A3
SE4
A9
SE3
SE5
A1
SE7
A5
SE6
A2
A6
SE9SE2
A7
Mistakeshave
largeconsequences
Mistakesareeasyforusersto?nd
Figure2:TwoWaystoLookatConcernswiththeGenerationofIncorrectResults(A:AcquisitionUseCases,SE:SoftwareEngineeringUseCases
[Table1
])
ConcernsandHowtoAddressThem
RecognizingconcernsaroundapplicationsofLLMsto
softwareengineeringandacquisition,anddecidinghowto
addresseach,willhelpdecisionmakersmakemoreinformedchoices.TherearemultipleperspectivesoneshouldconsiderbeforegoingforwardwithanLLMusecase.Animportant
realityisthattheresultsgeneratedbyLLMsareinfact
sometimeswrong.
Figure2
illustratesthisperspectivebasedontwoquestions:
?Howsignificantwoulditbetoactonanincorrectresultinagivenusecase?
?HoweasywoulditbeforauserintheusecasetorecognizethataresultfromanLLMisincorrect?
Thisfigureshowsanotionalplacementoftheusecasesfrom
Table1
(actualplacementwouldbereliantonrefinement
oftheseusecases).Thegreenquadrantisidealfromthis
perspective:Mistakesarenotparticularlyconsequentialandrelativelyeasytospot.UsecasesinthisquadrantcanbeagreatplacefororganizationstostartLLMexperimentation.Theredquadrant,ontheotherhand,representstheleastfavorablecasesforLLMuse:Mistakescreaterealproblemsandarehardforuserstorecognize.
Theconsequencesofmistakesandeaseofspottingthemisonlyoneperspectiveofevaluation.Anotherperspectiveis
theexpectedsignificanceofimprovementsorefficienciesachievablewithLLMs.Amongmanyconcerns,wediscussfivecategoriesinfurtherdetailinthisdocument—correctness,disclosure,usability,performance,andtrust—astheyare
relevanttoallusecases.
Correctness:Thesignificanceofcorrectnessasaconcern
dependsonfactorssuchashowtheresultswillbeused,thesafeguardsusedinworkflows,andtheexpertiseofusers.
Correctnessreferstotheoverallaccuracyandprecisionof
outputrelativetosomeknowntruthorexpectation.Accuracy
hingesgreatlyonwhetheranLLMwastrainedorfine-tunedwithdatathatissufficientlyrepresentativetosupportthe
specificusecase.Evenwithrichtrainingcorpuses,some
inaccuracycanbeexpected
[Ouyangetal.2023]
.Forexample,arecentstudyoncodetranslationfoundGPT-4toperform
betterthanotherLLMs,eventhoughmorethan80%of
translationsonapairofopensourceprojectscontainedsomeerrors.Advancesarelikelytoimprove,butnoteliminate,
thesenumbers
[Panetal.2023]
.
7
Disclosure:WhenusersinteractwithLLMs,someusecases
mayrequiredisclosingproprietaryorsensitiveinformationtoaserviceprovidertocompleteatask(e.g.,sharingsourcecodetohelpdebugit).Thedisclosureconcernisthereforerelatedtotheamountofproprietaryinformationthatmustbeexposedduringuse.Ifusersshareconfidentialdata,tradesecrets,or
personalinformation,thereisariskthatsuchdatacouldbestored,misused,oraccessedbyunauthorizedindividuals.Moreover,itmightbecomepartofthetrainingdatacorpusanddisseminatedwithoutusershavinganymeanstotrackitsorigin.Forexample,GSACIOIL-23-01(theU.S.GeneralServicesAdministrationinstructionalletterSecurityPolicy
forGenerativeArtificialIntelligence[AI]LargeLanguageModels[LLMs])bansdisclosureoffederalnonpublicinformationasinputsinpromptstothird-partyLLMendpoints
[GSA2023]
.
Usability:LLMusershavevastlydifferentbackgrounds,
expectations,andtechnicalabilities.Usabilitycaptures
theabilityofLLMuserswithdifferentexpertisetocomplete
tasks.Usersmayneedexpertiseonboththeinput(craftingappropriateprompts)andoutput(judgingthecorrectnessofresults)sidesofLLMuse
[Zamfirescu-Pereiraetal.2023]
.Thesignificanceofusabilityasaconcerndependsonthe
degreetowhichgettingtoacceptableresultsissensitivetotheexpertiseofusers.Astudycompletedwithdevelopers’earlyexperiencesusingCoPilotreflectsthatthereisashiftfromwritingcodetounderstandingcodewhenusingLLMsoncodingtasks
[Birdetal.2023]
.Thisobservationhintsattheneedfordifferentusabilitytechniquesforinteractionmechanisms,aswellastheneedtoaccountforexpertise.
Performance:WhileusinganLLMrequiresmuchless
computingpowerthantraininganLLM,responsiveness
canstillbeafactorinLLMuse,especiallyifsophisticated
promptingapproachesareincorporatedintoanLLM-
basedservice.Forthepurposesofconcernsrelatedtousecases,performanceexpressesthetimerequiredtoarriveatanappropriateresponse.Modelsize,underlyingcompute
power,andwherethemodelrunsandisaccessedfromareamongthefactorsthatinfluenceresponsiveness
[Patterson
etal.2022]
.ServicesbuiltonLLMsmayintroduceadditionalperformanceoverheadduetothewayinwhichother
capabilitiesareintegratedwiththeLLM.
Trust:Toemploythetechnologywiththerequisitelevel
oftrust,usersmustgraspthelimitationsofLLMs.Trust
reflectstheuser’sconfidenceintheoutput.Overrelianceon
anLLMwithoutunderstandingitspotentialforerroror
biascanleadtoundesirableconsequences
[Rastogietal.
2023]
.Asaresult,severalotherconcerns(e.g.,explainability,bias,privacy,security,andethics)areoftenconsideredin
relationshiptotrust
[Schwartzetal.2023]
.Forexample,theDoDpublishedethicalAIprinciplestoadvancetrustworthyAIsystems
[DoD2020]
.
Howsignificanttheseandotherconcernsareforeachuse
casewillvarybycontextanduse.Thequestionsprovided
in
Table3
canhelporganizationsassesshowrelevanteachconcernisforaspecificusecase.AstartingpointcouldbetocategorizethesignificanceofeachconcernasHigh,Medium,orLow.Thisinformationcanhelporganizationsdecide
whetheranLLMisfitforpurposeandwhatconcernsneedtobemitigatedtoavoidunacceptableoutcomes.
Table3:ExampleQuestionstoHelpDeterminetheSignificanceofCommonConcernsforaSpecificUseCase
CONCERN
SIGNIFICANCEQUESTIONS
Correctness
?Whatistheriskorimpactofusinganincorrectresultintheusecase?
?Howdifficultisitfortheexpectedusertodeterminewhetheraresultiscorrect?
?Aretheregapsinthedatausedtotrainthe
LLMthatcouldadverselyimpactresults(e.g.,thedataisnotcurrentwithrecenttechnologyreleasesorcontainslittledataforanesotericprogramminglanguage)?
Disclosure
?CananLLMbepromptedwithoutdisclosingproprietaryinformation(e.g.,usinggenericquestionsorabstractingproprietarydetails)?
?Whatistheriskorimpactofathirdpartybeingabletoobserveyourprompts?
?Arethereexistingdatadisclosureconstraintsthatstrictlyneedtobeobserved?
Usability
?HowadeptareexpectedusersatpromptinganLLM?
?Howfamiliarareexpecteduserswith
approachesfordeterminingwhetherresultsareinaccurate?
?Howfamiliarareexpecteduserswith
approachesfordeterminingwhetherresultsareincomplete?
Performance
?Howquicklymustauserormachinebeabletoactonaresult?
?Aretheresignificantcomputingresourcelimitations?
?ArethereintermediatestepsintheinteractionwiththeLLMthatmayaffectend-to-end
performance?
Trust
?Areyourexpecteduserspredisposedto
acceptgeneratedresults(automationbias)orrejectthem?
?IsthedatatheLLMwastrainedonfreeofbiasandethicalconcerns?
?HastheLLMbeentrainedondatathatisappropriateforuse?
8
Thesecommonconcerns,andquestionstodetermine
theirsignificance,enableidentificationofcommontacticsforaddressingeachconcern.Atacticisacourseofactionthatcanbetakentoreducetheoccurrenceorimpactofaconcern.
Table4
summarizesacollectionoftacticsthatcanhelpmitigateeachconcern,alongwitharoughestimate
(High[H],Medium[M],orLow[L])oftherelativepotentialcostofusingeachtactic.Typically,themoreresources
(humanandcomputation)atacticrequires,thehigherthe
cost.Forexample,promptengineeringandmodeltraining
bothaddresscorrectness,butpromptengineeringistypicallymuchlessexpensive.Ofnote,sometactics(purplerows)
focusontechnicalinterventions,others(greenrows)focusonhuman-centeredactions,andtherest(grayrows)couldemploytechnicalorhuman-centeredinterventions.
Table4:TacticsThatCanBeUsedtoAddressCommonConcernswithLLMUse
CONCERN
TACTIC
DESCRIPTION
COST
Correctness
Promptengineering
Educateusersonpromptengineeringtechniquesandpatternstogeneratebetterresults.
L
Validatemanually
Dedicatetimetoallowuserstocarefullyvalidateinterimandfinalresults.
M
Adjustsettings
Changesettingsofexposedmodelparametersliketemperature
(randomnessofthemodel’soutput)andthemaximumnumberoftokens.
L
Adoptnewermodel
Usenewermodelsthatintegratetechnicaladvancesorimprovedtrainingcorpusesthatcanproducebetterresults.
M
Finetunemodel
Tailorapretrainedmodelusingorganization-ordomain-specificdatatoimproveresults.
M
Trainnewmodel
Useacustomtrainingcorpusorproprietarydatatotrainanewmodel.
H
Disclosure
Opendisclosurepolicy
Establishapolicythatallowsuserstoshareasmuchdeta
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025計(jì)算機(jī)軟件買賣合同
- 母線槽采購(gòu)合同范例
- 已經(jīng)簽好合同范例
- 石材 銷售合同范例
- 山西公司商業(yè)合同范例
- 銅仁幼兒師范高等??茖W(xué)?!稇?zhàn)略管理雙語》2023-2024學(xué)年第一學(xué)期期末試卷
- 銅仁幼兒師范高等??茖W(xué)?!督ㄖ?jīng)濟(jì)》2023-2024學(xué)年第一學(xué)期期末試卷
- 完整版100以內(nèi)加減法混合運(yùn)算4000道140
- 銅陵學(xué)院《機(jī)器視覺檢測(cè)技術(shù)》2023-2024學(xué)年第一學(xué)期期末試卷
- 陽江廣東陽江陽春市引進(jìn)中學(xué)校長(zhǎng)歷年參考題庫(kù)(頻考版)含答案解析
- 《第八課 我的身體》參考課件
- 2023年高考語文全國(guó)甲卷作文滿分佳作:張弛有度做時(shí)間的主人
- 四川省巴中市2023-2024學(xué)年高二1月期末生物試題【含答案解析】
- 2024年福建永泰閩投抽水蓄能有限公司招聘筆試參考題庫(kù)含答案解析
- 嬰幼兒托育服務(wù)與管理的職業(yè)生涯規(guī)劃職業(yè)目標(biāo)自我分析職業(yè)定位實(shí)施計(jì)劃
- 全科教學(xué)查房糖尿病620課件
- 恢復(fù)力與心理抗逆力的提升
- 行政復(fù)議意見書
- 環(huán)境土壤學(xué)課件
- 《生產(chǎn)安全事故報(bào)告和調(diào)查處理?xiàng)l例》知識(shí)考題及答案
- 110kv各類型變壓器的計(jì)算單
評(píng)論
0/150
提交評(píng)論