制藥科學中的AI Artificial Intelligence in Pharmaceutical Sciences_第1頁
制藥科學中的AI Artificial Intelligence in Pharmaceutical Sciences_第2頁
制藥科學中的AI Artificial Intelligence in Pharmaceutical Sciences_第3頁
制藥科學中的AI Artificial Intelligence in Pharmaceutical Sciences_第4頁
制藥科學中的AI Artificial Intelligence in Pharmaceutical Sciences_第5頁
已閱讀5頁,還剩116頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

JournalPre-proofs

ArtificialIntelligenceinPharmaceuticalSciences

MingkunLu,JiayiYin,QiZhu,GaoleLin,MinjieMou,FuyaoLiu,ZiqiPan,

NanxinYou,XichenLian,FengchengLi,HongningZhang,LingyanZheng,

WeiZhang,HanyuZhang,ZihaoShen,ZhenGu,HonglinLi,FengZhu

PII:

S2095-8099(23)00164-9

DOI:

/10.1016/j.eng.2023.01.014

Reference:

ENG1255

Toappearin:

Engineering

ReceivedDate:

30September2022

RevisedDate:

11December2022

AcceptedDate:

6January2023

Pleasecitethisarticleas:M.Lu,J.Yin,Q.Zhu,G.Lin,M.Mou,F.Liu,Z.Pan,N.You,X.Lian,F.Li,H.

Zhang,L.Zheng,W.Zhang,H.Zhang,Z.Shen,Z.Gu,H.Li,F.Zhu,ArtificialIntelligenceinPharmaceutical

Sciences,Engineering(2023),doi:

/10.1016/j.eng.2023.01.014

ThisisaPDFfileofanarticlethathasundergoneenhancementsafteracceptance,suchastheadditionofacoverpageandmetadata,andformattingforreadability,butitisnotyetthedefinitiveversionofrecord.Thisversionwillundergoadditionalcopyediting,typesettingandreviewbeforeitispublishedinitsfinalform,butweareprovidingthisversiontogiveearlyvisibilityofthearticle.Pleasenotethat,duringtheproductionprocess,errorsmaybediscoveredwhichcouldaffectthecontent,andalllegaldisclaimersthatapplytothejournalpertain.

?2023PublishedbyElsevierLtd.onbehalfofChineseAcademyofEngineering.

1

Research

SmartProcessManufacturing—Review

ArtificialIntelligenceinPharmaceuticalSciences

MingkunLua,c,JiayiYina,QiZhua,GaoleLina,MinjieMoua,FuyaoLiua,ZiqiPana,NanxinYoua,XichenLiana,FengchengLia,HongningZhanga,LingyanZhenga,c,WeiZhanga,HanyuZhanga,ZihaoShenb,d,ZhenGua,

HonglinLib,d,e,*,FengZhua,c,*

aTheSecondAffiliatedHospital,ZhejiangUniversitySchoolofMedicine&CollegeofPharmaceuticalSciences,ZhejiangUniversity,Hangzhou310058,ChinabShanghaiKeyLaboratoryofNewDrugDesign,EastChinaUniversityofScienceandTechnology,Shanghai200237,China

cInnovationInstituteforArtificialIntelligenceinMedicineofZhejiangUniversity,Alibaba–ZhejiangUniversityJointResearchCenterofFutureDigitalHealthcare,Hangzhou330110,ChinadInnovationCenterforAIandDrugDiscovery,EastChinaNormalUniversity,Shanghai200062,China

eLingangLaboratory,Shanghai200031,China

*Correspondingauthors.

E-mailaddresses:

hlli@

(H.Li),

zhufeng@

(F.Zhu).

ARTICLEINFO

Articlehistory:

Received

Revised

Accepted

Availableonline

Keywords:

Artificialintelligence

Machinelearning

Deeplearning

Targetidentification

Targetdiscovery

Drugdesign

Drugdiscovery

2

ABSTRACT

Drugdiscoveryanddevelopmentaffectsvariousaspectsofhumanhealthanddramaticallyimpactsthepharmaceuticalmarket.However,investmentsinanewdrugoftengounrewardedduetothelongandcomplexprocessofdrugresearchanddevelopment(R&D).Withtheadvancementofexperimentaltechnologyandcomputerhardware,artificialintelligence(AI)hasrecentlyemergedasaleadingtoolinanalyzingabundantandhigh-dimensionaldata.ExplosivegrowthinthesizeofbiomedicaldataprovidesadvantagesinapplyingAIinallstagesofdrugR&D.Drivenbybigdatainbiomedicine,AIhasledtoarevolutionindrugR&D,duetoitsabilitytodiscovernewdrugsmoreefficientlyandatlowercost.ThisreviewbeginswithabriefoverviewofcommonAImodelsinthefieldofdrugdiscovery;then,itsummarizesanddiscussesindepththeirspecificapplicationsinvariousstagesofdrugR&D,suchastargetdiscovery,drugdiscoveryanddesign,preclinicalresearch,automateddrugsynthesis,andinfluencesinthepharmaceuticalmarket.Finally,themajorlimitationsofAIindrugR&Darefullydiscussedandpossiblesolutionsareproposed.

1.Introduction

Inthepastfewdecades,thepharmaceuticalindustryhasbeenlimitedbytheextentofcutting-edgeresearchinpharmaceuticalsciences,becausethedevelopmentofnewdrugsisalongandcomplexprocessaccompaniedbyhighrisksandhighcosts[1,2].Inotherwords,thecurrentfieldofdrugresearchanddevelopment(R&D)requiressignificantproductivityimprovementstoshortenthecycletimeandcostofdrugdevelopment[3].Technologiessuchasnetworkpharmacology,RNA-sequencing(RNA-seq),high-throughputscreening(HTS),orvirtualscreening(VS)haveallacceleratedthediscoveryofnewtargets,aswellasnewdrugstosomeextent[4–9].Nevertheless,thesetechnologieshaverarelybeensignificantcontributorstothecurrentprocessofnewdrugdiscovery.Thus,thereisanurgentneedfornewtechnologytodrivethedevelopmentofnewdrugs.

Asthecomputingpowerofdevicesgrows,artificialintelligence(AI)hasbeenusedinmanyrealcases,suchasinimageclassificationandspeechrecognition,duetoitsabilitytolearn,process,andpredictmassiveamountsofinformation[10–12].Atpresent,afteralongperiodofdataaccumulation,incombinationwiththedevelopmentofhigh-throughputRNA-seqtechnology,massiveamountsofbiomedicaldatahavebeencollected[13–18].Biomedicaldata,whichhasahighlevelofheterogeneityandcomplexity,comesfromavarietyofsources,includingomicsdatafromdifferentplatforms,experimentaldatafrombiologicalorchemicallaboratories,datageneratedbypharmaceuticalcompanies,publiclydisclosedtextualinformation,andmanuallycollateddatafrompubliclyavailabledatabases[19–22].AIcanbeusedtolearnthepotentialpatternsinthesevastamountsofbiomedicaldata,therebybringingnewopportunitiesandchallengestothepharmaceuticalsciencesandindustries.

TheAlphaFold2systemusedAIinthecriticalassessmentofproteinstructureprediction14(CASP14)competitionandoutperformedothersinaccuratelypredictingthethree-dimensional(3D)structuresofproteins[23].Similarly,intheOpen-GraphBenchmarkLarge-ScaleChallenge(OGB-LSC)competition,agraphneuralnetwork(GNN)combinedwithatransformermodelwonthetoprankinpredictingthemolecularpropertiescalculatedbymeansofdensityfunctionaltheory(DFT),whichisdifficultandhighlytime-consumingusingtraditionalmethods[24].ThesecompetitionsdemonstratedthestrongabilityofAItoanalyzebiologicalorchemicaldata.Duetoitspowerfulcapabilitytoutilizerelatedbiomedicaldatatounderstandcomplexbiologicalsystemsandchemicalreactionspaces[25,26],AIhashadarevolutionaryimpactonallstagesofdrugR&D,includingnotonlyresearchonproteinsandsmallmoleculesbutalsotheassisteddesignofclinicaltrialsandpost-marketsurveillance[27].Furthermore,inpharmaceuticalcompanies,manystate-of-the-art(SOTA)AImodelshavebeenadoptedindiversepipelinestoshortentheR&Dcycletimeanddecreasecosts[28–30].

AItechniquesinthiscontextmainlyinvolvemachinelearning(ML)anddeeplearning(DL).BothMLandDLalgorithmsareinvolvedintargetdiscoveryandvalidation[31],drugdiscoveryanddesign[32],andpreclinicaldrugresearch[33],wheretheyareusedtoanalyzedifferentdatacharacteristicsindifferentformats.Afteradrugcandidateisenrolledinaclinicaltrial[34],DLplaysapivotalroleinassistinginthedesignoftheclinicaltrialandinsupervisingandanalyzingdatafromtheclinicalphaseIV[33].Approveddrugshaveastrongimpactonmanufacturing[35]andthemarketeconomy,andDLcanplayapartintheseareasaswell.Therefore,inthisreview,wepresentacomprehensiveoverviewofmostaspectsoftheuseofAIinthepharmaceuticalsciences.WefocusonhowAIcanbeusedtopromotetargetdiscoveryanddrugdiscovery(asshowninFig.1)andreflectonhowtofurtheracceleratethedevelopmentofthisfield.

3

Fig.1.SummaryofAIapplicationsinthepharmaceuticalsciences.ADMET:absorption,distribution,metabolism,excretion,andtoxicity.

2.BasicconceptsofAIanditsscopeofapplication

AIwasfirstproposedattheDartmouthConferencein1956andwasdefinedasanalgorithmthatgivesmachinestheabilitytoreasonandperformfunctions[36].Fromperceptualmachinestosupportvectormachines(SVM)andartificialneuralnetworks(ANNs),thedevelopmentofAIhasgonethroughseveralupsanddowns,andiscurrentlyflourishingthankstothehardwaresupportthatisnowavailable.BothMLandDLfallunderthecategoryofAI;strictlyspeaking,DLcanbeplacedwithinthecategoryofML.However,ourdiscussionofMLinthisreviewonlyconcentratesontraditionalMLmethods,suchasrandomforest(RF)andSVMs.

2.1.Thebigdataera

Inthecurrentbigdataera,giganticamountsofbiologicalandclinicaldatahavelaidafoundationfortheapplicationofAIinthefieldofmedicalandpharmaceuticalresearch.AlthoughAIhasbeensuccessfullyandeffectivelyappliedinmultipleaspectsofthedrugR&Dprocess,thequantityandqualityofmedicaldatahavebecomeoneofthemainobstaclestothedevelopmentofAIinthepharmaceuticalsciences.Thusfar,pharmaceuticaldatabaseswithdetailedandstructuredbigdataproposedbymedicinalresearchersworldwideareplayingakeyroleinpromotingAIapplicationsinmedicalandpharmaceuticalresearch.

Forexample,thetherapeutictargetdatabase(TTD)includesthemostcomprehensiveinformationaboutknownand

4

Proteins

Genes

Drugs/drug

targets

Diseases

RCSB

PDB

PRIDE

UniProt

InterPro

VARIDT

Ensembl

UCSC

Genome

GEO

GenBank

RefSeq

EA

TTD

ChEMB

L

PubChe

m

DrugBank

DrugMAP

DTC

PHARO

S

TCGA

DisGenNET

ClinVar

OMIM

PDBcontains3Dstructuraldataoflargebiologicalmolecules,suchasproteinsandnucleicacids

PRIDEisapublicdatarepositoryforproteomics,includingproteinandpeptideidentifications,post-translationalmodificationsandsupportingspectralevidence

UniProtisaproteindatabasecontainingproteinsequences,functionalinformation,andanindexofresearchpapersInterProprovidesfunctionalanalysisofproteinsbyclassifyingthemintofamiliesandpredictingdomainsandimportantsitesVARIDTprovidescomprehensivedataonallaspectsofdrugtransporters’variability

Ensemblprovidescentralizedgenomicdataandpowerfulfunctionalitiessuchasgeneannotationandregulatoryfunctionpredictions

TheUCSCGenomebrowseroffersaccesstogenomesequencedatafromavarietyofvertebrateandinvertebratespeciesandmajormodelorganisms

TheGEOisadatabaserepositoryofhigh-throughputgeneexpressiondataandhybridizationarrays,chips,andmicroarraysGenBankisanannotatedcollectionofallpubliclyavailableDNAsequences

RefSeqprovidesseparateandlinkedrecordsforthegenomicDNA,genetranscripts,andcorrespondingproteinsformultipleorganisms

EAcollectsbaselinegeneexpressiondatafordifferentspeciesandcontexts,andcontainsdifferentialstudiesreportingexpressionchangesundertwodifferentconditions

TTDincludesthemostcomprehensiveinformationaboutknownandexploredtherapeuticproteinandnucleicacidtargetsChEMBLisamanuallycuratedlibraryofbioactivecompoundswithdrug-likeproperties

PubChemcoverscollectiveinformationonchemicalmoleculesandtheiractivitiesinresponsetobiologicalassaysDrugBankcombinescomprehensivedrugtargetinformationwithspecificdrugdata

DrugMAPprovidesacomprehensivelistofinteractingmoleculesfordrugs/drugcandidates,includinginformationondifferentialexpressionpatterns

DTCenablestheexplorationofbioactivitydata,theprocessingofnewbioactivitydata,anddatacurationinordertoimprovetheunderstandingofDTIs

PHAROSprovidesacomprehensive,integratedknowledgebaseforthedruggablegenome

TCGAhasover2.5petabytesofgenomic,epigenomic,transcriptomic,andproteomicdatarelatedtothecancergenomeDisGenNETcontainslarge,publiclyavailablecollectionsofgenesandvariantsassociatedwithhumandiseasesClinVarisapublicarchiveofreportsonrelationshipsamonghumanvariationsandphenotypes,withsupportingevidenceOMIMisanonlinecatalogofhumangenesandgeneticdisorders

[43]

[44]

[18]

[45]

[46,4

7]

[48]

[49]

[50]

[51]

[52]

[53]

[37]

[54]

[17]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

exploredtherapeuticproteinandnucleicacidtargets,thetargeteddisease,pathwayinformation,andthecorrespondingdrugsdirectedateachofthesetargets.Itprovidesdetailedknowledgeofthefunctionsoftargets,aswellastheirsequence,3Dstructures,ligand-bindingproperties,relevantenzymes,andcorrespondingdruginformation[37].PubChem[17]providescollectiveinformationofchemicalmoleculesandtheiractivitiesinresponsetobiologicalassays,includingmolecularstructure,identifiers,physicochemicalproperties,patentinformation,andmoleculartoxicity.Somepopulardatabasesaimedatvariouspharmaceuticalissueshavebeenproposedandarefrequentlyused;theseplaysignificantrolesinpromotingtheapplicationofAIinmedicalandpharmaceuticalresearch[38–42].Summarizingvariouspopularpharmaceuticaldatabases,Table1[17,18,37,43–62]providesbriefinformationonpopularpharmaceuticaldatabases,categorizedintoprotein-related,gene-related,drug-related,anddisease-relateddatabases.

Table1

Pharmaceuticaldatabasesfocusingonproteins,genes,drugs/drugtargets,anddiseases.

FocusDatabaseDescriptionRefs.

PDB:proteindatabank;PRIDE:proteomicsidentificationdatabase;GEO:geneexpressionomnibus;EA:expressionatlas;DTC:drugtargetcommons;DTIs:drug–targetinteractions;TCGA:thecancergenomeatlas;OMIM:onlinemendelianinheritanceinman.

5

2.2.MLandDL

Unliketraditionalcomputerprogrammingcalculations,MLandDLcanlearnpotentialpatternsfromtheinputdatawithoutexplicitprogramming.Theyarenotlimitedbytheformatoftheinputdata,whichisbroadandcanincludetext,images,sound,andmore(alltypesofdatathatcanbeencoded)[63].Similartothehumanlearningmodel,MLandDLcangraduallyrecognizedifferentfeaturesofthedata,inferthepatternslyingwithin,andupdatetheirmodelparametersthroughcontinuousiterationsuntilavalidmodelisformed.

Accordingtotheapplicationscenarios,themodelscanbecategorizedintoregressionmodelsandclassificationmodels.Thedifferencebetweenclassificationandregressiontasksliesmainlyinwhetherthetypeofoutputvariableiscontinuousordiscrete.ChengandNg[64]appliedMLapproachestopredictthebiologicalactivityofper-andpolyfluorinatedalkylsubstances(PFAS)withanoutputofcontinuousvalues,andthisstudyisatypicalregressiontask.Hongetal.[65]builtaDLmodeltopredictwhetheraproteininabacteriumisoftheT4SEtype,withanoutputofdiscretevalues(e.g.,0/1),andthisstudyisatypicalclassificationtask.

Dependingonthetypeoflearningalgorithmrequiredtosolvetheproblem,modelsareconceptualizedintothreecategories:supervisedlearning,unsupervisedlearning,andreinforcementlearning.Supervisedlearningisalabeled-data-drivenprocessthattrainsamodelontherelationshipbetweeninputanditsprespecifiedoutputinordertopredictthecategoriesorcontinuousvariablesoffutureinput.Incomparison,unsupervisedmethodsareusedforidentifyingpatternsinunlabeleddatasetsandexploringadataset’spotentialstructurestoallowclusteringofthedataforfurtheranalysis.Inaddition,semi-supervisedlearningispart-waybetweensupervisedandunsupervisedlearning;itacceptsonlypartofthelabeleddatatodevelopatrainingmodelandisusedasapotentialsolutionforproblemsthatlackhigh-qualitydata[66].Reinforcementlearningperformsmodelconstructionthroughconstantinteractivelearning,relyingonpenaltiesforfailureorrewardsforsuccess.

2.3.IntroductiontodifferenttypesofML/DL-basedalgorithms

MLandDLmethodshavebeensuccessfullyappliedtosolverelevantbiomedicalproblems,withtheadoptedmodelingapproachvaryingfordifferentproblemsoreventhesameproblems.Forexample,smallmoleculesusedtobecharacterizedasengineeredfeaturesfordirectloadinginseveralMLmethodstopredicttheproperties;however,morerecently,GNNscanalsobeutilizedtodescribesmallmoleculesforpredictionsofproperties[67].Determiningthefunctionannotationsofproteinsisessentialfortheselectionofdruggableproteinsaspotentialtargets.Maxatetal.[68]conductedaconvolutionalneuralnetwork(CNN)toannotatethegeneontologyannotation(GOA)ofproteins.Nadavetal.[69]builtarecurrentneuralnetwork(RNN)forproteinfunctionannotations,andXiaetal.[70]combinedbothaCNNandRNNtopredictthegeneontology(GO)labelofproteins.

MLbuildsaspecialalgorithm—notaspecificalgorithm—thatfocusesonthefeaturesofthedataandtransformsthemintoknowledgethatmachinescanreadtoprovidehumanswithnewinsights.Variouscommonalgorithmsexistforresearcherstochoosefrom.Thena?veBayes(NB)algorithmisaprobabilistic-basedclassifierbasedonBayes’theoremandindependenceassumptionsbetweenfeatures;itisasimpleandintuitivealgorithm[71].AnRFalgorithmconstructsasetofunrelateddecisiontreesthatformawholehierarchicalstructure;undermodelconstruction,eachtreeisindividuallyresponsibleforacorrespondingproblem[72].Thefinaldecisionisbasedonthemajorityvotesofthedecisiontrees.Modelsthatmakedecisionsbasedonthisapproacharealsocommonlyreferredtoasensemblemodels.eXtremegradientboosting(XGBOOST)isascalableMLalgorithmbasedongradientboosting,whichisalsoanensemblemodel[73].Multi-layerperceptron(MLP)canbeviewedasadirectedgraphconsistingofmultiplenodelayers,eachfullyconnectedtothenextlayer,sothatitmapsasetofinputvectorstoasetofoutputvectors.SVMisoneofthemostwidelyappliedMLalgorithms.Anoptimalhyperplaneisusedtoclassifysamples,whichareobtainedbymaximizingthemarginsbetweendifferentclassesinaspecificdimensionalspace,withthedimensionalitybeingdeterminedbythenumberoffeatures[74].K-nearestneighbor(KNN)isregardedas“l(fā)azylearning”thatclassifiesthesampleaccordingtoonlyafewneighboringsampleswhendistinguishingbetweencategories[75].Inadditiontotheabovemethods,severalotherMLmethodssuchasprincipalcomponentanalysis(PCA),partialleast-squares(PLS),lineardiscriminantanalysis(LDA),andlogisticregression(LR)havebeenappliedinbiomedicaldataprocesses[76,77].

DLispopularduetoitspowerfulgeneralizationandfeature-extractioncapabilities;itslearningandpredictionprocessisend-to-end.UnlikethetraditionalMLprocess(whichoftenconsistsofmultipleindependentmodules),DLobtainstheoutputdata(output-end)directlyfromtheinputdata(input-end)duringthemodeltrainingprocessandcontinuouslyadjustsandoptimizesthemodelbasedontheerrorbetweentheoutputandthetruevalue,untilitmeetstheexpectedresult.Adeepneuralnetwork(DNN)isafeed-forwardneuralnetworkconsistingofdenselyconnectedinput,hidden,andoutputlayers.Itachievesthefeaturelearningofinputdatabysimulatingnonlineartransformationsbetweenneurons,witheachlayerconsistingofvariousneurons[78].ACNNisafeed-forwardneuralnetworkthatconsistsofconvolutional(featureextraction)andpooling(dimensionalityreduction)layers.Theconvolutionalandpoolinglayershelptoextractalltheinformationinadatasetwithout

6

consumingtoomuchtimeandcomputationalresources[79].AnRNNisaclassofANNinwhichlinkednodesformadirectedorundirectedgraphalongatemporalsequence.AnRNNincludesafeedbackcomponentthatallowssignalsfromonelayertobefedbacktothepreviouslayer.Itistheonlyneuralnetworkwithinternalmemory,whichhelpstoaddressthedifficultyoflearningandstoringlong-terminformation[80].AGNNisaconnectivitymodelthatderivesthedependenciesinagraphbymeansofinformationtransferbetweennodesinthenetwork[81,82].AGNNupdatesthestateofanodeaccordingtoneighborsofthenodeatanydepthfromthenode;thisstateisabletorepresentthenodeinformation.TheneuralnetworkarchitecturesofthefournetworksdescribedaboveareshowninFig.2.

Anautoencoder(AE),whichconsistsofanencoderandadecoder,isusedtolearnefficientencodingsofinputdata.Theencoding,whichisgeneratedbyfeedinginputtotheencoder,regeneratestheinputbythedecoder.AnAEisusuallyusedfordatacompressionanddimensionalityreductionthroughtherepresentationmethods(i.e.,theencoding)ofasetofdata[83].Agenerativeadversarialnetwork(GAN)iscomposedoftwounderlyingneuralnetworks:ageneratorneuralnetworkandadiscriminatorneuralnetwork.Theformerisusedtogeneratecontent,whilethelatterisusedtodiscriminatethegeneratedcontent[84].Modelscanalsobeusedincombinationtosolveawiderrangeofproblems.Forexample,agraphconvolutionnetwork(GCN)extendsconvolutionaloperationsfromtraditionaldata(e.g.,images)tographdata[85].

Fig.2.SchematicnetworkarchitecturesforaDNN,GNN,CNN,andRNN.

Whenamodelfailstolearntheunderlyingpatternsindatafeatureseffectivelyandlosestheabilitytogeneralizetonewdata,suchaproblemiscalledmodelunderfitting[86].Incontrast,overfittingoccurswhenthemodelistrainingandnoisein

7

thedatafittedasarepresentativefeatureresultinginpoorpredictionsfornewdata[87].Comparedwithunderfitting,modeloverfittingismoredifficulttodealwith.Modelsoftenbecomeoverfittedduetobeingoverlycomplexorbecauseofanunderrepresentationofdata.Adatasetusedforamodelisoftendividedintoatrainingset,validationset,andtestset.Thesesetsarerespectivelyusedformodeltraining,modeladjustment,andmodelevaluation.Toputitsimply,amodelthatworksbadlyonboththetrainingandtestsetsisanunderfittedmodel,whileamodelthatworkswellonthetrainingsetbutbadlyonthetestsetisanoverfittedmodel.Typicalwaystosuppressoverfittingincluderegularization,dataaugmentation[88],dropout[89],earlystopping,ensemblelearning,andamongothermethods.

Researchersencounteredunderfittingandoverfittingproblems,usingonlyonemodeloftraditionalepidemicmodelsorMLmodels,whenpredictingthelong-termtrendsofthecoronavirusdisease2019(COVID-19)pandemic.Toaddresstheseissues,Sunetal.[90]proposedanewmodelcalleddynamic-susceptible-exposed-infective-quarantined(D-SEIQ).TheD-SEIQmodelcanaccuratelypredictthelong-termtrendsofCOVID-19outbreaksbyappropriatelymodifyingthesusceptible-exposed-infective-recovered(SEIR)modelandintegratingML-basedparameteroptimizationunderreasonableepidemiologyconstraints.

Differentmodelshavedifferentevaluationcriteria.Inregressionmodels,commonlyusedevaluationcriteriaincludemeansquarederror(MSE),rootMSE(RMSE),andRsquared.Inclassificationmodels,themorecommonlyusedcriteriaarerecall,precision,andF1score.Thereceiveroperatingcharacteristic(ROC)curveandprecision-recallcurve(PRC)arethemostcommonlyusedevaluationcriteriainclassificationmodels,withROCcurvestakingintoaccountbothpositiveandnegativecasestoassesstheoverallperformanceofthemodel,whilePRCsfocusmoreonpositivecases[91].

2.4.Abriefdescriptionofmoleculerepresentationasmodelinput

Overtime,theaccumulationofdataonsmallmoleculesandproteinshasresultedinanextremelylargedataresource.Databasesofmolecularsequences,structures,physicochemicalproperties,andsoforthhavebeencollectedandorganizedbydifferentorganizationsandcontainagreatdealofknowledgeandinformation.However,thedifferentsourcesandformatsofthedatamakeitdifficulttointegratethecorrelateddatafrommultipleheterogeneoussources.Therefore,itisparticularlyimportanttoadoptsuitablemethodstorepresentmoleculesinanappropriatewayandtominethecrucialinformationinthedataonmoleculesbymeansofAI[92].CurrentAIalgorithmsarehighlydependentonthequalityofthedata;thus,whenperformingmodelconstruction,itisnecessarytounifytheinputformatofmolecules,suchasbyrepresentingsmallmoleculesandproteinsasmodel-readablevectorsormatrices.

Atpresent,therepresentationofsmallmoleculesisgenerallydoneusingoneoffourmainapproaches.Thefirstapproachinvolvesknowledge-basedrepresentation.MoleculardescriptorsandmolecularfingerprintsbasedonhumanaprioriknowledgearewidelyusedinvariousMLorDLalgorithms[93].Thesecondapproachinvolvesdirectrepresentationbasedonimages.CNNshavenowbeenusedtolearnrulesfromtwo-dimensional(2D)digitalimages.A2DchemicaldigitalgridofamoleculecanbedirectlyusedasinputtoallowaCNNmodeltolearnthepropertiesofthemolecule[94].Thethirdapproachisstring-basedrepresentation.Forexample,atypicalcanonicalsimplifiedmolecular-inputline-entrysystem(SMILES)representssmallmoleculesintheformofstrings.Thus,CNNsandRNNscanbefurtherusedtolearnmolecularembeddingsfromthestringrepresentationsofchemicalstructures[95–97].Thefourthapproachinvolvesgraph-basedfeaturerepresentation.Representationmethodsbasedongraphconvolutionorgraphattentionhavebeenwidelyusedtoexplorethefeaturerepresentationofsmallmolecules.Inthesemethods,atomsandbondsareconsideredtobenodesandedges,respectively,whilenewmolecularrepresentationsareobtainedduringthecontinuousupdatingofinformationatindividualnodes.Graph-basedrepresentationshaveachievedoutstandingperformanceinavarietyofpharmaceuticallearningtasks[98,99].

Proteinrepresentationmethodscanbebasicallyclassifiedintofourcategories:representationbasedonintrinsicpropertiesofsequences,representationbasedonphy

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論