Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型

上傳人：策*** IP屬地：山西上傳時間：2024-10-03 格式：DOCX 頁數(shù)：41 大小：812.56KB 積分：19.9 舉報 版權(quán)申訴

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第2頁

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第3頁

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第4頁

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第5頁

已閱讀5頁，還剩36頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

Text2CAD:GeneratingSequentialCADModelsfromBeginner-to-ExpertLevelTextPrompts

arXiv:2409.17106v1[cs.CV]25Sep2024

MohammadSadilKhan*t1,2,3SankalpSinha*1,2,3TalhaUddinSheikh1,2,3DidierStricker1,2,3SkAzizAli1,4MuhammadZeshanAfzal1,2,3

1DFKI2RPTUKaiserslautern-Landau3MindGarage4BITSPilani,Hyderabad

Abstract

Prototypingcomplexcomputer-aideddesign(CAD)modelsinmodernsoftwarescanbeverytime-consuming.Thisisduetothelackofintelligentsystemsthatcanquicklygeneratesimplerintermediateparts.WeproposeText2CAD,thefirstAIframeworkforgeneratingtext-to-parametricCADmodelsusingdesigner-friendlyinstructionsforallskilllevels.Furthermore,weintroduceadataannotationpipelineforgeneratingtextpromptsbasedonnaturallanguageinstructionsfortheDeepCADdatasetusingMistralandLLaVA-NeXT.Thedatasetcontains~170Kmodelsand~660Ktextannotations,fromabstractCADdescriptions(e.g.,generatetwoconcentriccylinders)todetailedspecifications(e.g.,drawtwocircleswithcenter(x,y)andradiusr1,r2,andextrudealongthenormalbyd...).WithintheText2CADframework,weproposeanend-to-endtransformer-basedauto-regressivenetworktogenerateparametricCADmodelsfrominputtexts.Weevaluatetheperformanceofourmodelthroughamixtureofmetrics,includingvisualquality,parametricprecision,andgeometricalaccuracy.OurproposedframeworkshowsgreatpotentialinAI-aideddesignapplications.Projectpageisavailableat

https://sadilkhan.github.io/text2cad-project/.

1Introduction

Computer-AidedDesign(CAD)playsacrucialroleinin-dustrialdesignandadditivemanufacturing(AM),revolu-tionizingthewayproductsareprototyped[

.Thistypeofprototypingrequiresfeature-basedpartmodeling[

],pre-cisionmeasurements[

],andcreativepartediting[

]atdifferentdesignstages[

39,

59]

.WhileCADsoftwaresavesthefinalmodelasaboundaryrepresentation(B-Rep)[

],thedesignprocessofteninvolvesachainof2Dsketches(e.g.,circles,lines,splines)and3Doperations(e.g.,extru-sion,loft,fillet)[

55,

57,

58,

18]

.Thisrepresentationallowsthedesignerstocontrolthedesignhistoryanditerativelyrefinethefinalmodels.

Despitetheircapabilities,modernCADtoolslacktheAI-assisteddesignintegration[

36]

.InFigure

1,weillustrate

howanintelligentsystemcapableofgeneratingparametricCADmodelsfromtextualdescriptionscanbeutilizedtoas-sembleacomplex3Dmodel.AlthoughtoolslikeFreeCAD

],SolidWorks[

],andPara-Solid[

]offer3DCAD

modelsfromcatalogslikeMcMaster-Carr[

]forthereuse*EqualContributions.

tCorrespondingauthor(mohammad.khan@dfki.de)Preprint.Underreview.

AringshapeiscreatedbydrawingtwoconcentriccirclesontheXYplaneandscaledby1andunit1.2respectively,andextrudingitalongtheZ-axis0.1unittoforma3D

modelwithahollowcenter.

Alongrectangularshape.

Asimplerectangularbaseoflength0.05unit,width0.05unitandheight1unit

Figure1:Designerscanefficientlygen-erateparametricCADmodelsfromtextprompts.Thepromptscanvaryfromabstractshapedescriptionstodetailedparametricinstructions.

ofexistingCADmodels,nosuchsystemcurrentlyexiststhatcangenerateparametricCADmodelsfromtextualdesigndescriptions.OneprimarychallengefordevelopingsuchasystemisdefiningsuitabletextualdescriptionsforparametricCADgeneration,makingitdifficulttocreatedeeplearningmethodsthataccuratelyconvertthesedescriptionsintopreciseCADmodels.

Toaddressthisgap,inthispaperweproposeText2CADasthefirstAIframeworkforgeneratingparametricCADmodelsrepresentedbyconstructionsequences(i.e.,parametersfor2Dsketchesandextrusions)fromdesign-relatedtextprompts.Wefacedtwoprimarychallengesinfulfillingthisgoal:(1)theunavailabilityofthedatasetand(2)anetworktomapthetextsintoCADconstructionsequences.Towardsthisend,weintroduceadataannotationpipelinetogenerateadatasetcontainingtextualdescriptionsoftheCADmodelsinDeepCAD[

]dataset.Weleveragetheopen-sourceLargeLanguageModels(LLMs)[

]andVisionLanguageModels[

26,

]forthistask.Ourannotatedtextpromptsaremulti-levelinnaturerangingfromhighlyabstract(e.g.,alongrectangularshape,athinS-shapedobject)tomorespecificwithdetailedparametricdescriptions(e.g.,firstdrawarectanglefrom(x1,y1)and(x2,y2)thenextrudethesketchalongz-axis..).Thesepromptsaredesignedforusersofallskilllevelsandcancontainarithmeticlogicandnumericalexpressionsaspartofthedesigndetails.Withinthisframework,weintroduceText2CADTransformer[

],aconditionaldeep-generativenetworkforgeneratingCADconstructionlanguage

fromtextpromptinanauto-regressivefashion.

Currently,thereareworksontext-to-3Dgeneration[

24,

33,

50,

31,

]thathaveshownsignificantadvancementsincreating3Dscenesandshapesfromtextualdescriptions.Butexistingtext-to-3DmethodsarenotapplicableforgeneratingCADmodelsfromtextdescriptionsasthefinaloutputofthesemodelsisneitherparametricnorhuman-editableinnature.VeryrecentlywebAPIfromzoodevelopers

]hasintroducedCADgenerationappusingtextpromptfromusersandprogrammable

scriptinglanguage(asKittiCADLanguaget)fordesignerstoeditandmodify.

However,thegeneratedCADmodelsareobtainedintheformofsolid-body,andnotdecomposedtoitsintermediatesketch-and-extrusionstepsasproposedinourText2CAD.Ontheotherhand,GivenrawnumericaldataofanyparametricCADmodel,currentstate-of-the-artlargelanguagemodels(LLMs),suchaspre-trainedMistral-50b[

]orGPT-4[

]andopensourceLlama[

]mayonlyderiveproceduralscriptingcodesforotherAPIs,suchasFreeCAD[

]orOpenSCAD[

],togenerateamodel.However,incontrasttoourText2CAD,suchLLMaugmentedCADgenerationapproachwillnotbedesigner-friendly,notsuitableforbeginner-leveldesigners,willnotautomatethedevelopmentprocessineasyways,andwillrestrictthere-usabilityofthescriptsincaseofcomplexshapes.Alternatively,Usingstate-of-the-artvisionlanguagemodels,suchasLLaVa[

26,

],GPT-4V[

],asanalternativefordeducingCADconstructionsequencesperformspoorlybecauseoftwomainreasons–(1)notrainingdatasetsareavailablethatprovidenaturallanguage-baseddesigninstructionsasannotationsforrawCADconstructionsequencesand(2)mostVLMsaretrainedoncategoricaldescription/captiondatasetsof3Dobjects(e.g.,LLaVA-NeXT[

]predicts‘twoconcentrichollowcylinders’astoiletpaper).WeremovetheabovelimitationsinourText2CADbycreatingnewlarge-scaleannotationsforDeepCAD[

]datasetusingresponsesfromLLMsandVLMstotrainourmulti-modalmodel.Ourcontributionscanbesummarizedasfollows:

?WeproposeText2CADasthefirstAIframeworkforgeneratingparametric3DCADmodelsusingtextualdescriptions.

?WeintroduceadataannotationpipelinethatleveragesbothLLMsandVLMstogenerateadatasetthatcontainstextpromptswithvaryinglevelofcomplexitiesandparametricdetails.

?Weproposeanend-to-endtransformer-basedautoregressivearchitectureforgeneratingCADdesignhistoryfrominputtextprompts.

?Ourexperimentalanalysisdemonstratessuperiorperformanceoverthetwo-stagebaselinemethodadaptedforthetaskathand.

Therestofthesectionsareorganizedasfollows:Section

reviewstherelatedworkinCADdomains.Section

outlinesourdataannotationpipeline.Section

detailsourproposedText2CADtransformerarchitecture.Section

presentsourexperimentalresults.Section

discussesthelimitationsofourcurrentframework,andSection

concludesthepaper.

tInthispaper,thephrases‘CADconstructionlanguage’,‘CADdesignhistory’and‘CADconstructionsequence’areusedinterchangeably.

t/KittyCAD/modeling-app/tree/main?tab=readme-ov-file

2RelatedWork

DatasetsandGenerativemodelsforCAD:CurrentdatasetsandgenerativemodelsforCADare

limitedandoftennotsuitedfordevelopingknowledge-basedCADapplications.Somedatasetsfocus

solelyon2Dsketchdesign[

42,

],andotherpopulardatasetslikeABC[

],Fusion360Gallery

[54

],Thingi10K[

],andCC3D[

]provide3Dmeshes,BRep(boundaryrepresentation),andothergeometryortopologyrelatedannotationsthataresuitablefor3Dmodeling.DeepCAD[

55]

dataset,asubsetofABC,andFusion360[

]provideCADconstructionsequencesintheformofsketchandextrusiontodeducedesignhistory.However,CADmodelsmayconsistofnumerousothertypesofoperationsbesideextrusion,andsuchconstructionsequenceswithotherCADoperationsarenotavailableinthecurrentdatasets.Finally,thereisnodatasetavailablethatprovidestextualdesigndescriptionsasannotationstocreateaconversationalAIsystemforCADmodeling.

Currentsupervisedlearningmethodsthatfallundersequence-to-sequenceSketch/CADlanguagemodeling

[55,

57,

18,

]filtersoutunnecessarymetadatafromlengthyrawdesignfilesandrepresentthemasdesiredsequenceofinput/outputtokens.Forinstance,Ganinetal.

]representsdesignfilesasmessagesinProtocolBuffer[

]format.HierarchicalNeuralCoding(HNC)method[

57]

representsthedesireddesignsequenceintreestructureof2Dsketchloops,2Dboundingboxesoverallloopsasprofile,and3Dboundingboxesoverallprofilesassolid.CAD-SIGNet[

]representsCADconstructionlanguageasasequencecomposedof2Dsketchandextrusionparameters.InText2CADmethod,wemaptherawdesignhistoryobtainedfromDeepCADmetadataintotextualdescriptions.

CADConstructionLanguageusingTransformers:Transformer-based[

]networkarchitectureisthepreferredchoiceformanydeeplearning-basedapplicationsrelatedtoCADmodeling[

],3Dscan-to-CADreverseengineering[

18,

],representationlearning[

]andothers[

38]

.CADasalanguage[

]describehow2Dsketchescanbetransformedintodesignlanguagebysequencingtokensof2Dparametriccurvesasmessagepassingunits.MixtureofTransformer[

]andPointerNetworks[

]decodethesketchparametersinauto-regressivefashion.

Formalizingconstrained2Dsketches,i.e.,collectionofcurves(e.g.,line,arc,circleandsplines)withdimensionalandgeometricconstraints(e.g.,co-incidence,perpendicular,co-linearity),asalanguageforCADmodelinghasbeenstudiedoverlastfewyears[

35,

32,

53,

23]

.However,thefirstproposalofdevelopingaCADlanguageinterfacewassuggesteddecadesagoin[

40]

.Amongtherecentworksinthisdirection,SketchGen[

]represents2Dsketchesasasequenceofthetokensforcurvesandconstraints.Thedecoder-onlytransformermodelin[

]predictsoptimalsketchesthroughnucleussampling[

]oftokenembeddingvectors,focusingonreplicatingdrawingprocessesofCADdesigners.Polygen[

]methodalsoemploysTransformermodel[

]togeneratedetailed3DpolygonalmeshesbylearningjointdistributiononverticesandfacesofaCAD.Asanextensionof[

],TurtleGen[

]alsoproposedecoder-onlytransformermodeltolearnjointdistributionofverticesandedgestogetherthatformsketchesandrepresentedasgraphsinCADmodels.

3DCADmodelingstepsasalanguageisnotdirectlyformulatedbyanystate-of-the-artmulti-modalCADlearningmethods[

55,

28,

18,

58,

30,

57,

23]

.Khanetal.

[18

]proposeanovelauto-regressivegenerationofsketch-and-extrusionparametersdirectlyfrom3DpointcloudsasinputwhereasDeepCAD[

],SkexGen[

],HNC[

]andMultiCAD[

]adoptsatwo-stagestrategytogeneratetheoutput.MultiCAD[

]adoptmulti-modalcontrastivelearningtoassociategeometryfeatureswithfeaturesofCADconstructionsequenceswhereasCAD-SIGNet[

]requiresanextrastepasuserfeedbacktovoteforoneofthemanygeneratedsketchesatcurrentsteptopredictthenext

one.Unlikepreviousapproaches,ourproposedText2CADtransformeristhefirstauto-regressive

networkthatgeneratesCADconstructionsequencesdirectlyfromtextualdescriptions.

3Text2CADDataAnnotation

ThediagraminFig.

outlinestheprocessofgeneratingtextualannotationsforDeepCADdataset[

55]

usingLargeLanguageModels(LLMs)[

16,

34,

]andVisionLanguageModels(VLMs)[

26,

25]

.TheseannotationsdescribethecorrespondingCADconstructionworkflowinhumaninterpretableformat.ToenrichtheDeepCAD[

]datasetwithtextualannotations,weimplementatwo-stagedescriptiongenerationpipelineusingthecapabilitiesofbothLLMsandVLMs.Thetwostagesare-(1)generatingabstractshapedescriptionsusingVLM,and(2)extractingmulti-leveltextualinstructionsfromLLMbasedontheshapedescriptionsanddesigndetailsprovidedinthedataset.AnexampletextpromptfortheCADmodelshownintop-leftoftheFigure

‘TheCADmodelconsists

3DCADModel

DeepCADDataset

Multi-ViewImages(MVI)Extractor

RawJson

VLMPrompt

MinimalMetadataGenerator

LLaVaNext+Mistral-7B

Saenormaon

hpifti

MinimalJson

VLMPrompt

[INST]ThisisanimageofaComputerAidedDesign(CAD)model.YouareaseniorCADengineerwhoknowstheobjectname,whereandhowtheCADmodelisused.GiveanaccuratenaturallanguagedescriptionabouttheCADmodeltoajuniorCADdesignerwhocandesignitfrom

yoursimpledescription.Wrapthedescriptioninthefollowingtags<OBJECT>and</OBJECT>.

Followingaresomebadexamples:

1.CADmodel

2.Metalobject

Abidebythefollowingrules.Rules:

1.Donotusewordslike-"blue","shadow","transparent","metal","plastic","image","black","grey","CADmodel","abstract","orange","purple","golden","green"

2./INST]

NaturalLanguageInstruction(NLI)GenerationPrompt

NLIPrompt

[INST]

YouareaseniorCADengineerandyouaretaskedtoprovidenaturallanguageinstructionstoajuniorCADdesignerforgeneratingaparametricCADmodel.

OverviewinformationabouttheCADassemblyJSON:

1.TheCADassemblyjsonliststheprocessofconstructingaCADmodel.

2.EveryCADmodelconsistsofoneormultipleintermediateCADparts.

3.TheseintermediateCADpartsarelistedinthe"parts"keyoftheCADassemblyJSON.

4.ThefirstintermediateCADpartisthebasepartandthesubsequentpartsbuilduponthepreviouslyconstructedpartsusingtheoperationdefinedforthatpart.

5.Allintermediatepartscombinetoafinalcadmodel.

EveryintermediateCADpartisgeneratedusingthefollowingsteps:Step1:Drawa2Dsketch.

Step2:Scalethe2Dsketchusingthesketch_scalescalingparameter.

Step3:Transformthescaled2Dsketchinto3DSketchusingtheeuleranglesand

translation.

Step4:Extrudethe2Dsketchtogeneratethe3Dmodel.[/INST]

ExampleNLIPrompt

Mistral-50B(MoE)

MinimalJSON

K-Shot

NLIResponse

Multi-LevelNaturalLanguage

Instruction(NLI)GenerationPrompt

BeginnerLevelCADInstructions

AbstractLevelCADInstructions

ExpertLevelCADInstructions

IntermediateLevelCADInstructions

《ii>

“final_shape”:“Acylindricalobjectwithaflattopandbottom”

“parts”:{“part_1”:{

"coordinate_system":{

"EulerAngles":[0.0,0.0,0.0],

"TranslationVector":[0.1071,0.1071,

0.0974]},

"sketch":{

"face_1":{

"loop_1":{

"circle_1":{

"Center":[0.112,0.112],

"Radius":0.112}...},...}

"extrusion":{

"extrude_depth_towards_normal":0.0,"extrude_depth_opposite_normal":

0.0487,

"sketch_scale":0.6429,"operation":

"NewBodyFeatureOperation:},...}

Level-0Level-1Level-2Level-3

Figure2:Text2CADDataAnnotationPipeline:Ourdataannotationpipelinegeneratesmulti-leveltextpromptsdescribingtheconstructionworkflowofaCADmodelwithvaryingcomplexities.Weuseatwo-stagemethod-(Stage1)ShapedescriptiongenerationusingVLM(Stage2)Multi-LeveltextualannotationgenerationusingLLM.

ofacylindricalobjectwithaflattopandbottomconnectedbyacurvedsurfaceandslightlytaperedtowardsthebottom.Thisobjectiscreatedbyfirstsettingupacoordinatesystem,thensketchingtwoconcentriccirclesanddrawingaclosedloopwithlinesandanarconasharedplane.Thesketchisthenextrudedalongthenormaldirectiontoformasolidbody.Theresultingparthasaheightofapproximately0.0487units’.Inthisexample,thephraseinthevioletcolorisgeneratedbyaVLM.AnLLMusesthisdescriptionalongwiththeCADconstructioninformationtogeneratetheprompt.

ShapeDescriptionsusingVLM:Theinitialstepofourannotationgenerationpipelineinvolvesgeneratingabstractobject-leveldescriptionsoftheCADmodelsusingLLaVA-NeXT[

]model.Theobjectiveinthisstepistoaccuratelycapturethestructuraldescriptionsofthe3Dshape,suchas"aring-likestructure","acylinder",or"ahexagonwithacylinderontop".WegenerateshapedescriptionsforboththefinalCADmodelanditsintermediateparts.Wefirstproducemulti-viewimagesfrompredeterminedcameraanglesforeachindividualpartsandthefinalCADmodel.Theseimagesarethenutilizedinapredefinedprompt(refertothetop-rightofFigure

2)fortheLLaVA

-NeXT[

]modeltogeneratesimplifiedshapedescriptionsofallindividualpartsaswellasthecompletefinalshape.

Multi-levelDesignInstructionsusingLLM:Inthisstage,multipletextualannotationscorrespond-ingtodifferentdesigndetailsofaCADmodelaregeneratedusingMixtral-50B[

]throughaseriesofsteps(refertothemiddle-columninFigure

.TheDeepCAD[

]datasetcontainsCADconstruc-tionsequencesinJSONformat.WefirstpreprocesstherawCADconstructionsequencesusinga‘MinimalMetadataGenerator’whichreplacesrandom,meaninglesskeyswithmoremeaningfulterms(e.g.,"part_1","loop_1").Thisstepaimstoreducethehallucinations[

]byMixtral-50B[

16]

.TheminimalmetadataisfurtheraugmentedwiththeshapedescriptionsforeachpartsandthefinalmodelgeneratedbytheVLM.TheoutputofthisprocessisacondensedrepresentationoftheshapesandtheirrelationalattributeswithintheCADdesign(seebottom-leftinFigure

.Withtheminimalmetadataathand,wethencraftaprompt(refertothebottom-rightinFigure

2)togeneratedetailednatural

languageinstructions(NLI)ensuringaminimallossofinformationfromtheminimalmetadata.Afterward,theNLIresponsesarerefinedbyLLMusingak-shot[

]"Multi-LevelNaturalLanguageInstructionGenerationPrompt"togeneratemulti-levelinstructionsofdifferentspecificityanddetails.Wecategorizetheselevelsas:

?Abstractlevel(L0):AbstractShapeDescriptionsofthefinalCADmodelextractedusingVLMinthefirststage.

?Beginnerlevel(L1):SimplifiedDescription-Aimedatlaypersonsorpreliminarydesignstages,thislevelprovidesasimplifiedaccountofthedesignsteps,eschewingcomplexmeasurementsandjargon.

?Intermediatelevel(L2):GeneralizedGeometricDescription-Thislevelabstractssomeofthedetails,providingageneralizeddescriptionthatbalancescomprehensibilitywithtechnicalaccuracy.

?Expertlevel(L3):DetailedGeometricDescriptionwithRelativeValues-Here,theinstruc-tionsincludeprecisegeometricdescriptionsandrelativemeasurements,cateringtouserswhorequireanin-depthunderstandingorareperformingtheCADmodelingtask.

Ourannotationsconsistofthegeneratedmulti-levelinstructionsatthefinalstage.Wegeneratetheseannotationsoverthecourseof10days.It’sworthnotingthatonecandirectlygeneratethemulti-levelinstructionsfromtheminimalmetadatawithoutcreatingthedetailednaturallanguageinstructionsinthesecondstage.WeobservethatthisstrategyincreasestheLLM’stendencyforhallucinations[

13]

anditgeneratesmoreinaccuratemulti-levelinstructions.Insteadourmethodfollowschain-of-thoughtpromptingstrategyasoutlinedin[

]whichgreatlyreducessuchbottleneck.MoredetailsonourannotationpipelineareprovidedinSection

and

ofthesupplementarymaterial.

4Text2CADTransformer

TheText2CADtransformerarchitecture,asshowninFigure

3,isdesignedtotransformnatural

languagedescriptionsinto3DCADmodelsbydeducingallitsintermediatedesignstepsautoregres-sively.GivenaninputtextpromptT∈RNp,whereNpisthenumberofwordsinthetext,ourmodellearnstheprobabilitydistribution,P(C|T)definedas

whereCistheoutputCADsequence,NcisthenumberoftokensinCandθisthelearnablemodelparameter.WerepresentCasasequenceofsketchandextrusiontokensasproposedin[

18]

.Eachto-kenct∈Cisa2Dtokenthateitherdenotesa(1)2D-coordinateoftheprimitivesinsketch,(2)oneoftheextrusionparameters(eulerangles/translationvector/extrusiondistances/booleanoperation/sketchscale)or(3)oneoftheendtokens(curve/loop/face/sketch/extrusion/startsequence/endsequence).Following[

55,

18]

.Wequantizethe2Dcoordinatesaswellasthecontinuousextrusionparametersin8bitsresultingin256classlabelsforeachtoken.AnexampleCADsequencerepresentationisprovidedinFigure

(inbluetable).Formoredetails,pleaserefertothesupplementarysection

Nowweelaborateonthevariouscomponentsofthearchitecture,detailingtheprocessesinvolvedinconvertingtexttoCADrepresentations.Lettheinputtextpromptattimestept?1beT∈RNpandtheinputCADsubsequenceC1:t?1∈RNt?1×2.

PretrainedBertEncoder:TheinitialstepintheText2CADnetworkinvolvesencodingthetextualdescriptionprovidedbytheuser.Thisdescriptioncanrangefromhighlyabstract,beginner-friendlyinstructionstodetailed,expert-levelcommands.Tohandlethisdiversity,weusedapre-trainedBERT(BidirectionalEncoderRepresentationsfromTransformers)[

]model,denotedBERTpre-trained.TheinputtextT∈RNtistokenizedandpassedthroughtheBERTmodeltogeneratecontextual

embedding:T=BERTpre-trained(T)(2)

Here,T∈RNp×dprepresentsthesequenceoftokenembeddingvectorsthatcapturethesemanticmeaningoftheinputtext,whereNpisthenumberoftokensanddpisthedimensionoftheembedding.AdaptiveLayer.Anadaptivelayerconsistingof1transformerencoderlayer,refinestheoutputToftheBERTencodertobettersuittheCADdomainaligningwiththespecificvocabularyandstructuralrequirementsofCADinstructions.TheadaptivelayeroutputstheembeddingTadapt∈RNp×dpusingTadapt=AdaptiveLayer(T)(3)

CADSequenceEmbedder:EachtokenintheinputCADsubsequenceC1:t?1isinitiallyrep-resentedasaone-hotvectorwithadimensionof256,resultinginaone-hotrepresentation,

Ct?1∈RNt?1×2×256.Forthesakeofsimplicity,werepresentCt?1=[Cx:t?1;Cy:t?1],

whereCx:t?1,Cy:t?1∈RNt?1×256.TheinitialCADsequenceembeddingF?1∈RNt?1×dis

obtainedusingEq.

F?1=Cx:t?1W?1+Cy:t?1W?1+P(4)

,whereW?1,W?1∈RNt?1×darelearnableweightsandP∈RNt?1×disthepositionalencoding.

Layer-wiseCrossAttention.Weuseastandardtransformerdecoder[

]withlayer-wisecross-attentionmechanismbetweentheCADandthetextembeddingwithinthedecoderblocks.The

HighlyAbstract

BeginnerCADDesignerFriendlyInstructions

MoreDetailed

ForExpertsLevel

Instructions

《>

TheCADmodelfeaturesanelegant,curved,hollowdesigninspiredbythestylizedletter'O'.

C2:t

C1:t1

Pre-trainedBeRTEncoder

139

266

176

139

266

176

139

(139,266)

(52,176)(225,176)

(139,139)(139,44)

AdaptiveLayer

uuu·v·i··uuaIadapt

Fti1

SketchTokens

150

139

142

139

150

139

142

139

MLP

TransformerDecoderBlock

FFN

MHA

Downsampler

kg一—yv

Cross-Attention

Reconstructed3DCADModel

TrainableParametersFrozenParameters

ExtrusionTokens

PositionalEncoding

OutputCAD

InputCADTokens

Tokens

Figure3:Networkarchitecture:Text2CADTransformertakesasinputatextpromptTandaCADsubsequenceC1:t?1oflengtht?1.ThetextembeddingTadaptisextractedfromTusingapretrainedBeRTEncoder([

])followedbyatrainableAdaptivelayer.Theresultingembedding

TadaptandtheCADsequenceembeddingF?1ispassedthroughLdecoderblockstogeneratethe

fullCADsequenceinauto-regressiveway.

layerwisecross-attentionmechanismfacilitatestheintegrationofcontextualtextfeatureswiththeCADembedding,allowingthemodeltofocusonrelevantpartsofthetextduringCADconstruction.

EachdecoderblockltakesasinputCADembeddingF?andtextembeddingTadapt,whereF?is

theoutputofthepreviousdecoderblock(forthefirstdecoderblock,theinputCADembeddingis

F?1).Atfirst,theCADembeddingF?1∈RNt?1×disgeneratedfromF?using

F?1=MHA(F?)

人人文庫> 全部分類> 應(yīng)用文書 > 研究報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型

文檔簡介

溫馨提示

最新文檔

評論

Text2CAD：從初學(xué)者到專家級文本提示生成連續(xù)CAD模型

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔