Text2CAD:從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第1頁
Text2CAD:從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第2頁
Text2CAD:從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第3頁
Text2CAD:從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第4頁
Text2CAD:從初學(xué)者到專家級文本提示生成連續(xù)CAD模型_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

Text2CAD:GeneratingSequentialCADModelsfromBeginner-to-ExpertLevelTextPrompts

arXiv:2409.17106v1[cs.CV]25Sep2024

MohammadSadilKhan*t1,2,3SankalpSinha*1,2,3TalhaUddinSheikh1,2,3DidierStricker1,2,3SkAzizAli1,4MuhammadZeshanAfzal1,2,3

1DFKI2RPTUKaiserslautern-Landau3MindGarage4BITSPilani,Hyderabad

Abstract

Prototypingcomplexcomputer-aideddesign(CAD)modelsinmodernsoftwarescanbeverytime-consuming.Thisisduetothelackofintelligentsystemsthatcanquicklygeneratesimplerintermediateparts.WeproposeText2CAD,thefirstAIframeworkforgeneratingtext-to-parametricCADmodelsusingdesigner-friendlyinstructionsforallskilllevels.Furthermore,weintroduceadataannotationpipelineforgeneratingtextpromptsbasedonnaturallanguageinstructionsfortheDeepCADdatasetusingMistralandLLaVA-NeXT.Thedatasetcontains~170Kmodelsand~660Ktextannotations,fromabstractCADdescriptions(e.g.,generatetwoconcentriccylinders)todetailedspecifications(e.g.,drawtwocircleswithcenter(x,y)andradiusr1,r2,andextrudealongthenormalbyd...).WithintheText2CADframework,weproposeanend-to-endtransformer-basedauto-regressivenetworktogenerateparametricCADmodelsfrominputtexts.Weevaluatetheperformanceofourmodelthroughamixtureofmetrics,includingvisualquality,parametricprecision,andgeometricalaccuracy.OurproposedframeworkshowsgreatpotentialinAI-aideddesignapplications.Projectpageisavailableat

https://sadilkhan.github.io/text2cad-project/.

1Introduction

Computer-AidedDesign(CAD)playsacrucialroleinin-dustrialdesignandadditivemanufacturing(AM),revolu-tionizingthewayproductsareprototyped[

6]

.Thistypeofprototypingrequiresfeature-basedpartmodeling[

6

],pre-cisionmeasurements[

39

],andcreativepartediting[

59

]atdifferentdesignstages[

39,

59]

.WhileCADsoftwaresavesthefinalmodelasaboundaryrepresentation(B-Rep)[

21

],thedesignprocessofteninvolvesachainof2Dsketches(e.g.,circles,lines,splines)and3Doperations(e.g.,extru-sion,loft,fillet)[

55,

57,

58,

18]

.Thisrepresentationallowsthedesignerstocontrolthedesignhistoryanditerativelyrefinethefinalmodels.

Despitetheircapabilities,modernCADtoolslacktheAI-assisteddesignintegration[

36]

.InFigure

1,weillustrate

howanintelligentsystemcapableofgeneratingparametricCADmodelsfromtextualdescriptionscanbeutilizedtoas-sembleacomplex3Dmodel.AlthoughtoolslikeFreeCAD

[1

],SolidWorks[

45

],andPara-Solid[

44

]offer3DCAD

modelsfromcatalogslikeMcMaster-Carr[

2

]forthereuse*EqualContributions.

tCorrespondingauthor(mohammad.khan@dfki.de)Preprint.Underreview.

AringshapeiscreatedbydrawingtwoconcentriccirclesontheXYplaneandscaledby1andunit1.2respectively,andextrudingitalongtheZ-axis0.1unittoforma3D

modelwithahollowcenter.

Alongrectangularshape.

Asimplerectangularbaseoflength0.05unit,width0.05unitandheight1unit

Figure1:Designerscanefficientlygen-erateparametricCADmodelsfromtextprompts.Thepromptscanvaryfromabstractshapedescriptionstodetailedparametricinstructions.

2

ofexistingCADmodels,nosuchsystemcurrentlyexiststhatcangenerateparametricCADmodelsfromtextualdesigndescriptions.OneprimarychallengefordevelopingsuchasystemisdefiningsuitabletextualdescriptionsforparametricCADgeneration,makingitdifficulttocreatedeeplearningmethodsthataccuratelyconvertthesedescriptionsintopreciseCADmodels.

Toaddressthisgap,inthispaperweproposeText2CADasthefirstAIframeworkforgeneratingparametricCADmodelsrepresentedbyconstructionsequences(i.e.,parametersfor2Dsketchesandextrusions)fromdesign-relatedtextprompts.Wefacedtwoprimarychallengesinfulfillingthisgoal:(1)theunavailabilityofthedatasetand(2)anetworktomapthetextsintoCADconstructionsequences.Towardsthisend,weintroduceadataannotationpipelinetogenerateadatasetcontainingtextualdescriptionsoftheCADmodelsinDeepCAD[

55

]dataset.Weleveragetheopen-sourceLargeLanguageModels(LLMs)[

15

]andVisionLanguageModels[

26,

25

]forthistask.Ourannotatedtextpromptsaremulti-levelinnaturerangingfromhighlyabstract(e.g.,alongrectangularshape,athinS-shapedobject)tomorespecificwithdetailedparametricdescriptions(e.g.,firstdrawarectanglefrom(x1,y1)and(x2,y2)thenextrudethesketchalongz-axis..).Thesepromptsaredesignedforusersofallskilllevelsandcancontainarithmeticlogicandnumericalexpressionsaspartofthedesigndetails.Withinthisframework,weintroduceText2CADTransformer[

48

],aconditionaldeep-generativenetworkforgeneratingCADconstructionlanguage

t

fromtextpromptinanauto-regressivefashion.

Currently,thereareworksontext-to-3Dgeneration[

24,

33,

50,

31,

11

]thathaveshownsignificantadvancementsincreating3Dscenesandshapesfromtextualdescriptions.Butexistingtext-to-3DmethodsarenotapplicableforgeneratingCADmodelsfromtextdescriptionsasthefinaloutputofthesemodelsisneitherparametricnorhuman-editableinnature.VeryrecentlywebAPIfromzoodevelopers

[3

]hasintroducedCADgenerationappusingtextpromptfromusersandprogrammable

scriptinglanguage(asKittiCADLanguaget)fordesignerstoeditandmodify.

However,thegeneratedCADmodelsareobtainedintheformofsolid-body,andnotdecomposedtoitsintermediatesketch-and-extrusionstepsasproposedinourText2CAD.Ontheotherhand,GivenrawnumericaldataofanyparametricCADmodel,currentstate-of-the-artlargelanguagemodels(LLMs),suchaspre-trainedMistral-50b[

15

]orGPT-4[

34

]andopensourceLlama[

46

]mayonlyderiveproceduralscriptingcodesforotherAPIs,suchasFreeCAD[

37

]orOpenSCAD[

29

],togenerateamodel.However,incontrasttoourText2CAD,suchLLMaugmentedCADgenerationapproachwillnotbedesigner-friendly,notsuitableforbeginner-leveldesigners,willnotautomatethedevelopmentprocessineasyways,andwillrestrictthere-usabilityofthescriptsincaseofcomplexshapes.Alternatively,Usingstate-of-the-artvisionlanguagemodels,suchasLLaVa[

26,

25

],GPT-4V[

60

],asanalternativefordeducingCADconstructionsequencesperformspoorlybecauseoftwomainreasons–(1)notrainingdatasetsareavailablethatprovidenaturallanguage-baseddesigninstructionsasannotationsforrawCADconstructionsequencesand(2)mostVLMsaretrainedoncategoricaldescription/captiondatasetsof3Dobjects(e.g.,LLaVA-NeXT[

25

]predicts‘twoconcentrichollowcylinders’astoiletpaper).WeremovetheabovelimitationsinourText2CADbycreatingnewlarge-scaleannotationsforDeepCAD[

55

]datasetusingresponsesfromLLMsandVLMstotrainourmulti-modalmodel.Ourcontributionscanbesummarizedasfollows:

?WeproposeText2CADasthefirstAIframeworkforgeneratingparametric3DCADmodelsusingtextualdescriptions.

?WeintroduceadataannotationpipelinethatleveragesbothLLMsandVLMstogenerateadatasetthatcontainstextpromptswithvaryinglevelofcomplexitiesandparametricdetails.

?Weproposeanend-to-endtransformer-basedautoregressivearchitectureforgeneratingCADdesignhistoryfrominputtextprompts.

?Ourexperimentalanalysisdemonstratessuperiorperformanceoverthetwo-stagebaselinemethodadaptedforthetaskathand.

Therestofthesectionsareorganizedasfollows:Section

2

reviewstherelatedworkinCADdomains.Section

3

outlinesourdataannotationpipeline.Section

4

detailsourproposedText2CADtransformerarchitecture.Section

5

presentsourexperimentalresults.Section

6

discussesthelimitationsofourcurrentframework,andSection

7

concludesthepaper.

tInthispaper,thephrases‘CADconstructionlanguage’,‘CADdesignhistory’and‘CADconstructionsequence’areusedinterchangeably.

t/KittyCAD/modeling-app/tree/main?tab=readme-ov-file

3

2RelatedWork

DatasetsandGenerativemodelsforCAD:CurrentdatasetsandgenerativemodelsforCADare

limitedandoftennotsuitedfordevelopingknowledge-basedCADapplications.Somedatasetsfocus

solelyon2Dsketchdesign[

42,

9,

43

],andotherpopulardatasetslikeABC[

19

],Fusion360Gallery

[54

],Thingi10K[

61

],andCC3D[

5,

8

]provide3Dmeshes,BRep(boundaryrepresentation),andothergeometryortopologyrelatedannotationsthataresuitablefor3Dmodeling.DeepCAD[

55]

dataset,asubsetofABC,andFusion360[

54

]provideCADconstructionsequencesintheformofsketchandextrusiontodeducedesignhistory.However,CADmodelsmayconsistofnumerousothertypesofoperationsbesideextrusion,andsuchconstructionsequenceswithotherCADoperationsarenotavailableinthecurrentdatasets.Finally,thereisnodatasetavailablethatprovidestextualdesigndescriptionsasannotationstocreateaconversationalAIsystemforCADmodeling.

Currentsupervisedlearningmethodsthatfallundersequence-to-sequenceSketch/CADlanguagemodeling

[55,

57,

18,

9

]filtersoutunnecessarymetadatafromlengthyrawdesignfilesandrepresentthemasdesiredsequenceofinput/outputtokens.Forinstance,Ganinetal.

[9

]representsdesignfilesasmessagesinProtocolBuffer[

47

]format.HierarchicalNeuralCoding(HNC)method[

57]

representsthedesireddesignsequenceintreestructureof2Dsketchloops,2Dboundingboxesoverallloopsasprofile,and3Dboundingboxesoverallprofilesassolid.CAD-SIGNet[

18

]representsCADconstructionlanguageasasequencecomposedof2Dsketchandextrusionparameters.InText2CADmethod,wemaptherawdesignhistoryobtainedfromDeepCADmetadataintotextualdescriptions.

CADConstructionLanguageusingTransformers:Transformer-based[

48

]networkarchitectureisthepreferredchoiceformanydeeplearning-basedapplicationsrelatedtoCADmodeling[

55

],3Dscan-to-CADreverseengineering[

18,

22

],representationlearning[

17

]andothers[

38]

.CADasalanguage[

9

]describehow2Dsketchescanbetransformedintodesignlanguagebysequencingtokensof2Dparametriccurvesasmessagepassingunits.MixtureofTransformer[

48

]andPointerNetworks[

49

]decodethesketchparametersinauto-regressivefashion.

Formalizingconstrained2Dsketches,i.e.,collectionofcurves(e.g.,line,arc,circleandsplines)withdimensionalandgeometricconstraints(e.g.,co-incidence,perpendicular,co-linearity),asalanguageforCADmodelinghasbeenstudiedoverlastfewyears[

35,

9,

32,

53,

23]

.However,thefirstproposalofdevelopingaCADlanguageinterfacewassuggesteddecadesagoin[

40]

.Amongtherecentworksinthisdirection,SketchGen[

35

]represents2Dsketchesasasequenceofthetokensforcurvesandconstraints.Thedecoder-onlytransformermodelin[

35

]predictsoptimalsketchesthroughnucleussampling[

12

]oftokenembeddingvectors,focusingonreplicatingdrawingprocessesofCADdesigners.Polygen[

32

]methodalsoemploysTransformermodel[

48

]togeneratedetailed3DpolygonalmeshesbylearningjointdistributiononverticesandfacesofaCAD.Asanextensionof[

32

],TurtleGen[

53

]alsoproposedecoder-onlytransformermodeltolearnjointdistributionofverticesandedgestogetherthatformsketchesandrepresentedasgraphsinCADmodels.

3DCADmodelingstepsasalanguageisnotdirectlyformulatedbyanystate-of-the-artmulti-modalCADlearningmethods[

55,

28,

18,

58,

8,

30,

57,

23]

.Khanetal.

[18

]proposeanovelauto-regressivegenerationofsketch-and-extrusionparametersdirectlyfrom3DpointcloudsasinputwhereasDeepCAD[

55

],SkexGen[

58

],HNC[

57

]andMultiCAD[

28

]adoptsatwo-stagestrategytogeneratetheoutput.MultiCAD[

28

]adoptmulti-modalcontrastivelearningtoassociategeometryfeatureswithfeaturesofCADconstructionsequenceswhereasCAD-SIGNet[

18

]requiresanextrastepasuserfeedbacktovoteforoneofthemanygeneratedsketchesatcurrentsteptopredictthenext

one.Unlikepreviousapproaches,ourproposedText2CADtransformeristhefirstauto-regressive

networkthatgeneratesCADconstructionsequencesdirectlyfromtextualdescriptions.

3Text2CADDataAnnotation

ThediagraminFig.

2

outlinestheprocessofgeneratingtextualannotationsforDeepCADdataset[

55]

usingLargeLanguageModels(LLMs)[

16,

34,

46

]andVisionLanguageModels(VLMs)[

26,

25]

.TheseannotationsdescribethecorrespondingCADconstructionworkflowinhumaninterpretableformat.ToenrichtheDeepCAD[

55

]datasetwithtextualannotations,weimplementatwo-stagedescriptiongenerationpipelineusingthecapabilitiesofbothLLMsandVLMs.Thetwostagesare-(1)generatingabstractshapedescriptionsusingVLM,and(2)extractingmulti-leveltextualinstructionsfromLLMbasedontheshapedescriptionsanddesigndetailsprovidedinthedataset.AnexampletextpromptfortheCADmodelshownintop-leftoftheFigure

2:

‘TheCADmodelconsists

3DCADModel

DeepCADDataset

Multi-ViewImages(MVI)Extractor

RawJson

VLMPrompt

MinimalMetadataGenerator

LLaVaNext+Mistral-7B

Saenormaon

hpifti

MinimalJson

VLMPrompt

[INST]ThisisanimageofaComputerAidedDesign(CAD)model.YouareaseniorCADengineerwhoknowstheobjectname,whereandhowtheCADmodelisused.GiveanaccuratenaturallanguagedescriptionabouttheCADmodeltoajuniorCADdesignerwhocandesignitfrom

yoursimpledescription.Wrapthedescriptioninthefollowingtags<OBJECT>and</OBJECT>.

Followingaresomebadexamples:

1.CADmodel

2.Metalobject

Abidebythefollowingrules.Rules:

1.Donotusewordslike-"blue","shadow","transparent","metal","plastic","image","black","grey","CADmodel","abstract","orange","purple","golden","green"

2./INST]

V

NaturalLanguageInstruction(NLI)GenerationPrompt

NLIPrompt

[INST]

YouareaseniorCADengineerandyouaretaskedtoprovidenaturallanguageinstructionstoajuniorCADdesignerforgeneratingaparametricCADmodel.

OverviewinformationabouttheCADassemblyJSON:

1.TheCADassemblyjsonliststheprocessofconstructingaCADmodel.

2.EveryCADmodelconsistsofoneormultipleintermediateCADparts.

3.TheseintermediateCADpartsarelistedinthe"parts"keyoftheCADassemblyJSON.

4.ThefirstintermediateCADpartisthebasepartandthesubsequentpartsbuilduponthepreviouslyconstructedpartsusingtheoperationdefinedforthatpart.

5.Allintermediatepartscombinetoafinalcadmodel.

EveryintermediateCADpartisgeneratedusingthefollowingsteps:Step1:Drawa2Dsketch.

Step2:Scalethe2Dsketchusingthesketch_scalescalingparameter.

Step3:Transformthescaled2Dsketchinto3DSketchusingtheeuleranglesand

translation.

Step4:Extrudethe2Dsketchtogeneratethe3Dmodel.[/INST]

ExampleNLIPrompt

Mistral-50B(MoE)

MinimalJSON

K-Shot

NLIResponse

Multi-LevelNaturalLanguage

Instruction(NLI)GenerationPrompt

BeginnerLevelCADInstructions

AbstractLevelCADInstructions

ExpertLevelCADInstructions

IntermediateLevelCADInstructions

《ii>

“final_shape”:“Acylindricalobjectwithaflattopandbottom”

“parts”:{“part_1”:{

"coordinate_system":{

"EulerAngles":[0.0,0.0,0.0],

"TranslationVector":[0.1071,0.1071,

0.0974]},

"sketch":{

"face_1":{

"loop_1":{

"circle_1":{

"Center":[0.112,0.112],

"Radius":0.112}...},...}

"extrusion":{

"extrude_depth_towards_normal":0.0,"extrude_depth_opposite_normal":

0.0487,

"sketch_scale":0.6429,"operation":

"NewBodyFeatureOperation:},...}

Level-0Level-1Level-2Level-3

Figure2:Text2CADDataAnnotationPipeline:Ourdataannotationpipelinegeneratesmulti-leveltextpromptsdescribingtheconstructionworkflowofaCADmodelwithvaryingcomplexities.Weuseatwo-stagemethod-(Stage1)ShapedescriptiongenerationusingVLM(Stage2)Multi-LeveltextualannotationgenerationusingLLM.

ofacylindricalobjectwithaflattopandbottomconnectedbyacurvedsurfaceandslightlytaperedtowardsthebottom.Thisobjectiscreatedbyfirstsettingupacoordinatesystem,thensketchingtwoconcentriccirclesanddrawingaclosedloopwithlinesandanarconasharedplane.Thesketchisthenextrudedalongthenormaldirectiontoformasolidbody.Theresultingparthasaheightofapproximately0.0487units’.Inthisexample,thephraseinthevioletcolorisgeneratedbyaVLM.AnLLMusesthisdescriptionalongwiththeCADconstructioninformationtogeneratetheprompt.

ShapeDescriptionsusingVLM:Theinitialstepofourannotationgenerationpipelineinvolvesgeneratingabstractobject-leveldescriptionsoftheCADmodelsusingLLaVA-NeXT[

25

]model.Theobjectiveinthisstepistoaccuratelycapturethestructuraldescriptionsofthe3Dshape,suchas"aring-likestructure","acylinder",or"ahexagonwithacylinderontop".WegenerateshapedescriptionsforboththefinalCADmodelanditsintermediateparts.Wefirstproducemulti-viewimagesfrompredeterminedcameraanglesforeachindividualpartsandthefinalCADmodel.Theseimagesarethenutilizedinapredefinedprompt(refertothetop-rightofFigure

2)fortheLLaVA

-NeXT[

25

]modeltogeneratesimplifiedshapedescriptionsofallindividualpartsaswellasthecompletefinalshape.

Multi-levelDesignInstructionsusingLLM:Inthisstage,multipletextualannotationscorrespond-ingtodifferentdesigndetailsofaCADmodelaregeneratedusingMixtral-50B[

16

]throughaseriesofsteps(refertothemiddle-columninFigure

2)

.TheDeepCAD[

55

]datasetcontainsCADconstruc-tionsequencesinJSONformat.WefirstpreprocesstherawCADconstructionsequencesusinga‘MinimalMetadataGenerator’whichreplacesrandom,meaninglesskeyswithmoremeaningfulterms(e.g.,"part_1","loop_1").Thisstepaimstoreducethehallucinations[

13

]byMixtral-50B[

16]

.TheminimalmetadataisfurtheraugmentedwiththeshapedescriptionsforeachpartsandthefinalmodelgeneratedbytheVLM.TheoutputofthisprocessisacondensedrepresentationoftheshapesandtheirrelationalattributeswithintheCADdesign(seebottom-leftinFigure

2)

.Withtheminimalmetadataathand,wethencraftaprompt(refertothebottom-rightinFigure

2)togeneratedetailednatural

languageinstructions(NLI)ensuringaminimallossofinformationfromtheminimalmetadata.Afterward,theNLIresponsesarerefinedbyLLMusingak-shot[

4

]"Multi-LevelNaturalLanguageInstructionGenerationPrompt"togeneratemulti-levelinstructionsofdifferentspecificityanddetails.Wecategorizetheselevelsas:

?Abstractlevel(L0):AbstractShapeDescriptionsofthefinalCADmodelextractedusingVLMinthefirststage.

?Beginnerlevel(L1):SimplifiedDescription-Aimedatlaypersonsorpreliminarydesignstages,thislevelprovidesasimplifiedaccountofthedesignsteps,eschewingcomplexmeasurementsandjargon.

4

5

?Intermediatelevel(L2):GeneralizedGeometricDescription-Thislevelabstractssomeofthedetails,providingageneralizeddescriptionthatbalancescomprehensibilitywithtechnicalaccuracy.

?Expertlevel(L3):DetailedGeometricDescriptionwithRelativeValues-Here,theinstruc-tionsincludeprecisegeometricdescriptionsandrelativemeasurements,cateringtouserswhorequireanin-depthunderstandingorareperformingtheCADmodelingtask.

Ourannotationsconsistofthegeneratedmulti-levelinstructionsatthefinalstage.Wegeneratetheseannotationsoverthecourseof10days.It’sworthnotingthatonecandirectlygeneratethemulti-levelinstructionsfromtheminimalmetadatawithoutcreatingthedetailednaturallanguageinstructionsinthesecondstage.WeobservethatthisstrategyincreasestheLLM’stendencyforhallucinations[

13]

anditgeneratesmoreinaccuratemulti-levelinstructions.Insteadourmethodfollowschain-of-thoughtpromptingstrategyasoutlinedin[

51

]whichgreatlyreducessuchbottleneck.MoredetailsonourannotationpipelineareprovidedinSection

10

and

11

ofthesupplementarymaterial.

4Text2CADTransformer

TheText2CADtransformerarchitecture,asshowninFigure

3,isdesignedtotransformnatural

languagedescriptionsinto3DCADmodelsbydeducingallitsintermediatedesignstepsautoregres-sively.GivenaninputtextpromptT∈RNp,whereNpisthenumberofwordsinthetext,ourmodellearnstheprobabilitydistribution,P(C|T)definedas

whereCistheoutputCADsequence,NcisthenumberoftokensinCandθisthelearnablemodelparameter.WerepresentCasasequenceofsketchandextrusiontokensasproposedin[

18]

.Eachto-kenct∈Cisa2Dtokenthateitherdenotesa(1)2D-coordinateoftheprimitivesinsketch,(2)oneoftheextrusionparameters(eulerangles/translationvector/extrusiondistances/booleanoperation/sketchscale)or(3)oneoftheendtokens(curve/loop/face/sketch/extrusion/startsequence/endsequence).Following[

55,

18]

.Wequantizethe2Dcoordinatesaswellasthecontinuousextrusionparametersin8bitsresultingin256classlabelsforeachtoken.AnexampleCADsequencerepresentationisprovidedinFigure

3

(inbluetable).Formoredetails,pleaserefertothesupplementarysection

9.

Nowweelaborateonthevariouscomponentsofthearchitecture,detailingtheprocessesinvolvedinconvertingtexttoCADrepresentations.Lettheinputtextpromptattimestept?1beT∈RNpandtheinputCADsubsequenceC1:t?1∈RNt?1×2.

PretrainedBertEncoder:TheinitialstepintheText2CADnetworkinvolvesencodingthetextualdescriptionprovidedbytheuser.Thisdescriptioncanrangefromhighlyabstract,beginner-friendlyinstructionstodetailed,expert-levelcommands.Tohandlethisdiversity,weusedapre-trainedBERT(BidirectionalEncoderRepresentationsfromTransformers)[

7

]model,denotedBERTpre-trained.TheinputtextT∈RNtistokenizedandpassedthroughtheBERTmodeltogeneratecontextual

embedding:T=BERTpre-trained(T)(2)

Here,T∈RNp×dprepresentsthesequenceoftokenembeddingvectorsthatcapturethesemanticmeaningoftheinputtext,whereNpisthenumberoftokensanddpisthedimensionoftheembedding.AdaptiveLayer.Anadaptivelayerconsistingof1transformerencoderlayer,refinestheoutputToftheBERTencodertobettersuittheCADdomainaligningwiththespecificvocabularyandstructuralrequirementsofCADinstructions.TheadaptivelayeroutputstheembeddingTadapt∈RNp×dpusingTadapt=AdaptiveLayer(T)(3)

CADSequenceEmbedder:EachtokenintheinputCADsubsequenceC1:t?1isinitiallyrep-resentedasaone-hotvectorwithadimensionof256,resultinginaone-hotrepresentation,

Ct?1∈RNt?1×2×256.Forthesakeofsimplicity,werepresentCt?1=[Cx:t?1;Cy:t?1],

whereCx:t?1,Cy:t?1∈RNt?1×256.TheinitialCADsequenceembeddingF?1∈RNt?1×dis

obtainedusingEq.

4

F?1=Cx:t?1W?1+Cy:t?1W?1+P(4)

,whereW?1,W?1∈RNt?1×darelearnableweightsandP∈RNt?1×disthepositionalencoding.

Layer-wiseCrossAttention.Weuseastandardtransformerdecoder[

48

]withlayer-wisecross-attentionmechanismbetweentheCADandthetextembeddingwithinthedecoderblocks.The

HighlyAbstract

BeginnerCADDesignerFriendlyInstructions

MoreDetailed

ForExpertsLevel

Instructions

《>

TheCADmodelfeaturesanelegant,curved,hollowdesigninspiredbythestylizedletter'O'.

T

C2:t

C1:t1

Pre-trainedBeRTEncoder

139

139

139

266

5

0

4

0

52

176

139

44

5

0

1

0

139

139

139

266

5

0

4

0

52

176

139

44

(139,266)

(52,176)(225,176)

(139,139)(139,44)

AdaptiveLayer

uuu·v·i··uuaIadapt

Fo

t

Fti1

>

SketchTokens

2

0

150

0

139

0

139

0

142

0

139

44

2

0

150

0

139

0

139

0

142

0

139

44

>

MLP

TransformerDecoderBlock

FFN

MHA

Downsampler

kg一—yv

Cross-Attention

Reconstructed3DCADModel

TrainableParametersFrozenParameters

6

0

ExtrusionTokens

PositionalEncoding

6

0

1

0

OutputCAD

InputCADTokens

Tokens

Figure3:Networkarchitecture:Text2CADTransformertakesasinputatextpromptTandaCADsubsequenceC1:t?1oflengtht?1.ThetextembeddingTadaptisextractedfromTusingapretrainedBeRTEncoder([

7

])followedbyatrainableAdaptivelayer.Theresultingembedding

TadaptandtheCADsequenceembeddingF?1ispassedthroughLdecoderblockstogeneratethe

fullCADsequenceinauto-regressiveway.

layerwisecross-attentionmechanismfacilitatestheintegrationofcontextualtextfeatureswiththeCADembedding,allowingthemodeltofocusonrelevantpartsofthetextduringCADconstruction.

EachdecoderblockltakesasinputCADembeddingF?andtextembeddingTadapt,whereF?is

theoutputofthepreviousdecoderblock(forthefirstdecoderblock,theinputCADembeddingis

F?1).Atfirst,theCADembeddingF?1∈RNt?1×disgeneratedfromF?using

F?1=MHA(F?)

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論