版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
Text2CAD:GeneratingSequentialCADModelsfromBeginner-to-ExpertLevelTextPrompts
arXiv:2409.17106v1[cs.CV]25Sep2024
MohammadSadilKhan*t1,2,3SankalpSinha*1,2,3TalhaUddinSheikh1,2,3DidierStricker1,2,3SkAzizAli1,4MuhammadZeshanAfzal1,2,3
1DFKI2RPTUKaiserslautern-Landau3MindGarage4BITSPilani,Hyderabad
Abstract
Prototypingcomplexcomputer-aideddesign(CAD)modelsinmodernsoftwarescanbeverytime-consuming.Thisisduetothelackofintelligentsystemsthatcanquicklygeneratesimplerintermediateparts.WeproposeText2CAD,thefirstAIframeworkforgeneratingtext-to-parametricCADmodelsusingdesigner-friendlyinstructionsforallskilllevels.Furthermore,weintroduceadataannotationpipelineforgeneratingtextpromptsbasedonnaturallanguageinstructionsfortheDeepCADdatasetusingMistralandLLaVA-NeXT.Thedatasetcontains~170Kmodelsand~660Ktextannotations,fromabstractCADdescriptions(e.g.,generatetwoconcentriccylinders)todetailedspecifications(e.g.,drawtwocircleswithcenter(x,y)andradiusr1,r2,andextrudealongthenormalbyd...).WithintheText2CADframework,weproposeanend-to-endtransformer-basedauto-regressivenetworktogenerateparametricCADmodelsfrominputtexts.Weevaluatetheperformanceofourmodelthroughamixtureofmetrics,includingvisualquality,parametricprecision,andgeometricalaccuracy.OurproposedframeworkshowsgreatpotentialinAI-aideddesignapplications.Projectpageisavailableat
https://sadilkhan.github.io/text2cad-project/.
1Introduction
Computer-AidedDesign(CAD)playsacrucialroleinin-dustrialdesignandadditivemanufacturing(AM),revolu-tionizingthewayproductsareprototyped[
6]
.Thistypeofprototypingrequiresfeature-basedpartmodeling[
6
],pre-cisionmeasurements[
39
],andcreativepartediting[
59
]atdifferentdesignstages[
39,
59]
.WhileCADsoftwaresavesthefinalmodelasaboundaryrepresentation(B-Rep)[
21
],thedesignprocessofteninvolvesachainof2Dsketches(e.g.,circles,lines,splines)and3Doperations(e.g.,extru-sion,loft,fillet)[
55,
57,
58,
18]
.Thisrepresentationallowsthedesignerstocontrolthedesignhistoryanditerativelyrefinethefinalmodels.
Despitetheircapabilities,modernCADtoolslacktheAI-assisteddesignintegration[
36]
.InFigure
1,weillustrate
howanintelligentsystemcapableofgeneratingparametricCADmodelsfromtextualdescriptionscanbeutilizedtoas-sembleacomplex3Dmodel.AlthoughtoolslikeFreeCAD
[1
],SolidWorks[
45
],andPara-Solid[
44
]offer3DCAD
modelsfromcatalogslikeMcMaster-Carr[
2
]forthereuse*EqualContributions.
tCorrespondingauthor(mohammad.khan@dfki.de)Preprint.Underreview.
AringshapeiscreatedbydrawingtwoconcentriccirclesontheXYplaneandscaledby1andunit1.2respectively,andextrudingitalongtheZ-axis0.1unittoforma3D
modelwithahollowcenter.
Alongrectangularshape.
Asimplerectangularbaseoflength0.05unit,width0.05unitandheight1unit
Figure1:Designerscanefficientlygen-erateparametricCADmodelsfromtextprompts.Thepromptscanvaryfromabstractshapedescriptionstodetailedparametricinstructions.
2
ofexistingCADmodels,nosuchsystemcurrentlyexiststhatcangenerateparametricCADmodelsfromtextualdesigndescriptions.OneprimarychallengefordevelopingsuchasystemisdefiningsuitabletextualdescriptionsforparametricCADgeneration,makingitdifficulttocreatedeeplearningmethodsthataccuratelyconvertthesedescriptionsintopreciseCADmodels.
Toaddressthisgap,inthispaperweproposeText2CADasthefirstAIframeworkforgeneratingparametricCADmodelsrepresentedbyconstructionsequences(i.e.,parametersfor2Dsketchesandextrusions)fromdesign-relatedtextprompts.Wefacedtwoprimarychallengesinfulfillingthisgoal:(1)theunavailabilityofthedatasetand(2)anetworktomapthetextsintoCADconstructionsequences.Towardsthisend,weintroduceadataannotationpipelinetogenerateadatasetcontainingtextualdescriptionsoftheCADmodelsinDeepCAD[
55
]dataset.Weleveragetheopen-sourceLargeLanguageModels(LLMs)[
15
]andVisionLanguageModels[
26,
25
]forthistask.Ourannotatedtextpromptsaremulti-levelinnaturerangingfromhighlyabstract(e.g.,alongrectangularshape,athinS-shapedobject)tomorespecificwithdetailedparametricdescriptions(e.g.,firstdrawarectanglefrom(x1,y1)and(x2,y2)thenextrudethesketchalongz-axis..).Thesepromptsaredesignedforusersofallskilllevelsandcancontainarithmeticlogicandnumericalexpressionsaspartofthedesigndetails.Withinthisframework,weintroduceText2CADTransformer[
48
],aconditionaldeep-generativenetworkforgeneratingCADconstructionlanguage
t
fromtextpromptinanauto-regressivefashion.
Currently,thereareworksontext-to-3Dgeneration[
24,
33,
50,
31,
11
]thathaveshownsignificantadvancementsincreating3Dscenesandshapesfromtextualdescriptions.Butexistingtext-to-3DmethodsarenotapplicableforgeneratingCADmodelsfromtextdescriptionsasthefinaloutputofthesemodelsisneitherparametricnorhuman-editableinnature.VeryrecentlywebAPIfromzoodevelopers
[3
]hasintroducedCADgenerationappusingtextpromptfromusersandprogrammable
scriptinglanguage(asKittiCADLanguaget)fordesignerstoeditandmodify.
However,thegeneratedCADmodelsareobtainedintheformofsolid-body,andnotdecomposedtoitsintermediatesketch-and-extrusionstepsasproposedinourText2CAD.Ontheotherhand,GivenrawnumericaldataofanyparametricCADmodel,currentstate-of-the-artlargelanguagemodels(LLMs),suchaspre-trainedMistral-50b[
15
]orGPT-4[
34
]andopensourceLlama[
46
]mayonlyderiveproceduralscriptingcodesforotherAPIs,suchasFreeCAD[
37
]orOpenSCAD[
29
],togenerateamodel.However,incontrasttoourText2CAD,suchLLMaugmentedCADgenerationapproachwillnotbedesigner-friendly,notsuitableforbeginner-leveldesigners,willnotautomatethedevelopmentprocessineasyways,andwillrestrictthere-usabilityofthescriptsincaseofcomplexshapes.Alternatively,Usingstate-of-the-artvisionlanguagemodels,suchasLLaVa[
26,
25
],GPT-4V[
60
],asanalternativefordeducingCADconstructionsequencesperformspoorlybecauseoftwomainreasons–(1)notrainingdatasetsareavailablethatprovidenaturallanguage-baseddesigninstructionsasannotationsforrawCADconstructionsequencesand(2)mostVLMsaretrainedoncategoricaldescription/captiondatasetsof3Dobjects(e.g.,LLaVA-NeXT[
25
]predicts‘twoconcentrichollowcylinders’astoiletpaper).WeremovetheabovelimitationsinourText2CADbycreatingnewlarge-scaleannotationsforDeepCAD[
55
]datasetusingresponsesfromLLMsandVLMstotrainourmulti-modalmodel.Ourcontributionscanbesummarizedasfollows:
?WeproposeText2CADasthefirstAIframeworkforgeneratingparametric3DCADmodelsusingtextualdescriptions.
?WeintroduceadataannotationpipelinethatleveragesbothLLMsandVLMstogenerateadatasetthatcontainstextpromptswithvaryinglevelofcomplexitiesandparametricdetails.
?Weproposeanend-to-endtransformer-basedautoregressivearchitectureforgeneratingCADdesignhistoryfrominputtextprompts.
?Ourexperimentalanalysisdemonstratessuperiorperformanceoverthetwo-stagebaselinemethodadaptedforthetaskathand.
Therestofthesectionsareorganizedasfollows:Section
2
reviewstherelatedworkinCADdomains.Section
3
outlinesourdataannotationpipeline.Section
4
detailsourproposedText2CADtransformerarchitecture.Section
5
presentsourexperimentalresults.Section
6
discussesthelimitationsofourcurrentframework,andSection
7
concludesthepaper.
tInthispaper,thephrases‘CADconstructionlanguage’,‘CADdesignhistory’and‘CADconstructionsequence’areusedinterchangeably.
t/KittyCAD/modeling-app/tree/main?tab=readme-ov-file
3
2RelatedWork
DatasetsandGenerativemodelsforCAD:CurrentdatasetsandgenerativemodelsforCADare
limitedandoftennotsuitedfordevelopingknowledge-basedCADapplications.Somedatasetsfocus
solelyon2Dsketchdesign[
42,
9,
43
],andotherpopulardatasetslikeABC[
19
],Fusion360Gallery
[54
],Thingi10K[
61
],andCC3D[
5,
8
]provide3Dmeshes,BRep(boundaryrepresentation),andothergeometryortopologyrelatedannotationsthataresuitablefor3Dmodeling.DeepCAD[
55]
dataset,asubsetofABC,andFusion360[
54
]provideCADconstructionsequencesintheformofsketchandextrusiontodeducedesignhistory.However,CADmodelsmayconsistofnumerousothertypesofoperationsbesideextrusion,andsuchconstructionsequenceswithotherCADoperationsarenotavailableinthecurrentdatasets.Finally,thereisnodatasetavailablethatprovidestextualdesigndescriptionsasannotationstocreateaconversationalAIsystemforCADmodeling.
Currentsupervisedlearningmethodsthatfallundersequence-to-sequenceSketch/CADlanguagemodeling
[55,
57,
18,
9
]filtersoutunnecessarymetadatafromlengthyrawdesignfilesandrepresentthemasdesiredsequenceofinput/outputtokens.Forinstance,Ganinetal.
[9
]representsdesignfilesasmessagesinProtocolBuffer[
47
]format.HierarchicalNeuralCoding(HNC)method[
57]
representsthedesireddesignsequenceintreestructureof2Dsketchloops,2Dboundingboxesoverallloopsasprofile,and3Dboundingboxesoverallprofilesassolid.CAD-SIGNet[
18
]representsCADconstructionlanguageasasequencecomposedof2Dsketchandextrusionparameters.InText2CADmethod,wemaptherawdesignhistoryobtainedfromDeepCADmetadataintotextualdescriptions.
CADConstructionLanguageusingTransformers:Transformer-based[
48
]networkarchitectureisthepreferredchoiceformanydeeplearning-basedapplicationsrelatedtoCADmodeling[
55
],3Dscan-to-CADreverseengineering[
18,
22
],representationlearning[
17
]andothers[
38]
.CADasalanguage[
9
]describehow2Dsketchescanbetransformedintodesignlanguagebysequencingtokensof2Dparametriccurvesasmessagepassingunits.MixtureofTransformer[
48
]andPointerNetworks[
49
]decodethesketchparametersinauto-regressivefashion.
Formalizingconstrained2Dsketches,i.e.,collectionofcurves(e.g.,line,arc,circleandsplines)withdimensionalandgeometricconstraints(e.g.,co-incidence,perpendicular,co-linearity),asalanguageforCADmodelinghasbeenstudiedoverlastfewyears[
35,
9,
32,
53,
23]
.However,thefirstproposalofdevelopingaCADlanguageinterfacewassuggesteddecadesagoin[
40]
.Amongtherecentworksinthisdirection,SketchGen[
35
]represents2Dsketchesasasequenceofthetokensforcurvesandconstraints.Thedecoder-onlytransformermodelin[
35
]predictsoptimalsketchesthroughnucleussampling[
12
]oftokenembeddingvectors,focusingonreplicatingdrawingprocessesofCADdesigners.Polygen[
32
]methodalsoemploysTransformermodel[
48
]togeneratedetailed3DpolygonalmeshesbylearningjointdistributiononverticesandfacesofaCAD.Asanextensionof[
32
],TurtleGen[
53
]alsoproposedecoder-onlytransformermodeltolearnjointdistributionofverticesandedgestogetherthatformsketchesandrepresentedasgraphsinCADmodels.
3DCADmodelingstepsasalanguageisnotdirectlyformulatedbyanystate-of-the-artmulti-modalCADlearningmethods[
55,
28,
18,
58,
8,
30,
57,
23]
.Khanetal.
[18
]proposeanovelauto-regressivegenerationofsketch-and-extrusionparametersdirectlyfrom3DpointcloudsasinputwhereasDeepCAD[
55
],SkexGen[
58
],HNC[
57
]andMultiCAD[
28
]adoptsatwo-stagestrategytogeneratetheoutput.MultiCAD[
28
]adoptmulti-modalcontrastivelearningtoassociategeometryfeatureswithfeaturesofCADconstructionsequenceswhereasCAD-SIGNet[
18
]requiresanextrastepasuserfeedbacktovoteforoneofthemanygeneratedsketchesatcurrentsteptopredictthenext
one.Unlikepreviousapproaches,ourproposedText2CADtransformeristhefirstauto-regressive
networkthatgeneratesCADconstructionsequencesdirectlyfromtextualdescriptions.
3Text2CADDataAnnotation
ThediagraminFig.
2
outlinestheprocessofgeneratingtextualannotationsforDeepCADdataset[
55]
usingLargeLanguageModels(LLMs)[
16,
34,
46
]andVisionLanguageModels(VLMs)[
26,
25]
.TheseannotationsdescribethecorrespondingCADconstructionworkflowinhumaninterpretableformat.ToenrichtheDeepCAD[
55
]datasetwithtextualannotations,weimplementatwo-stagedescriptiongenerationpipelineusingthecapabilitiesofbothLLMsandVLMs.Thetwostagesare-(1)generatingabstractshapedescriptionsusingVLM,and(2)extractingmulti-leveltextualinstructionsfromLLMbasedontheshapedescriptionsanddesigndetailsprovidedinthedataset.AnexampletextpromptfortheCADmodelshownintop-leftoftheFigure
2:
‘TheCADmodelconsists
3DCADModel
DeepCADDataset
Multi-ViewImages(MVI)Extractor
RawJson
VLMPrompt
MinimalMetadataGenerator
LLaVaNext+Mistral-7B
Saenormaon
hpifti
MinimalJson
VLMPrompt
[INST]ThisisanimageofaComputerAidedDesign(CAD)model.YouareaseniorCADengineerwhoknowstheobjectname,whereandhowtheCADmodelisused.GiveanaccuratenaturallanguagedescriptionabouttheCADmodeltoajuniorCADdesignerwhocandesignitfrom
yoursimpledescription.Wrapthedescriptioninthefollowingtags<OBJECT>and</OBJECT>.
Followingaresomebadexamples:
1.CADmodel
2.Metalobject
Abidebythefollowingrules.Rules:
1.Donotusewordslike-"blue","shadow","transparent","metal","plastic","image","black","grey","CADmodel","abstract","orange","purple","golden","green"
2./INST]
V
NaturalLanguageInstruction(NLI)GenerationPrompt
NLIPrompt
[INST]
YouareaseniorCADengineerandyouaretaskedtoprovidenaturallanguageinstructionstoajuniorCADdesignerforgeneratingaparametricCADmodel.
OverviewinformationabouttheCADassemblyJSON:
1.TheCADassemblyjsonliststheprocessofconstructingaCADmodel.
2.EveryCADmodelconsistsofoneormultipleintermediateCADparts.
3.TheseintermediateCADpartsarelistedinthe"parts"keyoftheCADassemblyJSON.
4.ThefirstintermediateCADpartisthebasepartandthesubsequentpartsbuilduponthepreviouslyconstructedpartsusingtheoperationdefinedforthatpart.
5.Allintermediatepartscombinetoafinalcadmodel.
EveryintermediateCADpartisgeneratedusingthefollowingsteps:Step1:Drawa2Dsketch.
Step2:Scalethe2Dsketchusingthesketch_scalescalingparameter.
Step3:Transformthescaled2Dsketchinto3DSketchusingtheeuleranglesand
translation.
Step4:Extrudethe2Dsketchtogeneratethe3Dmodel.[/INST]
ExampleNLIPrompt
Mistral-50B(MoE)
MinimalJSON
K-Shot
NLIResponse
Multi-LevelNaturalLanguage
Instruction(NLI)GenerationPrompt
BeginnerLevelCADInstructions
AbstractLevelCADInstructions
ExpertLevelCADInstructions
IntermediateLevelCADInstructions
《ii>
“final_shape”:“Acylindricalobjectwithaflattopandbottom”
“parts”:{“part_1”:{
"coordinate_system":{
"EulerAngles":[0.0,0.0,0.0],
"TranslationVector":[0.1071,0.1071,
0.0974]},
"sketch":{
"face_1":{
"loop_1":{
"circle_1":{
"Center":[0.112,0.112],
"Radius":0.112}...},...}
"extrusion":{
"extrude_depth_towards_normal":0.0,"extrude_depth_opposite_normal":
0.0487,
"sketch_scale":0.6429,"operation":
"NewBodyFeatureOperation:},...}
Level-0Level-1Level-2Level-3
Figure2:Text2CADDataAnnotationPipeline:Ourdataannotationpipelinegeneratesmulti-leveltextpromptsdescribingtheconstructionworkflowofaCADmodelwithvaryingcomplexities.Weuseatwo-stagemethod-(Stage1)ShapedescriptiongenerationusingVLM(Stage2)Multi-LeveltextualannotationgenerationusingLLM.
ofacylindricalobjectwithaflattopandbottomconnectedbyacurvedsurfaceandslightlytaperedtowardsthebottom.Thisobjectiscreatedbyfirstsettingupacoordinatesystem,thensketchingtwoconcentriccirclesanddrawingaclosedloopwithlinesandanarconasharedplane.Thesketchisthenextrudedalongthenormaldirectiontoformasolidbody.Theresultingparthasaheightofapproximately0.0487units’.Inthisexample,thephraseinthevioletcolorisgeneratedbyaVLM.AnLLMusesthisdescriptionalongwiththeCADconstructioninformationtogeneratetheprompt.
ShapeDescriptionsusingVLM:Theinitialstepofourannotationgenerationpipelineinvolvesgeneratingabstractobject-leveldescriptionsoftheCADmodelsusingLLaVA-NeXT[
25
]model.Theobjectiveinthisstepistoaccuratelycapturethestructuraldescriptionsofthe3Dshape,suchas"aring-likestructure","acylinder",or"ahexagonwithacylinderontop".WegenerateshapedescriptionsforboththefinalCADmodelanditsintermediateparts.Wefirstproducemulti-viewimagesfrompredeterminedcameraanglesforeachindividualpartsandthefinalCADmodel.Theseimagesarethenutilizedinapredefinedprompt(refertothetop-rightofFigure
2)fortheLLaVA
-NeXT[
25
]modeltogeneratesimplifiedshapedescriptionsofallindividualpartsaswellasthecompletefinalshape.
Multi-levelDesignInstructionsusingLLM:Inthisstage,multipletextualannotationscorrespond-ingtodifferentdesigndetailsofaCADmodelaregeneratedusingMixtral-50B[
16
]throughaseriesofsteps(refertothemiddle-columninFigure
2)
.TheDeepCAD[
55
]datasetcontainsCADconstruc-tionsequencesinJSONformat.WefirstpreprocesstherawCADconstructionsequencesusinga‘MinimalMetadataGenerator’whichreplacesrandom,meaninglesskeyswithmoremeaningfulterms(e.g.,"part_1","loop_1").Thisstepaimstoreducethehallucinations[
13
]byMixtral-50B[
16]
.TheminimalmetadataisfurtheraugmentedwiththeshapedescriptionsforeachpartsandthefinalmodelgeneratedbytheVLM.TheoutputofthisprocessisacondensedrepresentationoftheshapesandtheirrelationalattributeswithintheCADdesign(seebottom-leftinFigure
2)
.Withtheminimalmetadataathand,wethencraftaprompt(refertothebottom-rightinFigure
2)togeneratedetailednatural
languageinstructions(NLI)ensuringaminimallossofinformationfromtheminimalmetadata.Afterward,theNLIresponsesarerefinedbyLLMusingak-shot[
4
]"Multi-LevelNaturalLanguageInstructionGenerationPrompt"togeneratemulti-levelinstructionsofdifferentspecificityanddetails.Wecategorizetheselevelsas:
?Abstractlevel(L0):AbstractShapeDescriptionsofthefinalCADmodelextractedusingVLMinthefirststage.
?Beginnerlevel(L1):SimplifiedDescription-Aimedatlaypersonsorpreliminarydesignstages,thislevelprovidesasimplifiedaccountofthedesignsteps,eschewingcomplexmeasurementsandjargon.
4
5
?Intermediatelevel(L2):GeneralizedGeometricDescription-Thislevelabstractssomeofthedetails,providingageneralizeddescriptionthatbalancescomprehensibilitywithtechnicalaccuracy.
?Expertlevel(L3):DetailedGeometricDescriptionwithRelativeValues-Here,theinstruc-tionsincludeprecisegeometricdescriptionsandrelativemeasurements,cateringtouserswhorequireanin-depthunderstandingorareperformingtheCADmodelingtask.
Ourannotationsconsistofthegeneratedmulti-levelinstructionsatthefinalstage.Wegeneratetheseannotationsoverthecourseof10days.It’sworthnotingthatonecandirectlygeneratethemulti-levelinstructionsfromtheminimalmetadatawithoutcreatingthedetailednaturallanguageinstructionsinthesecondstage.WeobservethatthisstrategyincreasestheLLM’stendencyforhallucinations[
13]
anditgeneratesmoreinaccuratemulti-levelinstructions.Insteadourmethodfollowschain-of-thoughtpromptingstrategyasoutlinedin[
51
]whichgreatlyreducessuchbottleneck.MoredetailsonourannotationpipelineareprovidedinSection
10
and
11
ofthesupplementarymaterial.
4Text2CADTransformer
TheText2CADtransformerarchitecture,asshowninFigure
3,isdesignedtotransformnatural
languagedescriptionsinto3DCADmodelsbydeducingallitsintermediatedesignstepsautoregres-sively.GivenaninputtextpromptT∈RNp,whereNpisthenumberofwordsinthetext,ourmodellearnstheprobabilitydistribution,P(C|T)definedas
whereCistheoutputCADsequence,NcisthenumberoftokensinCandθisthelearnablemodelparameter.WerepresentCasasequenceofsketchandextrusiontokensasproposedin[
18]
.Eachto-kenct∈Cisa2Dtokenthateitherdenotesa(1)2D-coordinateoftheprimitivesinsketch,(2)oneoftheextrusionparameters(eulerangles/translationvector/extrusiondistances/booleanoperation/sketchscale)or(3)oneoftheendtokens(curve/loop/face/sketch/extrusion/startsequence/endsequence).Following[
55,
18]
.Wequantizethe2Dcoordinatesaswellasthecontinuousextrusionparametersin8bitsresultingin256classlabelsforeachtoken.AnexampleCADsequencerepresentationisprovidedinFigure
3
(inbluetable).Formoredetails,pleaserefertothesupplementarysection
9.
Nowweelaborateonthevariouscomponentsofthearchitecture,detailingtheprocessesinvolvedinconvertingtexttoCADrepresentations.Lettheinputtextpromptattimestept?1beT∈RNpandtheinputCADsubsequenceC1:t?1∈RNt?1×2.
PretrainedBertEncoder:TheinitialstepintheText2CADnetworkinvolvesencodingthetextualdescriptionprovidedbytheuser.Thisdescriptioncanrangefromhighlyabstract,beginner-friendlyinstructionstodetailed,expert-levelcommands.Tohandlethisdiversity,weusedapre-trainedBERT(BidirectionalEncoderRepresentationsfromTransformers)[
7
]model,denotedBERTpre-trained.TheinputtextT∈RNtistokenizedandpassedthroughtheBERTmodeltogeneratecontextual
embedding:T=BERTpre-trained(T)(2)
Here,T∈RNp×dprepresentsthesequenceoftokenembeddingvectorsthatcapturethesemanticmeaningoftheinputtext,whereNpisthenumberoftokensanddpisthedimensionoftheembedding.AdaptiveLayer.Anadaptivelayerconsistingof1transformerencoderlayer,refinestheoutputToftheBERTencodertobettersuittheCADdomainaligningwiththespecificvocabularyandstructuralrequirementsofCADinstructions.TheadaptivelayeroutputstheembeddingTadapt∈RNp×dpusingTadapt=AdaptiveLayer(T)(3)
CADSequenceEmbedder:EachtokenintheinputCADsubsequenceC1:t?1isinitiallyrep-resentedasaone-hotvectorwithadimensionof256,resultinginaone-hotrepresentation,
Ct?1∈RNt?1×2×256.Forthesakeofsimplicity,werepresentCt?1=[Cx:t?1;Cy:t?1],
whereCx:t?1,Cy:t?1∈RNt?1×256.TheinitialCADsequenceembeddingF?1∈RNt?1×dis
obtainedusingEq.
4
F?1=Cx:t?1W?1+Cy:t?1W?1+P(4)
,whereW?1,W?1∈RNt?1×darelearnableweightsandP∈RNt?1×disthepositionalencoding.
Layer-wiseCrossAttention.Weuseastandardtransformerdecoder[
48
]withlayer-wisecross-attentionmechanismbetweentheCADandthetextembeddingwithinthedecoderblocks.The
HighlyAbstract
BeginnerCADDesignerFriendlyInstructions
MoreDetailed
ForExpertsLevel
Instructions
《>
TheCADmodelfeaturesanelegant,curved,hollowdesigninspiredbythestylizedletter'O'.
T
C2:t
C1:t1
Pre-trainedBeRTEncoder
139
139
139
266
5
0
4
0
52
176
139
44
5
0
1
0
139
139
139
266
5
0
4
0
52
176
139
44
(139,266)
(52,176)(225,176)
(139,139)(139,44)
AdaptiveLayer
uuu·v·i··uuaIadapt
Fo
t
Fti1
>
SketchTokens
2
0
150
0
139
0
139
0
142
0
139
44
2
0
150
0
139
0
139
0
142
0
139
44
>
MLP
TransformerDecoderBlock
FFN
MHA
Downsampler
kg一—yv
Cross-Attention
Reconstructed3DCADModel
TrainableParametersFrozenParameters
6
0
ExtrusionTokens
PositionalEncoding
6
0
1
0
OutputCAD
InputCADTokens
Tokens
Figure3:Networkarchitecture:Text2CADTransformertakesasinputatextpromptTandaCADsubsequenceC1:t?1oflengtht?1.ThetextembeddingTadaptisextractedfromTusingapretrainedBeRTEncoder([
7
])followedbyatrainableAdaptivelayer.Theresultingembedding
TadaptandtheCADsequenceembeddingF?1ispassedthroughLdecoderblockstogeneratethe
fullCADsequenceinauto-regressiveway.
layerwisecross-attentionmechanismfacilitatestheintegrationofcontextualtextfeatureswiththeCADembedding,allowingthemodeltofocusonrelevantpartsofthetextduringCADconstruction.
EachdecoderblockltakesasinputCADembeddingF?andtextembeddingTadapt,whereF?is
theoutputofthepreviousdecoderblock(forthefirstdecoderblock,theinputCADembeddingis
F?1).Atfirst,theCADembeddingF?1∈RNt?1×disgeneratedfromF?using
F?1=MHA(F?)
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024年度演出合同電影主演與制片方之間的表演委托合同
- 2024標準短期汽車租賃合同模板版B版
- 2024排水溝施工與河道整治綜合服務(wù)合同3篇
- 2024甲方提供技術(shù)支持乙方進行市場拓展的合同
- 2025年智能家居中央空調(diào)設(shè)備銷售與系統(tǒng)集成合同3篇
- 2024年租賃物業(yè)管理合同3篇
- 2024武漢住宅租賃合同(含租賃雙方爭議解決機制)3篇
- 2024年直飲水系統(tǒng)安裝工程項目協(xié)議版B版
- 2024水產(chǎn)養(yǎng)殖綠色生態(tài)養(yǎng)殖技術(shù)合作協(xié)議3篇
- 安全知識培訓(xùn)課件制作
- 超級充電綜合站及配套設(shè)施建設(shè)項目可行性研究報告
- 2023年核心素養(yǎng)下的初中歷史教學(xué)有效性策略
- 眼科學(xué) 眼外傷(課件)
- 索具螺旋扣規(guī)格花籃螺絲
- GB/T 9364.4-2016小型熔斷器第4部分:通用模件熔斷體(UMF)穿孔式和表面貼裝式
- GB/T 21709.1-2008針灸技術(shù)操作規(guī)范第1部分:艾灸
- GB/T 16288-2008塑料制品的標志
- 住院醫(yī)師規(guī)范化培訓(xùn)臨床實踐能力結(jié)業(yè)考核專科技能操作評分表(耳鼻咽喉科)氣管切開術(shù)
- DBJ-T 13-195-2022 燒結(jié)煤矸石實心磚和多孔磚(砌塊) 應(yīng)用技術(shù)標準
- 意大利FM筋膜手法治療量表
- 靜態(tài)爆破專項施工方案
評論
0/150
提交評論