基于KINECT的點(diǎn)云采集系統(tǒng)開題報(bào)告_第1頁
基于KINECT的點(diǎn)云采集系統(tǒng)開題報(bào)告_第2頁
基于KINECT的點(diǎn)云采集系統(tǒng)開題報(bào)告_第3頁
基于KINECT的點(diǎn)云采集系統(tǒng)開題報(bào)告_第4頁
基于KINECT的點(diǎn)云采集系統(tǒng)開題報(bào)告_第5頁
已閱讀5頁,還剩9頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、基于KINECT的點(diǎn)云采集系統(tǒng)1. 研究的目的、意義由于三維激光掃描的軟硬件水平日趨成熟和大眾化,相應(yīng)的三維點(diǎn)云采集技術(shù)越來越受關(guān)注的研究熱點(diǎn),廣泛應(yīng)用于計(jì)算機(jī)動(dòng)畫、醫(yī)學(xué)圖像處理、文物保護(hù)、地形勘探、游戲開發(fā)和數(shù)字化媒體創(chuàng)作等領(lǐng)域。點(diǎn)云采集技術(shù)是近來數(shù)據(jù)采集的一個(gè)發(fā)展熱點(diǎn),因?yàn)槠渚哂锌焖傩?、穿透性、不接觸性、實(shí)時(shí)性、動(dòng)態(tài)性、主動(dòng)性、全數(shù)字性、高密度、高效率等許多傳統(tǒng)的常規(guī)測量技術(shù)所不具有的優(yōu)點(diǎn),所以被廣泛的應(yīng)用于很多領(lǐng)域,具有廣闊的發(fā)展前景和應(yīng)用需求。近年來三維激光掃描設(shè)備在效率、精度和價(jià)格方面得到了極大地發(fā)展,同時(shí)為了滿足人們的需求三維點(diǎn)云采集技術(shù)也成為了研究的重點(diǎn)。在實(shí)際測量中,由于光的線

2、性傳播特性,三維激光掃描設(shè)備在一個(gè)視角下對(duì)于具有復(fù)雜形狀的物體的某些區(qū)域或者背面往往存在視覺盲點(diǎn),需要通過多次不同視角下的測量才能實(shí)現(xiàn)完整的模型數(shù)據(jù)采集;由于掃描設(shè)備測量范圍有限,對(duì)于大尺寸物體或者大范圍場景,不能一次性完整測量,必須分塊測量。上述的問題導(dǎo)致測量結(jié)果往往是多塊具有不同坐標(biāo)系統(tǒng)且存在噪聲的點(diǎn)云數(shù)據(jù),不能夠完全滿足人們對(duì)數(shù)字化模型真實(shí)度和實(shí)時(shí)性的要求,因此對(duì)三維點(diǎn)云采集技術(shù)有重要的研究意義。2. 國內(nèi)外研究現(xiàn)狀在中國,政府和科研機(jī)構(gòu)均開始高度關(guān)注大數(shù)據(jù),工信部發(fā)布的物聯(lián)網(wǎng)“十二五”規(guī)劃上,把信息處理技術(shù)作為四項(xiàng)關(guān)鍵技術(shù)創(chuàng)新工程之一提出。在進(jìn)行三維數(shù)據(jù)采集時(shí),有很多不確定的因素都會(huì)使

3、得采集時(shí)引入噪聲,在使用數(shù)據(jù)時(shí),我們要盡可能的降低噪聲所造成的影響,有些學(xué)者通過濾波的方式進(jìn)行處理。陳曉霞等在進(jìn)行數(shù)據(jù)點(diǎn)的篩選時(shí)主要進(jìn)行了兩個(gè)步驟的操作,首先在預(yù)處理時(shí)利用了密度聚類這種方式,然后通過VRML實(shí)時(shí)交互的顯示功能來展現(xiàn)。張毅等利用K領(lǐng)域結(jié)合核函數(shù),來對(duì)點(diǎn)云數(shù)據(jù)進(jìn)行噪聲的去除。在當(dāng)前三維點(diǎn)云數(shù)據(jù)配準(zhǔn)研究領(lǐng)域中,使用最為廣泛的算法當(dāng)屬ICP 算法了。ICP 算法自 1992 年由 Besl 和 Mckay提出之后便在自動(dòng)配準(zhǔn)方面得到了廣泛的引用。但是由于傳統(tǒng)的 ICP 算法的效率并不高,而且對(duì)初始值要求很高,容易陷入局部收斂等缺陷,很多學(xué)者對(duì) ICP 算法進(jìn)行了改進(jìn)。Hans Mar

4、tin Kjer 等利用基于曲率的方法進(jìn)行抽樣加速配準(zhǔn)效率;孫謙等利用法向量內(nèi)積加權(quán),但是由于人為的因素對(duì)最后的配準(zhǔn)精度與效率造成了一定的影響;賀永興等提出了一種基于鄰域特征的配準(zhǔn)方法;蔣成成等利用 Delaunay 三角剖對(duì)ICP 算法進(jìn)行了一定的改進(jìn)。為了使數(shù)據(jù)模型的復(fù)雜度更適合于有限的計(jì)算機(jī)資源,必須對(duì)數(shù)據(jù)模型進(jìn)行簡化。對(duì)三維點(diǎn)云數(shù)據(jù)的精簡方法主要可以概括為兩大類:一類是對(duì)三維點(diǎn)云數(shù)據(jù)進(jìn)行拓?fù)鋭澐掷猛負(fù)潢P(guān)系進(jìn)行精簡,另一類是根據(jù)特征信息來選取代表點(diǎn)從而進(jìn)行精簡。曲率精簡法是典型的直接通過代表點(diǎn)選取來進(jìn)行精簡的方法。國外的 Martin 等人在1997 年提出均勻網(wǎng)格法這種方法,但是這種

5、方法的局限性在于它經(jīng)常會(huì)誤刪除掉一些特征數(shù)據(jù);Chen 等根據(jù)法向量來進(jìn)行精簡,但是對(duì)點(diǎn)云數(shù)據(jù)要求比較局限;Lee 等的改進(jìn)方法對(duì)于物體表面特征的保留起到了一定的作用,但是時(shí)間開銷太大。國內(nèi)的張麗艷等人也是利用法向量來精簡點(diǎn)云,該方法對(duì)于物體特征保留雖然不錯(cuò),但是由于實(shí)驗(yàn)中的某些參數(shù)是根據(jù)經(jīng)驗(yàn)值來選擇的,這個(gè)方法的可操作性就不強(qiáng)了;朱冒冒等人是通過二次精簡來進(jìn)行改進(jìn)的,相對(duì)來說精簡的比較合理了;史寶全等提出的聚類精簡算法雖然在保留點(diǎn)云特征上面已經(jīng)不錯(cuò)了,可計(jì)算量有所增加;杜曉暉等人提出的混合算法簡化效率就比較高了,但是時(shí)間略有下降。3. 擬采取的研究路線(1)點(diǎn)云數(shù)據(jù)的采集 本文提出了一種基于

6、kinect的點(diǎn)云采集系統(tǒng)設(shè)計(jì)方案,該方案以kinect為核心,利用模塊化C+模板庫PLC(點(diǎn)云庫)中提供的通用采集接口,可以直接獲取到實(shí)際坐標(biāo)空間的三維信息,三維坐標(biāo)信息保存為點(diǎn)云數(shù)據(jù),提高了點(diǎn)云數(shù)據(jù)的采集速度。本方案的采集系統(tǒng)是由局域網(wǎng)內(nèi)的一臺(tái)計(jì)算機(jī)負(fù)責(zé)kinect點(diǎn)云數(shù)據(jù)的采集,它將采集到的圖像深度信息轉(zhuǎn)換為實(shí)際空間上的點(diǎn)的三維坐標(biāo)信息,利用了kinect的深度成像原理,采用OpenNI開放自然交互框架來抓取kinect設(shè)備中的點(diǎn)云數(shù)據(jù)。(2)點(diǎn)云數(shù)據(jù)去噪聲處理由于受人為擾動(dòng)、光照、掃描設(shè)備本身的缺陷等因素的影響,采集到的數(shù)據(jù)會(huì)受到噪聲污染,需要進(jìn)行去噪聲處理,根據(jù)噪聲在各個(gè)方向上擴(kuò)散方

7、式不同我們可以采用各向同性和各向異性算法對(duì)噪聲進(jìn)行處理。(3) 點(diǎn)云數(shù)據(jù)的參數(shù)化表示利用三角網(wǎng)格參數(shù)化將原始模型上的數(shù)據(jù)點(diǎn)映射到給定的參域上,建立點(diǎn)云數(shù)據(jù)到參數(shù)域上新的點(diǎn)云集合Q之間的一種對(duì)應(yīng)關(guān)系&:G>Q,并且要求在一定意義下幾何變形達(dá)到極小。(4) 點(diǎn)云數(shù)據(jù)的可視化處理 針對(duì)三維模型的幾何特征,提出基于三角形簡化的多分辨率復(fù)雜三維模型生成算法,生成多尺度的三維點(diǎn)云數(shù)據(jù)結(jié)構(gòu),同時(shí)構(gòu)建對(duì)應(yīng)的多分辨率紋理特征模型在建立幾何與紋理尺度關(guān)聯(lián)的基礎(chǔ)上,采用R+樹的索引機(jī)制實(shí)現(xiàn)三維模型的分塊存儲(chǔ),建立靜態(tài)多層次(LOD)的三維模型分塊數(shù)據(jù)結(jié)構(gòu)。在進(jìn)行三維場景瀏覽時(shí),依據(jù)客戶端現(xiàn)場范圍裁剪和

8、網(wǎng)絡(luò)傳輸效率,在服務(wù)端快速檢索和獲取相應(yīng)的靜態(tài)LOD三維模型數(shù)據(jù),自適應(yīng)分塊傳輸?shù)娇蛻舳?,?shí)時(shí)生成符合視覺要求的動(dòng)態(tài)LOD三維模型,達(dá)到最佳的可視化效果。4. 文獻(xiàn)綜述Kinect 是微軟公司研制的一款體感外設(shè),最初是針對(duì)其游戲主機(jī) XBOX360 推出的一套外設(shè)產(chǎn)品,適用領(lǐng)域也僅限于游戲領(lǐng)域。但是其高科技的含量以及該裝置本身的創(chuàng)意在發(fā)售后的兩年內(nèi)開始逐漸應(yīng)用于許多領(lǐng)域,隨著近來 kinect for windows 這款針對(duì) windows 平臺(tái)的研發(fā)設(shè)備的推出,目前世界上尤其是國外的一批人工智能科學(xué)家,人機(jī)交互,體感互動(dòng)工程師和研究小組等等也紛紛在對(duì) kinect 的應(yīng)用領(lǐng)域做探究和研發(fā)。

9、 Kinect 最初的開發(fā)代號(hào)稱為 Natal,之后正式更名為Kinect。Kinect 技術(shù)是微軟公司基于高端研究得出來的電子科技產(chǎn)品,是微軟在依靠人工智能解決復(fù)雜問題的過程中產(chǎn)生的一個(gè)副產(chǎn)品,這就是 Kinect 的來歷。除了體感設(shè)備已經(jīng)比較普及的游戲領(lǐng)域外,Kinect 的應(yīng)用和實(shí)驗(yàn)性應(yīng)用正在快速發(fā)展,下面通過幾個(gè)領(lǐng)域已經(jīng)出現(xiàn)了的應(yīng)用來探討 Kinect 的應(yīng)用領(lǐng)域。(1)虛擬應(yīng)用。歐洲時(shí)裝店 Topshop 在莫斯科旗艦店安裝了一種全新的試衣間,這種虛擬的試衣間利用了當(dāng)前最先進(jìn)了兩種技術(shù):增強(qiáng)現(xiàn)實(shí)(augmented reality,AR)和微軟 Kinect 體感外設(shè),你無需試穿就能

10、見到真實(shí)的試衣效果。(2)3D 建模雕塑工具。經(jīng)過國外一些小組的實(shí)驗(yàn),多臺(tái) Kinect 可以用作 3D 攝像機(jī)并進(jìn)行 3D 建模。在一個(gè)名為 Blablab LAB 的小組的街頭實(shí)驗(yàn)中,通過使用三臺(tái) Kinect 為游客進(jìn)行掃描建模,然后使用 Rep Rap 3D 打印機(jī)制作出一個(gè)迷你的雕塑。(3)機(jī)械控制遙控機(jī)器人。使用 Kinect 作為機(jī)器人的頭,通過 Kinect 檢測周圍環(huán)境,并進(jìn)行 3D 建模,來指導(dǎo)機(jī)器人的行動(dòng)。 因?yàn)闄C(jī)器人的可應(yīng)用領(lǐng)域非常廣泛,低廉成本的 kinect機(jī)器人可以代替?zhèn)鹘y(tǒng)機(jī)器進(jìn)行一些不需要很高精度的危險(xiǎn)地區(qū)或者地底高空等惡劣環(huán)境下的測量與勘察作業(yè)。(4)虛擬實(shí)驗(yàn)

11、醫(yī)學(xué)領(lǐng)域。Kinec 在醫(yī)學(xué)領(lǐng)域中,可代替醫(yī)生進(jìn)行尸檢,研究人員只需要對(duì)著空氣做手勢或者語音,就可控制 3D 圖片放大縮小旋轉(zhuǎn)等功能。根據(jù)測量探頭的組成方式不同,被測對(duì)象表面數(shù)據(jù)的獲取主要包括接觸式和非接觸式測量兩類。接觸式測量的代表性設(shè)備是三坐標(biāo)測量機(jī)(CMM)。但是由于接觸式測量設(shè)備與被測物體接觸,不可避免的使被測物體產(chǎn)生變形,因此測量誤差較大。非接觸式則應(yīng)用光學(xué)及激光原理進(jìn)行激光掃描或光學(xué)掃描等,不存在受力變形產(chǎn)生的誤差。大多數(shù)實(shí)用的非接觸式測量儀器都采用結(jié)構(gòu)光照明技術(shù),投影儀器發(fā)出結(jié)構(gòu)照明光束,接收器接受由被測三維表面返回的光信號(hào)。由于三維面形對(duì)結(jié)構(gòu)照明光束產(chǎn)生的空間或時(shí)間調(diào)制,因此可

12、以通過適當(dāng)?shù)姆椒◤挠^察光場中解調(diào)出三維面形數(shù)據(jù)。激光三維掃描設(shè)備采集到的數(shù)據(jù),是大量的三維點(diǎn)坐標(biāo)的集合。由于點(diǎn)的數(shù)量巨大,其數(shù)據(jù)被形象的成為點(diǎn)云數(shù)據(jù)。點(diǎn)云數(shù)據(jù)采集過程一般為:將儀器與電源、微機(jī)連接并開啟,打開數(shù)據(jù)識(shí)別和處理軟件。建立定點(diǎn)參照目標(biāo),并開啟掃描儀坐標(biāo)系統(tǒng)的自動(dòng)識(shí)別功能,建立三維坐標(biāo)系統(tǒng)。在當(dāng)前坐標(biāo)系統(tǒng)內(nèi),對(duì)采集范圍內(nèi)的實(shí)體進(jìn)行數(shù)字采集,并建立三維圖形。一次采集完畢后,更換儀器地點(diǎn),通過定點(diǎn)參照物重新識(shí)別當(dāng)前坐標(biāo),進(jìn)行數(shù)據(jù)的多次采集,并自動(dòng)完成數(shù)據(jù)的空間合并。對(duì)掃描得到的云點(diǎn)數(shù)據(jù)進(jìn)行先期處理,包括對(duì)模型的分割、修剪、移動(dòng)、旋轉(zhuǎn)、縮放等等。通過開放的數(shù)字接口,對(duì)當(dāng)前模型數(shù)據(jù)進(jìn)行轉(zhuǎn)換,使

13、其與后期三維設(shè)計(jì)軟件和開發(fā)軟件兼容、并行和共享。同樣我們也可以利用Atos掃描儀進(jìn)行點(diǎn)云數(shù)據(jù)的采集。Atos三維掃描儀是一種帶有兩個(gè)CCD攝像機(jī)和一個(gè)中央投影單元的光學(xué)三維掃描儀。它的中央投影單元部分配備了一個(gè)白色的投射燈泡和一個(gè)可規(guī)則滑動(dòng)的復(fù)雜光柵。Atos掃描儀的傳感器被固定在一個(gè)三腳架上,并可以十分方便的沿四軸方向轉(zhuǎn)動(dòng)。測量時(shí),投射燈泡將規(guī)則變化的光柵投影到被測工件表面產(chǎn)生的摩爾條紋,摩爾條紋的變化被CCD鏡頭記錄下來,并轉(zhuǎn)送到計(jì)算機(jī),經(jīng)過處理以后得到兩個(gè)CCD鏡頭分別拍攝到的兩張“三維”照片。由于兩個(gè)CCD鏡頭可以感知高達(dá)440,000個(gè)象素,所以每一單幅照片可以采集到1.3萬個(gè)有效數(shù)

14、據(jù)點(diǎn)。Atos軟件可以在瞬間處理這1.3萬個(gè)數(shù)據(jù)并精確的標(biāo)定出其三維空間坐標(biāo)值。在Atos掃描儀進(jìn)行測量,即點(diǎn)云采集的過程中,誤差的產(chǎn)生是難以避免的,但如果誤差累積到一定程度,就無法達(dá)到精度的要求。因此,正確的測量順序應(yīng)該是由中部向四周逐漸擴(kuò)展測量,這樣做所得到的誤差是最小的。 點(diǎn)云數(shù)據(jù)采集的工作特點(diǎn):(1)多幅性。各種數(shù)據(jù)采集系統(tǒng),由于測量范圍的限制,所得的點(diǎn)云是一幅一幅進(jìn)行測量的,一般情況下,一幅點(diǎn)云的測量并不能包含工件上所有需要的點(diǎn),所以最終工件完整點(diǎn)云數(shù)據(jù)的獲得需要利用多次測量的多幅點(diǎn)云進(jìn)行拼合。這種拼合在測量系統(tǒng)中一般有自動(dòng)拼合和人工拼合兩種方式,也有相應(yīng)的軟件。自動(dòng)拼合方式有邊測量

15、邊拼合,例如在光學(xué)三維掃描系統(tǒng)ATOS中的利用相鄰兩幅點(diǎn)云的共同參考點(diǎn)進(jìn)行拼合,以及利用數(shù)碼相機(jī)定位、經(jīng)TRITOP軟件處理生成的整體參考點(diǎn)數(shù)據(jù)與包含特定參考點(diǎn)的單幅點(diǎn)云數(shù)據(jù)拼合;也有利用工件表面特征在測量完成后進(jìn)行自動(dòng)拼合的,例如利用Geomagic軟件進(jìn)行的多幅點(diǎn)云的自動(dòng)拼合。(2)工件的多樣性。在測量中,工件的形狀、尺寸以及工件中不同部位的精度不同,決定著需要采取不同的測量策略和測量手段來進(jìn)行測量。一般情況下,對(duì)于工件需要進(jìn)行幾何反求的部分,例如汽車飯金、各種鑄件外形、各種注塑件等,由于其功能各異,因而形狀各異;由于其模型材料各異,有的是用油泥塑造的,有的是塑料件,有的是鋼件,有的是玻璃

16、件,有的是海綿或橡膠件,因而表面形態(tài)各異,功能不同,復(fù)雜程度各異。(3)數(shù)據(jù)采集要求的多變性。在進(jìn)行數(shù)據(jù)采集時(shí),要明確所采集數(shù)據(jù)的用途,所采集的數(shù)據(jù)與產(chǎn)品中哪些部分相關(guān)或在空間以產(chǎn)品中哪些部件的相應(yīng)部分為參考點(diǎn),以決定數(shù)據(jù)采集是對(duì)產(chǎn)品部件單獨(dú)進(jìn)行,還是在產(chǎn)品裝配中包含相關(guān)部件來進(jìn)行。這在進(jìn)行數(shù)據(jù)采集前首先要予以明確,以減少不必要的返工。數(shù)據(jù)采集中對(duì)工件不同部位精度要求不同,例如對(duì)發(fā)動(dòng)機(jī)機(jī)體數(shù)據(jù)采集時(shí),對(duì)用于定位的孔的尺寸精度要求較高。性材料構(gòu)成的工件不可采用接觸式測量,而須采用非接觸測量。參考文獻(xiàn): 1 喬思航,程志權(quán),陳寅,等基于三個(gè)Kinect 的個(gè)性化人體重建系統(tǒng)仿真學(xué)報(bào),2013,25

17、(10),2408 - 24112 李國鎮(zhèn)基于 Kinect 的三維重建方法的研究和實(shí)現(xiàn)D北京:北京交通大學(xué),20123 韋羽棉,尚趙偉基于 Kinect 的旋轉(zhuǎn)剛體三維重建方法計(jì)算機(jī)與現(xiàn)代化,2014(5):89 - 98 4 羅元,謝彧,張毅. 基于 Kinect 傳感器的智能輪椅手勢控制系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)J.機(jī)器人,2012,(01).5 劉鑫,許華榮,胡占義. 基于GPU和 Kinect 的快速物體重建. 自動(dòng)化學(xué)報(bào),2012,38(8):1288-1297.6 周瑾,潘建江,童晶,等.使用 Kinect 快速重建三維人體. 計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào),2013,25(6):873-87

18、9.7 宋詩超, 禹素萍,許武軍基于 Kinect 的三維人體掃描、重建及測量技術(shù)的研究天津工業(yè)大學(xué)學(xué)報(bào),2012,31(5):34 - 41.8 孫晶晶,王金變,管玉基于三維掃描技術(shù)的人體測量天津工業(yè)大學(xué)學(xué)報(bào),2012,31(5):30-33. 9 宋詩超基于Kinect 的三維人體建模與測量的研究上海:東華大學(xué),2013.10 朱德海點(diǎn)云庫 PCL 學(xué)習(xí)教程北京:北京航天航空大學(xué)出版社,201211 余濤Kinect 應(yīng)用開發(fā)實(shí)戰(zhàn):用最自然的方式與機(jī)器對(duì)話北京:機(jī)械工業(yè)出版社,201312 陶麗君基于深度信息的實(shí)時(shí)頭部姿態(tài)估計(jì)廈門:廈門大學(xué),201313 Henry P, Krainin

19、M, Herbst E, etal. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Proceedings of the International Symposium on Experimental Robotics(ISER), 2010.14 Newcombe R A, Izadi S, Hilliges O, etal. Kinect Fusion: Real-Time Dense Surface Mapping and Tracking, in IEEE ISMA

20、R, IEEE, October 2011.15 Izadi S, Kim D, Hilliges O, etal. Kinect Fusion:Real-time 3D reconstruction and interaction using a moving depth cameraJ. In Symposium on User Interface Software and Technology(UIST), 2011.5.5. 外文文獻(xiàn)翻譯The Microsoft KINECT: A Novel Tool for Psycholinguistic Research Rinus G. V

21、erdonschot1, Héloïse Guillemaud2, Hobitiana Rabenarivo2, Katsuo Tamaoka31. Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan 2. Graduate School of Engineering, Nagoya University, Nagoya, Japan 3. Graduate School of Languages and Cultures, Nagoya University, Nagoya, Japan

22、Received 29 May 2015; accepted 26 June 2015; published 30 June 2015 Abstract The Microsoft KINECT is a 3D sensing device originally developed for the XBOX.The Microsoft KINECT opens up many exciting new opportunities for conducting experimental research on hu-man behavior. We investigated some of th

23、ese possibilities within the field of psycholinguistics(specifically: language production) by creating software, using C#, allowing for the KINECT to be used in a typical psycholinguistic experimental setting. The results of a naming experiment using this software confirmed that the KINECT was able

24、to measure the effects of a robust psycholinguistic variable (word frequency) on naming latencies. However, although the current version of the software is able to measure psycholinguistic variables of interest, we alsodiscuss several points where the software can still stand to be improved. The mai

25、n aim of this paper is to make the software freely available for assessment and use by the psycholinguistic community and to illustrate the KINECT as a potentially valuable tool for investigating human behavior, especially in the field of psycholinguistics. Keywords Language Production, Psycholingui

26、stics, KINECT, Psychological Research Tool Introduction The way we interact with technology is rapidly changing. While we were once limited to keyboards and poin-and-click devices, we can now interact with technology using our whole body. The rapidly decreasing cost of 3D sensing technologies (such

27、as the Microsoft KINECT), even allows us to interact with technology through facial expressions and voice information. Although this technology offers exciting new opportunities for experimental research onhuman behavior, the actual implementation of these novel technologies is still in its infancy.

28、 This paper highlights a potentially important role for KINECT technology in a particular area concerning the study of human behavior, namely language production (a subfield of psycholinguistics). This paper is structured as follows: First, we provide a brief background on the existing research and

29、theoretical models of language production, and summarize how dependent variables (such as naming latencies and accuracy) are usually obtained. Second, we introduce several important features of the KINECT sensor and review their potential applications within experimental psycholinguistic research. S

30、ubsequently, we discuss the C# software developed by our lab (all code freely downloadable), which implements the KINECT device to an experimental paradigm by depicting a characteristic experimental situation found in psycholinguistics. Next, we present experimental data within a genuine experimenta

31、l setting by testing 34 participants on a word-frequency paradigm by using the KINECT and validate this data by using an established method in the field (i.e., by voice key). Finally, we point out particular shortcomings of the current version of the software and avenues for resolving these shortcom

32、ings and implementing the KINECT in future research, both on language production and in general.1. Short Background on Language Production Research Although the KINECT offers advancements for behavioral research in many fields, this paper focuses on how the KINECT can benefit research on language pr

33、oduction (a part of experimental psycholinguistics). Within thelanguage production literature, there are several theoretical models that describe the way speech is produced: starting from ideas in our head and ending with the actual pronunciation of words (e.g. Dell, 1986; Levelt, Roelofs, & Mey

34、er, 1999). Most of the experimental data supporting these models comes from chronometric research (i.e. measuring reaction time latencies) using basic “triggering devices” such as buttons and voice keys (i.e.electronic circuits initiating a pulse if an input volume crosses a certain threshold). Typi

35、cal experimental paradigms used in language production research either show a particular stimulus on the screen or present a stimulus auditorily and wait for the participant to name a particular target out loud. The time it takes from seeing (or hearing) the stimulus to naming it out loud is called

36、the reaction time (RT) and serves as the main dependent variable together with the accuracy of the response. However, classic lab equipment such as voice keys only capture RTs for the onset of a single word at a time, and the difference between speech and other (irrelevant) sounds (e.g. coughing) ca

37、nnot be distinguished without time consuming post-hoc (or online) manual response checking (although there is freely available software which substantially eases and optimizes this task such as Check Vocal; Protopapas, 2007). This is because voice key triggering will simply occur if the input volume

38、 crosses a certain threshold. Additionally, data will be usually lost if the voice input does not exceed that threshold (e.g.when a participant speaks softly for instance). Moreover, voice keys have no semantic capabilities, which again instigate a need for manual response checking. Finally, some qu

39、estions have arisen about the reliability of voice keys. For example, when speaking, even after phonemes are produced it may take the voice key varying amounts of time to detect them, since some sounds take more or less time to initiate (e.g. /z/ versus /p/; see Kessler, Treiman, & Mullennix, 20

40、02; Sakuma, Fushimi, & Tatsumi, 1997). It is therefore reasonable to state that paradigms found in experimental psycholinguistics can be limited by particular aspects of experimental equipment.2. The Microsoft KINECT Device In contrast to devices designed to be implemented for scientific use onl

41、y, the KINECT is a device (costing roughly 200 USD) developed by Microsoft to be used with video games (e.g. on XBOX and Windows). The KINECT enables users to interact with a computer via gestures and voice commands.The KINECT (v1)1 contains an infrared (IR) emitter and IR depth sensor (640 × 4

42、80 pixels) for 3D tracking, a RGB camera (1280 × 960 pixels) to acquire high-quality RGB color video (both the IR depth sensor and the RGB camera operate at 30 fps) and a microphone array, which contains four microphones for capturing sound. The IR emitter emits infrared light in a predetermine

43、d “speckle pattern” (which are in fact small dots of infrared light that fall on everything in front of the KINECT camera). The IR depth sensor perceives these patterns and determines depth by looking at the displacement of specific dot patterns (e.g. on objects close to the KINECT the dot pattern w

44、ill be spread out, but on far objects the dot pattern will be much denser). Additionally, as there are four microphones, it is possible to accurately retrieve the spatial location of the sound source (e.g. a person speaking), as well as being able to record what is spoken. Furthermore, by using an a

45、ccelerometer it is possible to determine the current orientation of the KINECT and the integrated tilt motor can be used to track objects or people within the room.For research in language production, one particularly important feature of KINECT is its ability to track thehuman face . Microsoft has

46、made a so-called Software Development Kit (SDK; current version for KINECT v1 is 1.8) available which contains numerous programming routines to track a human face in real time. This SDK can measure roughly 100 points (including so-called “hidden points”) resulting in real-time face-tracking. Thus, t

47、he KINECT is able to build a detailed model of the human face, called a face mesh, using sets of triangles and lines. 3. Opportunities Offered by the KINECT for Research in PsycholinguisticsNaturally, the most important issue for researchers is how the KINECT can contribute to their research. The fo

48、llowing list, though incomplete, offers five potential ways we believe the KINECT could advance language production research: 1) The KINECT can track lip movements in real-time, allowing researchers to obtain detailed information on the speech planning process even before actual speech sounds are ut

49、tered. By focusing on the distances between particular points on the lips and face, in combination with the speech recognition pack (found in the SDK), it is possible to determine the onset and offset of individual words. In this paper we report our preliminary efforts to build a novel program that

50、detects the detection of the beginning and end of individual words, by tracking lip movements.2) Another exciting feature is that the KINECT is able to track more than one person over time, which would allow for language experiments to take place in a more natural, conversational setting.3) The KINE

51、CT has the potential to perform basic eye tracking, allowing researchers to assess approximately where participants are looking on a screen. Experimental paradigms may benefit from these additional behavioral measures, which could indicate, for instance, whether participants are engaged in the task

52、at hand, and, if so, which parts of the screen they are mainly fixating on. 4) The KINECT comes with advanced voice recognition (including language packs for many major languages), allowing for automatic post-hoc accuracy checking (see examples in the SDK provided by Microsoft).5) It has been shown

53、that the KINECT is able to on-line track and interpret body gestures and basic emotions,allowing for another dimension to be added to the dependent measures in a psycholinguistic experiment.4. A First Attempt to Implement the KINECT into the Area of Language Production As far as we know, there is no

54、 previous language production literature that utilizes the KINECT. This paper therefore represents the first attempt in this field to integrate the KINECT into the daily practice of a psycholinguistics lab. In this paper we focus on implementing the first of the five abovementioned points, that is,

55、the tracking of lip movements in real-time to gather information on the speech planning process.As there are no previous instances for comparison (again, as far as we know), we set out to program a working version of the KINECT software (using C#) to display experimental stimuli and measure a psycho

56、linguistic variable of interest. We aim to keep the code open and freely available for other researchers to use and adapt to their own insights. Obviously, when running the program, an attached KINECT for Windows, including the SDK is required (and Visual Studio is needed when adapting the code). To

57、 accommodate those who do not have this setup we provide a short video demonstrating the program online. Furthermore, the program(executable and source code) is provided .Notice that we provide the complete working directory in this file to have everything available to experienced programmers (for t

58、hose who simply want to run the program the executable can be found in /bin/x86/debug/FaceTrackingBasicsWPF.exe). The KINECT SDK v1.8 needs to be installed as well.Although the KINECT is able to track more than one person, in this initial stage of program development, only a single person is tracked

59、 during an experiment. The current version of the program is able to:1) Randomly display a word (taken from an Excel file) to a participant.2) Use the KINECT to determine the visual on- and offset of the word relative to its initial presentation (i.e.lip/face points) in real-time.3) Use the KINECT to detect the auditory on- and offset of the word relative to its

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論