IROS2019國際學(xué)術(shù)會議論文集 1228_第1頁
IROS2019國際學(xué)術(shù)會議論文集 1228_第2頁
IROS2019國際學(xué)術(shù)會議論文集 1228_第3頁
IROS2019國際學(xué)術(shù)會議論文集 1228_第4頁
IROS2019國際學(xué)術(shù)會議論文集 1228_第5頁
已閱讀5頁,還剩3頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

Spatiotemporal Representation of Dynamic Scences Darius Burschka 1 Abstract We present a novel representation of dynamic scenes perceived by moving agents This type of environments require constant updates of the map information that uses conventional geometric representations due to the changing relative position of dynamic object to the static scene We show that this changed representation of the environment increases the robustness and accuracy in the perception of the scene and simplifi es signifi cantly the processing and complexity of the perception module At the same time the changed representation allows also a better prioritization of attention to the moving object around the robot that takes into account not the Euclidean distance but the time to interaction TTI which is the core contribution of this approach We present the mathematical framework behind the pro posed representation and show examples how this framework simplifi es and robustifi es the processing in the perception modules of a moving agent I MOTIVATION Environmental models enable mobile agents to generate action plans and to operate in static and dynamic environ ments The original implementations involved single moving agents that operated in static environments where the pos sible collisions with the static geometry of the environment were caused by the motion of the mobile agent itself If a system tried to prioritize the sequence in which possible collisions with the environment may occur the distance to the object was used directly instead since it was proportional to the collision time A relative speed which was constant for all parts of the static environment mapped the collision time to a distance to the object At the same time the relative position of the objects in a static scene did not change This allowed a storage of the information about the objects at their relative geometric positions that did not need to be changed over the operation of the system The situation changes drastically when the environment itself is not static anymore Multiple agents in the environment create varying relative motion to the mobile agent and require a constant update of their geometric position in the map Currently the robotic perception is undergoing a transfor mation from navigation of service robots in static indoor en vironments to operation in large scale dynamic environments This process is rapidly evolving due to the availability of cheap and robust sensors and computing units and because of the current focus on navigation tasks in automotive and fl ying applications It is very tempting to make this transition using tools from the well established service robotics domain and apply them to the outdoor applications While this process allows a fast deployment of systems in the new domain 1Darius Burschka is with Faculty of Informatics Technical University of Munich Germanyburschka tum de Fig 1 Most current mobile robot applications focus on static represen tations of the scene and abstract the sensor data to 3D left branch while the task defi nition in dynamic scenes usually depends on dynamic motion parameters of objects right branch multiple new problems are created The advantage of being able to rely on well understood and tested software seem to outweigh the problems which are created through this migration Most of the current service robotics research focuses on effi cient modeling of static environments with focus on abstraction and labeling of objects in the scene Fig 1 The main interest in this domain is identifi cation and transport of objects in the scene where the changes in the geomet ric arrangement are caused exclusively by the robot itself Any other changes in are treated often as outliers and are neglected in further processing e g 4 Since the relative motion to an object in the scene depends here entirely on the own velocity indexing the scene by the distance gives the correct prioritization of information for planning The underlying model representation is usually organized based on the Cartesian position of objects in the world that supports directly this type of queries The perception is usually optimize to match the requirements of the envi ronment representation Since the tasks are defi ned in metric representation of the surrounding geometry an ideal sensor is a stereo camera or an active 3D sensor for reconstruction We need to notice here that the 3D data is often not the native representation of the sensor and it requires additional calibration parameters which are sensitive to errors and changes during the calibration and the operation of the system New application for robotic systems require to apply the perception for outdoor environments where the scale of the 2019 IEEE RSJ International Conference on Intelligent Robots and Systems IROS Macau China November 4 8 2019 978 1 7281 4003 2 19 31 00 2019 IEEE8097 environment which is often a lot larger and where the robot is not the only dynamic agent in operation It is interesting to observe that often conventional approaches from indoor service robotics are directly applied to dynamic large scale outdoor scenarios for convenience This simplifi cation makes many of the existing well tested tools available to the new domain but the processing in the perception often removes important signal properties that were not important in static environments Often the task in such environments is not a unique identifi cation of objects but a safe operation un der changing dynamic conditions of the environment like collision avoidance to name a very basic important task in the domain The old tools often do not provide the correct representation that is necessary for the prioritization of the processing in the task as it is depicted in Fig 2 Here the ego motion of the other objects in the scene lets the relative motion to different part of the environments change Close objects are not necessarily the ones with the highest collision probability as shown in Fig 2 A representation scheme using dynamic or temporal prop erties of the object would better preserve the information about possible collision relations in the environment which exist in the original camera information in form of time to collision calculations and epipolar analysis of optical fl ow in the image sequence 2 The closest object in a metric sense does not necessarily pose an immediate danger for the collision avoidance task Fig 2 Geometric indexing to the scene representation may result in incorrect prioritization of objects for dynamic analysis Object B may pass the camera before the object C in front is reached Another reason requiring a modifi cation in the task rep resentation is the limited sensing accuracy While a 3D sensors can cope correctly with the 3D information in a typical indoor environment the signal to noise ratio makes any 3D reconstruction in large scale outdoor environments useless The error due to detection and matching accuracy of the stereo system is often larger than the displacement information between two stereo pairs used for reconstruction A dynamic state estimation from changes in the estimated 3D position of the vehicles becomes unreliable Fig 3 It is apparent that this processing applied for the original applications in close range of an indoor scenario but shall not be applied for larger distances While the reconstruction of static scenes is a well under Fig 3 The relative error in the estimated position for a binocular setup with the distance between the cameras 10cm under the assumption of matching accuracy of 0 5 pixel stood problem and t can be subdivided into binocular and monocular approaches that use the passive scene illumination or enhance the processing with active illumination projected onto the scene Especially the later became very popular with the introduction of the PrimeSense sensor in the Microsoft Kinect camera The approaches using on active illumination of the scene and binocular approaches suffer from limitations in the achievable range Monocular approaches avoid this problem by using a fl exible distance between the images used for 3D reconstruction The system can delay the acquisition of the second image based for example on the length of the optical fl ow vectors adapting to varying velocities In case of signifi cant rotation an additional compensation in the optical fl ow may be necessary because only the translational motion of the camera carries information about the structure The problem with monocular approaches is that they do not recover the scale of the reconstructed information and they need an estimate of the rotation and translation of the moving camera from image information This estimation is recovered using typical structure from motion approaches like Essential or Homography matrices 5 or similar ap proaches A comparison of achievable accuracy is hereby strongly dependent on the distribution of the features in the images as it was presented in 8 The resulting error in the motion estimation deteriorates the consecutive rectifi cation process requiring a larger search window around the epipolar geometry There exist approaches to analyze the independent motion of clusters in the images Approaches like the Generalized Principle Component Analysis GPCA be used to fi nd the independently moving clusters in images 16 There are approaches reconstructing the motion of moving objects in the world multiple images between the images 17 11 In case of planar environments the plane parallax approach is applied to analyze the independent motion properties in the scene 6 7 14 Recently multiple approaches have been published that combine Structure from motion and optical fl ow 20 15 19 9 The current top methods to the KITTI 2012 bench mark 20 calculates the fundamental matrix and computes 8098 the epipolar lines of the fl ow This computation is limited to static scenes A similar calculation based on fundamental matrix and regularization of the optical fl ow to align with the epipolar lines can be found in ID31 The independent motion in the scene is detected by reverting it to the optical fl ow of the entire scene Roussos 10 fi nds a solution he depth and motion parameters for moving objects in the scene from batch processing on a sequence of about 30 frames There have been multiple approaches to motion segmentation of the scenes into regions corresponding to independently moving objects by exploiting 3D cues and epipolar motion 1 18 12 This paper addresses the problem of clustering and motion estimation in dynamic scenes using an extension of the Time To Collision Approach presented in 13 It is interesting to see that under some restricted conditions the system is able to reconstruct the depth information entirely based on pixel information in the images It is not possible in other known structure from motion approaches The remaining paper is structured as follows In Section 2 the new method of the prioritization of moving objects based on Time To Interaction is presented In Section 3 some experimental re sults from real world applications are presented We fi nalize with conclusions and future work II APPROACH The goal of the presented framework is an effi cient ex change between the sensor and the planner with the focus on visual sensors that do not operate in three dimensional space While active sensors like laser range fi nders and time of fl ight sensors are explicitly designed to provide 3D data a typical passive camera operates in 2D space of the camera image A conventional representation of the environment as a 3D model requires an abstraction of the images from multiple camera positions This step may introduce errors into the perception due to the errors in the intrinsic camera parameters like focal length position of the optical point and radial distortion of the lens and due to errors in the estimation of the relative positions of the images to each other These extrinsic parameters may be wrong due to errors in the motion estimation in structure from motion approaches 5 calibration errors and changes in camera rig confi guration due to vibrations We mentioned already in the motivation that our goal is to identify the essential information that is required to exchange data between the sensor and the planner In case of collision avoidance this appears to be the time An analysis of the optical fl ow in monocular images that we presented in 3 shows that the time to collision with a co planar structure on which the tracked point appears we will refer to it as a collision plane in the following text can be estimated for translational motion entirely from the change in the pixel position change of the tracked point Fig 4 We see in Fig 4 that the point which fi nally collides with the focal point of the camera is always projected onto the same image point E in the camera image This is the epipole E of the current relative motion between the object t 0 t 1 t 2 k vgi Vgi PI Ei PI Vgi H H H i Fig 4 A point tracked over a sequence of images defi nes together with the relative velocity vector vga collision plane The velocity vector vg represents the normal vector onto this plane see 3 and the camera The position of the epipole can be estimated as intersection of multiple epipolar lines lines on which the projected point moves originating from the same object Details of the estimation of the epipole can be found in 2 3 We can derive from the observation angle ifor each tracked point on the moving object the number of frames k until the collision plane sweeps through the camera focal point to 1 i arccos piT E pi E arccosci pi ui vi f T tan i H k vg tan i 1 H k 1 vg 1 k tan i 1 tan i 1 tan i ci p1 c i 12 ci p1 c i 12 ci 1 p 1 ci2 This can be calculated directly from the pixel position ui vi of the observed point piin two consecutive camera images with the focal length f We can estimate this value directly from a video stream of a single monocular camera The value of k defi nes the time to interaction TTI in number of frames which extends the typical notion of Time to Collision of the previous approaches to conditions where no direct collision occurs This allows to sort the tracked points according to the time of the closes encounter with the camera when the corresponding collision plane sweeps through the camera Each tracked point in the image can be parameterized now by the radial time distance k of the corresponding collision plane defi ned by the relative motion vector vgas a normal and the miss angle missand a corresponding orientation angle of the optical fl ow segment relative to the horizontal 8099 orientation Fig 4 tan miss H vg k tan i from Eq 1 pi E 2 The missangle tells us how close the point will pass the camera while the collision plane sweeps over the focal point The angle can be used to defi ne the direction in which the collision avoidance needs to act to prevent the collision Pure monocular processing does not provide any metric values for the distances from the point but we will show later that this non metric value can be used for navigation We make here an assumption of a static camera for the ego vehicle that is why the ego velocity gets added with the opposite sign to the velocity of each of the independently moving vehicles creating the resulting vectors vg An inter esting feature of the miss angle is that it is scaled with the absolute velocity value vg This allows to defi ne a safety margin missthat automatically increases the distance H to the camera with increasing relative velocity vg A Organization of the Map In opposite to conventional metric maps our map stores the information using quasi polar representation Each entry is characterized by the values extracted solely from the monocular camera perception It is described by the TTI the estimated direction of the vgvector the miss angle miss and the orientation angle from Eq 2 The organization of the map is easiest visualized for 2D map of a planar motion system The map is represented as a circular grid that is parameterized by the TTI value in the radial direction and where the orientation of vgexpressed as hrepresents the second dimension of the grid Fig 5 Fig 5 Left 2D map the coordinates are derived from quantization of the angle hand the time to interaction TTI right 3D map the orientation of vg is also quantized in elevation for fl ying applications Although we use a polar representation of the world in our map the only entry that changes over time in case of constant dynamic confi guration of all participating objects are the TTI values of all tracked objects Constant dynamic confi guration means that all objects do not change the direction and the magnitude of the velocity of travel Since we assume also a moving ego camera the static scene is also represented as moving in the central tiles 0 motion orientation towards the camera The map scrolls with the motion of the ego camera and tiles that represent the current collision candidates scroll to the most distant part of the map while their content gets deleted Fig 6 Fig 6 Content of the map grid stays unmodifi ed for constant dynamic confi guration of the scene For a moving camera the gray tiles store information about the static environment e g parked car C The ego motion of the camera appears as a motion component for each of the object in opposite direction This arrangement does not require any modifi cations in the map if the dynamic confi guration of the environment does not change It has a signifi cant advantage to a metric Cartesian representation where the position of the dynamic objects needs to be updated during the operation of the system The cars B and A appear with a lot shorter TTI than the geometrically closer car C which is not moving Our map representation allows an easier prioritization of the attention of the system The system can access objects based on the order in which it will need to avoid them This does usually not correspond to the metric distance to the

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論