We describe a method for detecting crossing pedestrians and, in general, any object that is moving perpendicular to the driving direction of the vehicle. This is achieved by combining video snapshots from multiple cameras that are placed in a linear configuration and from multiple time instances. We demonstrate that the proposed array configuration imposes tight constraints on the expected disparity of static objects in a certain image region for a given camera pair. These regions are distinct for different camera pairs. In that manner, static regions can generally be distinguished from moving targets throughout the entire field of view when analyzing enough pairs, requiring only straightforward image processing techniques. On a self-captured dataset with crossing pedestrians, our proposed method reaches an F1 detection score of 83.66% and a mean average precision (MAP) of 84.79% on an overlap test when used stand-alone, being processed at 59 frames per second without GPU acceleration. When combining it with the Yolo V4 object detector in cooperative fusion, the proposed method boosts the maximal F1 scores of this detector on this same dataset from 87.86% to 92.68% and the MAP from 90.85% to 94.30%. Furthermore, combining it with the lower power Yolo-Tiny V4 detector in the same way yields F1 and MAP increases from 68.57% to 81.16% and 72.32% to 85.25%, respectively.
Understanding the 3D structure of the environment is advantageous for many tasks in the field of robotics and autonomous vehicles. From the robot’s point of view, 3D perception is often formulated as a depth image reconstruction problem. In the literature, dense depth images are often recovered deterministically from stereo image disparities. Other systems use an expensive LiDAR sensor to produce accurate, but semi-sparse depth images. With the advent of deep learning there have also been attempts to estimate depth by only using monocular images. In this paper we combine the best of the two worlds, focusing on a combination of monocular images and low cost LiDAR point clouds. We explore the idea that very sparse depth information accurately captures the global scene structure while variations in image patches can be used to reconstruct local depth to a high resolution. The main contribution of this paper is a supervised learning depth reconstruction system based on a deep convolutional neural network. The network is trained on RGB image patches reinforced with sparse depth information and the output is a depth estimate for each pixel. Using image and point cloud data from the KITTI vision dataset we are able to learn a correspondence between local RGB information and local depth, while at the same time preserving the global scene structure. Our results are evaluated on sequences from the KITTI dataset and our own recordings using a low cost camera and LiDAR setup.
Occlusion is one of the most difficult challenges in the area of visual tracking. We propose an occlusion handling framework to improve the performance of local tracking in a smart camera view in a multicamera network. We formulate an extensible energy function to quantify the quality of a camera’s observation of a particular target by taking into account both person–person and object–person occlusion. Using this energy function, a smart camera assesses the quality of observations over all targets being tracked. When it cannot adequately observe of a target, a smart camera estimates the quality of observation of the target from view points of other assisting cameras. If a camera with better observation of the target is found, the tracking task of the target is carried out with the assistance of that camera. In our framework, only positions of persons being tracked are exchanged between smart cameras. Thus, communication bandwidth requirement is very low. Performance evaluation of our method on challenging video sequences with frequent and severe occlusions shows that the accuracy of a baseline tracker is considerably improved. We also report the performance comparison to the state-of-the-art trackers in which our method outperforms.
Tracking of a handheld device’s three-dimensional (3-D) position and orientation is fundamental to various application domains, including augmented reality (AR), virtual reality, and interaction in smart spaces. Existing systems still offer limited performance in terms of accuracy, robustness, computational cost, and ease of deployment. We present a low-cost, accurate, and robust system for handheld pose tracking using fused vision and inertial data. The integration of measurements from embedded accelerometers reduces the number of unknown parameters in the six-degree-of-freedom pose calculation. The proposed system requires two light-emitting diode (LED) markers to be attached to the device, which are tracked by external cameras through a robust algorithm against illumination changes. Three data fusion methods have been proposed, including the triangulation-based stereo-vision system, constraint-based stereo-vision system with occlusion handling, and triangulation-based multivision system. Real-time demonstrations of the proposed system applied to AR and 3-D gaming are also included. The accuracy assessment of the proposed system is carried out by comparing with the data generated by the state-of-the-art commercial motion tracking system OptiTrack. Experimental results show that the proposed system has achieved high accuracy of few centimeters in position estimation and few degrees in orientation estimation.
Junzhi Guan, Peter Van Hese, Jorge Oswaldo Niño-Castañeda, Nyan Bo Bo, Sebastian Gruenwedel, Dirk Van Haerenborgh, Dimitri Van Cauwelaert, Peter Veelaert, Wilfried Philips
In this paper, we proposes a people tracking system composed of multiple calibrated smart cameras and one fusion server which fuses the information from all cameras. Each smart camera estimates the ground plane positions of people based on the current frame and feedback from the server from the previous time. Correlation coefficient based template matching, which is invariant to illumination changes, is proposed to estimate the position of people in each smart camera. Only the estimated position and the corresponding correlation coefficient are sent to the server. This minimal amount of information exchange makes the system highly scalable with the number of cameras. The paper focuses on creating and updating a good template for the tracked person using feedback from the server. Additionally, a static background image of the empty room is used to improve the results of template matching. We evaluated the performance of the tracker in scenarios where persons are often occluded by other persons or furniture, and illumination changes occur frequently e.g., due to switching the light on or off. For two sequences (one minute for each, one with table in the room, one without table) with frequent illumination changes, the proposed tracker never lose track of the persons. We compare the performance of our tracking system to a state-of-the-art tracking system. Our approach outperforms it in terms of tracking accuracy and people loss.
Nyan Bo Bo, Peter Van Hese, Junzhi Guan, Sebastian Gruenwedel, Jorge Niño-Castañeda, Dimitri Van Cauwelaert, Dirk Van Haerenborgh, Peter Veelaert, Wilfried Philips
Many computer vision based applications require reliable tracking of multiple people under unpredictable lighting conditions. Many existing trackers do not handle illumination changes well, especially sudden changes in illumination. This paper presents a system to track multiple people reliably even under rapid illumination changes using a network of calibrated smart cameras with overlapping views. Each smart camera extracts foreground features by detecting texture changes between the current image and a static background image. The foreground features belonging to each person are tracked locally on each camera but these local estimates are sent to a fusion center which combines them to generate more accurate estimates. The nal estimates are fed back to all smart cameras, which use them as prior information for tracking in the next frame. The texture based approach makes our method very robust to illumination changes. We tested the performance of our system on six video sequences, some containing sudden illumination changes and up to four walking persons. The results show that our tracker can track multiple people accurately with an average tracking error as low as 8 cm even when the illumination varies rapidly. Performance comparison to a state-of-the-art tracking system shows that our method outperforms.
A new method for segmenting intensity images into smooth surface segments is presented. The main idea is to divide the image into flat, planar, convex, concave, and saddle patches that coincide as well as possible with meaningful object features in the image. Therefore, we propose an adaptive region growing algorithm based on low-degree polynomial fitting. The algorithm uses a new adaptive thresholding technique with the L ∞ fitting cost as a segmentation criterion. The polynomial degree and the fitting error are automatically adapted during the region growing process. The main contribution is that the algorithm detects outliers and edges, distinguishes between strong and smooth intensity transitions and finds surface segments that are bent in a certain way. As a result, the surface segments corresponding to meaningful object features and the contours separating the surface segments coincide with real-image object edges. Moreover, the curvature-based surface shape information facilitates many tasks in image analysis, such as object recognition performed on the polynomial representation. The polynomial representation provides good image approximation while preserving all the necessary details of the objects in the reconstructed images. The method outperforms existing techniques when segmenting images of objects with diffuse reflecting surfaces.
In order for organic bulk heterojunction solar cells to compete with the traditional inorganic cells, higher power
conversion efficiencies are desirable. A characteristic of organic solar cells is their narrow absorption window, compared
to the absorption band of inorganic semiconductors. A possible way to capture a wider band of the solar spectrum - and
thus increasing the power conversion efficiency - is using two or more solar cells with different bandgaps in a row,
referred to as a multi-junction solar cell. In this article, we study the theoretical efficiency potential of three organic cells
in a row, i.e. a triple-junction. We study the influence of the energy levels of donor and acceptor, as well as different
absorption windows of the subcells. We not only study the light harvesting potential of the usual monolithic
configuration, but also consider a stacked set-up. Ideal material characteristics are obtained from these calculations,
giving an idea of how the ideal organic triple-junction cell should look like. An interesting result is that it is not
necessary to develop photovoltaic organic materials with an absorption window broader than 300 nm for triple-junctions,
because hardly any efficiency gain can be achieved by a broader absorption window.
Real-time tracking of people has many applications in computer vision and typically requires multiple cameras;
for instance for surveillance, domotics, elderly-care and video conferencing. However, this problem is very
challenging because of the need to deal with frequent occlusions and environmental changes. Another challenge
is to develop solutions which scale well with the size of the camera network. Such solutions need to carefully
restrict overall communication in the network and often involve distributed processing. In this paper we present a
distributed person tracker, addressing the aforementioned issues. Real-time processing is achieved by distributing
tasks between the cameras and a fusion node. The latter fuses only high level data based on low-bandwidth
input streams from the cameras. This is achieved by performing tracking first on the image plane of each camera
followed by sending only metadata to a local fusion node. We designed the proposed system with respect to a
low communication load and towards robustness of the system. We evaluate the performance of the tracker in
meeting scenarios where persons are often occluded by other persons and/or furniture. We present experimental
results which show that our tracking approach is accurate even in cases of severe occlusions in some of the
views.
Most commercially available photovoltaic solar cells are crystalline silicon cells. However, in indoor environments, the efficiency of silicon solar cells is poor. Typically, the light intensity under artificial lighting conditions is less than 10 W/m2 as compared to 100-1000 W/m2 under outdoor conditions. Moreover, the spectrum is different from the outdoor solar spectrum and there is more diffuse than direct light. Taken into account the predicted cheaper costs for the production of organic solar cells, a possible niche market for organic PV can be indoor applications. In this article, we study the influence of the narrow absorption window, characteristic for organic solar cells, for different indoor
conditions. This comparison is made for typical artificial light sources, i.e. a common incandescent lamp, an LED lamp and a "warm" and a "cool" fluorescent tube, which are compared to the outdoor AM 1.5 spectrum as reference. The comparisons are done by simulation based on the quantum efficiencies of the solar cells and the light spectra of the different light sources. A classical silicon solar cell is used as reference. In this way we determine the appropriateness for indoor use of organic solar cells.
In this work we present the use and properties of a transformation uncertainty polytope for a frequently encountered problem in computer vision: registration in visual inspection. For each feature point in the reference image, a corresponding feature point must be distinguished in the test image among many candidates. A convex polytope is used to captivate the uncertainty of the transformation from the reference feature points to uncertainty regions in the test image in which the candidate matches are to be found. By checking the consistency of the uncertainty transformation for pairs of possible matches, we construct a consistency graph. The consistency graph gives us the necessary information to distinguish the good matches from the rest. Based on the best matches, we compute the registration transformation.
Given a set of lines, line grouping considers the problem of deciding which lines are likely to belong to the same object or to a set of similar objects, before any recognition of objects has actually taken place. Vision scientists have suggested a number of factors that may be involved in the grouping process of lines, among which proximity, parallelism and collinearity are the easiest to quantify. These properties have often been measured by empirical estimates. Previous work, however, has shown that it is also possible to follow a more systematic approach based upon the uncertainty of pixel positions. Thus we can give precise definitions regarding the parallelism, collinearity or concurrency of lines whose parameters are only known to lie within given regions in the parameter space of lines. In this work we generalize this framework and show how it can be used during an entire line grouping process.
Several geometric concepts from affine geometry have their counterparts in digital geometry. We define and discuss the digitization of three of important concepts: parallelism, colinearity, and concurrency of digital straight lines. Their main characteristic is that in the digital plane these properties become Helly-type theorems, which means that they express a geometric relation holding for an entire collection of geometric objects in terms of simpler geometric relations that must hold for subcollections. For example, in the digital plane we can show that a collection of digital lines is parallel if and only if each of its 2-membered subcollections consists of two digital lines that are parallel. Thus parallelism in the digital plane is more complicated than it is in ordinary affine geometry. Appropriate definitions for digital parallelism and concurrency have many applications in digital image processing. For example, they provide an appropriate setting for verifying whether lines detected in a digital image satisfy the constraints imposed by a perspective projection. Furthermore, the existence of Helly-type properties has important implications from a computational viewpoint. In fact, these theorems ensure that in the digital plane parallelism, colinearity, and concurrency can be detected in polynomial time by standard algorithms developed within the field of computational geometry. We illustrate this with several algorithms, where each algorithm solves a particular geometric problem.
The goal of segmentation is to partition a digital image or curve into segments such that the points in each segment share a common property. For example, we can partition a curve into connected subsets such that the points of each subset lie on a common straight line, or we can partition an image such that intensity function is linearly varying when restricted to ne part. A region growing algorithm starts from a small seed segment, and then repeatedly tries to add new points to this segment. Each time a point is added seed segment, whether the segmentation criterion is still satisfied for the enlarge segment, otherwise a new segment is started. In general, the verification of the segmentation criterion becomes increasingly more difficult when the segment gets larger. We propose new linear-time algorithms for region growing. These algorithms are related to the economical design of mechanical frameworks, where the goal is to make a rigid construction with as few bars as possible. According to this analogy, the region growing algorithm tries to attach each new point as firmly as possible to the existing region with a minimal amount of computation. We illustrate this technique for the segmentation of digital curves into straight or parabolic line segments, and for image segmentation with segments of linearly varying intensity.
An often reoccurring problem in digital image processing is the application of operators from differential geometry to discrete representations of curves and surfaces. We propose the use of feature detectors to improve the estimation of differentials of discrete functions. To this end we replace a differential operator by a bank of feature detectors and difference operators. The purpose of the feature detectors is first to examine the local behavior of the function. Next, depending on the outcome, the feature detectors select the most appropriate difference operator. For example, if the function behaves locally as a linear function, they select a difference operator that is well suited for linear functions. We show that this technique can be put on a firm mathematical basis. In particular, when designing a bank of feature detectors, we use Groebner bases for the functional decomposition and combination of the detectors. We illustrate the mathematical results with several practical examples.
Some properties of digital surfaces, such as straightness or flatness can be detected by the combined action of a large set of simple local feature detectors. Our goal is to examine how we can use feature detectors to improve the computation of tangents, curvature, surface normals for digitized curves and surfaces. For continuous surfaces these standard functions can be computed by differential operators. We propose to replace the differential operators by sets of difference operators and feature detectors. For each part of the surface the feature detectors determine which difference operator yields the best approximation for the differential. We show that both the set of feature detectors and the set of difference operators have a rigid mathematical structure. To examine this structure, i.e. the functional decomposition and combination of multiple feature detectors, we use Groebner bases.
In this paper we present a more human-like approach towards navigating a mobile robot. We maintain that navigation of a mobile robot always occurs within a certain context which we call a situation. Typical situations might be 'entering a corridor,' 'passing through a door,' 'seeking a goal,' etc. To approximate the navigation behavior of an intelligent agent in such a situation, we define generic situations as collections of pathways. Each pathway describes a possible path followed by that agent in that situation. We further assume that these pathways can be generated by observing a limited set of beacons associated with each situation. Hence, the robot will make use of relative positions only, distance and bearing of the beacons with respect to the robot and distances between the different beacons, obviating the need for an absolute coordinate system. To limit the number of pathways that need to be stored to describe a generic situation we propose a competition and cooperation algorithm. To show how this approach fares in realistic circumstances on a real mobile robot we include preliminary results with a triangulation based ultrasonic sensor system.
This paper considers the reconstruction of the boundary of a 2D object from one or more digital representations that have been obtained by successive displacements of a digital image sensor. We explain how the reconstruction can be done for known displacements as well as for unknown random displacements that are uniformly distributed. In the latter case we assume that we can solve a certain correspondence problem. The reconstructed boundary can be characterized by a sliding ruler property of which the well-known Lagrange interpolation is a special case. It is also shown that the reconstruction can be exact under certain circumstances. Although we only consider the reconstruction of functions of one variable, the technique can be extended to functions of two or more variables. The technique can also be applied to measurements generated by other digital sensors such as range sensors.
It has been shown that digital algebraic surfaces can be characterized by inequality conditions that follow from Helly's Theorem on convex sets. As a result, we can recognize digital algebraic surfaces by examining the validity of large collections of inequalities. These inequality conditions can be regarded as a natural extension of the chord property which has been proved by Rosenfeld for digital straight lines. In this paper we show that these inequalities can also be used to measure an absolute value distance. They can be used for example, to measure how far a digital set is from being digitally straight. Since the collection of measurements that must be performed to measure the absolute value distance can be very large, it makes sense to study the mathematical structure of such a collection. We show that it has the structure of a polynomial ideal. For digital straight lines this ideal is generated by a single polynomial.
Digital polynomial curves and surfaces arise from the digitization of algebraic surfaces such as lines, parabolas, planes, or paraboloids. It is known that the n-th difference of a digital polynomial curve of degree n is periodic for polynomials that have rational coefficients. In this paper we consider the following problem: Suppose we have a digital curve S whose n-th difference is known to be periodic. When is S a digital rational polynomial curve? As a solution to this problem we state a simple criterion that can be checked in linear time. As a first application of this criterion we describe a linear time algorithm for the recognition of digital straight lines. In comparison to other algorithms, the advantages of the new algorithm are its simplicity, and its ability to actually find the coefficients of the rational polynomial representing the line. We then go on to discuss the applicability of this criterion to the recognition of digital curves and surfaces of arbitrary degree.
KEYWORDS: Vision geometry, Image segmentation, Silicon, Digital electronics, Algorithm development, Detection and tracking algorithms, Algorithms, Electronics, Quantization, Chemical elements
In digital geometry we study the properties of discrete representations of geometrical sets; in general, a discrete representation consists of a set of digital points on a rectangular grid. In this paper we consider discrete representations that can be specified by linear inequalities. For example, a digital straight line, and more generally, a digital hyperplane can be specified by an expression that involves two inequalities. First, we describe an elimination method to solve systems of inequalities; it is based on a theorem on convex sets due to Helly. Next, we discuss how this method can be used to derive properties of digital sets. Finally, we illustrate this approach for digital curves. In particular, we show how the chord property for digital straight lines can be extended to digital curves of arbitrary order.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.