Large scale multiple robot visual mapping with heterogeneous landmarks in semi-structured terrain

This paper addresses the cooperative localization and visual mapping problem with multiple heterogeneous robots. The approach is designed to deal with the challenging large semi-structured outdoors environments in which aerial / ground ensembles are to evolve. We propose the use of heterogeneous visual landmarks, points and line segments, to achieve e ﬀ ective cooperation in such environments. A large-scale SLAM algorithm is generalized to handle multiple robots, in which a global graph maintains the relative relationships between a series of local sub-maps built by the di ﬀ erent robots. The key issue when dealing with multiple robots is to ﬁnd the link between them, and to integrate these relations to maintain the overall geometric consistency: the events that introduce these links on the global graph are described in detail. Monocular cameras are considered as the primary extereoceptive sensor. In order to achieve undelayed initialization required by the bearing-only observations, the well-known inverse-depth parametrization is adopted to estimate 3D points. Similarly, to estimate 3D line segments, we present a novel parametrization based on anchored Pl¨ucker coordinates, to which extensible endpoints are added. Extensive simulations show the proposed developments, and the overall approach is illustrated using real-data taken with a helicopter and a ground rover.


Introduction
There are several cooperation schemes in which the complementarity of aerial and ground robots can be exploited to enhance the efficiency of autonomous robotics operations.For instance, aerial robots can provide the ground robots with information related to the environment, e.g.traversability maps or landmarks, they can localize the ground robots by perceiving them, or provide them communication links with a remote station.Aerial and ground robots can also cooperate in a more symmetric way to achieve a task, such as exploration, surveillance or target detection and tracking.All these tasks can benefit from the enhanced observation capabilities brought by both kinds of robots.
In this context, the ability to build and share environment models among the robots is an essential prerequisite to the development of cooperation schemes.Be it for exploration, surveillance or intervention missions, environment models are indeed necessary to plan and coordinate paths, but also to determine the utility of vantage points, to assess whether robots will be able to communicate or not, and to localize the robots in a common frame.In particular, 3D information on the environment is required: not only the robots evolve in the three dimensions, but the determination of vantage points calls for visibility computations in the 3D space.Also, vision is here the primary sensor to build environment representations: besides the fact that images carry a lot of information on the environment, vision is passive, it has the main advantage to perceive features that are arbitrarily far away, and it can be embedded on-board any kind of drone, even the smallest ones.
Similarly to the single robot case, one can not take for granted the fact that the robots are at all times perfectly localized.Centimeter accuracy RTK-GPS is often disturbed by signal outages 1 , and no on-board sensors can ensure a robust long term localization.Nevertheless, the maps must be spatially consistent: their spatial uncertainties must be compatible with the actual environment's spatial characteristics, and the process of mapping must tolerate the fact that the robots positions are not always perfectly known.This naturally leads to exploit a SLAM approach, in which uncertainties on the environment features and on the robots poses are explicitly managed over time.
But mapping in a multi-robot system brings forth the following additional issues, that have to be tackled in our aerial/ground context: • The mapping algorithms must be distributed among the robots.They can not rely on a complete and permanent communication infrastructure between the robots, as this would impose strong requirements on communication coverage and bandwidth, which can hardly be satisfied in most field applications.
• The structure of the maps built by the various robots must allow to match, register or fuse them, so that they can be shared among the robots.For these purposes, the information on the environment they exhibit must be common to the various robots.In our context the 3D geometry of the environment is the only intrinsic environment characteristic on which one can rely to tackle data association, as it can cope with the fact that the sensors or viewpoints can be very different among the different robots (see example data in Figure 1).
Approach.To tackle these two issues, we aim at defining a 3D multiple robot mapping framework using vision, that is able to handle any localization means, including matching perceived data with pre-existing 3D maps.The produced environment representation must also allow the incorporation of range data such as those provided by stereo vision or Lidars, these sensors being widely used on ground robots.For these purposes, we propose to use 3D points and 3D line segments to represent the environment from sequences of aerial and ground images.These representations should gather most of the environment geometric structure, and is the basis to register and fuse the data acquired from different vantage points, different sensors, and prior 3D information on the environment.
In the direction towards the achievement of such a mapping framework, the contribution of this paper is twofold: • We propose a mapping approach in which robots build series of sub-maps using a classical EKF-based SLAM paradigm.The overall spatial consistency of the maps among the robots is ensured by an optimization process, that takes into account various inter-robot and absolute localization estimates.This approach takes after the work on hierarchical SLAM proposed by Estrada et al in [1], which relies on a hierarchical representation of the map: the global level is an adjacency graph, where nodes are local maps (or "sub-maps"), and edges are relative locations between local maps.In a multi-robot context, the sub-maps are not necessarily built in a sequential manner, and various events can trigger loop closures and later map merging, namely rendezvous between robots, feature correspondences ("matching") and absolute localizations, provided either by GPS fixes or by matches with an a priori existing map [2].These events produce loop closures: the global graph exhibits a cycle, and thus define constraints that allow the system to refine the estimates of the sub-maps origins.We analyze the impact of these events on the overall map structure and propose a way to handle them in a distributed context.
• We exploit 3D points and 3D line segments detected in the images in order to create a environment representation that is invariant to the vantage points.This environment representation is built using bearing-only information.Bearing-only information calls for a dedicated implementation of the EKF-based SLAM approach, which relies on the undelayed initialization.We use the well-known inverse-depth parametrization (IDP) for 3D points and we propose a new parametrization based on anchored Plücker 3D lines.Plücker coordinates are well known in computer vision for their nice projective properties, similar to what happens with homogeneous points.To improve their behavior within EKF-SLAM, we add an anchor to the parametrization as it is done in IDP.
Outline.Section 2 deals with the overall localization and mapping approach to deal with multiple robots.It presents the principle of hierarchical SLAM by means of a probabilistic graph model, and its extension in the multi-robot case, analyzing in particular the various loop closures that can occur between the robots.It also discusses how the mapping process could be distributed among the robots, and simulation results are depicted.Section 3 is devoted to the line segments detected in monocular imagery.It introduces the way such segments can be used as landmarks in a monocular EKF SLAM approach: the anchored Plücker line parametrization is used, the initialization and update processes are depicted, the map-matching with lines is presented, and simulation results are analyzed.Finally, Section 4 presents results obtained with data gathered by a helicopter and a ground rover in a village-like environment, and a discussion concludes the paper.

Related work
The multi-robot SLAM problem has recently gained a lot of attention.Several approaches have been presented to solve this problem: based on extended Kalman filters [3,4,5], information filters [6,7], maximum likelihood estimators [8,9] and particle filters [10,11].
Early work on cooperative localization and mapping extended the well-studied single robot SLAM problem to multiple robots.This extension is straightforward when the robots start from a known location and the information is centralized.In [8] a maximum likelihood algorithm is applied to find the maps that are maximally consistent with sensor data and odometry.They assume the robots start from a known location and a central mapper improves the map by iteratively combining data from the robots.Ref. [10] also assume known location to fuse maps from different robots combining a maximum likelihood with an a-posteriori pose estimator.The robots poses are propagated using a particle filter representation of their belief functions, while the map is compiled in a distributed manner among robots using laser range data.In [3], the authors do not address implementation issues but the theoretically attainable estimation accuracy, by providing bounds for the steady state covariance of the position estimates.
Further developments have been achieved for the case when the robots do not have prior knowledge about their relative locations.In [6] a sparse extended information filter is presented with no knowledge of initial locations.The alignment of local maps into a single global map is achieved by a tree-based algorithm for searching similar-looking local landmark configurations, paired with a hill climbing algorithm that maximizes the overall likelihood by searching in the space of correspondences.Ref. [12] deals with completely unknown start locations by using an adapted particle filter in combination with a predictive model of indoor environments in order to sequentially determine whether and how the partial maps of two robots overlap.
The authors in [4] proposed an algorithm to align maps built by different vehicles that not necessarily overlap.They used the information provided by the robot-to-robot measurements (rendezvous) to obtain the transformation between their coordinate frames.They used an Extended Kalman Filter (EKF) to estimate the robots and landmarks positions.For maps that do overlap, [5] presents an incremental minimum cycle basis algorithm that combined with hierarchical SLAM [1] manages the global level after a map merging is performed.The work of [13] uses a maximum likelihood approach for merging maps with initial unknown correspondences with manifold operations and scan matching.This work subdivides the multi-robot problem into three sub-problems: incremental localization and mapping, loop closure and island merging.This latter work is related to our work in the sense that we also consider this subdivision, although our approach relies on events such as rendezvous, landmark correspondences or GPS fixes to first trigger a loop closure in global level and at the end if required do the map merging.
The theoretical solution for the centralized multi-robot SLAM algorithm is achieved by appending robots together with landmarks to the same state-space, using the same filter.The practical solution requires a central station where all the information is concentrated [8,10].Some efforts have been done in terms of decentralization, where robots broadcast the gathered information and, independently, each robot computes its own pose and full map.A decentralized SLAM approach was proposed by [7] using channel filters.In order to reduce the communication bandwidth the authors proposed the use of the covariance intersection algorithm.A interesting distributed SLAM approach is presented in [14], where communication is limited to an upper trapezoidal matrix which condenses the entire measurement history on the individual robots.More feasible algorithms can achieve distributed mapping using cooperative localization [15,16].Distributed mapping is accomplished by the fact that multiple robots build their own maps.Our solution is related with cooperative localization in the sense that robots only communicate some past robot poses (origins of local maps) to each other.
Range and bearing observation characterized most of the approaches in the literature for multi-robot SLAM [8,10,12,17].The work in [18] is one of the very few work on multi-robot SLAM using vision.The multi-robot terrain mapping algorithm uses localization information to combine vision-based range estimates with an elevation profile across the terrain.Also, in [11] the proposed approach considers the use of a stereo vision system, where again range and bearing observations are acquired.It relies on a fully centralized Rao-Blackwellized particle filter, and only presents simulation results.

Hierarchical SLAM
Different scalable SLAM approaches in which a single vehicle builds multiple local maps have been proposed, mainly to reduce computational complexity and to delay linearization errors until the map merging [19,20,1,21,22].When the maps are merged into a single one, on the basis of either common landmarks between local maps, or simply the sequential constraint, a fully correlated map of the environment is obtained.Successful fast implementations exploiting the topology of the representation to systematically join and fuse local map have been proposed for the single vehicle case, such as trees [23], or binary trees [21].
We formulate the multiple robot localization and mapping problem using sub-maps, in a similar manner to hierarchical SLAM in [1] or hybrid metric-topological SLAM in [22], where there are two levels: local level (sub-maps), and global level (adjacency graph).We make use of the probabilistic graphical models representation to describe our approach.
The global level represents the relationships s j i between local maps i and j.The local level contains the sub-maps, composed of the set of landmarks m i and the current robot pose x i k (at instant k).At a certain point a new local map is generated with the robot pose acting as the new local reference frame (lrf ).Thus the robot pose x i truly represents the relation between the previous map and the new one, and one can set s i i+1 = x i .Other non-correlative relations s j i may be established between maps as we will see.Based on simple frame compositions, information in the world reference frame (wrf) is also available for the origins S i of each map, and for the map itself (X, M) i if it is required.The Bayesian network in Figure 2 shows the representation of this hierarchical/hybrid SLAM.
Figure 2: Graphical model as a simplified Bayesian network that represents the multiple local sub-maps.Sub-maps constitute the local level (x i , m i ) and the global level on top links the local sub-maps (S i ).In this representation there is no information shared between local sub-maps.The map building here is sequential, corresponding to one robot exploring and not closing loops, having always s i i+1 = x i .Loop closures will establish non-sequential links s Local Level.The local level contains the feature-based locally referred stochastic maps, built with the standard EKF-SLAM.The i-th local map is defined by where x i is the current pose of the robot, and is the set of m mapped landmarks, both with respect to the i-th lrf.EKF-SLAM keeps a Gaussian estimate The maps are built sequentially as mentioned above.Once a threshold is passed, either in number of landmarks or in robot's uncertainty, a new map is created.No information is shared between these sub-maps, thus the new map starts in a lrf with the robot's pose and error covariances equal to zero.Each local map stores information in its own lrf.
We assume the control input of the robot and landmark's observation have been already introduced to obtain the local sub-maps.The joint distribution resulting of the local map SLAM process is p(x i , m i ).Thus the m i conditional probability, as shown in Figure 2, is given by, Global Level.The global level is represented as an adjacency graph in which origins of local maps S i in wrf are nodes, and the links between them are the relative transformations s i i+1 .Let us define the global level as the Gaussian state s ∼ N{ŝ; P s } of relative transformations between sequentially built local maps, namely The global origins of the maps in the wrf are computed as the compounding of the previous global origin with the relative transformation between sub-maps, S i+1 = S i ⊕ s i i+1 (a detailed description of the compounding ⊕ and inversion operations for 2D and 3D can be found in [24]).The current position of the robot in wrf is computed as X k = S i ⊕x i k .Also, the global map can be obtained through, Mean and covariances of the Gaussian estimates are obtained by regular linear-Gaussian propagation using the Jacobians of ⊕ and .The conditional probability distribution of S i , as shown in Figure 2, is given by, Figure 3: Bayesian network of the hierarchical SLAM for two robots.Two disconnected sub-graphs, one for each robot.
Considering the relative transformations between local maps as past robot poses, we note that the global level can be viewed as a sparse delayed-state pose-SLAM [25], where local maps are like landmarks hanging from robot poses in wrf as shown in Figure 2. The main difference is due to the fact that the state-space in our case contains relative poses s i−1 i , instead of absolute poses S i (that will be naturally correlated) as in the pose-SLAM case when the map features are marginalized out.
Loop Closure.At the global level, a loop closure corresponds to a cycle in the graph, that appears for instance when a relative position estimate between non-consecutive sub-maps is established by a map matching process.Such a cycle defines a constraint between a series of relative transformations: Given that h(s) is not linear due to the angular terms, the enforcement of this constraint can be formulated as a nonlinear constrained optimization problem such that, For multiple l loops h(s) is formed by all the constraints h(s) = [h 1 (s) . . .h l (s)] .The maximum a posteriori likelihood solution for instance, could be based on the Iterative EKF as presented in [1].As a consequence, the part of the state involved in the loop closure at global level becomes correlated, resulting in a non-sparse covariance matrix P s .

Multiple robots
A hierarchical/hybrid SLAM approach in the multi-robot case is quite straightforward: each robot manages a set of sub-maps and a global graph of poses.This approach lends itself nicely to the case of multiple robots as shown in the Bayesian network of Figure 3.But the interests of multi-robot mapping arise of course when the robots exchange mapping or position information, which permits the enhancement of the spatial consistency and the construction of a multi-robot global graph of map poses.Indeed, when robots meet or communicate, they can estimate their relative position or match maps that overlap: these kind of events generate connections between the individual global graph.The possible events are the following: • robots rendezvous (Figure 4(a)), • match common information within sub-maps (Figure 4(b)), • receiving external information that provide absolute localizations (e.g. a GPS fix (Figure 4(c)), or feature matches with an existing environment model).
The latter is not exactly a direct multi-robot loop closure, but it provides a link between a lrf and a global georeferenced frame, which in turn establishes a link with any other robot that has already been absolutely localized once.
Whereas in a single robot case a loop closure only occurs when the robot revisits and matches a previously mapped place, in a multi-robot case these events trigger loop closures: any cycle that appears in the overall graph defined by the concatenation of each robot graph (the multi-robot graph) is a loop closure.The compounding of all relative transformations that defines a cycle is equal to zero as in (7), and a nonlinear optimization over the transformations can be performed.Note that to obtain a cycle in the graph defined by the concatenation of two robots global levels, at least two events between these robots are required.
Robot Rendezvous.The event occurs when a robot observes another robot (partial rendezvous) or when both robots observe each other (full rendezvous).We focus on the case when the relative transformation between two robots is fully recovered, from the information obtained through a partial or full rendezvous.That is, whenever this event occurs, it leads to an immediate observation of the relative transformation between locations of the two vehicles.New local maps are created at the instant of the rendezvous, then the current robot poses are "promoted" to the global level, i.e., s i = x i−1 k and s j = x j−1 k .In this way, the observed transformation z naturally produces a link between the maps' origins S i and S j on the global level; z = s i j .Figure 5 shows the Bayesian network representation of this event.Note that in most of the cases, multiple observations of the relative transformation between two robots would be needed to recovered the full pose (position and orientation).This will require a dedicated implementation of a delayed-state filter for instance.

Matching common information.
There are two different ways in the literature to match common information.The first one uses signal information independent from the SLAM estimates, e.g.image's descriptors matching (SIFT, SURF), image indexing techniques [26] or scan matching.A common way to produce a map of poses (Pose SLAM [25]) is to find the rotation and translation between two robot poses using one of these techniques, as opposed to tracking features.The second way to match common information is using the available information in the SLAM maps (landmarks' position and uncertainty in global level using ( 5)), which is triggered usually based on the current position of the robot or robots in the wrf.We refer the first approach as "data-matching" (image-matching in the vision case), and the second as "map-matching".
The image-matching produces a link directly between images, that are associated to certain poses S i and S j , as shown in Figure 6(a).This is a simple, but effective manner to obtain the relationship between two robots, or even to close a loop with a single robot.The observations are independent from the previous mapped information, but the robot poses have to be part of the global graph when this event occurs.As in the rendezvous case, this event produces directly the missing link z = s i j : practically, two current images matching is equivalent to a rendezvous.The map-matching requires the transformation between lrf to be recovered based on previously built maps.It could require both local maps to be transformed to a common frame for data-association, e.g.promoted to the global level as M i and M j using (5) (see Figure 6(b)).The matching process happens in the 3D space, being the wrf or the lrf s.The disadvantage of this method is that in the worst case the absolute position of the two sub-maps must be computed.Moreover, once the maps are matched, they have to be fused into a single one, otherwise it could lead into inconsistencies when merging all the maps.The reason is that the local maps involved turn out not to be conditionally independent2 m i m j | z, as shown in Figure 6(b).
Absolute Localization.In an aerial/ground context, it is reasonable to assume that one or both kind of robots might receive GPS fixes from time to time 3 .The relative transformation provided by a GPS fix for the pose , where G is the geo-referenced frame and new local map is started.Such information provides a link between a lrf and a global geo-referenced frame, and can generate a loop at the graph level for an individual robot.Figure 7 shows a graphical model representation of this event.
Map Merging.Merging sub-maps into a global map is not necessary for the robots to operate, as this is not required to maintain the consistency on the graph.Should one require a global map, the map fusion could be delayed until all possible loop closures are performed, e.g at the end of the mapping process.For that purpose, we exploit an approach similar to the Divide & Conquer algorithm [21].Maps are changed to a common reference frame and merged two by two, and the fusion considers the common map information in covariance form.
Impact on the sub-maps.From the point of view of a hierarchical SLAM formulation, the hierarchical nature of this model manifests itself in the ability to apply a loop-consistency constraint to a subset of local transformations (e.g. a single loop) without taking into account the local sub-maps.Particularly, when no information is shared between sub-maps, which is the case between two sub-maps built by different robots with strictly independent observations but an approximation for the sub-maps with non-independent observations, the origin of the local sub-map is the only state that changes after the constraint is applied.We are interested in formulating this important property in terms of the properties of the graphical model.Let us consider the conditional independence property of the graph in Figure 2. It can be easily shown that , given the relative transformations, the consecutive local sub-maps are independent, such that where ( 3) is directly applied.Note that the global poses S i are d-separated [27] from all possible paths between the pair of sub-maps m i−1 and m i or even M i−1 and M i .Also, in the multi-robot case it can happen that two different events create a link on the same node, i.e., if a mapmatching is established after a rendezvous.To avoid this problem, new sub-maps are started after an event occurs for the robots involved.In the case of receiving GPS fixes once in a while, the fact of starting a new local map at the instant k when the fix is received, removes the dynamics aspects of the internal local maps setting a fix pose at the global level.Note that in practice GPS fixes could be received at high frequencies.In this case not every measurement should create an event.
Similarly, to avoid counting information twice if one eventually wants to merge all the sub-maps, after a mapmatching event both sub-maps should be fused into a single sub-map.This has the disadvantage that the sub-maps must be shared among the two robots, but on the one hand this is a pre-requisite for at least one robot to establish the matches, and on the other hand such events will occur when the robots are within communication range.
The main advantage to exploit a hierarchical map structure in multi-robot mapping is the low communication bandwidth required among the robots: only the individual graphs need to be exchanged to update the multi-robot graph.Most importantly is that in the general case, only marginal distributions of each node have to be communicated, as opposed to the full joint distributions of the graph.

Distributed constraints enforcement
The presented approach is distributed in the sense that each robot builds its own local sub-maps.The only information that is shared between robots is the set of relative transformations between sub-maps s i i+1 (the marginals that form the unconstrained global graph s ∼ N{ŝ, P s }).With this information, each robot builds its own graph of global origins and enforces its own constraints separately.
In the centralized case (e.g. with a central server), the loop constraint can be enforced whenever a cycle in the graph is found.For the distributed case, if a constraint is enforced by one robot and not the others, further global information exchanges will lead to incompatible graphs among robots.Therefore, a loop constraint should only be enforced when it is known by all the robots (see Figure 8).Nevertheless, any constraint can be enforced locally, provided the graph state before the constraint application is memorized, which allows to backtrack to a globally consistent graph when new information is received from other robots.
Several cycles might appear in the graph when an event happens.Our algorithm deals with this issue enforcing only the constraints over the minimal cycle, as shown in Figure 8.This approximation is performed to enforce only one cycle at a time in order to accelerate the incremental online solution.The minimal cycle guarantees less linearization errors and focuses only in the current mapped area.The offline centralized solution should use instead (9) for all cycles in the graph.The smallest cycle between the new connected nodes is searched right after the link is added.Not only the graph, but also a record of all the new links related with any event are sent to the robots.This is done especially for the robots that are not involved in the loop closure.
The sequence of steps of the algorithm is as follows: In the example of Figure 8, three robots are considered.Each robot searches for cycles in the graph until the information of the other two has been communicated.For the sake of clarity the robot n is indicated as the superscript of the global origin i.e., S n i .Note that in Figure 8c) r 1 does not have any cycle yet (step 2(b)), while r 2 and r 3 do, after having received the information of the other two robots previously (step 2(c)).It is not until Figure 8d), when r 1 receives the information from the missing robot that can impose its own constraint, and therefore re-localize itself.
Algorithms complexity.Within the local maps, the EKF-SLAM is O(m 2 ), with m the number of landmarks in each submap.At the global level, finding the minimum cycle is O(log n), with n the number of nodes of the graph, and the optimization of the minimum cycle of g nodes is O(g), without the full computation of P s that is O(n 2 ).

Simulation results
We use simulations to benchmark the algorithms under controlled conditions, which allows us to compare the estimated values against perfect ground truth and therefore to conclude on the consistency of the solution.
Simulation results are analyzed for three settings: rendezvous, map matching, and the distributed full collaboration of three robots with several loop closures triggered by the latter events.
The world has 300 landmarks spread along a 10000 m 2 surface.Aerial and ground robots build fixed size local maps with point landmarks.We show robots' trajectories and consistency plots before the final global map merging.Figure 8: Example of the graph for the distributed algorithm with three robots.The dash-lines represent events between robots.For the sake of simplicity, we also consider dash-lines as communication links.Highlighted areas represent nodes involve in the optimization.The algorithm automatically searches for the minimum cycle between the new connected nodes after an event occurs.
In each sub-map, the poses and their uncertainties are expressed in the associated lrf s: in order to plot the global positions, we compound the poses and the covariances to obtain them in the wrf.
The robots make bearing-only observations: the inverse-depth parametrization (IDP) is used to map 3D point landmarks, with the parameters ρ init = 0.5 m −1 and σ ρ = 0.5 m −1 [28].The bearing-only observation model has a 0.2 • standard deviation, and the initial parameters for the three experiments are shown in Table 1.Robots are controlled by linear u and angular ω velocities on their own planes with the following odometry noise model: The uncertainty of each robot is considered for the 6 DOF pose vector (x, y, z, ψ, θ, φ).
To evaluate the consistency of the approach, we performed a 25-runs Monte Carlo analysis of the normalized estimation error squared (NEES) of the current robot pose in wrf as explained in [29].Figure 11(a) shows the average NEES for the global pose of both robots and the single-sided 95% region for the N * 6 DOF during this setting.The average NEES shows good consistency for the full simulation, thanks to the use of sub-maps.For robot r 2 , the average NEES is small compared with the consistency limits after the loop closure.This is because some uncertainty in the orientation was added by sensing each other (see Figure 9(c)), while the estimation error remain the same.
Map Matching.For this experiment the two robots move clockwise in a circle.Robot r 1 moves in the direction of the initial position of r 2 .Once r 1 reaches that position it is able to close a loop, as seen on Fig. 10(a).Data association is known.One of the main issues of this experiment is to compute the transformation between the two local maps which have common landmarks, because the least-square estimation is very sensitive to points whose localization is not accurate.The problem is even more difficult for bearing-only observations, when newly initialized points can be anywhere on the bearing line.The eigenvalues of each point's covariance matrix are computed, and points whose eigenvalues are below a threshold (e.g.0.1 m 2 ) are selected to estimate the transformation.The relative transformation is recovered using the approach in [30].
The results of a single run experiment are presented on Figure 10.A first loop is closed at 380 s, which increases the precision of the global localization of r 2 .At 520 s the two robots close another loop.Note that those two loops have mostly improved the global localization of r 2 .When r 2 reaches its origin, it closes a loop with its first map, at around 610 s: that map being topologically connected to the maps of r 1 , a slight improvement on the localization of r 1 can be seen through its error covariance (Figure 10(b)).remains pretty much the same, showing consistency for both robots.
Distributed collaboration between three robots.We show here the effects of multiple loop closures between three robots that communicate their unconstrained global maps whenever an event occurs.Two ground robots r 1 and r 2 move along circles in different locations on the environment.They never meet, and their maps never overlap.The third robot r 3 is an aerial robot that moves in a circular trajectory which extends from the area covered by r 1 to the one covered by r 2 .Each robot simulation runs in its own process.
The three robots start at a well known location, then at 240 s, r 1 and r 3 have a rendezvous, later at 610 s and 700 s the robot r 3 detects a loop closure with two maps of r 2 (the video slamnet.mp45shows a run of this simulation).The robots communicate the unconstrained global graph when a local map is created.Perfect communication is assumed.The uncertainties are expressed in the wrf so that the effects of the loop closure can be seen.Notice the robots only have information about their own landmarks.The consistency plots for a single run are shown in Figure 12.The final map and global graph before and after merging (at the end of the simulation) are shown in Figure 13.

Estimating line segments in the monocular EKF-SLAM framework
Most of the visual SLAM approaches rely on landmarks defined by interest points detected in the images, be they Harris, CenSurE, SIFT, SURF or other features.The estimated state of such landmarks is their 3D position, and the resulting maps are sparse collections of 3D points: even though they are spatially consistent and span the world's three dimensions, such maps do not represent useful information to asses volumes, to compute visibilities, or to establish matches either with maps built by another robot or with existing 3D maps now widely available.
To improve the map's representational power, one of the key points here is to use, in addition to sparse 3D points, linear landmarks or line segments.Lines provide an improved semantic over points: they inherently contain the notions of connectivity (they span from one position to another) and boundary (they separate one region from another), which open the door to potential automatic interpretations of the environment, both at the local and global levels.A 3D model based on lines can further allow the possibility to build higher level entities (planes, closed regions, objects).
By building such maps, our goal is to aim at building a meaningful 3D model, on the basis of which one can assess visibilities and match features or maps from disparate viewpoints.These goals, however, are difficult to achieve and lie beyond the scope of the work presented in this paper.As recognized by Smith et.al. [31], one of the difficult features to achieve satisfactory SLAM operation with lines is a good lines extractor and matcher algorithm, which is not one of our goals in this paper.We concentrate only on the estimation of the geometry of 3D lines from moving monocular platforms, which constitutes by its own right a problem that has not been successfully solved yet, as we expose in the following section.Wide baseline map matching based on lines is introduced, and demonstrated only in simulation.Real map matching based on lines and map interpretation are left for future work.The focus is therefore on the ability of building heterogeneous maps with points and lines.

Related work
Most approaches to visual SLAM with segments define the segments based on two endpoints that support them, fixing the segment's length so that partially occluded segments remain shorter than they actually are regardless of the evidence gathered with new observations.Additionally, the majority of these works make use of delayed initialization techniques, limiting the potential performances of monocular or long-range sensing SLAM.These remarks are detailed in the following paragraphs.
Ref. [32] proposes a model-based monocular SLAM system using line segments and the UKF, but the use of direct-distance parametrization for the unknown endpoints depths leads to delayed initialization.Ref. [31] represents segments with their two endpoints.The support endpoints are coded using inverse-depth parameters during the initialization phase, and are converted to Cartesian representations after convergence.The inverse depth parameters are initially estimated by an external EKF until convergence, thus delaying initialization.Ref. [33] makes use of small edge landmarks, named 'edgelets', associated to a 3D point which is initialized in an undelayed manner.Edgelets are typically 15 pixels long, and longer lines must be represented by several edgelets.This intrinsically local definition of edgelets does not possess the connectivity property that we highlighted above (though it does possess local boundary), compromising precisely the representativeness that we are seeking.We look here for representations of arbitrarily long lines with the additional possibility of incrementally updating their endpoints according to new observations.These works have been conceived and demonstrated in indoor setups, often using camera motions that have been purposely chosen to maximize observability (i.e., motions with a significant sideways component looking at objects close by), thus achieving successful operation even with delayed initialization.Our case is radically different as we are dealing with outdoors robots whose trajectories are planned and executed with other priorities in mind, and where objects to be mapped might be at large distances.As we have demonstrated earlier [34,35,36], undelayed initialization allows the robots to use the partial landmark information for localization from the very first observation of a landmark, regardless of the sensor trajectory and the distance to the landmark, greatly helping to constrain the camera localization.In particular, this makes the use of remote landmarks beneficial for long term attitude estimation, minimizing angular drift which is the major cause of inconsistency in EKF-SLAM [29].This constitutes a crucial feature in e.g.aerial vehicles as in our case, especially because, due to the long range sensing requirements and strict energy and payload constraints, only monocular vision seems to be practicable.
Our approach draws on [36], which is in turn based on [37].Ref. [37] performs SLAM with segments using the Euclidean Plücker coordinates (a sub-set of the Plücker coordinates where the direction vector is normalized) to map infinite lines.The major drawback of [37] is a delayed initialization.In [36] we added undelayed operation by using the Plücker coordinates directly, where the Plücker direction vector's norm exhibits inverse-distance behavior.Here we add an anchor to improve linearity, equivalently to what it is shown in [38] for the case of points.
Plücker lines have also been used in many major vision works with straight 3D lines [39,40,41].These works, and other ones referenced therein, are based on Structure From Motion approaches solved offline using non-linear optimization.Online, incremental SLAM methods based on these techniques would have to draw on e.g.[42] (which works for points and edgelets) but they have not yet been reported.

Plücker lines (PL): incorporating the inverse-depth concept to lines
The Plücker coordinates for a line consist of a homogeneous 6-vector in projective space, L ∈ P 5 .In this vector, one can identify two sub-vectors 6 , L = (n : v), with {n, v} ∈ R 3 , with which an intuitive geometrical interpretation of the line in 3D Euclidean space is possible [36] (Fig. 14(a)):  1.The sub-vector n is a vector normal to the plane π supporting the line L and the origin O. 2. The sub-vector v is a director vector of the line.
Pin-hole projection.Given a perspective camera defined by the intrinsic parameters k = (u 0 , v 0 , α u , α v ), a Plücker line in camera frame L C = (n C : v C ) projects into a homogeneous line λ ∈ P 2 in the image plane with the linear expression [39] where K is called the Plücker intrinsic matrix.Its relation to the regular intrinsic matrix K is K ∝ K − .
The two most remarkable properties of the Plücker line are its linear transformation and projection equations and the inverse-depth behavior of the sub-vector v (recall property 3. above), something that allows us to design appropriate undelayed initialization methods for EKF.For further details on the use of Plücker lines in monocular EKF-SLAM see [36].

Anchored Plücker lines (APL): improving on PL linearity
Now, we add an anchor to the parametrization to improve linearity, as it is done for points in the inverse-depth parametrization [28].Anchoring the Plücker line means referring it to a point p 0 in 3D space different from the origin (Fig. 14(b)).The anchor point p 0 is chosen to be the optical center at initialization time.Thanks to this, the Plücker line part (n : v) is correlated to the rest of the map only through the anchor p 0 .As a consequence, on subsequent EKF updates, only the accumulated errors from p 0 to the current camera position T are considered, in contrast with regular Plücker lines where the error accounts for the absolute motion of the sensor from the origin of coordinates.
The anchored Plücker line (APL, Fig. 14(b)) is then the 9-vector: The operations needed to manipulate APL are as follows: Frame transformation.Frame transformation reduces simply to transforming the point p 0 and rotating the vectors n and v.This is accomplished with the affine transformation Un-anchoring.Given an anchored Plücker line Λ = (p 0 , n : v), its corresponding (un-anchored) Plücker line L is obtained by transforming the Plücker part (n : v) from a frame at the anchor (T, R) = (p 0 , I 3 ) to the global frame with (12), and removing the anchor.This reduces to Transformation and projection.Transformation and projection are accomplished with a transformation to the camera frame (the inverse of ( 15)), un-anchoring ( 16), and Plücker projection (13).The three steps can be composed in one single expression with:

Segment endpoints
The line's endpoints in 3D space are maintained out of the filter via two abscissas defined in the local 1D reference frame of the line, whose origin is at the point q = p 0 + (v×n)/(v v), the closest point to the anchor (see Fig. 14(b)).Given the line Λ = (p 0 , n : v) and abscissas {t 1 , t 2 }, the 3D Euclidean endpoints are obtained for i ∈ {1, 2} with 3.5.Back-projection of an APL APL back-projection consists in defining a Plücker line L from a segment observation λ, and anchoring it at the camera position T to obtain an APL Λ.These operations are detailed below.
Back-projection of a Pl ücker line.Here we briefly summarize the development in [36].In the camera frame, the Plücker sub-vector n C resulting from the observation λ is simply the inverse of ( 13), The second sub-vector v C is not measured and must be obtained by injecting prior information.This prior must specify exclusively the 2 DOF that are missing in the observation.It is defined in the 2-dimensional plane and mapped to the plane orthogonal to the observed n C .This plane is spanned by the base matrix E which is obviously orthogonal to n Then, given a prior β ∈ R 2 , the sub-vector v C is obtained linearly with  For convenience, we arbitrarily build E so that the points in the β-plane correspond to lines easy to interpret.With this purpose of intuitiveness we choose the base vectors {e 1 , e 2 } so that β is exactly inverse-distance and e 1 is parallel to the image plane, leading to Anchoring.This step is trivial as we have interest in making the anchor p 0 coincide with the current camera position, which is the origin when we are in camera frame, Back-projection and transformation.The operations above plus the transformation to the global frame (15) can be composed and written as a single-step function of R, T, λ and β, We name this function g() and retain it as it is needed for landmark initialization.In EKF-SLAM, the pose orientation R is often encoded by a quaternion or the Euler angles, which we denote indistinctly by Q.The camera pose C is specified by the vector x = (T, Q) (we neglect here the super-index indicating the local map, x ≡ x i , to make the notation easier to read), in which case the function above becomes which is simply a function of the current state x, the current measurement λ and the provided prior β.

Undelayed APL initialization in EKF-SLAM
To initialize the APL, suppose we have a camera C at location x = (T, Q), with intrinsic Plücker matrix K. x has uncertainties encoded in the map 7 , while K is assumed to be deterministic.A newly detected segment, together with its uncertainty encoded by a covariances matrix, needs to be transformed (via regular covariance propagation based on the Jacobians of the transformation) to a homogeneous line in P 2 .For example, if the segment is detected in the form of two endpoints {u 1 , u 2 }, with noise covariance R u = diag(σ 2 u , σ 2 u ) each, we obtain the homogeneous line's pdf λ ∼ N{ λ; L} by joining the two points λ = u 1 ×u 2 (26) where u i = (u i : 1), R u = diag(R u , 0), and [•] × is the skew-symmetric matrix associated to the cross-product (i.e., [u] × v ≡ u×v), Initialization of the support APL is done with the classical EKF-SLAM method, by linearizing Λ = g(x, λ, β), Eq. (24-25), and providing β as a Gaussian prior.The camera C is part of the i-th local map vector x i m , that we rename here simply x m .Its Gaussian pdf x m ∼ N{x m ; P m } is decomposed in (2), let us rename m ≡ m i the set of existing landmarks in the current (the i-th) local map.
For the prior β we define a Gaussian pdf β ∼ N{ β; B} as shown in Fig. 15.This Gaussian in the β-plane will be conveniently mapped to the 3D space via the transformation matrix E, as we have seen in the previous section.We use the non-isotropic Gaussian shown in Fig. 15(b), described by We further choose d min = 0.75m.We obtain with (24) the landmark's mean and Jacobians with which, being x, λ and β mutually independent, we compute the co-and cross-variances to finally augment the map The endpoints are found by retroprojecting both image endpoints {u 1 , u 2 } onto the Plücker estimate's mean, expressing the solution {p 1 , p 2 } in terms of abscissas {t 1 , t 2 } with (18), and solving for them.Details on these operations and the necessary algebra can be found in [37,36].

Segment EKF correction
The update of the APL is essentially a standard EKF update.The only difficulty, perhaps, is the fact that a proper measure of distance between two lines is not available in the Euclidean sense, and we need to find something practicable to define the innovation.We chose the technique described in [37] which defines the innovation space in polar coordinates (Fig. 16).The observation function h() is defined by composing (17) with the transformation from homogeneous lines λ = (λ 1 , λ 2 , λ 3 ) into polar coordinates (ρ, θ), given by The measured lines are defined by 2 endpoints {u 1 , u 2 } with a fixed covariance R u each.Converting them to polar coordinates is just a matter of composing ( 26) with (35).We remind that an innovation measure y = z − h(x m ) based on a mapping (35) contains some singularities and discontinuities that we need to tackle.See [37] for details.The currently measured segment's endpoints are retro-projected onto the 3D line, and its abscissas {t 1 , t 2 } computed (see Eq. 18).During the line's convergence phase, the segment endpoints abscissas are systematically replaced with the current ones.We use some heuristics to determine when the line estimate is stable enough, and then apply an extending-only policy: each segment's abscissa is updated with the new one only if this lengthens the segment.

Map-matching with 3D lines
Assuming the data-association problem for lines in the 3D Euclidean space is solved, the problem of recovering the transformation between two reference frames, based on a set of 3D lines, is approached using a closed form least-squares solution.
First the APLs are transformed to the Euclidean Plücker coordinates representation using ( 16) and normalized with respect to the line direction as v = v v and n = n v .When removing the anchor, the normal n is defined with respect to the origin of the reference frame.Then ( 12) is used to state the least-squares problem to determine R and T. As in the case of pose estimation from corresponding 3D points, the problem is decoupled to first recover the rotation matrix and latter the translation vector [30].
From corresponding Euclidean PLs the orientation is determined through the following criterion, where N is the number of matched lines between reference frames i and j.The rotation matrix that minimizes (36) is obtained through a singular value decomposition (SVD) algorithm as described in [30] for 3D points.
The translation vector T is obtained by minimizing, thus the explicit translation is recovered as follows,

Simulation results
We simulate a scenario consisting of a wireframe of a house built with 27 segments (Fig. 17 shows the house being reconstructed).A robot with one perspective camera (90 • FOV, 640×480 pix resolution, 0.5 pix error) looking forward approaches from a certain distance at 1.2 m/s, gathering images at 30 fps.In order to observe the scale factor, the robot takes noisy odometry readings with 0.01 m/ √ m and 0.25 • / √ m error.In order to improve consistency, the measurement noise covariance is multiplied by a factor 2, i.e., R u = 2×diag(0.5 2 , 0.5 2 )pix 2 , as suggested in [29].To evaluate the estimation's consistency, a Monte Carlo analysis of the normalized estimation error squared (NEES) of the robot 6D-pose is made (Fig. 18(a)).The averaged NEES after 25 runs shows good consistency up to frame 100 and a riskier behavior from then on.This is in accordance with [29], which concludes that long-term EKF-SLAM is always inconsistent, providing evidence of the necessity of approaches using multiple local maps.Simulation results with multiple local maps are shown Figure 18(b), where the consistency has been clearly improved.
In order to evaluate the map-matching approach presented in Section3.8,two robots are deployed in a simulation setting populated with line segments only (see adjoint video mapMatchingLines.avi8).The robots trajectories do not intersect until the end of the run, however two of their 50 lines sub-maps partially overlap.A map-matching event occurs at frame 377 (data association is known in this setup).The event takes place after 16 anchored Plücker lines from the aerial robot current frame (S 4 ) are matched with a previous map built form the ground robot (S 2 ).Lines with large uncertainties are not taken into account for the least square minimization.The transformation between the lrfs obtained with the proposed approach is ŝ2 4 = (0.73, 14.42, −6.14, −1.178, −0.004, 0.006) in (m,m,m,rad,rad,rad).The real transformation between the reference 4th and 2th frames is s 2 4 = (0.66, 14.56, −6, −1.169, 0, 0) .Note that the orientation and the translation are well-estimated, but most importantly they are consistent.Figure 19 shows the 3D robots location and sub-maps before and after the map-matching event, and figure 20 shows the consistency plots for a single run of this setting.

Setup
Outdoors data acquisitions in a large environment have been conducted to verify the performance of the proposed approach, with the ground robot "Dala" and the helicopter "Ressac" (Figure 21).The environment is an abandoned country village in the south of France, now used as a military training facility -see Figure 1.It is "semi-structured", in the sense that it does not contain as many buildings as an urban area, and the building themselves do not contain many straight lines or perfect planar areas.
The ground robot Dala is an iRobot ATRV platform, equipped with a calibrated stereo-vision bench made of two 1024×768 cameras with a baseline of 0.35m.The helicopter Ressac is controlled by algorithms developed at Onera [43], and is also equipped with a calibrated stereo vision bench made of two 1024×768 cameras, with a 0.9m baseline.
Several thousands of images have been acquired while the helicopter Ressac is automatically achieving a swathing pattern at an altitude of about 40m, and Dala is making loop trajectories in the north-west group of buildings under manual control (Figure 24).

Involved processes
The SLAM algorithms integrate two types of observations from only one camera; image points and image line segments.The mapped points are parametrized as inverse-depth points, and the mapped segments are parametrized as anchored Plücker line segments, as presented Section 3.
Unfortunately, because of engineering issues encountered during the data collection, no inertial or odometric motion estimates are available 9 .As a consequence, we use a visual odometry approach based on stereo vision for the motion prediction steps of the EKF SLAM algorithms -which in turn is the mean trough which the scale is recovered.
At each image acquisition, point observations are firstly processed: the resulting updated motion estimate is exploited by the line segment tracker, and line landmarks observations are then processed.Images are sub-sampled by a factor of 2 before being processed (640 × 480), and a heuristic is used to select the points that will be used as landmarks: the image is regularly partitioned in 3×3 regions, in which one ensures that at least 2 landmarks are being tracked -one or two new interest points are selected as a landmark every time this is not satisfied.As for the lines, only the ones whose length is greater than 60 pixels are retained as landmarks.

Landmark detection, tracking and deletion
Point landmarks are Harris interest points that are matched from one view to the other with the group based matching procedure described in [44]: a first candidate match between two interest points is established using signal information (the two principal curvatures of the auto-correlation function), and confirmed if matches of neighboring points that satisfy geometric constraints on their location in the image are found.
We use different initialization parameters for inverse depth point parametrization with Dala and Ressac.Dala's parameters are ρ init = 0.1 m −1 and σ ρ = 0.2 m −1 , while Ressac's parameters are ρ init = 0.025 m −1 and σ ρ = 0.0125 m −1 (the points are initialized at 40m, which is the helicopter average elevation over the terrain).
To extract and track line segments, we use a model-driven approach, that is very robust to scene and illumination variations, and that does not require the setting of any sensitive parameter (details can be found in [45]).
For the estimation part, the a priori parameters used in the experiment for the APL are β = (0.025, 0), σ β 1 = 0.025 and σ β 2 = 0.0375 for both robots.The prediction of the line segment position in the image, required by the segment tracker, is done using the projection of the 3D line segment into the image frame.
Smart landmark deletion is crucial for maintaining reliable maps consisting of only stable and consistent landmarks.We perform it at the local-map level, based on a test using three counters associated to each landmark: N s , the number of match attempts (searches); N m , the number of matches performed; and N i , the number of validated matches (inliers).A landmark is deleted whenever the condition D  holds true.D() basically ensures sufficient evidence with (N s > 10), and then checks if the landmark appearance OR visibility are unstable, with (N m /N s < 0.5), or if the landmark estimate is inconsistent with the observations, with (N i /N m < 0.5).The three thresholds (10, 0.5, 0.5) could be optimized in some way, however, with these initial values the impact on map quality and estimation consistency is notable.

Local Maps
New local maps are created when 100 landmarks (combining points and line segments) are in the map.Immediately after, the current robot's pose is the new relative transformation in the global graph.Figures 22 and 23 shows examples of local maps built by Dala and Ressac: the maps consist of a set of 3D line segments and 3D points.
Figure 24 shows the estimated trajectories of the robots, superimposed to the aerial image of the area.For this run, no events that link the two robots graphs are considered.

Enforcing loop closures
With this dataset, only very few line segments are mapped by both robots, which precludes the segment map matching.To define loop closing events, we therefore take advantage of the availability of stereovision data, and emulate with 3D point matches two types of events, rendezvous and data-matching (image to image): • The rendezvous is emulated using matches of interest points perceived by the two robots using their current image frames.The 3D coordinates of the points obtained by stereovision yield the possibility to provide an estimate of the relative robot position.The actual trajectory of Dala fits the road, whereas here the trajectory estimate does not, although its origin as been set at the good position over the aerial view.
• The image-matching event recovers the relative transformation between the current robot pose and a past pose from a different robot (origin of a local sub-map, such as in Figure 6(a)), using also matches of interest points between their respective frames.Only key-frames corresponding to the origins of local sub-maps are used for the matching.
Once an event arises, a modified A-star algorithm is used to detect the occurrence of a cycle in the graph, and in particular the minimum cycle on which the loop constraint is applied (the algorithm searches for the minimal length path between the two nodes linked by the event, the existence of this link being ignored for the search).
Figures 25 to 28 show results obtained by the integration of events between Dala and Ressac.Dala starts at the entry of the north-west group of buildings, with no uncertainty in its local map, but also in the wrf : the first Dala sub-map is the origin of the world.Ressac starts above Dala, and heads towards south-east.A first rendezvous event occurs immediately after start, and Ressac is localized in Dala's reference frame.
A second event occurs after Ressac comes back from the south-east village, passing above a place previously visited by Dala.The effects of the image-matching event are shown in Figure 25.The figure also shows the image frames that were evaluated for the matching: new local maps are initiated afterward for both robots.Note that Ressac's uncertainty in height is pretty large, especially before the second event: the visual odometry used as predictions is indeed not very precise in the vertical direction, because Ressac stereo baseline is small with respect to the depth of the perceived points -and the integration of points and lines in the sub-maps does not reduces much the elevation estimates.However, after the data-matching event, the elevation of Ressac and the origins of all the built maps are strongly corrected.No map-merging is done.Figure 26 shows in the same plot the robots' trajectories as estimated by visual odometry, SLAM without any loop closure event integrated, and with the integration of the events.
The final global graph (origins of local maps with associated uncertainties in the wrf), and the robots' trajectory in wrf are shown Figure 27.This figure shows how the origins of Ressac's sub-maps around the same area (sub-map 5 and 27) have a similar uncertainty, as an effect of the loop closing event when it comes back.Note also that Ressac's 3D filtered path is off the trajectory defined by the sub-maps origins (mean of the ellipsoids): the path shown is the result of the EKF at each time instant, that is not corrected by the optimization algorithm.However, the full topology (origins of the local maps) is readjusted, as in a smoothing algorithm.
Finally, Figure 28 presents qualitative results of the proposed approach.Each robot has processed over 1200 images mapping approximately an area of 300 × 300 m.In the second part of the adjoint video airGround.mov10, the robots trajectory in wrf are shown.The south-east group of buildings is easy to spot in the video, it is the place where Ressacc is turning 180 • .Also, the video shows the processed image sequence for Dala and Ressac, along with the 3D landmarks, and local maps' origins in wrf.

Discussion
The contributions proposed in this paper fit well with the objective of deploying cooperative aerial / ground robotic systems: on the one hand, the distributed multi-map approach handles communication losses and can cope with large areas, and on the other hand, maps made of points and lines represent better the environment geometry than only points, and thus yield the possibility to match and fuse data acquired by both kind of robots.
The proposed solution to the localisation and mapping problem is based on a combination of filtering (within the local maps the robots poses are filtered) and smoothing (the origins of the local sub-maps are past robots' poses and whenever an event occurs the past poses are corrected).The mapping problem is therefore relegated within the local sub-maps, and is decorrelated from the global localization problem, making our approach akin to a cooperative localization approach, where robots improve their localization using the information provided by other robots [15].In terms of cooperation, the marginals of graph level is the sole information that must be exchanged between the robots or to a central server.Future work considers the implementation of the distributed approach in a real scenario.The approach is well suited to a multi-robot context, and it can in particular handle all the possible localization means, from odometry to absolute localization with respect to an initial model.The fact that no information is shared between sub-maps leads to a loss of information, as faraway points and lines that are being tracked for long periods of time and between sub-map need to be re-initialized after a new sub-map is created.The conditional independent maps approach (CI-SLAM [46]) palliates this problem in the single robot case.But in the multi-robot case, this solution raises issues that remain to be studied -in particular, sharing information between the robots can not be straightforwardly done at the graph level.
In order to obtain the best localization for each robot in the incremental SLAM solution and to avoid any graph incompatibility between robots, an alternative efficient implementation of a posteriori maximum likelihood solution should be considered instead of the minimal cycle optimization.
The map merging process is not required for the robots to operate, but should one require a global map, it can be done at the end of the mapping process.In our simulations, we applied a fusion in covariance form: fusion in information form [6] has however more interesting properties for decentralized systems, because information in the same reference frame is just added.Moreover, the sparseness of the information matrix's structure can be exploited to reduce communication bandwidth.Combining covariance and information forms might reduce the computation complexity, as shown in [47] which exploits the advantages of both methods.
In order to build landmark maps that are invariant to the vantage point, we have proposed to use 3D points and line segments.An important contribution of this paper is the new line segment parametrization for undelayed initialization, the anchored Plücker line, with which we exploited the two key concepts of inverse-distance and anchoring.The outcome is promising, but not fully satisfactory yet.Ongoing work is a more detailed analysis of different line parametrizations with the aim of further improving line estimation performances, most particularly with regard to filter consistency but also to other important aspects such as endpoints management [48].
The built maps combine inverse-depth points and anchored Plücker line segments.Essentially, as we stated the problem, detected line endpoints do not provide useful information as there is no guaranty that they represent stable 3D points.Therefore, one line in the image conveys the same amount of information as one point: 2DOF.In general, a set of N 3D lines in general configuration provides the same amount of information as a set of the same number of points, also in general configuration.Having both points and lines means that such amount of information can be gathered with, say, N/2 points and N/2 lines.Within a complex scenario, there may be places where mostly points are visible, but other places where mostly lines are visible and trackable (structured scenarios for example).An algorithm that is able to exploit both types of landmarks is therefore in advantage in front of single-type ones.
The use of line segments is very promising, but yields a wireframe model that remains preliminary.Further post-processes can allow the building of a whole surfaces model, e.g. by collapsing point landmarks or by using homographies to verify plane hypotheses generated on the basis of pairs or triplets of coplanar segments.
Finally, much work remain to be done at the control level in order to efficiently deploy these mapping processes among a fleet of heterogeneous robots.For instance, creating a new submap after every detected loop closure becomes totally inefficient when GPS is available, and besides sharing graphs every-time a connection between robots occurs, one must define what additional information has to be transmittede.g. which maps should be associated to trigger a loop closure?Mapping is an active process that must be controlled and supervised: this is even more necessary when it is distributed among several robots.

Figure 1 :
Figure 1: Three aerial images of a village acquired at about 40m altitude (top) and three images of the same area acquired by a ground rover.The red angular sector in the aerial images approximately represents the position and field of view of the ground rover camera.How could these data be registered and fused in a spatially consistent data structure?

ji
that are not present in this figure (see Figs. 5-7 for examples).

Figure 5 :
Figure 5: Bayesian network of rendezvous event.The observation z is the relative transformation between the two nodes.

Figure 6 :
Figure 6: Bayesian network of information matching event.a) Independent information of the map is observed to establish the link between the nodes (e.g.images), without the need to recover the global maps.b) It requires to recover the global maps to perform the data association.

Figure 7 :
Figure 7: Bayesian network of GPS fix event.A link is created to the geo-referenced frame G and the current pose.

1 .
Build local maps 2. When an event occurs (a) Broadcast the unconstrained global graph 4 (b) Append the unconstrained global graph with the new marginals (c) Check if all robots' information have been received (d) Check if there is a cycle • Perform optimization over the minimal cycle.
(a) Event r 1 /r 2 and r 1 /r 3 (b) Event r 2 /r 3 (c) Event r 1 /r 2 and LC r 2 and r 3 (d) Event r 2 /r 3 and LC r 1 and r 3 (e) LC r 1 , r 2 and r 3 (f) LC r 3

r 1 andr 2 Figure 9 :
Figure 9: Rendezvous single run simulation results for one aerial and one ground robots.In a) the odometry is shown in green, real and estimated trajectories are shown in red and blue respectively.3σ ellipsoids are plotted on the basis of each lrf.b) shows the global position errors for each robot and their global 3σ uncertainty bounds.c) shows the global orientation errors for each robot and their global 3σ uncertainty bounds.

r 1 and r 2 Figure 10 :Figure 11 :
Figure 10: Map-matching simulation results for one aerial and one ground robots.See caption of Fig. 9.

3 Figure 12 :
Figure 12: Simulation results for the 3 robots exploration; one aerial and two ground robots.See caption of Fig. 9.

Figure 11 (
Figure 11(b)  shows the average NEES of the robots poses for this setting: after a loop closing the average NEES

Figure 14 :
Figure 14: Geometrical representation of Plücker-based lines, the sub-vectors n and v and the anchor p 0 .(a) The Plücker line L and the origin O define the support plane π.The line's sub-vector n ∈ R 3 is orthogonal to π.The sub-vector v ∈ R 3 is a director vector of the line, and lies on π.This implies n ⊥ v.The closest point to O is Q = (v×n : v v) ∈ P 3 .The distance from L to O is d = n / v , showing that v acts as the homogeneous part of L, thus exhibiting inverse-depth properties.(b) The anchored Plücker line Λ is a Plücker line referred to an anchor p 0 .The closest point of the line to the anchor is q = p 0 + (v×n)/(v v).

3 .
The distance from the line to the origin is given by d = n / v .The basic operations needed to manipulate Plücker lines are detailed below:Frame transformation.Given a (camera) reference frame C specified by a rotation matrix R and a translation vector T, a Plücker line L in global frame is obtained from a line L C in frame C with the linear transformation[39]

(a) Isotropic pdf with line at infin- ity β 1 β 2 β 1
/d min (b) Non-isotropic pdf penalizing lines at the back of the camera.

Figure 15 : 1 , σ 2 β 2 ]
Figure 15: Defining a pdf for the prior β.(a) The Gaussian pdf contains all possible lines at a minimum distance of d min : it is isotropic in orientation, it includes the origin which represents the line at infinity, and d min is at 2σ.For reference, a Gaussian shape is superimposed on the horizontal axis to evaluate the probability values at 2σ and 3σ.(b) An interesting alternative that penalizes lines in the back of the camera is to approximate just the right-hand half of the pdf in (a) (here dashed) by a new Gaussian.A good fit is obtained with β = (1/3d min , 0) and a non-isotropic covariance B = diag([σ 2 β 1 , σ 2β 2 ]) with σ β 1 = 1/3d min and σ β 2 = 1/2d min .

Figure 16 :
Figure 16: The polar coordinates of a line in 2D space

Figure 17 :
Figure 17: The simulated environment consists of a robot approaching a wireframe of a house.The figure shows a) the partially reconstructed house, after approximately 40 frames have been processed.Despite the preliminary state of the map, the structure is already visible.With this fair amount of data, a map made of points (with e.g. one point at each line intersection) would not convey very much information besides localization, because it lacks structural semantics such as connectivity and boundaries.b) Fully reconstructed house with the ground-truth in blue.

r 1 Figure 20 :Figure 21 :
Figure 20: Multiple APL maps with line-matching event single run simulation results for a ground and an aerial robots.In a) the odometry is shown in green, real and estimated trajectories are shown in red and blue respectively.3σ ellipsoids are plotted on the basis of each lrf.b) shows the global position errors for each robot and their global 3σ uncertainty bounds.c) shows the global orientation errors for each robot and their global 3σ uncertainty bounds.

Figure 22 :
Figure 22: Local maps built by the rober Dala (a) and the UAV Ressac (b).

Figure 23 :
Figure 23: Local maps superimposed on a aerial view of the scene: with the rover Dala (a), the UAV Ressac (b), and the combined submaps (c).Yellow ellipsoids represent the endpoints covariances.Note that only line segments with small covariances have been plotted.
Figure 24: SLAM trajectories for Dala and Ressac in open loop, drawn on an aerial view of the experiment area (obtained on www.geoportail.fr).The actual trajectory of Dala fits the road, whereas here the trajectory estimate does not, although its origin as been set at the good position over the aerial view.

Figure 25 :
Figure 25: Top: Image frames from both robots before the event, approximately corresponding to the middle column images of figure 1. Green squares represent interest point currently considered as landmarks, yellow squares represent interest points just initialized as landmarks.The line segments are in blue, with endpoints in red.Yellow ellipses are the uncertainty in the image view.Bottom: Event effect in the global map, the sub maps origins expressed in the wrf are the large ellipsoids -only 3D line segments landmarks are shown here.

Figure 26 :
Figure 26: Comparative trajectory plots: odometry in dash-dot line, open loop run in dashed line and cooperative run for Dala (left) and Ressac (right).

Figure 27 :
Figure 27: Final global graph (the global level) with Dala's trajectory in blue and Ressac's trajectory in red in the wrf.

Figure 28 :
Figure 28: The thick-lines are the estimated trajectories for Dala and Ressac after the integration of two rendezvous/matching events.Ressac's trajectories are shown in red and magenta, and Dala's trajectories are shown in blue and cyan.Note that Dala's SLAM trajectory is now fitted the road.