wiki:soc/2007/UserFriendlyGraphs/Indexing

Version 2 (modified by Andrew Sutton, 15 years ago) ( diff )

--

A Discussion of Vertex Indexing

Since switching the default storage selectors from vecS to listS, I've noticed exactly how few examples there are that actually use this. What's actually troubling about this is the fact that vecS secretly provides a vertex_index property which, as far as I can tell is used by nearly every algorithm in the library. Not surprisingly, listS does not. This can cause a number of misconceptions, namely this using vertex_descriptor as a 0-based index post on the Boost wiki.

Anyhow, the problem is not that using listS as a storage selector really breaks a lot of code (it can), but that its hard to figure out how to create vertex and edge index maps for them. The Dijkstra example has a listS example, but it uses property maps instead of bundled properties. Apparently, the solution is to manually initialize the index map. This is both good and bad. First, it's bad because new user's won't really understand why they have to write the same chunk of code over and over. Maybe it would be nice to provide a default index_vertices() function and an index_edges() functions that do that work for us. On the other hand, It's good because it highlights the instability of indices over descriptors.

I think the real I'm having with this is that so much of the documentation and so many of the examples operate on the notion that a vertex_descriptor is its own index - which it is most definitely not.

Since originally writing this, and continuing to work on the graph libraries, I have come to believe that the indexing required by most algorithms is more of a developmental crutch than a true feature. The sole purpose of requiring an index it mapping from a vertex to a property in constant time. There are certainly other, better ways to do this that don't require an artifical mapping of vertex descriptors to indices, and then to properties. One idea might be to declare external properties as hash_maps that map from vertices to properties (Look Ma! No index!). Of course, this is well beyond the scope of my summer project, so obviously, I won't be trying to fully address this problem.

Instead, I am going to focus on solutions for just my [un]directed classes. The following is a proposal for semi-automated vertex management.

Semi-Automated Index Management

One of the most misleading aspects of the Boost.Graph library is that in many cases we treat vertex descriptors as vertex indices. Many of the examples simply assume that descriptor is an integral type (which in most examples it is) and that the descriptor value magically corresponds to the index into the vector into which the vertex is added (which in most cases it does). Unfortunately, this is quite misleading and can be trivially invalidated using a core algorithm in the following manner:

vertex_descriptor u = add_vertex(g);
vertex_descriptor v = add_vertex(g);
remove_vertex(u, g);
breadth_first_search(g, v, make_bfs_visitor(null_visitor())); // CRASH!

This will probably result in a segmentation fault because of the algorithm's use of exterior property maps. Specifically, this algorithm uses an externior color property declared similarly to:

vector<color> colors(num_vertices(g));

Unfortunately, the only remaining vertex has index 1, which means when the algorithm runs this line of code (for example):

if(colors[get(vertex_index, g, v)] == BLACK) ...

Oops. That index isn't in the bounds of the array, and so this may result in a crashed program. Somewhat ironically, this problem is trivially solved by simple renumbering the vertex indices before calling the algorithm.

The Vertex Indexing Suite

Because of the near-ubiquitous approach of accessing exterior properties via vector indices, it seems logical that the Boost.Graph Library provide facilities for making this easier. This set proposed set of features addresses three common indexing problems:

# The fact that undirected_graphs, directed_graphs, and adjacency_lists using listS as a vertex storage selector do not have vertex indices by default - which means that there are precious few algorithms these classes will work with out of the box. The current solution is to manually define and manage these property maps. # The easy invalidation of indices on destructive graph operations (i.e, removing a vertex). It should be easy to reset the indices of a graph to a continuous integral range after a number of remove_vertex() operations. # The inability to correctly identify bounds on exterior property map for a graph that has had vertices removed (without first renumbering the indices).

The first part of the proposal requires a change to the undirected_graph and directed_graph class. Specifically, that they provide a default property map for vertex indices. This functionality can be propogated to the adjancency_list class (with listS vertex storage) at a later time.

There are also three new graph functions in the proposed suite. These functions all require their graph parameters to be a model of PropertyGraph, with a a property map for the vertex_index as part of the graph itself.

get_vertex_index(v, g)

This is an alias for get(vertex_index, g, v) and provided for convenience rather than added functionality.

max_vertex_index(g)

This returns the total number of indices allocated for the vertices of the graph g. The last vertex added to a graph will always have the index max_vertex_index(g) - 1. The return type for this function is the value type of the vertex_index property map, which is typically an integer type.

renumber_vertex_indices(g)

Renumber the indices in the vertex index map of g such that the index of each vertex is in the range [0, num_vertices(g)). Also, after calling this function, max_vertex_index(g) == num_vertices(g).

Implementing these three functions will allow more flexible algorithms for mutable graphs. Specifically, this model relaxes many of the constraints that vertices have indices in the range [0, num_vertices(g)). Instead, algorithms can require indices in the range \[0, max_vertex_index(g)).

As a downside, this relaxed constraint will result in over-sized vectors whose items may not correspond to vertices within the graph. However, if the algorithms employ vectors as exterior properties, there is probably a good chance that the difference in vector size and capacity will subsume space required by overallocating for missing vertices. Furthermore, correct use of the get_vertex_index(v,g) function will promote more correct accessing of said vectors. Since all valid vertex descriptors have valid and correct indices, there is no possibility of accessing an element in an exterior property that does not correspond to a valid vertex. As an added bonus, the explicit name should improve readability significantly and discourage new (and experienced?) developers from doing this:

vertex_descriptor v = add_vertex(g);
colors[v] = ...

which, for non-trivial programs is in fact, incorrect.

This proposal is implemented in this form for the undirected_graph and directed_graph classes. Comments are welcome.

Additional Functionality

One possible criticism of this approach is the absence of automated index management, which only needs to be implemented for the remove_vertex(g) method. One solution, might be to simply have that function call renumber_vertex_indices(g), but that's probably overkill since it renumbers all vertices. If we think about it long enough, we'll eventually realize that we only need to renumber vertices that are after the removed vertex. So we could implement the remove_vertex(g) method to do that, effectively cutting the average cost of removal in half (assuming that on average removals tend toward the middle of the list). Clearly, worst case is removing the first vertex which runs in theta(n) time. Best case is removal of the back vertex which is simply constant.

The obvious downside of the previous solution is that it hamstrings programs that performa a number of sequential vertex removals without needing to renumber indices - which is a pretty bad side-effect, but is still a useful function. As such, we could introduce an additional function:

remove_vertex_and_renumber_indices(v, g)

Removes the target vertex and renumbers all indices that follow it, in the order in which vertices were added.

The remove_vertex_and_renumber_indices(g) method will run in theta(n) time for setS, multisetS, and hashS selected vertex storage (if implemented). Unlike lists and vectors, the order of insertion is lost with these corresponding data sets, meaning that we can't cut the time in half as with above.

Other Side-Effects of Indexing

It should be probably be noted that any solution - even a fully automated index management suite - will still invalidate all exterior properties defined before the removal of a vertex. Once an index has been invalidated, so too are properties stored at those indices in exterior vectors. Exterior properties will need to be regenerated, which is probably a non-trivial task.

Note: See TracWiki for help on using the wiki.