= A Discussion of Vertex Indexing = Since switching the default storage selectors from `vecS` to `listS`, I've noticed exactly how few examples there are that actually use this. What's actually troubling about this is the fact that `vecS` secretly provides a `vertex_index` property which, as far as I can tell is used by nearly every algorithm in the library. Not surprisingly, `listS` does not. This can cause a number of misconceptions, namely this [http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Documentation_-_Graph_Library using vertex_descriptor as a 0-based index] post on the Boost wiki. Anyhow, the problem is not that using `listS` as a storage selector really breaks a lot of code (it can), but that its hard to figure out how to create vertex and edge index maps for them. The Dijkstra example has a `listS` example, but it uses property maps instead of bundled properties. Apparently, the solution is to manually initialize the index map. This is both good and bad. First, it's bad because new user's won't really understand why they have to write the same chunk of code over and over. Maybe it would be nice to provide a default `index_vertices()` function and an `index_edges()` functions that do that work for us. On the other hand, It's good because it highlights the instability of indices over descriptors. I think the real I'm having with this is that so much of the documentation and so many of the examples operate on the notion that a `vertex_descriptor` is its own index - which it is most definitely not. Since originally writing this, and continuing to work on the graph libraries, I have come to believe that the indexing required by most algorithms is more of a developmental crutch than a true feature. The sole purpose of requiring an index it mapping from a vertex to a property in constant time. There are certainly other, better ways to do this that don't require an artifical mapping of vertex descriptors to indices, and then to properties. One idea might be to declare external properties as `hash_map`s that map from vertices to properties (Look Ma! No index!). Of course, this is well beyond the scope of my summer project, so obviously, I won't be trying to fully address this problem. Instead, I am going to focus on solutions for just my `[un]directed` classes. The following is a proposal for semi-automated vertex management. == Semi-Automated Index Management == One of the most misleading aspects of the Boost.Graph library is that in many cases we treat vertex descriptors as vertex indices. Many of the examples simply assume that descriptor is an integral type (which in most examples it is) and that the descriptor value magically corresponds to the index into the vector into which the vertex is added (which in most cases it does). Unfortunately, this is quite misleading and can be trivially invalidated using a core algorithm in the following manner: {{{ #!cpp vertex_descriptor u = add_vertex(g); vertex_descriptor v = add_vertex(g); remove_vertex(u, g); breadth_first_search(g, v, make_bfs_visitor(null_visitor())); // CRASH! }}} This will probably result in a segmentation fault because of the algorithm's use of exterior property maps. Specifically, this algorithm uses an externior ''color'' property declared similarly to: {{{ #!cpp vector colors(num_vertices(g)); }}} Unfortunately, the only remaining vertex has index 1, which means when the algorithm runs this line of code (for example): {{{ #!cpp if(colors[get(vertex_index, g, v)] == BLACK) ... }}} Oops. That index isn't in the bounds of the array, and so this may result in a crashed program. Somewhat ironically, this problem is trivially solved by simple ''renumbering'' the vertex indices before calling the algorithm. == The Vertex Indexing Suite == Because of the near-ubiquitous approach of accessing exterior properties via vector indices, it seems logical that the Boost.Graph Library provide facilities for making this easier. This set proposed set of features addresses three common indexing problems: # The fact that `undirected_graph`s, `directed_graph`s, and `adjacency_list`s using `listS` as a vertex storage selector do not have vertex indices by default - which means that there are precious few algorithms these classes will work with out of the box. The current solution is to manually define and manage these property maps. # The easy invalidation of indices on destructive graph operations (i.e, removing a vertex). It should be easy to reset the indices of a graph to a continuous integral range after a number of remove_vertex() operations. # The inability to correctly identify bounds on exterior property map for a graph that has had vertices removed (without first renumbering the indices). The first part of the proposal requires a change to the `undirected_graph` and `directed_graph` class. Specifically, that they provide a default property map for vertex indices. This functionality can be propogated to the `adjancency_list` class (with `listS` vertex storage) at a later time. There are also three new graph functions in the proposed suite. These functions all require their graph parameters to be a model of PropertyGraph, with a a property map for the `vertex_index` as part of the graph itself. `get_vertex_index(v, g)` This is an alias for `get(vertex_index, g, v)` and provided for convenience rather than added functionality. `max_vertex_index(g)` This returns the total number of indices allocated for the vertices of the graph ''g''. The last vertex added to a graph will always have the index `max_vertex_index(g) - 1`. The return type for this function is the value type of the `vertex_index` property map, which is typically an integer type. `renumber_vertex_indices(g)` Renumber the indices in the vertex index map of ''g'' such that the index of each vertex is in the range [0, `num_vertices(g)`). Also, after calling this function, `max_vertex_index(g)` == `num_vertices(g)`. Implementing these three functions will allow more flexible algorithms for mutable graphs. Specifically, this model relaxes many of the constraints that vertices have indices in the range [0, num_vertices(g)). Instead, algorithms can require indices in the range \[0, max_vertex_index(g)). As a downside, this relaxed constraint will result in over-sized vectors whose items may not correspond to vertices within the graph. However, if the algorithms employ vectors as exterior properties, there is probably a good chance that the difference in vector size and capacity will subsume space required by overallocating for missing vertices. Furthermore, correct use of the `get_vertex_index(v,g)` function will promote more correct accessing of said vectors. Since all valid vertex descriptors have valid and correct indices, there is no possibility of accessing an element in an exterior property that does not correspond to a valid vertex. As an added bonus, the explicit name should improve readability significantly and discourage new (and experienced?) developers from doing this: {{{ #!cpp vertex_descriptor v = add_vertex(g); colors[v] = ... }}} which, for non-trivial programs is in fact, incorrect. This proposal is implemented in this form for the `undirected_graph` and `directed_graph` classes. Comments are welcome. === Additional Functionality === One possible criticism of this approach is the absence of automated index management, which only needs to be implemented for the `remove_vertex(g)` method. One solution, might be to simply have that function call `renumber_vertex_indices(g)`, but that's probably overkill since it renumbers ''all'' vertices. If we think about it long enough, we'll eventually realize that we only need to renumber vertices that are ''after'' the removed vertex. So we could implement the `remove_vertex(g)` method to do that, effectively cutting the average cost of removal in half (assuming that on average removals tend toward the middle of the list). Clearly, worst case is removing the first vertex which runs in ''theta''(n) time. Best case is removal of the back vertex which is simply constant. The obvious downside of the previous solution is that it hamstrings programs that performa a number of sequential vertex removals without needing to renumber indices - which is a pretty bad side-effect, but is still a useful function. As such, we could introduce an additional function: `remove_vertex_and_renumber_indices(v, g)` Removes the target vertex and renumbers all indices that follow it, in the order in which vertices were added. The `remove_vertex_and_renumber_indices(g)` method will run in ''theta''(n) time for `setS`, `multisetS`, and `hashS` selected vertex storage (if implemented). Unlike lists and vectors, the order of insertion is lost with these corresponding data sets, meaning that we can't cut the time in half as with above. === Other Side-Effects of Indexing === It should be probably be noted that any solution - even a fully automated index management suite - will still invalidate all exterior properties defined before the removal of a vertex. Once an index has been invalidated, so too are properties stored at those indices in exterior vectors. Exterior properties will need to be regenerated, which is probably a non-trivial task.