The k-Colorable Graph Generator Manual

1 Over View

This generator may be used to generate quasi-random graphs with or without hidden k-colorings. These graphs can be generated using various mechanisms which vary combinations of the size and variability of size of the hidden color classes, the average edge density, variability of degree sequence, clique clusters, and girth.

The hope is that these will aid in identifying features that can be used to compare and distinguish various coloring algorithms, as well as providing a means of studying those settings which make for particularly hard classes of graphs to color with a specified coloring.

The program is initiated by

     generator filename
and output is to filename. The program prompts interactively for inputs to specify the graph.

2 Graph Output Format

2.1 File Types

This generator produces output in either the DIMACS standard ASCII format or DIMACS standard binary format. Descriptions of these and translators to/from binary formats can be found in Michael Trick's Graph Coloring Instances .

The user is prompted for an input to specify which file type is desired.

2.2 Cheat

For graphs with a hidden k-coloring, there is an option that produces an output called the cheat. This cheat is in the form of comments added to the preamble and looks like
c cheat 100 10
cx 10 11 1 1 0 1 3 2 6 3
cx 5 11 7 6 9 7 14 1 8 9
  .........
The cheat is a set of comment lines, the first using the keyword "cheat", followed by the number of vertices and the number of colors per line in the following portion. Each cx-line contains a sequence of colors from the range[0..k-1], which if applied to the vertices in order 1..n, will be a legal k-coloring. This may be used when present to verify that indeed a k-coloring exists. (The last cx-line may have fewer colors.)

We use this cheat in our coloring program to compare the coloring(s) we find with the hidden coloring. See for example Lawrence Hubert and Phipps Arabie, Comparing Partitions, Journal of Classification 2, 193-218 (1985).

The cheat option is the last requested input from the generator program.

3 Randomization

The program requires the presence of a (pseudo) random number generator (RNG) provided by a system library or the user. We have not experimented with different RNGs, and it would be interesting to know if some of the observed effects are significantly different using different RNGs. Clearly, systems using different RNGs will yield different graphs.

As its second input, the program prompts for a random number seed.

4 Hidden Color Selection

Several options are available for selecting a hidden coloring. These are:
        1 No hidden coloring
        2 Equi-partitioned
        3 k-colorable
        4 k-colorable(smooth)
        5 k-colorable with delta variation
Alternate graph:
        6 Flatgraph

4.1 No Hidden Coloring

As the name suggests, if this option is chosen then no hidden coloring is used. Use this option if you wish for example to have random graphs, or geometric graphs.

4.2 Equi-partite

If this option is selected, then the hidden independent sets will be as nearly equal in size as possible, with the smallest sets being one less than the largest. For example, on a n=1000 node graph, if k=60, then there will be 20 16-sets and 40 17-sets.

For classes tested so far, and using a suite of various algorithms, it appears that these present the most difficulty in obtaining the specified coloring, or good approximations to it.
(See Exploring the k-colorable Landscape with Iterated Greedy)

4.3 K-Colorable Graphs

This class requires as input an integer variability parameter delta, in the range [0..k-1]. The special case delta=0 generates a uniform k-colorable graph. In this case, vertices are randomly assigned to one of the k partition elements uniformly and independently. When coupled with the IID generation described below, the graphs produced are those used in Jonathan S. Turner Almost all k-colorable graphs are easy to color, Journal of Algorithms 9, 63-82,1988. The variation in size of independent sets seems to make these graphs easier to approximately color well, and possibly easier to color exactly.

For variability greater than 0, the class is also called the highly variable k-colorable graphs. For each vertex in turn, the generator chooses an integer h at random from the range [0,delta]. It then assigns the vertex to the partition number i chosen at random from the range [h,k-1]. (We use colors [0...k-1]).

4.4 k-colorable (smooth)

In this variation, the colors assigned to the vertices are computed using kx(ax+1-a) . Here 0<= a<=1 is a variability parameter and x is a random number from the interval [0,1). For a=0 we have the class of uniform k-colorable graphs, as described above.

4.5 k-colorable with delta variation

This variation takes a parameter delta to control the variability of the independent set sizes. A random permutation of the vertices is generated, and then partitioned into k classes such that the ith class has delta more vertices than the i-1th class.

Classes 3, 4 and 5 seem to be easier to approximately color the larger is the amount of the variation in size that is allowed.

After selecting the color class, the user will be prompted for the number of vertices in the graph, and the number of colors, and if one of the highly variable classes the variation control parameter.

4.6 Flat Graphs

As the format suggests, the last option, the FlatGraph class, is somewhat special. The flat graph generator that is included in this package uses the same generation algorithm as that used in the DIMACS version but is enhanced in that now the hidden coloring can be output using the cheat, and some additional parameters of the graph are listed in the comments.

The original Flat Graph Generator together with a description and discussion can be found here. The hidden coloring used in the flat graphs is equi-partite.

5 Graph Types

The program has several methods of assigning edges, which define the differing graph types.
        1 IID (independent random edge assignment)
        2 Girth and Degree Inhibited
        3 Geometric
        4 Weight Biased Graph (encourages or inhibits cliques)
        5 Clique driven
        6 Cycle driven

5.1 IID Graphs

In these graphs, each pair of vertices is considered in turn and assigned an edge with independent identical probability p, provided the two vertices are not in the same hidden color set. The only input required is the probability of assigning edges p. (As noted in section 2, after the graph is completed the user will need to indicate whether or not the hidden coloring is to be output in the cheat.)

5.2 Girth and Degree Inhibited

These graphs require 4 parameters: edge probability p, girth limit g, a ``soft'' degree limit delta, and a hard degree limit Max delta.

The input g indicates that no cycle is to be created in the graph of size less than g. To this end, whenever an edge (v,w) is about to be added to the graph, every pair of vertices x,y which will have distance less than g after the addition is marked as ``blocked''. (x,y) will then never be selected as a candidate edge. Our algorithm for doing this is not very efficient, and so the cost of generating these graphs goes up sharply with g and n.

p is the probability that a pair selected as a candidate edge will be used as an edge. Since limiting the girth already drastically limits the number of edges in a graph, most of our experiments so far use p=1.0.

Max delta and delta are limits set on the allowed variation in the degree sequence. Max delta is a hard limit on the allowed difference between the average degree over all vertices and the maximum degree of any vertex. delta is a soft limit that may be increased if the program detects it is unable to add any more edges at the current setting. The edge selection terminates if delta > Max Delta .

The graphs are generated using two queues of vertex pairs and a blocking matrix. The blocking matrix B is like an adjacency matrix except when B{ij} = 1 it means that the vertex pair (i,j) cannot be used as an edge in the graph. Initially the blocking matrix is assigned 0 everywhere, then 1's are assigned for every pair of vertices which occur in a same set in the specified coloring.

All pairs of vertices are assigned to one queue initially, and then the queue is randomly shuffled. This is to reduce the biases that would otherwise be present if we were to consider the vertices adjacent to another vertex all at once, as would happen in usual traversal methods. As pairs are removed from the queue, they are simply discarded if the edge is blocked in B. Unblocked pairs are tested to see if adding this edge would violate the soft degree constraint. If so they are added to the second queue. Otherwise, with probability p the edge is added, and if added the blocking graph is updated to prevent small cycles.

When the queue is exhausted, the second queue is randomly shuffled and the roles of the two queues are switched. If a pass through the queue fails to add any edges to the graph, the soft limit is increased. The program terminates when either the soft limit exceeds the hard limit, or the queues become empty.

5.3 Geometric

Johnson et al introduce the class of geometric graphs in Optimization by Simulated Annealing: An Experimental Evaluation; Part {II}, Graph Coloring and Number Partitioning, Operations Research 39:378-406, 1991. These graphs are created by uniformly distributing n pairs of numbers (x,y) in the range 0<=x,y<1. Vertices in the graph U{n,r} correspond to these points in the plane, and are joined by edges exactly when the Euclidean distance between the points is less or equal the input radius r. Their description of these mentions that these graphs have ``inherent structure and clustering''.

In our system, we generalize this class. We allow the dimension to vary from D=1 upwards. Then the user is asked for the distance r which determines whether an edge is to be assigned. As in Johnson et al, there are two complementary versions: In the first a pair of vertices is joined by an edge if the D-dimensional Euclidean distance is less than r and the vertices are in distinct partition elements. In the second, the vertices are joined by an edge if the distance is greater than r (and the vertices are in distinct partition elements).

The program asks for a wrap parameter that can be set if desired. If wrap is set, then the definition of distance used is the minimum of the distance between the vertices and the distance that would result by identifying opposite facets of the hypercube. Otherwise we use the usual definition of distance. The reason for this parameter is to reduce the variance in degree that occurs because some vertices are near corners or edges of the space.

Finally, the program is capable of generating an output file readable by ``gnuplot'', if the dimension is D=2 or D=3. The user is prompted for this option.

5.4 Weight Biased Graphs

Weight Biased graphs are designed to restrict the development of larger cliques while allowing smaller ones, and at the same time to maintain flatness criteria, i.e. to reduce large variations in structure that might aid an algorithm. These graphs appear to be at least as hard, if not harder to color than the Flat Graphs.

The first parameter for this class is the edge probability p. This is used to terminate the selection of edges if and when the number of selected edges exceeds pn(n-1)/2. For many settings this bound is never reached.

The next parameter is an integer weight W. This weight is assigned to all pairs of vertices which are not in the same hidden independent set. Pairs in the same set are assigned a weight of 0. Pairs are selected as edges with probability proportional to their weight.

The weights are adjusted under the control of two further parameters, alpha and gamma. When a new edge is selected by the program, gamma indicates the amount of change applied to every pair of vertices incident to the new edge, if the pair has not yet been selected as an edge. Alpha only applies to those not yet selected pairs incident to the new edge which would form a triangle with the new edge and some other previously selected edge.

The adjustment parameters, alpha and gamma, may either be applied multiplicatively or additively. The method of application is requested before the parameter itself.

The reason for allowing multiplicative adjustments was to take the most advantage of the proportional selection. For example, suppose we set W=256 and alpha=0.5. (Ignore gamma for now). Then when a new edge is added, every pair that could form a triangle with this new edge and some other old edge would have its weight cut in half. Thus, on the next selection these pairs would have only half the likelyhood of being selected. After a pair was decremented 9 times, its weight would be zero, (due to truncation) and so would no longer be eligible as an edge.

Although this would allow cliques of size 10, and no larger (to see this consider the last edge added to a clique), in practice the maximum clique size would be far less than 10. This is because the probability is very high that many different incident pairs of edges will be set before a pair of vertices is selected.

If multiplicative controls greater than one are applied, then the generator will induce clustering and larger cliques. The reader is encouraged to imagine and then test what will happen if one control is set to increase the weights, while the other is set to decrease them.

In practice, using an additive weight of -1 for alpha and additive zero for gamma(which means it has no effect) the graphs are very flat with respect to degree variation. These are the ones we have tested so far and found to be very difficult to color well.

The edge selection process will be terminated if the total weight becomes zero.

The data structure used to store and select the edges by weight is an implicit tree similar to that used in binary heaps. This allows the updating of any pair's weight and the selection of a pair proportional to its weight in logarithmic time. This is important when one considers that the algorithm performs O(n^3) updates and O(n^2) selections over n choose 2 pairs.

Weight biased graphs can require a significant amount of CPU time to complete.

5.5 Clique Driven Graphs

For this class, the generator creates cliques of given size. Each h-clique is generated by randomly selecting h color classes, and then randomly selecting one of the vertices of each of the selected classes and joining every pair by an edge. Each clique is generated independently, so edges will get repeated; that is the total edges assigned will usually be less than the number of cliques times the number of edges in a clique.

The first required input is to indicate whether the color classes are to be selected in proportion to their sizes or not. The remainder of the input is a series of pairs of numbers, the first being a number of cliques to generate, and the second the size. An entry of 0,0 terminates the process.

5.6 Cycle Driven Graphs

This is similar to the clique driven graphs, but instead of cliques it generates a set of cycles of specified length. The algorithm generates an h-cycle by randomly generating a non-self-intersecting path, with each vertex from a different color class than the last. When h-1 vertices have been chosen, the last vertex must be in a different color class than the first and the h-1th.

The selection process is done by randomly selecting a vertex and deciding whether it is available for the current cycle. If not, another probe is made. The first input request asks the user to specify how many probes should be made before ending the attempt to extend the current cycle. This is used because it is possible to specify very long cycles which it may not be possible to complete, or the user may wish not to spend the CPU time for very intensive attempts, being satisfied with long paths.

The rest of the inputs are pairs of numbers as in the clique driven graphs, except the size now indicates the cycle length.

6 Conditions of Use

Copyright by Joseph Culberson. Permission is granted to use, copy or modify this program freely for non-commercial purposes with appropriate attribution. Please send corrections, ideas for improvement, interesting results and random thoughts to joe@cs.ualberta.ca.

DISCLAIMER OF WARRANTY. Because the Software is a research work, it is provided "AS IS" WITHOUT WARRANTY OF ANY KIND AND WITHOUT ANY SUPPORT SERVICES. This program was developed on SUN SPARC workstations running UNIX using the Gnu C compiler. It may or may not be portable to other systems.

Thanks

Programmed by Adam Beacham, Denis Papp and Joe Culberson. Thanks to Jonathan Baldwin for a first version of the Geometric Graph generator code.

Joseph Culberson
June 15, 1995