Running Clairlib on the CLIC machines


Clairlib is located on the cs accounts via /home/cs6998/clairlib/ Note: this is the cs6998 account *not* coms6998

Scripts

The following scripts should work on the clic accounts:

/home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/convert_network.pl
/home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/print_network_stats.pl

Prior to running the scripts set the following: (these can be permanently set in the .profile file in your home directory)

IN CLIC ACCOUNT: export PERL5LIB=/home/cs6998/clairlib/clairlib-fall07/clairlib-core/lib/:/home/cs6998/clairlib/perl-fall07/:/home/cs6998/clairlib/perl-fall07/lib/perl5/site_perl/5.8.5/
IN CLIC ACCOUNT: export PATH=$PATH:~cs6998/clairlib/clairlib-fall07/clairlib-core/util/:

The usage of the scripts are demonstrated using italian_network.net and comic_porgat.net

Italian Network

Running on CLIC machines

The network can be viewed at: italian_network.net. The network is in Pajek format. Use convert_network.pl script to convert to edgelist format: Note: Edgelist files have the extension .graph

IN CLIC ACCOUNT: /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/convert_network.pl --output-format edgelist --input italian_network.net --input-format pajek --output italian_network.graph

The edgelist format is simply a list of edges. Therefore, nodes without edges are ignored (In this case, PUCCI or 1 is ignored). Since it is undirected, the same edge is stored in both directions (ex: 11 10, 10 11).

The edgelist format can be viewed at: italian_network.graph

The edgelist graph can be used to run print_network_stats.pl. To see the possible options, run print_network_stats.pl without any arguments. In the command shown below, the -i option is for the input file (italian_network.graph) and the -u option is used to imply that it is an undirected graph. --all is used to display results for all options. The output includes basic statistics, shortest paths, connected components, triangles, and clustering. The output is broken up below with further descriptions.

IN CLIC ACCOUNT: /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/print_network_stats.pl -i italian_network.graph -u -all

Basic Statistics

Network information:
  nodes: 15
  edges: 19
  diameter: 5
  average degree: 1.27
  largest connected component size: 15
  Degree statistics:
    degree stats:
      power law exponent: 2.52 r-squared: 0.77
      not a power law relationship (p = 1.95)
  Newman power law exponent: 3.54, error: 1.22
  Clustering:
    Watts Strogatz clustering coefficient: 0.1111
  Newman clustering coefficient: 0.1500
  clairlib avg. directed shortest path: 2.53
  Ferrer avg. directed shortest path: 2.36
  harmonic mean geodesic distance: 2.33
  Assortativity: -0.37

Shortest Paths (--paths option)

The matrix below lists all shortest paths for example, the shortest path from 12 to 16 is 2: 12 -> 15 -> 16

Shortest paths:
1011121314151623456789
10012122333221121
11103233444332232
12230231232231343
13122011232332232
14233102343443343
15231120121232343
16342231032343454
2343342301112254
3342231210122354
4232342311011243
5233343412102143
6121232322120232
7123233423212032
8234344555443301
9123233444332210

Connected Components (--components option for connected, --scc option for strongly connected and -wcc option for weakly connected)

Strong and weak connected components are for directed graphs only.

Connected components: 6 12 15 16 13 14 10 7 5 4 3 2 9 8 11

Triangles (-t or --triangles option)

In this case 2 4 5 is a triangle because 2 -> 4 -> 5 -> 2

Triangles (2 triangles, 40 triples):
2 4 5
2 3 4

Local Clustering Coefficient (--localcc option)

This can be viewed in pajek by loading the graph and then choosing Net -> Vector -> Clustering Coefficients. Draw the resulting vector and choose Options -> Mark Vertices Using -> Vector Values

Local clustering coefficient for each connected vertex:
6 0
11 0
3 0.333333333333333
7 0
9 0
12 0
2 0.666666666666667
15 0
14 0
8 0
4 0.333333333333333
10 0
13 0
16 0
5 0.333333333333333

Superheroes Network

Running on CLIC machines

The network can be viewed at: comic_porgat.net. The network is in Pajek format. As with the italian network, convert superheroes network to graph format:

IN CLIC ACCOUNT: /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/convert_network.pl --input comic_porgat.net --input-format pajek --output comic_porgat.graph --output-format edgelist

When attempting to run in print_network_stats.pl we can not obtain much information due to network size:

IN CLIC ACCOUNT: /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/print_network_stats.pl -i comic_porgat.graph -u

Network is too large (12478 > 2000 nodes) (77844003 > 4000000 edges), please use sampling

Since the network size is very large, we use a sampling to analyze the network. As before, the -i option is used for the network file name, -u to imply undirected, and -all to run all options. In addition the --sample option is used to select a sampling of nodes, in this case 1000 edges. The graph generated for the sample is also being stored for future use as comic_porgat_sample1000.graph using the --output option. The command and some output is below. Since the amount of output is large it is being stored in a seperate file. The entire output can be viewed at comic_porgat.output NOTE: this may take a few minutes to run. In addition, the output will not be identical since the sampling is chosen randomly.

IN CLIC ACCOUNT: /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/print_network_stats.pl -i comic_porgat.graph -u --sample 1000 --all --output comic_porgat_sample1000.graph > comic_porgat.output

Basic Statistics

Network information:
  nodes: 1727
  edges: 1000
  diameter: 5
  average degree: 0.58
  largest connected component size: 16
  Degree statistics:
    degree stats:
      power law exponent: 3.91 r-squared: 0.99
      not a power law relationship (p = 2.00)
      Newman power law exponent: 6.31, error: 0.94
  Clustering:
    Watts Strogatz clustering coefficient: 0.0000
    Newman clustering coefficient: 0.0000
  clairlib avg. directed shortest path: 1.49
  Ferrer avg. directed shortest path: 0.72
  harmonic mean geodesic distance: 1151.30
  Assortativity: -0.10

Other Useful Scripts

All these scripts can be found in /home/cs6998/clairlib/clairlib-fall07/clairlib-core-1.03.1/util/

Useful Links