Friday, March 23, 2012

pipe with rm

I wanted to delete a large number of files in a directory that satisfied a certain criteria in their name.  The command for this is:


find /path/to/dir -name '*criteria*' -exec rm {} \;

or



find /path/to/dir -name '*criteria*'| xargs rm

or


ls | grep criteria | xargs rm


Note: The last command might be slower than the first two.

Sunday, March 11, 2012

Plotting a random geometric graph using Networkx

I wanted to plot the random geometric graph as shown in networkx gallery with a few tweaks. I wanted to have two plots: 1) A plot of 600 nodes with nodes in only one color and 2) A similar plot of 600 nodes with few (75) nodes highlighted with a different color. I thought it would be an easy process if I download the code and update the code with a few additional lines to perform my work. Unfortunately, when I downloaded the code and executed it (without any changes), the code crashed on the second line with the following error codes:


Traceback (most recent call last):
  File "random_geometric_graph.py", line 6, in <module>
    pos=nx.get_node_attributes(G,'pos')
AttributeError: 'module' object has no attribute 'get_node_attributes'

I checked to see if I had the latest versions of NetworkX and Matplotlib and I did. No idea why a piece of code downloaded directly from the library's official website won't run. To be sure something wasn't wrong in the way libraries were linked in my machine, I decided to recheck the code on another machine . Same error and same crash.. No idea. Finally I decided to check the code in the library itself. Everything with get_node_attributes function seemed to be okay so I copied it to my working file and it failed again! Narrowing down, it looked like the code had problem accessing the node elements and thus the function crashed. Fast forward, I decided to copy the graph generation function and return the attributes simultaneously to a different variable. Here is my final code:



# -*- coding: utf-8 -*-
"""
Generators for geometric graphs.
"""
#    Copyright (C) 2004-2011 by 
#    Aric Hagberg <hagberg@lanl.gov>
#    Dan Schult <dschult@colgate.edu>
#    Pieter Swart <swart@lanl.gov>
#    All rights reserved.
#    BSD license.
from __future__ import print_function

__author__ = "\n".join(['Aric Hagberg (hagberg@lanl.gov)',
                        'Dan Schult (dschult@colgate.edu)',
                        'Ben Edwards (BJEdwards@gmail.com)'])

__all__ = ['random_geometric_graph',
           'waxman_graph',
           'geographical_threshold_graph',
           'navigable_small_world_graph']
from bisect import bisect_leftfrom functools import reducefrom itertools import productimport math, random, sysimport networkx as nx
import matplotlib.pyplot as plt
#---------------------------------------------------------------
#  Random Geometric Graphs
#---------------------------------------------------------------
        def random_geometric_graph(n, radius, dim=2, pos=None):
    r"""Return the random geometric graph in the unit cube.

    The random geometric graph model places n nodes uniformly at random 
    in the unit cube  Two nodes `u,v` are connected with an edge if
    `d(u,v)<=r` where `d` is the Euclidean distance and `r` is a radius 
    threshold.

    Parameters
    ----------
    n : int
        Number of nodes
    radius: float
        Distance threshold value  
    dim : int, optional
        Dimension of graph
    pos : dict, optional
        A dictionary keyed by node with node positions as values.

    Returns
    -------
    Graph
      
    Examples
    --------
    >>> G = nx.random_geometric_graph(20,0.1)

    """
    G=nx.Graph()
    G.name="Random Geometric Graph"
    G.add_nodes_from(range(n)) 
    if pos is None:
        # random positions 
        for n in G:
            G.node[n]['pos']=[random.random() for i in range(0,dim)]
    else:
        nx.set_node_attributes(G,'pos',pos)
    
 
    name = 'pos'
    position_data = dict( (n,d[name]) for n,d in G.node.items() if name in d)
    
    # connect nodes within "radius" of each other
    # n^2 algorithm, could use a k-d tree implementation
    nodes = G.nodes(data=True)
    while nodes:
        u,du = nodes.pop()
        pu = du['pos']
        for v,dv in nodes:
            pv = dv['pos']
            d = sum(((a-b)**2 for a,b in zip(pu,pv)))
            if d <= radius**2:
                G.add_edge(u,v)
                
    return G,position_data


n = 600
d = 0.0725


G,pos = random_geometric_graph(n,d)
print(n,'t',G.number_of_edges())
# find node near center (0.5,0.5)
dmin=1
ncenter=0
for n in pos:
    x,y=pos[n]
    d=(x-0.5)**2+(y-0.5)**2
    if d<dmin:
        ncenter=n
        dmin=d

# color by path length from node near center
#p=nx.single_source_shortest_path_length(G,ncenter)

highlighted_nodes = random.sample(xrange(n),75)
#print(highlighted_nodes)
p = {}

base_color = '#0F1C95'
trust_color = '#36D258'
for i in range(n+1):
        p[i] = base_color
        if i in highlighted_nodes:
                p[i] = trust_color

plt.figure(figsize=(9,9))
nx.draw_networkx_edges(G,pos,nodelist=[ncenter],alpha=0.4)
nx.draw_networkx_nodes(G,pos,nodelist=p.keys(),node_size=100,node_color=p.values(),cmap=plt.cm.Reds_r)

plt.xlim(-0.05,1.05)
plt.ylim(-0.05,1.05)
plt.axis('off')
plt.savefig('highlighted_graph.png')

plt.xlim(-0.05,1.05)
plt.ylim(-0.05,1.05)
plt.axis('off')
nx.draw_networkx_nodes(G,pos,nodelist=p.keys(),node_size=100,node_color=base_color,cmap=plt.cm.Reds_r)
plt.savefig('no-highlight_graph.png')



Reading unicode data through Python

Processing files containing unicode based characters requires using the codec library instead of the standard file processing libraries. The relevant code:


import codecs

file = codecs.open("file_with_unicode_data.txt", "r", "utf-8")
print file.readlines()
file.close()

Difference between two folders

Redirect output of difference between two directories, removing the system files (DS_Store, etc..):

diff -qr dir1 dir2 | grep -v -e 'DS_Store' -e 'Thumbs' | sort > diffs_dir1_dir2.txt