• Home
  • RSS
  • Github

R, D3.Js And Sna Course

April 05 2013   

I took the SNA course by Lada Adamic in coursera. It's a super interesting course. In fact, I was using the networks only how a visualization tool, and that is what it make me little bit embarrassing because there are more, a lot of more. You can detect communities, know the more centric nodes and a lot of other information. So, there are a lot of reasons to look the course.

By other hand I like the d3 javascript library. Recently I was learning javascript, so I decided make a very little app to keep learning this library and show differents measures of centrality for each node in a set of 4 toy networks and see these measures by size, color or a label. 

In this respositoy you can find the R code which create the networks, and export the data in json format.

 

What Does It Say About R?

February 18 2013   

The last post I show a way to plot a gexf file in R using the rgexf package and the Sigmajs library. Now we need some data to use that piece of code. So I've decided obtain the tweets about R. For this I've used the twitteR package and search "#rstats", then clean the texts and extract all the hashtags. Then find the associations between following the next simple rule: if a tweet said: "#rstats and #data are my drugs" this two hashtags are related. Then put some graphics attributes like size of the node according the quantity of mentions, and some random to make the graph more attractive.

Finally make te code run and see the result ;).

There are many tweets about #python in the #rstats 's tweets!. It is obvious see many tweets about data (#data, #bi, #datamining, #bigdata, etc). In other hand there are conversations about #sas, #matlab, and #sastip, and so on.


library(twitteR)
library(stringr)
library(plyr)
library(sna)


# Some tweets about R
tweets <- tolower(twListToDF(searchTwitter(searchString="#rstats", n=1500))$text)
head(tweets)
hashtags_remove <- c("#rstats", "#r")

# Cleaning the tweets
for(term in hashtags_remove) tweets <- gsub(term, "", tweets)

# Extract the hastags
hashtags <- unique(unlist(str_extract_all(tolower(tweets), "#\\w+")))
hashtags <- setdiff(hashtags, hashtags_remove)

# Capture the node size according the amount that appear
nodesizes <- laply(hashtags, function(hashtag){
  sum(grepl(hashtag, tweets))
})


# scaling  sizes
nodesizes <-  1 + log(nodesizes, base = 3)


nodes <- data.frame(id = c(1:length(hashtags)), label = hashtags, stringsAsFactors=F)

#
relations <- ldply(hashtags, function(hashtag){
  hashtag_related <- unlist(str_extract_all(tweets[grepl(hashtag, tweets)], "#\\w+"))
  hashtag_related <- setdiff(hashtag_related, hashtag) 
  if(length(hashtag_related)==0){
    return(data.frame())
  }
  data.frame(source = which(hashtags==hashtag),
             target =  which(hashtags %in% hashtag_related))
})

# Is an undirected graph! So remove the duplicates
for(row in 1:nrow(relations)){  
  relations[row,] <- sort(relations[row,])
}

relations <- unique(relations)


# Some colors
nodecolors <- data.frame(r = sample(1:249, size = nrow(nodes), replace=T),
                         g = sample(1:249, size = nrow(nodes), replace=T),
                         b = sample(1:249, size = nrow(nodes), replace=T),
                         a = 1)


links <- matrix(rep(0, length(hashtags)^2), ncol = length(hashtags))
for(edge in 1:nrow(relations)){
      links[(relations[edge,]$target), (relations[edge,]$source)] <- 1
}

positions <- gplot.layout.kamadakawai(links, layout.par=list())
positions <- cbind(positions, 0) # needs a z axis



graph <- write.gexf(nodes=nodes,
                    edges=relations,
                    nodesVizAtt=list(
                      color=nodecolors,
                      size=nodesizes,
                      position=positions))

plot.gexf(graph)

Having Fun With Rgefx Package And Sigmajs In R

February 12 2013   

The las week I knew the r package rgexf made by George Vega Yon. Rgexf is a R library to work with GEXF graph files. This type of files allow represent networks in a xml. So, if you have a list of nodes and a data frame of edges (source-target) you can obtain a gexf file with write.gexf function. Then you have a gexf file which can open with Gephi or use with Sigma.js to show via web. Simple, right?

Now, if you work with your data and want to visualize your gexf object you must open Gephi or create a simple http server to view your chart using SigmaJS (this is because for security a reason, I guess, like this). For not do this extra step we can use the Rook  R library for print gexf objects more quickly and easy.

Basically we need to make a RHttp element with 2 apps. One app for show the plot (using SigmaJS) and another app to access the data made by write.gefx. We need a template for made the first app, I used the example founded here and I modified a little (here, "save link as"), this index.html must be in your working directory. Now the function is:

 plot.gexf <- function(gexf.object){
      library(Rook)
      graph <- gexf.object$graph
      s <- Rhttpd$new()
      s$start(listen='127.0.0.1')
      my.app <- function(env){
            res <- Response$new()
            res$write(paste(readLines("index.html", warn=F), collapse="\n"))
            res$finish()
      }

      s$add(app=my.app, name='plot')

      my.app2 <- function(env){
            res <- Response$new()
            res$write(graph)
            res$finish()
      }

      s$add(app=my.app2, name='data')
      s$browse('plot') 
}

Now we need to create the gexf object.


nNodes <- 100
nRelations <- 200

nodes <- data.frame(id = c(1:nNodes),
                    names = c(1:nNodes))


allrelations <- as.data.frame(t(combn(nNodes, 2)))
relations <- allrelations[sample(1:nrow(allrelations),
                                 size = min(c(nRelations, nrow(allrelations)))),]
names(relations) <- c("target", "source")

nodecolors <- data.frame(r = sample(1:249, size = nrow(nodes), replace=T),
                         g = sample(1:249, size = nrow(nodes), replace=T),
                         b = sample(1:249, size = nrow(nodes), replace=T),
                         a = runif(nrow(nodes), min=.5, max=1))


nodesizes <- sample(50:500, size=nrow(nodes), replace=T)
edgethicks <- sample(50:500, size=nrow(relations), replace=T)

This is all random, the color, sizes, etc. But if the position of each node is random the visualization will not achive its purpouse: find agglomerations or find groups of nodes more closely between them. For this reason we can use the sna package to find an optimal layout to show the nodes depending the links between them (if you know other packages with more algorithms please email me!).

links <- matrix(rep(0, nNodes*nNodes), ncol = nNodes)
for(edge in 1:nRelations){
      links[(relations[edge,]$target), (relations[edge,]$source)] <- 1
}

library(sna)

positions <- gplot.layout.mds(links, layout.par=list())

positions <- cbind(positions, 0) # needs a z axis

Finally we create the graph with the parameters and we plot it!

graph <- write.gexf(nodes=nodes,
                    edges=relations,
                    nodesVizAtt=list(
                      color=nodecolors,
                      size=nodesizes,
                      position=positions
                    ),
                    edgesVizAtt=list(
                      thickness= edgethicks
                    ))
                    
plot.gexf(graph)

And you'll obtain something like this live example. Have fun ;)!

plot.gefx.example

Update:

The are many algorithms to find layout of the network in the sna package

# positions <- gplot.layout.adj(links, layout.par=list())
# positions <- gplot.layout.circle(links, layout.par=list())
# positions <- gplot.layout.circrand(links, layout.par=list())
# positions <- gplot.layout.eigen(links, layout.par=list())
# positions <- gplot.layout.fruchtermanreingold(links, layout.par=list())
# positions <- gplot.layout.geodist(links, layout.par=list())
# positions <- gplot.layout.hall(links, layout.par=list())
# positions <- gplot.layout.kamadakawai(links, layout.par=list())
positions <- gplot.layout.mds(links, layout.par=list())
# positions <- gplot.layout.princoord(links, layout.par=list())
# positions <- gplot.layout.random(links, layout.par=list())
# positions <- gplot.layout.rmds(links, layout.par=list())
# positions <- gplot.layout.segeo(links, layout.par=list())
# positions <- gplot.layout.seham(links, layout.par=list())
# positions <- gplot.layout.spring(links, layout.par=list())
# positions <- gplot.layout.springrepulse(links, layout.par=list())
# positions <- gplot.layout.target(links, layout.par=list())
Page 1 of 3