Tag Archives: D3.js

About 43000 results

November 8, 2015 Tuija

Since few days now, I’ve had my Google search archive with me. In my case, it’s a collection of 38 JSON files, containing search strings and timestamps. The oldest file dates back to mid-2006, which acts as a digital marriage certificate of us, me and the Internet giant.

It took no more than 15 minutes for Google to fulfill my wish to get the archive as a zipped file. For more information on How & Where, see e.g. Google now lets you export your search history.

Now, this whole archive business started when I was led to a very nice blog posting by Lisa Charlotte Rost
.

I find it fascinating, what you can tell about a person just by looking at her searches. Or rather, what kind of narratives s/he builds upon them; to publish all search strings verbatim is not really an option.

Halfway in the 4-week course Intermediate D3 for Data Visualization, the theme is stacked charts. Maybe I could visualize, on a timeline, as a stacked area chart, some aspects of my search activity. But what aspects? What sort of person am I as a searcher?

Quite dull, I have to admit. No major or controversial hobbies, no burning desire to follow latest gadgets, only mildly hypocondriac, not much interest at all in self-help advisory. Wikipedia is probably my number one landing site. Very often I use Google simply as a text corpus, an evidence-based dictionary:”Has this English word/idiom been used in the UK or did I just made it up, or misspelled?” Unlike Lisa, who tells in episode #61 of the Data Stories podcast that now when she lives in a big city, Berlin, she often searches for directions – I do not. Well, compared to Berlin, Helsinki do is small, but we also have a superb web service for guiding us around here, Journey Planner. So instead of a search, I’ll go straight there.

One area of digital life I’ve been increasingly interested in – and what this blog and my job blog reflect, too, I hope – is coding. Note, “coding” not as in building software but as in scripting, mashupping, visualizing. Small-scale, proof-of-concept data wrangling. Learning by doing. Part of it is of course related to my day job at Aalto University. For example, now when we are setting up a CRIS system, I’ve been transforming, with XSLT, legacy publication metadata to XML. It needs to validate against the Elsevier Pure XML Schema before it can be imported.

A few years now, appart XSLT, the other languages I have been writing with, are R and Perl. Unix command line tools I use on a daily basis. Thanks to the D3 course, I’m also slowly starting to get familiar with JavaScript. Python has been on my list a longer time, but since the introductory course I took at CSC – IT Center for Science some time ago, I haven’t really touched it.

I’m not the only one that googles while coding. Mostly it’s about a specific problem: I need to accomplish something but cannot remember or don’t know, how. When you are not a full-time coder, you forget details easily. Or, you get an error message you cannot understand. Whatever.

Are my coding habits visible in the search history? If yes, in what way.

First thing to do with the JSON files, was to merge them into one. For this, I turned to R.

library(jsonlite)
 
filenames <- list.files("Searches", pattern="*.json", full.names=TRUE)
jsons.as.list <- lapply(filenames, function(f) fromJSON(txt = f))
alljson <- toJSON(jsons.as.list)
write(alljson, file = "g.json")

Then, just as Lisa did, I fired up Google Refine, and opened a new project on g.json.

To do:

add Boolean value columns for JavaScript, XSLT (including XPath), Python, Perl and R by filtering the query column with the respective search string
convert Unix timestamps to Date/Time (Epoch time to Date/Time as String). For now, I’m only interested in date, not time of day
export all Boolean columns and Date to CSV

From the language names, R is the most tricky one to filter because it is just one character. Therefore, I need to build a longish Boolean or sentence for that.

Here I’m ready with R and Date, and checking the results with a text facet on the column r.

Thanks to a clearly commented template by the D3 course leader, Scott Murray, the stacked area chart was easy to do, but only after I had figured out how to process and aggregate yearly counts by language. Guess what – I googled for a hint, and got it. The trick was, while looping over all rows by language, to define an object to store counts by year. Then, for every key (=year), I could push values to the dataset array.

Do the colors of the chart ring a bell? I’m a Wes Anderson fan, and have waited for an excuse to make use of some of the color palette implementations of his films. This 5-color selection represents The Life Aquatic With Steve Zissou. The blues and browns are perhaps a little too close to each other, especially when used as inline font color, but anyway.

Quite an R mountain there to climb, eh? It all started during the ELAG 2012 conference in Palma, Spain. Back then I was still working at the Aalto University Library. I had read a little about R before, but it was the pre-conference track An Introduction to R led by Harrison Dekker, that finally convinced me that I needed to learn this. I guess it was the easiness of installing packages (always a nightmare with Perl), reading in data, and quick plotting.

So what does the big amount of R searches tell? For one thing, it shows my active use of the language. At the same time though, it tells that I’ve needed a lot of help. A lot. I still do.

Logo of Mikkeli as a D3 learning object

September 4, 2015 Tuija

After three weeks of the online course Data Visualization and Infographics by Knight Center I’m still very happy that I decided to participate. Alberto Cairo and Scott Murray make a good duo. It’s refreshing to hear Alberto’s solid, learned opinions, and Scott has this talent of lowering the barriers to learning.

This week, we have mostly been binding data to SVG rect elements. Related to this, as a side project of my own, I’ve practised the topic with the logo of the city of Mikkeli. It’s a delightfully colourful skyline, built from rectangles of the same size, with a color palette size of 8. The logo appears on the web site in two places: a tiny one in the header

and a much bigger one in the footer.

First, colors. To get values that’d be at least to the right direction, I took a screenshot of the page, uploaded it to Gimp, and copied the hex values that the color picker tool returned.

Apologies for the graphic designers of the logo! It may well be that the logo follows the CMYK color model, not RGB at all.

Next, data. I pasted the hex values to a spreadsheet, reflecting the 5 x 32 rectangle structure of the logo. Background of the final web page would be ghostwhite (#f8f8ff) so I choose the same to fill the sky behind the skyline, although Mikkeli has #ffffff on their page. Sorry about that, too.

As an example, in the screenshot below, I’ve highlighted the cells that represent the first greenish block of rectangles, starting from the left.

When I had data in CSV, it was time to figure out how to bind them to SVG elements.

It took me quite some time to realize that uploading data with d3.csv(), the way we had done in the course so far, and which seemed legit in my case here too, does not in fact preserve the order of the original spreadsheet columns. The result is an array of objects, but the order of them is not defined. Thanks to good advice, I reverted to d3.text() in uploading, and then parsed data with d3.cvs.parseRows() into an array of arrays – and there is order!

After this step, the rest was relatively easy. Note, relatively. I had reasoned though, that I better wrap every row of rects inside a group, so that I could define new coordinates for them in one go. First row would start from x=0 y=0, second from x=0 y=[height of rect], third x=0, y=[height of 1+1 rect], etc.

The way the horizontal positioning of rectangles is determined within groups, is explained (much better than I could do) by a helpful StackOverflow member like this:

Then, for each group, you can add the rectangles, adjusting the horizontal position only (the vertical position was already set in the containing group). Here, you need to bind the inner array elements to each rectangle.

Thank you, Mikkeli! Here is my take on your skyline. All the best!

Network once again, now with YQL!

April 3, 2014 Tuija

While fiddling with the Facebook network, GEXF and JSON parsing I remembered Yahoo! and its YQL Web Services. With it, you can get a JSON-formatted result from any, say, XML file out there. GEXF is XML.

The YQL query language isn’t that handy if you are interested only in a selection of nodes; the XPath filter is only for HTML files, curiosly enough. I wanted the whole story though, so no problem. Here is how the YQL Console shows the result:

With the REST query down below, you can e.g. transfer the JSON result to your local machine, in Unix like curl 'http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20xml%20where%20url%3D%22http%3A%2F%2Fusers.tkk.fi%2Fsonkkila%2Fnetwork%2Ffbmini.gexf%22&format=json&callback=' > gexf.json

The structure is more deep than in the JSON that the Cytoscape D3.js Exporter returns, but the only bigger change the D3 code needs is to have new references from the links/edges to nodes.

Like the documentation of force.start() says,

On start, the layout initializes various attributes on the associated nodes. The index of each node is computed by iterating over the array, starting at zero.

This is fine, if the source and target attributes in the edge array apply to this. Here, they do not. Instead, the attributes reference the id attribute in the respective nodes. So I needed to change that, and excellent help was available.

So far so good, but using index numbers to access attribute values isn’t pretty and needs to be done differently. Maybe next time.

Deconstructing Facebook network

April 2, 2014 Tuija

The other day I noticed this tweet about Cytoscape’s D3.js Exporter.

D3.js Exporter released for Cytoscape 3.1.0. You can export Cytoscape networks and tables as D3 compatible JSON: http://t.co/ebWvOxmWPT

— Cytoscape (@cytoscape) March 27, 2014

Because I am currently learning the basics of D3, this sounded interesting to look at more closely.

Cytoscape is a tool for visualizing networks. While Gephi is well known in this area, Cytoscape is not, at least not for me. The first time I heard of it was while watching Data Literacy and Data Visualization, a great collection of videos I mentioned last time.

A year ago, I wrote in a brief post, how I put Facebook friends on a network graph – a common visualization those days. How would the same data look like in SVG?

I didn’t want to repeat the whole process, but to continue from the GEXF file. Cytoscape does not support this markup language by Gephi in import. However, another XML-based language, GraphML, is on the list. So, I read the GEXF file back in Gephi, exported in GraphML, and imported that one in Cytoscape.

By default, Cytoscape presents the network as a grid. Following the advice from Ohio, I applied the preferred layout (F5). After installing the D3.js Exporter in App Manager, the data was ready for a JSON export.

Mike Bostock, a central figure behind D3, has an extensive collection of examples in his gallery. One of them is on force-directed graphs, and that was exactly what I was after. All I did to get the first version of my D3.js Facebook network, was that I changed the file name in the d3.json() function that imports the data. That was easy!

In this graph, the node labels are numbers and all nodes of the same color and size. Time to change these to something more visually interesting, and perhaps more informative.

Gephi’s community detection algorithm had provided numbers for the nodes, and stored them in the Modularity_Class attribute. This is an obvious choice for the script when it’s time to decide, in which colour the circles ought to be filled. The name of the node should not be the name in my case, but the tiny version of the full name in label. What about the size of the nodes? Of all attributes available, I decided to try Betweenness_Centrality. Note that you will not find this and a couple of other attributes in the original GEXF; I added them this time by letting Gephi calculate the respective values.

{
  "nodes" : [ {
    "id" : "10162",
    "SUID" : 10162,
    "In_Degree" : 16,
    "PageRank" : 0.010341513363002573,
    "Weighted_In_Degree" : 16.0,
    "Weighted_Degree" : 32.0,
    "selected" : false,
    "name" : "100003621746564",
    "Clustering_Coefficient" : 0.44166666,
    "shared_name" : "100003621746564",
    "Betweenness_Centrality" : 1434.8632653485495,
    "Eigenvector_Centrality" : 0.18212450755372586,
    "etusuku" : "J K",
    "g" : 184,
    "b" : 47,
    "Out_Degree" : 16,
    "label" : "JK",
    "size" : 52.0,
    "Modularity_Class" : 4,
    "r" : 47,
    "Weighted_Out_Degree" : 16.0,
    "Degree" : 32,
    "Eccentricity" : 7.0,
    "y" : 111.5109,
    "Closeness_Centrality" : 2.925,
    "x" : 412.6945
  }

The new version shows now the modularity classes in different colors, and the label pops up as a tooltip when you hover of the circle.

The proportional size of the node tells, which of my friends act as “bridges” more than the others do. The normalization is done with a power scale function d3.scale.sqrt(), thanks to Mike’s advice a while back. Contrary to his words though, I put the lower bounds to 2 and also tweaked the data. In some nodes, the value of this attribute is 0.0, and these nodes vanish altogether. Not the best way to deal with the issue I gather. Perhaps I should have left these nodes out of the exercise altogether?

Suoritin III

Tag Archives: D3.js

About 43000 results

Logo of Mikkeli as a D3 learning object

Network once again, now with YQL!

Deconstructing Facebook network

Semi-automatic IT at home, and travel diaries (some in Finnish)