On mobile you can pan around with touch, and zoom in and out by pinching. Tap a node to see info about it, and double tap a node to see info about it and the labels of surrounding nodes.
Mobile currently can only access the first grouped level in 2D (2D 1G). Mobile 3D controls are difficult to implement, and there is not enough space to put a "switch level" or "switch to grouped mode" button.
On desktop the controls should pop up on the right.
The graphs include all of the seed accounts, all the people the seed accounts follow, and the connections between them all. To make it interesting, I excluded all accounts with > 100k (sometimes more, 200k, 300k) followers. To speed up the scraping, everyone who follows > 10k people are filtered. Some moderate filtering is then applied to get rid of outliers (like, running the HITS algorithm and removing lowest x percentile of authority).
The default graph, Supergraph 68, has the combined ego networks of 68 Twitter accounts. I chose the seed accounts based on my own personal curiosity and some suggestions on Twitter. The network is therefore obviously biased. Although, if one tried to be all-inclusive one would reach a limit of comprehensibility (and hardware). At 34k nodes and 3m edges the Supernode 68 graph is already hard to interpret.
The colors correspond to communities (detected algorithmically). There are many community detection algorithms out there; each will give different results (sometimes drastically). Take it with a grain of salt. Nonetheless, these algorithms apparently give useful results in many cases.
The community detection algorithm only considers who follows whom. It does not look at tweets, profiles, follow dates, names, locations, etc
The lines are colored with the color of the node the line is pointing to. I.e., the color of the node being followed.
The sizes are proportional to a 'centrality' measure called PageRank. Roughly, consider it the node's 'influence' or 'status' inthis network. This is based only on looking at who follows whom inthis network. Less roughly, it's defined recursively as something like: the more people of large 'influence' who follow a node, the larger the 'influence' of the node. More specifically, the more selective people of large 'influence' who follow a node, the larger the 'influence'. I.e., if a person has large influence but follows people easily, their follow 'counts' for less.
In grouped mode the size of the supernodes is proportional to the sum of the PageRanks of its subnodes.
The nodes are positioned in such a way that the network appears aesthetically pleasing (e.g., minimizing edge overlaps, making edge lengths similar) which hopefully highlights the graph's structure, e.g. communities are close together.
The layout algorithm knows nothing about the colors (communities from the community detection algorithm). They are computed completely separately.
The most common algorithm type is a force-directed layout: the algorithm models the nodes and edges like a physical system. The nodes are like repulsive particles and the edges are like springs; these fight each other until equilibrium is reached.
In the base graph the lines represent follows between accounts. It's not visible in the visualization, but the follows are directional. Direction is taken into account for the color, size, and position.
For clarity purposes, not every edge is shown. If every edge were included in the visualization it would look like a huge hairball. In fact, most edges are filtered for the visualization. The edges which are included are chosen to be the ones which most highlight the community structure (see the advanced section for details). Although, again, this algorithm is completely independent from the community detection algorithm (it knows nothing about the colors).
In the grouped mode the lines represent aggregate follows from one supernode to another supernode. The strength (weight) of the edge from Node A to Node B is given by the number of follows from A to B divided by the total possible number of follows from A to B. The edges are then filtered based on that weight so as to only show stronger connections. Then, the edges are filtered again in the same way as the base graph.
Grouped mode allows you to see communities within communities. First, people are put into communities. Then, for the next level, those communities are put into communities. Then, for the next levelthose communities are put into communities. And so on.
See this diagram
Each node (well, supernode) in the grouped graph represents an entire community. You can then see how those communities are connected. As you increase the level the supernodes get swallowed up into bigger supernodes.
The color represents the coarsest level of community. The human eye can't easily distinguish many colors, so I've chosen the level of granularity for the color to be <= 36 total colors. By visualizing with supernodes we can work around our inability to distinguish colors.
Notice that any particular person will have the same color no matter whether you're looking at the base graph or any of the grouped levels.
Tl;dr: For a very large graph, 36 colors isn't very much. If you want to see finer community distinctions, look at the grouped levels.
I'm using the word "community" because it's evocative. Properly speaking, they're called "blocks".
The kind of structure where everyone mostly "follows each other" is called assortative structure. The model I'm using which infers these underlying blocks is able to detect general structure, not just assortative. See here and here
Say I drop coders, poets, and journalists onto a desert island. After a month I encode their friendships as a graph. How do you expect their friendships to be structured? I would guess that each group is dense within itself. And, poets and journalists are likely to be friends, and coders aren't particularly likely to be friends with poets or journalists. It's conceivable, for example, that poets wouldn't like other poets (too many artistic differences). Then, poets as a block would be distinguished by: unlikely to be friends with other poets, unlikely to be friends with coders, likely to be friends with journalists.
Consider the situation where there are 3 accounts who each follow 0 people but are followed by most people in the network. In this situation, the 3 "leader" accounts are structurally identical, even though they don't follow each other (or, indeed, anyone). Properly speaking this relationship is 'stochastic': there is high probability any given person in the network will follow one of the 3 leaders and a 0 probability they will follow anyone else.
Generally, members of a given block can have similar stochastic relationship to other blocks without being densely connected within the block.
Although this stochastic blockmodel concept is proven to be effective, it's not without its limits. There are theoretical results which show that we can generate artificial networks which we know have this type of block structure but which we can't recover by inference.
Anyway, this inference approach tries to find the block configuration which most likely generated the underlying connections in the network. This is an NP-hard optimization problem. There may be many different configurations which have approximately equal probability. See here
Also, the networks I'm considering aren't "complete". I'm just looking locally around a chosen set of seeds. A given person may appear to be periphery for a particular network but appear to be core for another. Blocks are only distinguishable with respect to other blocks.
In-degree = number of people in this network who follow account
Out-degree = number of people in this network who this account follows
I made it up. It's supposed to represent the person's influence in this network relative to their overall influence (in all of Twitter). It is the person's PageRank (read: 'influence' in this network) divided by their follower count.
Those are just phrases which appear frequently in the Twitter bios of people in that supernode. It's simply looking at the descriptions, not: tweets, locations, etc. It's pretty dumb: it doesn't distinguish between "art" and "artist" or even "artist" and "artists".