Twitterverse Visualization

FAQ

How do I use this? How do I move around?

On mobile you can pan around with touch, and zoom in and out by pinching. Tap a node to see info about it, and double tap a node to see info about it and the labels of surrounding nodes.

Mobile currently can only access the first grouped level in 2D (2D 1G). Mobile 3D controls are difficult to implement, and there is not enough space to put a "switch level" or "switch to grouped mode" button.

On desktop the controls should pop up on the right.

Who are in these graphs?

The graphs include all of the seed accounts, all the people the seed accounts follow, and the connections between them all. To make it interesting, I excluded all accounts with > 100k (sometimes more, 200k, 300k) followers. To speed up the scraping, everyone who follows > 10k people are filtered. Some moderate filtering is then applied to get rid of outliers (like, running the HITS algorithm and removing lowest x percentile of authority).

The default graph, Supergraph 68, has the combined ego networks of 68 Twitter accounts. I chose the seed accounts based on my own personal curiosity and some suggestions on Twitter. The network is therefore obviously biased. Although, if one tried to be all-inclusive one would reach a limit of comprehensibility (and hardware). At 34k nodes and 3m edges the Supernode 68 graph is already hard to interpret.

What do the colors mean?

The colors correspond to communities (detected algorithmically). There are many community detection algorithms out there; each will give different results (sometimes drastically). Take it with a grain of salt. Nonetheless, these algorithms apparently give useful results in many cases.

The community detection algorithm only considers who follows whom. It does not look at tweets, profiles, follow dates, names, locations, etc

The lines are colored with the color of the node the line is pointing to. I.e., the color of the node being followed.

What does it mean when colors are similar?

Similar colors don't indicate anything. The colors are meant to be unique.

What do the sizes of the nodes mean?

The sizes are proportional to a 'centrality' measure called PageRank. Roughly, consider it the node's 'influence' or 'status' inthis network. This is based only on looking at who follows whom inthis network. Less roughly, it's defined recursively as something like: the more people of large 'influence' who follow a node, the larger the 'influence' of the node. More specifically, the more selective people of large 'influence' who follow a node, the larger the 'influence'. I.e., if a person has large influence but follows people easily, their follow 'counts' for less.

In grouped mode the size of the supernodes is proportional to the sum of the PageRanks of its subnodes.

What do the locations of the nodes mean?

The nodes are positioned in such a way that the network appears aesthetically pleasing (e.g., minimizing edge overlaps, making edge lengths similar) which hopefully highlights the graph's structure, e.g. communities are close together.

The layout algorithm knows nothing about the colors (communities from the community detection algorithm). They are computed completely separately.

The most common algorithm type is a force-directed layout: the algorithm models the nodes and edges like a physical system. The nodes are like repulsive particles and the edges are like springs; these fight each other until equilibrium is reached.

What do the lines mean?

In the base graph the lines represent follows between accounts. It's not visible in the visualization, but the follows are directional. Direction is taken into account for the color, size, and position.

For clarity purposes, not every edge is shown. If every edge were included in the visualization it would look like a huge hairball. In fact, most edges are filtered for the visualization. The edges which are included are chosen to be the ones which most highlight the community structure (see the advanced section for details). Although, again, this algorithm is completely independent from the community detection algorithm (it knows nothing about the colors).

In the grouped mode the lines represent aggregate follows from one supernode to another supernode. The strength (weight) of the edge from Node A to Node B is given by the number of follows from A to B divided by the total possible number of follows from A to B. The edges are then filtered based on that weight so as to only show stronger connections. Then, the edges are filtered again in the same way as the base graph.

What is grouped mode? What are levels?

Grouped mode allows you to see communities within communities. First, people are put into communities. Then, for the next level, those communities are put into communities. Then, for the next levelthose communities are put into communities. And so on.

See this diagram

Each node (well, supernode) in the grouped graph represents an entire community. You can then see how those communities are connected. As you increase the level the supernodes get swallowed up into bigger supernodes.

The color represents the coarsest level of community. The human eye can't easily distinguish many colors, so I've chosen the level of granularity for the color to be <= 36 total colors. By visualizing with supernodes we can work around our inability to distinguish colors.

Notice that any particular person will have the same color no matter whether you're looking at the base graph or any of the grouped levels.

Tl;dr: For a very large graph, 36 colors isn't very much. If you want to see finer community distinctions, look at the grouped levels.

I don't know any of these people in my so-called "community". ???

I'm using the word "community" because it's evocative. Properly speaking, they're called "blocks".

The kind of structure where everyone mostly "follows each other" is called assortative structure. The model I'm using which infers these underlying blocks is able to detect general structure, not just assortative. See here and here

Say I drop coders, poets, and journalists onto a desert island. After a month I encode their friendships as a graph. How do you expect their friendships to be structured? I would guess that each group is dense within itself. And, poets and journalists are likely to be friends, and coders aren't particularly likely to be friends with poets or journalists. It's conceivable, for example, that poets wouldn't like other poets (too many artistic differences). Then, poets as a block would be distinguished by: unlikely to be friends with other poets, unlikely to be friends with coders, likely to be friends with journalists.

Consider the situation where there are 3 accounts who each follow 0 people but are followed by most people in the network. In this situation, the 3 "leader" accounts are structurally identical, even though they don't follow each other (or, indeed, anyone). Properly speaking this relationship is 'stochastic': there is high probability any given person in the network will follow one of the 3 leaders and a 0 probability they will follow anyone else.

Generally, members of a given block can have similar stochastic relationship to other blocks without being densely connected within the block.

Although this stochastic blockmodel concept is proven to be effective, it's not without its limits. There are theoretical results which show that we can generate artificial networks which we know have this type of block structure but which we can't recover by inference.

Anyway, this inference approach tries to find the block configuration which most likely generated the underlying connections in the network. This is an NP-hard optimization problem. There may be many different configurations which have approximately equal probability. See here

Also, the networks I'm considering aren't "complete". I'm just looking locally around a chosen set of seeds. A given person may appear to be periphery for a particular network but appear to be core for another. Blocks are only distinguishable with respect to other blocks.

What does indegree/outdegree mean?

In-degree = number of people in this network who follow account

Out-degree = number of people in this network who this account follows

What does niche factor mean?

I made it up. It's supposed to represent the person's influence in this network relative to their overall influence (in all of Twitter). It is the person's PageRank (read: 'influence' in this network) divided by their follower count.

What are the words above the list of subnodes?

Those are just phrases which appear frequently in the Twitter bios of people in that supernode. It's simply looking at the descriptions, not: tweets, locations, etc. It's pretty dumb: it doesn't distinguish between "art" and "artist" or even "artist" and "artists".

Can you make a graph for ____?

I'm open to suggestions on Twitter @euxenus. I'd prefer to do interesting combination graphs of, say, 5-10 seed accounts with a similar theme.

Can I see the code?

I'll open source it all soon.

Advanced

What community detection algorithm did you use?

Hierarchical stochastic block model inference as implemented in graph-tool. See here

What edge sparsification algorithm did you use?

I use the sparsification methods implemented in networkit. For the older graped I used Local Quadrilaterial Simmelian and for the newer ones I'm using Local Edge Forest Fire. The older grouped level graphs used local edge weight filtering (see above), then LQS sparsification. I switched to using the Directed Enhanced Configuration model with edge weights as number of follows between supernodes. See here,here, and,here

What layout algorithm did you use?

ForceAtlas2 for 3D and 2D. I tried some others but either I couldn't find implementations in 3D and 2D or the results didn't improve upon ForceAtlas2. (Know of an alternative? Let me know! @euxenus)

What did you make the website with?

React, Typescript, Webpack, Three.js