You can find the markdown files and data for this exercise in the corresponding Github folder.

Construct the timelines of Twitter users

Build social network of retweets

Calculate assortativity

Permutation tests

Community detection

Remember what we did in the exercise about retrieving Twitter network data. First connect to the Twitter API using your credentials as in previous exercises.

```
library(dplyr)
library(tidygraph)
library(rtweet)
library(ggraph)
```

Then download the file SwissPoliticians.csv and read it as a csv in R. Take into account that separators are tabs.

`#Your code here`

Look up the basic profile information of each user by screenname and delete the filtered users from your dataset. Then convert to lowercase all screen names in your Twitter profiles and in the data from the csv file and merge them in a single dataframe.

`#Your code here`

Now retrieve the last 200 tweets of each user (remember the check=FALSE parameter). It might take a bit to get data. Save the result in a file called “timelines.RData” so you don’t lose any data.

`#Your code here`

Use the graph_assortativity function to calculate the assortativity with respect to party labels. How high is the value?

`#Your code here`

To see if the assortativity value fits your expectations, use ggraph to plot the network coloring each node according to the political party label of the politician. Does the pattern of colors fit the value of assortativity?

`#Your code here`

The above result looks assortative, but how can we test if it could have happened at random and not because of party identity? Here were are going to test it with a permutation test.

First, let’s run a permutation. Perform the same assortativity calculation as above but permuting the party labels of nodes. You can do this very efficiently by using the sample() function when you call the graph_assortativity() function.

`#Your code here`

Is the value much closer to zero? Repeat the calculation with 1000 permutations and plot the histogram of the resulting values. Add a line with the value of the assortativity without permutation. Is it far or close to the permuted values?

`#Your code here`

To be sure, let’s calculate a p-value for the null hypothesis that the assortativity is zero and the alternative hypothesis that it is positive (what we expected):

`#Your code here`

After looking at the above results, do you think it is likely that the assortativity we found in the data was produced by chance?

Let’s test if Twitter communities match political affiliations. Remove nodes with degree zero in the network and run the Louvain community detection algorithm. Visualize the result coloring nodes by community labels (cast the label to a character so you have distinct colors).

`#Your code here`

Run the graph_modularity function with the above community labels. Is it high enough to think that the network has a community structure?

`#Your code here`

Repeat but using the party labels (you might have to cast to factor) instead of the communities detected with Louvain. Is it higher or lower? How far is this modularity from the maximal one found with Louvain?

`#Your code here`

Finally, to understand which parties are represented in each community, build a data frame for nodes with two columns: one with the party label and another one with the community label. Use the table() function to print a contingency table. Can you guess which party or parties compose each community?

`#Your code here`

- How well can you predict the party of a politician from its neighbors in the network? Here you can use the rule of predicting the party as the majority party among its neighbors and evaluate accuracy of this approach.
- What would be the results if we use the network of replies? Do you expect assortativity and modularity to be higher or lower?
- If you retrieved data of follower links, you can repeat the above analysis for undirected following relationships. Do you expect a higher or lower assortativity?