use Meerkat-ED to do social network analysis and discourse analysis

Meerkat-ED is a software designed by Dr. Reihaneh Rabbany 

Meerkat-ED can be used to analyze two kinds of networks: participants’ interaction network and the network of terms they have used in their interactions.
I applied a small portion of my data in this tool, here is the process:
  • transfer your data into .txt file, if you don’t use Moodle, please refer to samplefile.txt for format. The online course I am studying was not based on Moodle, so I manually transferred a very small portion of my data into the right format
  • open the txt file in Meerkat-ED, two networks (i.e., interaction network and term network) were automatically generated

in the interaction network, you can change time frame at the bottom to demonstrate network during different duration

screen-shot-2016-12-01-at-5-37-26-pm

Figure 1. Student interaction network

in the term network, you can click student’s name at the right top window, to show terms this student used

screen-shot-2016-12-01-at-5-47-48-pm

Figure 2. Term network

  • you can click analyze-content analysis-topic clustering to show topic communities.

screen-shot-2016-12-01-at-5-45-22-pm

Like KBDex, Meerkat-ED is a useful tool to analyze online collaborative learning. One strength Meerkat-ED has is you can set different text analysis methods, and different types of words, e.g., noun or verb. One thing Meerkat-ED is different form KBDex is that, you cannot choose specific words in Meerkat-ED. In addition, you should also pay attention to data clean process before you import your data into Meerkat-ED.  You might need to make noun words consistent throughout your data file. For example, I should have changed all “communities” to “community”, or “questions” to “question”. This part can be tricky.
Another creative use of Meerkat-ED and KBDex is that you can use them to visually demonstrate content or discourse analysis results. For instance, if you have coded discussions based on some coding scheme, you have got codes for each analysis unit. Then you can use Meerkat-ED and KBDex to show the networks of codes (where links represent co-occurrence of codes in the same sentence), rather than the networks of words, phrases of the discussion content itself. This might be an interesting way to demonstrate content or discourse analysis result. Widely-used ways to demonstrate content or discourse analysis results are “code and count” table, pie/bar charts, or sequential/path analysis. The function of network of terms supported by Meerkat-ED can create new visualizations for researchers.
Overall, Meerkat-ED is a solid tool to do data mining in online collaborative learning.
If you are interested in applying this tool in your reserach, please refer to more info:
Rabbany, R., Takaffoli, M., & Zaïane, O. R. (2011). Analyzing participation of students in online courses using social network analysis techniques. In Proceedings of educational data mining.
Rabbany, R., Elatia, S., Takaffoli, M., & Zaïane, O. R. (2013). Collaborative learning of students in online discussion forums: A social network analysis perspective. In Educational data mining (pp. 441-466). Springer International Publishing
Advertisements

use KBDex in online collaborative learning research

A Japanese research & software team developed an analytical software called KBDeX, to visualize network structures of discourse based on two-mode words*discourse units. This software is basically a discourse analysis tool, based on the relation between words and discourse units. For a researcher in the collaborative online learning research field, you can select words and demonstrate the relations of words, based on your interest. This can help demonstrate students’ knowledge construction/creation. It demonstrates words/discourses relations in the network format; in addition, it can demonstrate the temporal development process of networks. This is a big strength of this discourse analysis tool. Yet it should be noted that it is not primarily based on the relations between students; rather, it is based on the co-occurence of words in discourse units. Although, student interaction network can be demonstrated, in terms of the common words they have used in the discourse.
Here I want to demonstrate a demo from a part of my dissertation data. I would probably consider to use this tool in my dissertation.
  • First, I dragged a portion of online discussion data from the data I collected for my dissertation and transferred them to the format as shown by KBDex dataset. In my data, each discourse unit represents a comment.
  • I added one feature to assign students to two different groups (see figure 1); a group of students got involved within a discussion in the thread.
  • Then I ran the data of group 2 in the main windows (see figure 2).
screen-shot-2016-12-01-at-11-08-21-am
Figure 1. Groups
KBDex platform has four windows: (1) The discourse viewer which shows an overview of the discourse and selected word (top left window), (2) the network structure of students (top right window), (3) the network structure of discourse units (bottom left window), and (4) the network structure of selected words (bottom right window).
screen-shot-2016-12-01-at-10-55-40-am

Figure 2. Main windows

An important note I have gained from this trail is that it is very important to consider what words you choose from the student discussion content. Words can represent students’ inquiry process; but it cannot represent the whole inquiry process. We, as researchers, might carry our understanding of the inquiry process based on the choice of words; yet, it cannot fully represent students’ cognitive inquiry process. For example, from the bottom left window of figure 2, we can see that some comments are isolated with the core cluster, which means that there are no common words within these comments. But, of course, students made inquiry in these comments, they just did not use the words we have chosen. Therefore, it’s important to pay attention to the word choosing process, and to describe why you choose some words other than other words. And it’s important to acknowledge the inquiry process within isolated discourse units.

Something to consider before you start KBDex analysis:

  • clean your data, make sure the words you chose are consistent throughout the dataset
  • consider how to use the group function, it does not necessarily need to be a traditional group as we talk in education. Like in my study, a group is assigned to students who got involved in interacting with each other in a discussion thread. The discussion thread is divided into several groups depending on the interaction
  • time in your data, since KBDex can demonstrate temporality of networks, namely the evolution of networks

Finally, like Meerkat-ED, the function of network of selected words provided by KBDex can be used to demonstrate content/discourse analysis result.

If you are interested in using KBDex in your online collaborative learning research, here are some seminal work done by the research and software develop team:

Matsuzaw, Y., Oshima, J., Oshima, R., Niihara, Y., & Sakai, S. (2011). KBDeX: A platform for exploring discourse in collaborative learning. Procedia-Social and Behavioral Sciences, 26, 198-207.

Matsuzawa, Y., Oshima, J., Oshima, R., & Sakai, S. (2012). Learners’ use of SNA-based discourse analysis as a self-assessment tool for collaboration.International Journal of Organisational Design and Engineering2(4), 362-379.

Oshima, J., Oshima, R., & Matsuzawa, Y. (2012). Knowledge Building Discourse Explorer: a social network analysis application for knowledge building discourse. Educational technology research and development60(5), 903-921.

A social network analysis of my core discussion network

Many researchers have done studies on people’s core discussion network, the set of friends and family people turn to when discussing important matters.

I did a simple social network analysis on my core discussion network, which helped me gain an overview picture of my social network. I first made a weighted adjacency matrix to encode an egocentric network for those with whom I have discussed matters important in the last six months. In the weighted adjacency matrix, rows and columns represent different nodes, and weight ranges from 0-5. 0 means there is no connection between nodes, and 5 means the two nodes has exchanged important information. The matrix was saved in a csv file, which looks like this:

Screen Shot 2015-09-27 at 8.06.04 PM

I used R packages – network and sna  in my analysis process.

  1. The first step is to import data in R. There are two ways to import data: 1) from a google file 2) from local file

require(RCurl)
url<-“the google file URL address”
myCsv <- getURL(url)
myData<-read.csv(textConnection(myCsv))

or

myData<-read.csv(“local file path”)

2. The second step is convert the file into a matrix in R. (note: “row.names=1” coerces the first row into the header of the matrix)

ego<- as.matrix(read.csv(“google file URL address”, row.names=1))

or

ego<- as.matrix(read.csv(“local file path”, row.names=1))

3. The third step is to plot the network. I can use function gplot to draw a network plot, for example

gplot(ego, gmode=”graph”, displaylabels=TRUE, label.cex=0.8, vertex.col=”darkolivegreen”)

The plot looks like this:

Screen Shot 2015-09-27 at 8.19.03 PM

4. The last step is to customize the network. I want to present different groups with different color, and edge width represents the weight between two nodes. This process is a little complicated.

# add an attribute edgeweight to the network

MyNetwork <- network(ego, directed=FALSE, edge.attrnames=”edgeweight”)

# set values to the edgeweight attribute

set.edge.attribute(MyNetwork2, “edgeweight”, value=c(1,2,3,4,5))

# set nodecolors
nodeColors = c(“goldenrod3″,”steelblue1″,”steelblue1″,”steelblue1”,  “darkgreen”,”darkgreen”, “darkgreen”,”firebrick”,”firebrick”, “firebrick”,”firebrick”,”darkslateblue”,”darkslateblue”, “darkslateblue”,”blue”,”blue”,”blue”)

# use function plot to draw the network
plot(MyNetwork,displaylabels=TRUE, label.cex=0.5,edge.lwd=”edgeweight”,vertex.col=nodeColors)

Finally, my core discussion network (ego network) looks like this:

Screen Shot 2015-09-27 at 8.25.10 PM

I didn’t record all my important connections in the matrix. But based on the above analysis, it is demonstrated that in my core discussion network, the friendship, coworker relationship, and kinship are apparent in my personal network. Alters in these four different networks are closely tied to each other within each network, but there is no relationships between or among these four different networks. For instance, my dad, mom, and sister are closely tied to each other in the kinship network, but they are not tied to my friendship network, or coworker network. A unique network is the partner network, they are not linked to each other at all.

Twitter hashtag analysis #justdoit

I used  NodeXL (can only be used in Windows OS) and Gephi to analyze a Twitter Search Network: #justdoit. I am not a super fan of Nike but my roomie is 🙂

In the dataset, each twitter account that mentions the hashtag is a node. If the tweet is a reply to another tweet or a mention of another tweet, an edge is added between these two accounts. Here is how the data laboratory looks like in Gephi (the original datas was generated automatically in NodeXL):

Screen Shot 2015-02-16 at 10.44.23 AM

 

 

Here is the first network showing up once I open the data file. This network is meaningless for us, so it needs modifications:

Screen Shot 2015-02-17 at 1.03.24 PM

Then, we need to calculate in statistics. (Details found on Gephi wikipedia http://wiki.gephi.org/index.php/Category:Measure)

  • Avg. weighted degree: Average of sum of weights of the edges of nodes. (differences: Average Degree: Simply the sum of edges of a node.) We always use avg. weighted degree, I think.
  • Modularity: Measures how well a network decomposes into modular communities.
  • Eigenvector centrality: A measure of node importance in a network based on a node’s connections.

Screen Shot 2015-02-17 at 1.13.31 PM

 

Then, we need to adjust the nodes min and max size; apply modularity class;

The next step is very important, because we have many communities in the network which are not important for us, so we only show the first three communities:

Screen Shot 2015-02-17 at 1.25.18 PM

Then partition the edge relationship: tweet, reply and mention; Run a layout algorithm – force atlas to get a better layout; show the labels for nodes and run a label adjust algorithm.

Here is my result for the first three communities using #justdoit:

nike_justdoit

 

 

(important steps for creating a beautiful network: statistics,node,modularity,edge,labels,layout)

From the analysis, we can see that nikejapan, nike, and nikewomen_jp are the first three #justdoit communities. Why do Japanese enjoy running this much? What comes to my mind firstly is the book – What I Talk About When I Talk About Running written by a famous Japanese writer Haruki Murakami. It seems that his book has made a hit!

My Facebook Friends Network

First, I must get Gephi installed in my MacBook. It’s very easy to install it in Windows OS but for apple OS, this process was full of pain. It took me almost 7 hours to troubleshoot all problems. Finally, I won. I followed these two posts and tried all potential ways. If you got the same issue, the only thing you can do is either giving up or keeping yourself cool and try all these tips shown in the two posts.
Then, I downloaded my personal friends network out of Facebook on NetGet Application and it was used to save my friends’ network in a GML File. In this process, you’d better use Google Chrome to download the gml file. Safari is not professional. When I used Safari to save the file, it was automatically saved as a txt file, which can’t be opened in Gephi. For groups or pages, we can get data from Facebook apps – netvizz.
Now, I can start playing with Gephi!
1. open gml file in Gephi. The network is shown as a grey meaningless graph.
Screen Shot 2015-02-14 at 2.32.46 PM
2. layout – force atlas or force atlas 2 – you could try different parameters to adjust the layout
here is my choice:
Screen Shot 2015-03-10 at 11.48.31 AM
3. go to statistics/ run avg. path length – go to ranking/nodes/apply betweenness centrality, try different parameters on min/max size and color
1
4. go to statistics/ run modularity – go to partition/apply modularity class

network without labels

5. if you wanna show names in the graph, you can click “show nodes labels” in Graph window, and use “label adjust” in Layout.

network with labels

 6. file- export your graph
save your project
 Well done! 🙂