A brief overview of a recent publication on a social network analysis study of online learning community development

Recently, I and one of my advisors Dr. Cassandra Scharber published a social network analysis (SNA) study titled “The influences of an experienced instructor’s discussion design and facilitation on an online learning community development: A social network analysis study” in The Internet and Higher Education. A short audioslide presentation of this overview can be found here. Please also read a brief introduction of SNA methods and my design of a visualization showcase app at my Github page. I have also been building up a network analytics R package to show my analysis process in this study; please find the package in my Github here.

In this study, we first characterized an effective online learning community as an environment with learners’ reciprocal interaction and continuous participation. Then, based on this philosophy, we argued that the social network analysis (SNA) method is an appropriate research method for studying online learning communities. We further proposed an integrated network analysis framework (refer to section 4.4 in this article ) using emerging network methods to analyze both one-mode and two-mode networks in online learning. This framework combined both emerging SNA measures — Opsahl’s measures (Opsahl, 2009; Opsahl, 2013; Opsahl, 2015; Opsahl, Agneessens, & Skvoretz, 2010; Opsahl & Panzarasa, 2009), with more traditional measures — Butts’ measures (Butts, 2008; Butts, 2014; Butts, Hunter, Handcock, Bender-deMoll, & Horner, 2015). The analysis process was conducted via R programming and relevant packages (i.e., tnet, and sna). In addition, Marcos-García et al.’s (2015) DESPRO method, using centrality ranges to detect participatory roles, can be combined with Opsahl’s (2015) SNA centralities to make the results more accurate.

The research purpose for this study is twofold: we aim to provide methodological implications for using emerging social network analysis in online learning community research; and to provide practical implications for designing and facilitating discussions that can foster online learning communities.

SNA results showed the students gradually formed an interactive, cohesive and equally-distributed learning community duing class-level and group-level discussionsThe instructor, overall, played a facilitator role in this community; yet her participatory roles varied within different discussions during different time frames. Her participatory role evolved from a guide in the first class-level discussion, to varying roles, i.e., a facilitator, an observer, and a collaborator within different group discussions at the middle stages of the course, and to an observer in the course’s later stages.

Two important implications of this study are: (1) practical implications for designing and facilitating discussions that can foster online learning communities were proposed. Strategies include: design of a structural interweaving of class-level and group-level discussions; use of base groups at the early stage of an online course; integration of opportunistic collaboration groups with “fixed” group configuration; the instructor’s leadership role in the early stage, role changes in terms of different group situations, and relinquishment of authority in the middle and late stages.

(2) methodological implications for studying online learning communities were proposed. An integrated social network analysis framework for one-mode and two-mode network analysis as well as adapted instructors’ participatory role examination were proposed. Particularly, specific suggestions of using SNA measures in online community research that stress both interaction and participation were included in this framework. It is worth mention that Opsahl’s network measures combining both the effect of the number of ties and the effect of tie weights can offer a more robust method to analyze online learning communities. In addition, Butts et al. and his colleagues (2014, 2015)’ measures on reciprocity, transitivity, centralization are also important measures. Basic statistics, such as student-student, student-instructor, and instructor-student interaction frequency are simple yet useful measures.

This study is a part of my dissertation research. For a more fun (and short) introduction of my dissertation study, please refer to my 3MT presentation. I hope you will find this SNA study interesting and useful for your own research. Feel free to contact me at ouyan064@umn.edu for further questions or details.


Butts, C. T. (2008). Social network analysis: A methodological introduction. Asian Journal of Social Psychology, 11(1), 13-41. doi:10.1111/j.1467-839X.2007.00241.x

Butts, C. T. (2014). sna: Tools for social network analysis (version 2.3-2) [R package]. Retrieved from http://CRAN.R-project.org/package=sna

Butts, C. T., Hunter, D., Handcock, M., Bender-deMoll, S., & Horner, J. (2015). network: Classes for relational data (version 1.13.0) [R package]. Retrieved from https://cran.r-project.org/web/packages/network/index.html

Marcos-García, J. A., Martínez-Monés, A., & Dimitriadis, Y. (2015). DESPRO: A method based on roles to provide collaboration analysis support adapted to the participants in CSCL situations. Computers & Education, 82, 335-353. doi:10.1016/j.compedu.2014.10.027

Opsahl, T. (2009). Structure and evolution of weighted networks (Doctoral dissertation, University of London). Retrieved from http://toreopsahl.com/publications/thesis/

Opsahl, T. (2013). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2), 159-167. doi:10.1016/j.socnet.2011.07.001

Opsahl, T. (2015). tnet: Software for analysis of weighted, two-mode, and longitudinal networks (version 3.0.14) [R package]. Retrieved from https://cran.r-project.org/web/packages/tnet/index.html

Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251. doi:10.1016/j.socnet.2010.03.006

Opsahl, T., & Panzarasa, P. (2009). Clustering in weighted networks. Social Networks, 31(2), 155-163. doi:10.1016/j.socnet.2009.02.002


use Meerkat-ED to do social network analysis and discourse analysis

Meerkat-ED is a software designed by Dr. Reihaneh Rabbany 

Meerkat-ED can be used to analyze two kinds of networks: participants’ interaction network and the network of terms they have used in their interactions.
I applied a small portion of my data in this tool, here is the process:
  • transfer your data into .txt file, if you don’t use Moodle, please refer to samplefile.txt for format. The online course I am studying was not based on Moodle, so I manually transferred a very small portion of my data into the right format
  • open the txt file in Meerkat-ED, two networks (i.e., interaction network and term network) were automatically generated

in the interaction network, you can change time frame at the bottom to demonstrate network during different duration


Figure 1. Student interaction network

in the term network, you can click student’s name at the right top window, to show terms this student used


Figure 2. Term network

  • you can click analyze-content analysis-topic clustering to show topic communities.


Like KBDex, Meerkat-ED is a useful tool to analyze online collaborative learning. One strength Meerkat-ED has is you can set different text analysis methods, and different types of words, e.g., noun or verb. One thing Meerkat-ED is different form KBDex is that, you cannot choose specific words in Meerkat-ED. In addition, you should also pay attention to data clean process before you import your data into Meerkat-ED.  You might need to make noun words consistent throughout your data file. For example, I should have changed all “communities” to “community”, or “questions” to “question”. This part can be tricky.
Another creative use of Meerkat-ED and KBDex is that you can use them to visually demonstrate content or discourse analysis results. For instance, if you have coded discussions based on some coding scheme, you have got codes for each analysis unit. Then you can use Meerkat-ED and KBDex to show the networks of codes (where links represent co-occurrence of codes in the same sentence), rather than the networks of words, phrases of the discussion content itself. This might be an interesting way to demonstrate content or discourse analysis result. Widely-used ways to demonstrate content or discourse analysis results are “code and count” table, pie/bar charts, or sequential/path analysis. The function of network of terms supported by Meerkat-ED can create new visualizations for researchers.
Overall, Meerkat-ED is a solid tool to do data mining in online collaborative learning.
If you are interested in applying this tool in your reserach, please refer to more info:
Rabbany, R., Takaffoli, M., & Zaïane, O. R. (2011). Analyzing participation of students in online courses using social network analysis techniques. In Proceedings of educational data mining.
Rabbany, R., Elatia, S., Takaffoli, M., & Zaïane, O. R. (2013). Collaborative learning of students in online discussion forums: A social network analysis perspective. In Educational data mining (pp. 441-466). Springer International Publishing

use interactive Shiny app to demonstrate your research analysis result

Shiny app is created by R studio. It can be used in creative ways to demonstrate your research results if you use R to analyze your data. Your shiny app includes server.R and ui.R. The R code that you used to run your data, can be embedded in server.R file. It may create plots, tables, network graphs, etc. You can design and program your user interface in different format, like checkbox, buttons, radio buttons, sliders, text input, etc.

Here is an example I played with on a recent study my advisor has worked on – a gender analysis of female scholars in six educational technology journals from 2004-2015.

  • used navbarPage to demonstrate two pages for including different info
  • used select box for users to select the journal
  • used tabPanel to demonstrate two different formats of barplots (created through ggplot2 in R)


copyright of the data demonstrated in this shiny app belongs to the research team, please do not use the data without permit


Many fancy examples can be found here in shiny gallery and user showcase, where you might find some inspirations. Really, it is not that hard to learn, even you don’t have any programming background, and don’t know any programming languages.

In addition, shiny app can be embedded within R presentation, to make presentation more interactive.

use KBDex in online collaborative learning research

A Japanese research & software team developed an analytical software called KBDeX, to visualize network structures of discourse based on two-mode words*discourse units. This software is basically a discourse analysis tool, based on the relation between words and discourse units. For a researcher in the collaborative online learning research field, you can select words and demonstrate the relations of words, based on your interest. This can help demonstrate students’ knowledge construction/creation. It demonstrates words/discourses relations in the network format; in addition, it can demonstrate the temporal development process of networks. This is a big strength of this discourse analysis tool. Yet it should be noted that it is not primarily based on the relations between students; rather, it is based on the co-occurence of words in discourse units. Although, student interaction network can be demonstrated, in terms of the common words they have used in the discourse.
Here I want to demonstrate a demo from a part of my dissertation data. I would probably consider to use this tool in my dissertation.
  • First, I dragged a portion of online discussion data from the data I collected for my dissertation and transferred them to the format as shown by KBDex dataset. In my data, each discourse unit represents a comment.
  • I added one feature to assign students to two different groups (see figure 1); a group of students got involved within a discussion in the thread.
  • Then I ran the data of group 2 in the main windows (see figure 2).
Figure 1. Groups
KBDex platform has four windows: (1) The discourse viewer which shows an overview of the discourse and selected word (top left window), (2) the network structure of students (top right window), (3) the network structure of discourse units (bottom left window), and (4) the network structure of selected words (bottom right window).

Figure 2. Main windows

An important note I have gained from this trail is that it is very important to consider what words you choose from the student discussion content. Words can represent students’ inquiry process; but it cannot represent the whole inquiry process. We, as researchers, might carry our understanding of the inquiry process based on the choice of words; yet, it cannot fully represent students’ cognitive inquiry process. For example, from the bottom left window of figure 2, we can see that some comments are isolated with the core cluster, which means that there are no common words within these comments. But, of course, students made inquiry in these comments, they just did not use the words we have chosen. Therefore, it’s important to pay attention to the word choosing process, and to describe why you choose some words other than other words. And it’s important to acknowledge the inquiry process within isolated discourse units.

Something to consider before you start KBDex analysis:

  • clean your data, make sure the words you chose are consistent throughout the dataset
  • consider how to use the group function, it does not necessarily need to be a traditional group as we talk in education. Like in my study, a group is assigned to students who got involved in interacting with each other in a discussion thread. The discussion thread is divided into several groups depending on the interaction
  • time in your data, since KBDex can demonstrate temporality of networks, namely the evolution of networks

Finally, like Meerkat-ED, the function of network of selected words provided by KBDex can be used to demonstrate content/discourse analysis result.

If you are interested in using KBDex in your online collaborative learning research, here are some seminal work done by the research and software develop team:

Matsuzaw, Y., Oshima, J., Oshima, R., Niihara, Y., & Sakai, S. (2011). KBDeX: A platform for exploring discourse in collaborative learning. Procedia-Social and Behavioral Sciences, 26, 198-207.

Matsuzawa, Y., Oshima, J., Oshima, R., & Sakai, S. (2012). Learners’ use of SNA-based discourse analysis as a self-assessment tool for collaboration.International Journal of Organisational Design and Engineering2(4), 362-379.

Oshima, J., Oshima, R., & Matsuzawa, Y. (2012). Knowledge Building Discourse Explorer: a social network analysis application for knowledge building discourse. Educational technology research and development60(5), 903-921.

A social network analysis of my core discussion network

Many researchers have done studies on people’s core discussion network, the set of friends and family people turn to when discussing important matters.

I did a simple social network analysis on my core discussion network, which helped me gain an overview picture of my social network. I first made a weighted adjacency matrix to encode an egocentric network for those with whom I have discussed matters important in the last six months. In the weighted adjacency matrix, rows and columns represent different nodes, and weight ranges from 0-5. 0 means there is no connection between nodes, and 5 means the two nodes has exchanged important information. The matrix was saved in a csv file, which looks like this:

Screen Shot 2015-09-27 at 8.06.04 PM

I used R packages – network and sna  in my analysis process.

  1. The first step is to import data in R. There are two ways to import data: 1) from a google file 2) from local file

url<-“the google file URL address”
myCsv <- getURL(url)


myData<-read.csv(“local file path”)

2. The second step is convert the file into a matrix in R. (note: “row.names=1” coerces the first row into the header of the matrix)

ego<- as.matrix(read.csv(“google file URL address”, row.names=1))


ego<- as.matrix(read.csv(“local file path”, row.names=1))

3. The third step is to plot the network. I can use function gplot to draw a network plot, for example

gplot(ego, gmode=”graph”, displaylabels=TRUE, label.cex=0.8, vertex.col=”darkolivegreen”)

The plot looks like this:

Screen Shot 2015-09-27 at 8.19.03 PM

4. The last step is to customize the network. I want to present different groups with different color, and edge width represents the weight between two nodes. This process is a little complicated.

# add an attribute edgeweight to the network

MyNetwork <- network(ego, directed=FALSE, edge.attrnames=”edgeweight”)

# set values to the edgeweight attribute

set.edge.attribute(MyNetwork2, “edgeweight”, value=c(1,2,3,4,5))

# set nodecolors
nodeColors = c(“goldenrod3″,”steelblue1″,”steelblue1″,”steelblue1”,  “darkgreen”,”darkgreen”, “darkgreen”,”firebrick”,”firebrick”, “firebrick”,”firebrick”,”darkslateblue”,”darkslateblue”, “darkslateblue”,”blue”,”blue”,”blue”)

# use function plot to draw the network
plot(MyNetwork,displaylabels=TRUE, label.cex=0.5,edge.lwd=”edgeweight”,vertex.col=nodeColors)

Finally, my core discussion network (ego network) looks like this:

Screen Shot 2015-09-27 at 8.25.10 PM

I didn’t record all my important connections in the matrix. But based on the above analysis, it is demonstrated that in my core discussion network, the friendship, coworker relationship, and kinship are apparent in my personal network. Alters in these four different networks are closely tied to each other within each network, but there is no relationships between or among these four different networks. For instance, my dad, mom, and sister are closely tied to each other in the kinship network, but they are not tied to my friendship network, or coworker network. A unique network is the partner network, they are not linked to each other at all.

A Basic Social Network Analysis for LA Seminar in Knowledge Forum

In spring 2015 semester, Dr. Bodong Chen has offered a course CI5330 – learning analytics seminar in UMN. He has used Knowledge Forum (KF) as the online learning environment. Knowledge Forum is an educational software designed to help and support knowledge building communities. The instructor can set up scaffolding keywords to help students build knowledge in the online community. Figure 1 shows the home page for CI5330 and figure 2 shows the KF page for week 2.


Figure 1. KF homepage for CI5330


Figure 2. KF page for week 2

My purpose is to analyze students’ basic interactions in CI5330 throughout the whole semester in KF. I want to know whether there are extremely active students and outliers or not. I also want to know whether there are highly connected small groups in the KF or not.

Based on the data generated by students and the instructor on KF, I created two csv files to represent nodes and edges. The node file (figure 3) contains information of the instructor and students; the edge file (figure 4) contains their interaction information. When one person builds on another person’s post, I add a record in the edge file.

Screen Shot 2015-04-25 at 12.16.49 PM

Figure 3. The node file

Screen Shot 2015-04-25 at 8.21.15 AM

Figure 4. The edge file

I use Gephi to do the social network analysis. Here are my results.

From figure 5 we can see that the differences of people’s in-degree and out-degree are not significant. One or two peoples’ degrees are relatively lower than other’s. This result can also be concluded from the in-degree and out-degree diagrams. The size represents the node’s degree. There is only one outlier in the in-degree and out-degree diagrams.


Figure 5. In-Degree and Out-Degree table


Figure 6. In-Degree Diagram


Figure 7. Out-Degree Diagram

Figure 8 shows the modularity class generated in Gephi. From this diagram, we can see that there are two groups. Most people in KF are connected to each other in a big group.

Based on this basic social network analysis, I conclude that people in this class are equally involved in this online learning community. My next step is to do a content analysis to get an in-depth picture of the topics people communicated in the KF.


Figure 8. The modularity class

Note: This SNA report is just for my personal purpose, please don’t use any information from this post for any research.

Finally, I wanna share a social network analysis site created by Christiane Reilly and me for a class session in this course.

Text mining my paper with an online text visualization tool

Today, I was introduced with an online text visualization tool from my course colleagues. This tool can analyze any text into an interactive network. very cool!

I used it to analyze one of my conference paper, which was focused on online social learning. Here is the link to the text network.

Screen Shot 2015-04-02 at 10.14.45 PM

Try to embed it, but seems it doesn’t show up 😦