WORDij 3.0 Basic Features How to use WordLink, QAPNet, VISij, Z-Utilities – a Twitter Example
Assumptions This presentation assumes you have installed WORDij 3.0. If not, see the “How to Install WORDij 3.0” tutorial. The files used in this tutorial, with name “twitter” and the drop list accompany the program download package. Here we use three files: twitter2008.txt, twitter2009.txt, and droplist.txt. When we say “See Notes” for more information this means the slide has notes at the bottom which you cannot see unless you turn off the Slide Show mode and go to View, Normal Mode.
How to use WordLink • Start WORDij 3.0 by going to the stored location and clicking on the WORDij.jar This will open the program. • To do a basic WordLink procedure, click the “Browse…” Source Text File box to locate the sample file: twitter2009.txt • Click “Browse…” for the Drop List File to locate file: droplist.txt (See Notes below for more detail.)
Leave other settings at defaults. See Notes below for further detail. • Click the “Analyze Now” button to execute your WordLink analysis. • “Quit” terminates the program without executing further analyses.
You will see three boxes while the program is running: • “WordLink is working” and showing a progress bar. You can abort the program by pressing the “Stop Now” button. • “Program is finished” and you press “OK” • A log file lists program options and an output file after you click “Close.”
There are 8 output files generated: • These files serve as input for other analyses. See Notes for description of each file.
There are options on how to draw networks. In this presentation we use VISij which is part of the WORDij 3.0 package. • For an alternative See the tutorial “How to Draw a Semantic Network using UCINET/NetDraw.” This approach has more steps (17) and does not show motion over time like VISij does. • You can also graph in Pajek or in MultiNet/Negopy, or in other programs that accept the Pajek .net file format. The WORDij 3.0 Conversion tab contains utilities that will convert .net to MultiNet node and link files, .csv with required headers. WORDij 3.0 provides a high degree of network program interoperability.
The default is for 30 nodes and 3 minimum link strength but Zoom needs to be changed for most graphs. The default is no zoom and the resulting graph may look suboptimal. It is the initial graph which you then tailor to your preferences through exploring changing nodes and links and zooming in and out.
As shown in the previous slide, to graph your network, click on the main tab at the top labeled “VISij. It does a spring-embedded optimal layout. • Click on ADD to include a .net file(s) for graphing. • The default settings are 30 nodes with Minimum link value of 3. • Click Min Link Value to increase or decrease. • Click on Zoom + or – to move in or out to set an optimal view within the screen. For other options see Notes below.
To show the network around the key word, Twitter for 2009, we ran the Nodetric program in the Conversions tab. Steps: • Browse for the 2009 .pr file. • Put in the Focal Word: twitter • Select the default of 5 Link Steps away from this node. • Browse for where to put the output file and what to call it. • Then to run click Start. • When it ends, click Close. • Repeat this process with the 2008 file. • Next we graph these 2008 & 2009 Nodetric .net files in VISij.
Summary and Next Demonstration: • So far we have demonstrated how to take a text file, run WordLink to get the word & word-pair frequencies and to graph the semantic network with VISij. It allows for graphing a single point in time or over multiple points in time. • Next we will more beyond visualization to the comparison of text files from two different time periods. First QAP will determine the overall degree of network similarity. Second, the Z-utilities will show us what words and word pairs are new, what is dropping in relative frequency, and what is remaining the same in relative frequencies (proportions). • We will compare Twitter news in early 2008 (n=58) and early 2009 (n=182).
QAP is an overall measure of the similarity of two networks using a correlation coefficient. • It does not matter in which order the two files are. Enter the two .pr files you wish to compare. Here we enter the 2008 and 2009 twitter files. • You may leave the Permutations value to the default of 100 to generate 100 bootstrap random samples against which to arrive at a probability of significance value. • Click Analyze Now. • After the message that the program is finished, click OK and observe the contents of the log file for the correlation and probability. • Click Close. (If you click Quit it will terminate WORDij.) • The results for the two twitter files show a small correlation. • See Notes file below.
Even those the QAPNet correlation of the two twitter files is low there may be a number of words or word pairs that have higher relative frequency in one file compared to the other and others with no significant difference. This is revealed with the Z Utilities. • There are two different types of Z comparisons: word proportions and word-pair proportions. There are actually two types of word-pair comparisons, one is for the main .pr file and the other is for the results of the NodeTric .net files pairs, in which only the pairs in the node-centric network around a focal word are compared.
Z-Utilities allow you to compare two text files and determine what the significant differences are for either the words or the word pairs or the pairs from NodeTric .nets.
Enter or browse for the first file and then for the second file. Take note of the order because the statistical comparisons will be based on order in terms of whether showing a negative or positive z-score. You must browse to create the output file name inside the Select Window. Do not merely type the name in the WORDij screen slot for output file or it will produce no output file. You will get a warning message if you browse to a file name that is one of the input files. The next slide is a screen shot of the z-word pair comparison.
The above screen shot shows the output file name we have chosen. We suggest you open it in Excel and specify Fixed Width format. This will place each column of the output file into a separate Excel column. The default is sorting of the file by z-score in ascending order, so that the highest negative values appear first. • The negative values means there is a higher frequency for group 2 (time 2) than group 1 (time 1) These are word pairs that are relative new or a significantly increased frequency. • Or, sort the Chi Square as descending although all the “NAs” will be listed first when there are fewer than 5 counts in a group, after the list of NAs come the highest chi-square values. Chi-square does not distinguish direction of difference with negative or positive signs (all values appear to be positive) so you look at the proportions and counts to see which one is lower or higher to determine the direction of difference, increasing or decreasing.
There are three kinds of z-tests and chi-squares that WORDij 3.0 computes: words, word pairs, and pairs from the selected “ego-centric,” node-centric networks from NodeTric. • Because pairs are the most important elements to identifying networks we illustrate reading them into Excel and interpreting them. The same process would apply to words and word pairs from NodeTric .net file comparisons.
Reading a z-test pair output file into Excel: Open the file and click Finish. Click the upper left corner to select the entire spreadsheet. Then click Format Column, AutoFit Selection. This will adjust your column widths.
Here is how to format the Excel file for optimal viewing. Select columns D, E, F, & G to format Select Format, Cells, Number, Increment decimals to 4. Click OK.
This is a snapshot of a portion of the Excel file, sorted by Z-score in ascending order. Negative z-score values mean that group 2 (here time 2) had higher relative frequencies than group 1 (here time 1). These are word pairs that are significant increasing over time. If the comparison files were not time-based it would show the second file as having higher values.
Word pairs with positive z-scores are “dropping” or lower in relative frequency in group 2 compared to group 1.
Here are word pairs that did not change or were not different in the two files. These are indicated by insignificant z-scores, those between -1.63 and +1.63:
Word pairs sorted by chi square in descending order: This shows the value of using both z-scores and chi-square tests because they do not always produce the same results. For example, ch-square is unsigned. One must examine the difference in word pair counts and proportions to judge directionality and/or substantive significance. Chi-square is based on the actual counts, while z-tests are based on proportions.
Here is a sample of a screen from the z-tests and chi-square tests for comparing two .net files produced by NodeTric with “twitter” as the focal word. This illustrates word pairs that are higher in relative frequency in group 2 than in group 1. Notice that the Z-tests have significant values because we use a very small constant for zero. But, one should not use chi-square with cell frequencies less than 5, so this shows NA for these tests even though the results appear possibly substantively significant.
Here are the positive NodeTric .net pair changes, increasing significantly for time 2 (group 2) compared to time 1 (group 1). Notice how “virtual worlds” drops out. Remember that files need not be time-based. With this utility you can compare two files based on any criterion for creation and see which has more or less of some word or word-pair attribute.
These sample screen shots show some of the NodeTric .net word pairs that remained the same in relative frequency from time 1 (group 1) to time 2 (group 2) according to z-scores but not all chi-squares.
How to Create Semantic Network Graphs in UCINET and NetDraw Importing WORDij .net files Converting .net files to system files in UCINET Creating the visualization in NetDraw
Graphing the semantic network with NetDraw is a 17 step process. Breaking it down step by step makes it easy to make a graph: • Start UCINET (http://www.analytictech.com/) • Click on the Data tab. • Move your cursor to the Import option. • Move your cursor to Pajek and click it. • In the dialog box browse for your twitter.net file and accept the default file names for the remaining slots in the box. • Close the Output Log file.
Graphing the semantic network with NetDraw (cont’d): 7. Click the Visualize tab. 8. Select NetDraw 9. Click File. 10. Click Open and select UCINET dataset and click Network. 11. Click on the … box and browse to find your twitter.##h file and click OK.
Graphing the semantic network with NetDraw (cont’d): 12. After the file loads and you see red circles, click the Rels tab in the upper right corner. 13. Go near the bottom of that box to the small boxes containing > and 0. Replace the 0 with 100 which drops word pairs whose frequency is less than 100. 14. Click the top-level tab ISO to remove isolate nodes after frequency pruning.
Graphing the semantic network in Netdraw (cont’d): 15. Click Layout and select Graph Theoretic, Spring Embedding and click OK in the dialog box. 16. Experiment with different frequency prunings to get a graph you want to save and click File, Save Diagram As, Bitmap, and give the file a name. 17. You can insert the twitter.bmp file into documents or slides
Note: Spring layout produces a different perspective on each run but keeps the same distances between all nodes. So, your graph will look different each time you run Layout, Graph Theoretic Layout, Spring Embedding.