文档搜索 > #1$2K3OverviewofHelp Tofi
#^{1}$^{2}K^{3}Overview of Help
To find help on a specific
topic click on the index button above.
For a general tutorial and
introduction to UCINET see the online User's Guide which accompanies
this program.
An introduction to the general
form of most help files in UCINET is contained in the Introduction Section
(see link below). Also below are links to the UCINET standard datasets
together with help on the DL file format.
R61FJKIntroduction
Section
.YXWXNDL
Scribble30 Standard
Datasets
To obtain technical support,
send email to:
*! ExecFile("mailto:support@analytictech.com") (for United States users)support@analytictech.com
M.Everett@wmin*! ExecFile("mailto:M.Everett@wmin.ac.uk") (for all other users) .ac.uk
#^{4}$^{5}K^{6}DATA>DESCRIBE>IMPORT
LABELS
PURPOSE Import
labels into a UCINET dataset
DESCRIPTION Imports
labels which are in text format into a UCINET dataset. The labels should
be separated by a carriage return and be of plain text.
PARAMETERS
Label File
Name of text file
containing the labels
Import into:
Choices are:
Row Labels
Column Labels
Matrix
Labels
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{7}$^{8}K^{9}FILE
> DELETE
PURPOSE Delete
a UCINET dataset
DESCRIPTION Both
the header and the data files are deleted. Files should be separated
by a space.
PARAMETERS
File(s) to be deleted
List of files to
be deleted. Data type: any UCINET file.
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{10}$^{11}FILE>RENAME
UCINET FILE
PURPOSE Rename
a UCINET dataset.
DESCRIPTION Renames
both a header and data file of a UCINET dataset.
PARAMETERS
Original Dataset Name :
Name of file to
be re-named
New Dataset Name:
Name of
new UCINET dataset.
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{12}$^{13}K^{14}FILE>COPY
UCINET DATASET
PURPOSE Copy
a UCINET dataset to a new filename or folder.
DESCRIPTION Copies
both a header and data file of a UCINET dataset.
PARAMETERS
Original Dataset Name :
Name of dataset
to be copied. Data type: any UCINET file.
New Dataset Name:
Name of new UCINET
dataset. This can be sent to a new folder. Default is the same folder
as the original file.
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{15}$^{16}K^{17}Introduction
This file gives technical information
about all the routines contained within UCINET.
The manual assumes that users
have certain rudimentary knowledge of the Windows operating system and
of network terminology. Elementary information on UCINET is available
in the accompanying users guide.
Each routine is documented
in a standard way. This should help the user to understand some
of the non-standard routines once documentation for which they are familiar
has been thoroughly digested.
Command Format
Each routine is documented
using the following keywords: MENU, PURPOSE, PESCRIPTION, PARAMETERS,
LOG FILE, COMMENTS, and REFERENCES. The details of
these are as follows:
MENU This
gives the exact position of the routine within the UCINET menu system.
For example NETWORK>SUBGROUPS>K-PLEX
can be found by first selecting NETWORK on the top level of the
menu and then from the pull down submenu selecting SUBGROUPS
and then finally from this submenu selecting K-PLEX. The
selection of all the options in the MENU list followed by a mouse
click will begin execution of the routine.
PURPOSE This
gives a brief one or two line description of the routine.
DESCRIPTION Gives
a fuller account of what the routine does. This description will
include a brief definition of some of the concepts required to understand
the technique and an outline of the algorithms employed. It should
contain sufficient information for a user to fully comprehend the action
of the routine. An effort has been made to make the descriptions
succinct. Users should read descriptions carefully if they are
unfamiliar with the action of a particular algorithm.
PARAMETERS
This gives a complete list of what information must be supplied by the
user in order to run a routine. It contains a list of all the
information requested on the forms when a routine is executed.
This list is indented in such a way as to make it clear what exactly
appears on the forms.
For each entry
on the form the manual gives the defaults provided by UCINET.
This can be useful in trying to locate files that have been created
by the software, or when re-running a particular routine with different
parameters.
In addition the
manual gives additional information (to the help line on the form) about
how to complete each entry on the form.
If the routine
requires a dataset (which most usually do) then the manual specifies
precisely which type of data can be analyzed. These are as follows:
Graph - an n??n
symmetric binary adjacency matrix.
Digraph - an n??n
not necessarily symmetric binary adjacency matrix.
Valued graph -
an n??n
matrix. The entries are usually reals, sometimes there are restrictions
on the values to integers or the matrix to symmetric.
Square matrix -
an n??n
matrix. The entries are usually reals, sometimes there are restrictions
on the values to integers or probabilities. Obviously valued graph and
square matrix are the same data type, it is just convention which dictates
usage.
Matrix - an n??m
matrix. The entries are usually reals. These can be restricted
to binary or integer.
Each data type
is contained within the next. So, for example, any routine that
accepts valued graphs will run on digraphs or graphs.
Some routines contain
options which will run on different data types. In this case the
data type given in the manual is the most general.
Certain options dictated by the parameters may not run with this data
type. It should be apparent from the manual which data types will
be applicable for the selected parameters.
Routines which
take specific action on multirelational data have this indicated in
the data type specification. For example, the routine specified
by
TRANSFORM>SEMIGROUP
has as its data
type Digraph.Multirelational. This indicates that this routine
acts on multirelational data in a particular way. If this data
type is not included and a multirelational data set is submitted for
analysis then UCINET will perform the analysis on each relation separately,
if possible. In some cases such an action would not make network
sense, and in other cases it is simply not technically possible to do
this. In these cases the routine only acts on the first relation.
LOG FILE The
LOG FILE contains output generated by each routine. The contents
of the file are displayed on the screen and the user can browse, edit,
save or print it. For each routine a comprehensive account of
the contents of the file is given.
TIMING The
timing gives the order of the routine related to the longest dimension
of the data matrix, which is called N. Care should be taken on
the interpretation of this value since it only gives the order of the
polynomial (if one exists) which dictates the time. Hence a time
O(N^3) means that for sufficiently large N the time to execute will
increase at the rate of N^3. It is quite possible for the user
to increase N for an O(N^3) routine by a factor of 2 say, and the execution
time to increase by 20-fold instead of the expected 8-fold increase.
This would be because N was not sufficiently large for the highest order
to dominate. Equally well it cannot be used to compare two different
routines.
Whilst caution
is wise for a strict interpretation, it will be true that for O(N^3)
routine doubling the size of N will probably cause the execution time
to increase by approximately a factor of 8. Timings which are
exponential mean that the user should be aware that small increases
in N may cause very large increases in execution time.
COMMENTS Additional
comments which may be of help to the user are given in this section.
REFERENCES A
'sample' of useful references which should enable the interested user
to gain more information.
#^{18}$^{19}STANDARD
DATASETS
Ucinet comes with a collection
of network datasets. Multirelational data are stored, where possible,
in a single multirelational data file. Each relation within a multirelational
set is labelled and information about the form of the data is described
for each individual matrix.
Scribble5010BERNARD
& KILLWORTH FRATERNITY
BERNARD
& KILLWORTH HAM RADIO Scribble5020
Scribble5025BERNARD
& KILLWORTH OFFICE
Scribble5027BERNARD
& KILLWORTH TECHNICAL
Scribble5206CAMP
92
Scribble5201COUNTRIES
TRADE DATA
Scribble5029DAVIS
SOUTHERN CLUB WOMEN
Scribble5191FREEMAN'S
EIES DATA
5031GAGNON
& MACRAE PRISON
GALASKIEWICZ'S
CEO'S AND CLUBSScribble5211
KAPFERER MINEScribble5041
Scribble5051KAPFERER
TAILOR SHOP
Scribble5061KNOKE
BUREAUCRACIES
Scribble5181KRACKHARDT
HIGH-TECH MANAGERS
Scribble5071KRACKHARDT
OFFICE CSS
Scribble5081NEWCOMB
FRATERNITY
Scribble5091PADGETT
FLORENTINE FAMILIES
Scribble5101READ
HIGHLAND TRIBES
Scribble5111ROETHLISBERGER
& DICKSON BANK WIRING ROOM
Scribble5121SAMPSON
MONASTERY
Scribble5131SCHWIMMER
TARO EXCHANGE
Scribble5141STOKMAN-ZIEGLER
CORPORATE INTERLOCKS
Scribble5151THURMAN
OFFICE
Scribble5161WOLFE
PRIMATES
Scribble5171 ZACHARY
KARATE CLUB
#^{20}$^{21}K^{22}DATA>EDIT
PURPOSE Edit
or create a UCINET dataset using a spreadsheet style editor.
DESCRIPTION All
UCINET data files store the data as a matrix. Upon execution of
this routine a spreadsheet style editor is invoked. The spreadsheet
layout is very similar to that found on other spreadsheets such as Excel,
and hence should be familiar to most users.
Each element of
the data occupies a cell in the spreadsheet. The data matrix is
displayed exactly in matrix form. The user can move around the
matrix using the keys , ¯, ¬
and ® to move from one cell to an adjacent
cell, and by using the scroll bars to move around the whole dataset.
If there is more than one matrix then the tabs at the bottom can be
used to move between matrices. When the cursor is located
in a particular cell the position of the cursor is recorded on the screen
in terms highlighted row and column numbers of the cell.
If the rows and/or
columns are labeled then the labels are displayed at the top and side
of the screen.
If your data is
symmetric click the Asymmetric mode button before you enter any data
this will automatically fill in the other half of your data. You need
only enter the non-zero values in the spreadsheet, once these have been
filled in then click on the button marked Fill all empty cells will
be given a value of zero up to the dimensions specified. If your data
has more than one relation then add the extra matrices using the edit
button and selecting insert sheet.
The editor allows
some simple analysis and transformations. These are exactly the same
routines as contained in the menu and the user should read the help
files associated with these to obtain relevant information.
PARAMETERS N/A.
LOG FILE None.
TIMING Linear.
REFERENCES None.
#^{23}$^{24}K^{25}DATA
> RANDOM > MATRIX
PURPOSE Generate
matrices where the cell values are drawn randomly from a variety of
possible distributions.
DESCRIPTION Generate
a set of m??n matrices whose elements are random numbers drawn from
any of the following distributions - uniform, normal, binomial, Poisson,
gamma or exponential.
PARAMETERS
# of rows: (Default = 10).
The number of rows
in the random matrix to be generated.
# of columns: (Default = 10).
The number of columns
in the random matrix to be generated.
# of levels: (Default = 1).
The number of matrices
to be generated, all matrices will be of the same dimension.
Probability distribution: (Default = Uniform).
The underlying
distribution from which the elements of the matrix are taken.
Choices are:
Uniform
Each cell value
is taken from a [0,1] uniform distribution so that each cell value is
between 0 and 1. The mean is 0.5.
Normal
Each cell value is taken from a normal distribution.
Upon execution
of the routine with this option a new window will appear with the following
parameters:
Mean
of normal distribution
(Default = 0.0)
Standard
deviation of normal distribution
(Default = 1.0).
Binomial
Each cell is filled
with the number of times an event with probability p occurs in n trials.
Upon execution
of the routine with this option a window will appear with the following
parameters:
Event probability: (Default = 0.5)
This gives the
probability p of success, i.e. the probability of an event occurring
during one trial.
# of trials (Default = 1).
This gives the
desired number of repeated trials n. The mean is np.
Poisson
Each cell is filled
with the number of times an event occurred in a unit interval of time
assuming a Poisson process.
Upon execution
of the routine a window will appear with the following parameter:
Average # of occurrences per time period (Default = 1.0).
This gives the
mean of the distribution.
Gamma
Each cell is filled
with the time taken for the kth occurrence of an event to occur assuming
the event follows a Poisson process with an average of one occurrence
per time period.
Upon execution
of the routine a window will appear with the following parameter:
Desired # of occurrences (Default = 1).
The number k of
events which must occur. The value k=1 gives the exponential distribution.
The mean is k.
Exponential
Each cell is filled
with the time taken for the 1st occurrence of an event to occur assuming
the event follows a Poisson process with an average of one occurrence
per time period. The mean is 1.
Include diagonal values: (Default = YES).
NO will give missing
values on the main diagonal.
Generator Seed:
A seed for random
number generator. Use of the same number will create exactly the
same 'random' matrix twice. Any value from 1 to 32000 is permissible.
The default is randomly generated.
Output dataset: (Default = 'Random').
Name of data file
which will contain random matrix.
LOG FILE Generated
random matrix. The cells of the random matrix will be of the following
type:
UNIFORM - real range [0,1].
NORMAL - real range (-¥,¥).
BINOMIAL - integer range [0,¥).
POISSON - integer range [0,¥).
GAMMA - real range (0,¥).
EXPONENTIAL - real
range (0,¥).
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{26}$^{27}K^{28}DATA
>RANDOM > SOCIOMETRIC
PURPOSE A
random digraph is created in which edges are generated with the constraint
that each vertex has a user specified out-degree.
PARAMETERS
Number of nodes (Default = 10)
The size of digraph
to be constructed.
Number of graphs (Default = 1)
This specifies
the number of relations to be generated.
# of choices per actor (out-degree)
Specifies the out-degree for each actor. A single number will specify the same out-degree for each actor.The degree of each actor can be specified by a list. Each element of the list is separated by a space or comma. If the list is shorter than the number of nodes then it is extended by repeating from the first element. Values greater than the maximum out-degree are reduced to the maximum value.
The list can be
specified by a UCINET data file. This must be of the form:
<filename>
ROW (or COLUMN) <number>
where filename
is the name of the data file. The command ROW or COLUMN followed by
the appropriate number specifies which row or column of the dataset
is to be used.
Generate self loops (Default = No)
If NO edges connecting
a node to itself will not be allowed.
Random generator seed:
A seed for the
random number generator. Use of the same number will create exactly
the same 'random' graph. Any value from 1 to 32000 is permissible.
The default is randomly generated.
OUTPUT dataset (Default = 'SociometricRandomGraph')
Name of file which
contains generated digraph.
LOG FILE Table of specified out-degrees.
Randomly generated
digraph which conforms to the specification.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{29}$^{30}K^{31}DATA
> RANDOM > BERNOULLI
PURPOSE Generate
a random network taken from a Bernoulli distribution.
DESCRIPTION A
random network is created in which edges are generated independently
from a Bernoulli distribution.
A random number
between 0 and 1 is generated for each cell in an adjacency matrix.
If this number is less than a user specified probability then an edge
is created. Users can specify a single probability for the whole
matrix, or different probabilities
for each row, column or cell. The whole procedure can be repeated
for a number of trials to create an integer valued network.
PARAMETERS
Number of nodes (Default = 10)
The size of the
graph to be constructed.
Number of graphs (Default = 1)
This specifies
the number of relations to be generated.
Number of trials per cell (Default = 1)
The number of repeated
trials per cell. A value of 1 will give a binary matrix.
Values greater than 1 will give entries which correspond to the number
of successes in the given number of trials.
What probabilities will you supply (Default = Matrix)
Choices are:
Matrix - in which a single probability is
used for the entire matrix.
Row - a set of probabilities, one for each
row is used.
Column - a set of probabilities, one for
each column is used.
Cell
- a complete matrix of
probabilities one for each cell is prescribed.
Once an option
has been selected the routine highlights parameters which are dependent
on the option selected.
MATRIX
option:
Probability of a tie (Default = 0.5)
A single probability
applicable to the whole matrix should be specified.
ROW
option:
Row probabilities dataset:
Name of file which
will contain dataset with row probabilities.
Probabilities are ROW or COLUMN of the dataset (Default = Column)
Row means that
probabilities will be taken from a particular row of the dataset.
Column specifies a column.
Which row/column (Default = 1)
Specifies which
row or column of the dataset is to be used.
COLUMN
option:
Column probabilities dataset:
Name of file which
will contain dataset of column probabilities.
Probabilities are Row or Column of the dataset (Default = Column)
Row means that
probabilities will be taken from a particular row of the dataset.
Column specifies a column.
Which row/column (Default = 1)
Specifies which row or column of the dataset is to be used.
CELL
option:
Cell probabilities dataset:
Name of file which
will contain the matrix of probabilities.
Generate self-loops (Default = No)
No means that nodes cannot be connected to themselves.
Yes means
that self-loops may be generated.
Random generator seed:
A seed for the
random number generator. Use of the same number will create exactly
the same 'random' graph twice. Any value from 1 to 32000 is permissible.
The default is randomly generated.
Output dataset (Default = 'RandomBernoul')
Name of file which
will contain random graph.
LOG FILE Generated
random graph.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{32}$^{33}K^{34}DATA
> RANDOM > MULTINOMIAL
PURPOSE Generate random valued graphs in which
the values are distributed by user assigned probabilities.
DESCRIPTION The
user specifies N, the total number of cases in the simulated "sample".
The algorithm randomly distributes the N cases into the cells of the
adjacency matrix. This distribution can either be uniform, in
which case each cell has the same probability of being assigned one
of the cases, or the distribution can be user specified. In this
case the algorithm randomly assigns each case in proportion to the cell
probabilities. The probabilities can be specified by row, column
or individual cells. The result is a value for each directed arc
in the network.
PARAMETERS
Number of nodes (Default = 10)
Number of nodes
in each valued adjacency matrix to be created.
Number of graphs (Default = 1)
Number of random
matrices to be created.
Total number of cases (sum of values)
Total number of
values to be distributed across all cells in adjacency matrix.
Default is n(n-1) where n is the number of nodes.
What probabilities will you supply (Default = Matrix)
Choices are:
Matrix - a single probability is used for
the entire matrix.
Row - a set of probabilities, one for each
row is used.
Column
- a set of probabilities, one for each column is used.
Row*Column - two sets of probabilities are prescribed,
one for the rows and one for the columns. The probability for
each cell is the product of the probabilities prescribed for its row
and column.
Cell - a complete matrix of probabilities,
one for each cell is prescribed.
Once an option
has been selected the routine highlights parameters which are dependent
on the option selected.
Row
option
Row probabilities dataset:
Name of file which
contains probabilities for each row, it is assumed that the required
probabilities will be contained in a matrix.
Probabilities are Row or Column of this dataset: (Default = Column)
Specify Row or
Column as required.
Which Row/Column (Default = 1)
Number of row or
column required.
Column
option
Column probabilities dataset:
Name of file which
contains probabilities for each column, it is assumed that the required
probabilities will be contained in a matrix.
Probabilities are Row or Column of this dataset: (Default = Column)
Specify Row or
Column as required.
Which Row/Column: (Default = 1)
Number of row or
column required.
Row*Column option
Two datasets are
provided row probabilities as in row option and column probabilities
as in column option.
Cell
option
Cell probabilities dataset:
Name of file which
contains matrix of probabilities.
Generate self loops: (Default = No)
If NO then there
will be no ties on the diagonal.
Random number seed:
UCINET generates
a different random number as a default each time it is run. Use
of the same seed will result in the same 'random' graph. The range
is 1 to 32000.
Output dataset (Default = 'MultinomialRandomGraph')
Name of file which
will contain generated random network.
LOG FILE The
log file contains a display of each random matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{35}$^{36}K^{37}DATA
> IMPORT>DL
PURPOSE Convert
text (ie ASCII) data files in DL format to UCINET format.
DESCRIPTION Imports
ASCII files, that is plain text files which are in DL format into UCINET.
These files can be created externally or using the UCINET text editor,
more information is contained in the users guide or in the .YXWXN DL help.
PARAMETERS
Input dataset:
Name of DL type
file containing data to be imported. Data type: ASCII or text.
Output data type: (Default = Real)
Choices are:
Byte - whole numbers in the range 0 to 255 inclusive.
Missing values
are not allowed.
Smallint - whole numbers in the range -32000 to 32000.
Missing values
are not allowed.
Real - real numbers in the range -1.E36 to 1.E36.
Missing values
permissible.
Output dataset:
Name of UCINET
data file, this will be set to the same name as the text file by default.
LOG FILE UCINET
data file.
TIMING O(N^2).
COMMENTS None
#^{38}$^{39}DATA
> IMPORT>MULTIPLE DL FILES
PURPOSE Converts
multiple text (ie ASCII) data files in DL format to UCINET format.
DESCRIPTION Imports
multiple ASCII files, that is plain text files which are in DL format
into UCINET. These files can be created externally or using the UCINET
text editor, more information is contained in the users guide or in
the .YXWXNWithin UCINET these files retain their
plain text filename. Files which are of the same size on the same set
of actors can be imported into a single multirelational dataset. This
dataset can be given a new name in this routine. Imported files are
of real type. DL
help.
PARAMETERS
Input files:
Names of DL type
files containing data to be imported. Each file should be specified
starting on a new row. Data type: ASCII or text.
Output files
Choices are:
Load all data into a single dataset with multiple relations
Each dl file must
contain the same nodes. It is possible to change the default name from Multiple to a user selected name.
Load each file into a separate dataset
If this is chosen
the each imported file will have the dl filename as its UCINET filename
LOG FILE UCINET
data files
TIMING O(N^2).
COMMENTS None
#^{40}$^{41}DATA
> IMPORT > PAJEK
PURPOSE Convert
Pajek data files into UCINET format.
DESCRIPTION Imports
Pajek files for use by UCINET, both the network in the form of
an adjacency matrix and the co-ordinates of the nodes in the plot may
be imported.
PARAMETERS
Input dataset:
Name of file containing
data to be imported. Data type: ASCII file.
Output UCINET Network
Name of UCINET
data file to contain the network details, default is the same name as
the input dataset.
Output Coordinate dataset
Name of UCINET
data file to contain the coordinate details, default is the same name
as the input dataset with Crd added to the name.
LOG FILE A
display of the UCINET data file.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{42}$^{43}K^{44}DATA
> IMPORT > KRACKPLOT
PURPOSE Convert
Krackplot data files into UCINET format.
DESCRIPTION Imports
Krackplot files for use by UCINET both the network in the form of an
adjacency matrix and the co-ordinates of the nodes in the plot may be
imported.
PARAMETERS
Input dataset:
Name of file containing
data to be imported. Data type: ASCII file.
(Output) Network dataset
Name of UCINET
data file to contain the network details, default is the same name as
the input dataset.
(Output) Coordinate dataset (Default = 'Kpcrd')
Name of UCINET
data file to contain the coordinate details, default is the same name
as the input dataset with Crd added to the name.
LOG FILE A
display of the UCINET data file.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{45}$^{46}DATA
> IMPORT>VNA
PURPOSE Convert
Netdraw vna data into UCINET format.
DESCRIPTION Imports
vna network and attribute data into UCINET. Netdraw vna files can be
pure attribute or network or contain both network and attribute data.
This procedure allows the user to import all of these types into UCINET.
PARAMETERS
Input file:
Name of vna to
be imported.
Output network:
Name of UCINET
data file to contain the network details, default is the same name as
the input dataset followed by -Net. Unchecking the associated box means
this file will not be created.
Output Attributes:
Name of UCINET data file to contain the attribute details, default is the same name as the input dataset followed by -Attr. Unchecking the associated box means this file will not be created.
.
LOG FILE UCINET
data files.
TIMING O(N^2).
COMMENTS Information
on the VNA format can be found online at *! ExecFile("www.analytictech.com/netdraw.htm") www.analytictech.com/netdraw.htm
#^{47}$^{48}DATA
> IMPORT>RAW
PURPOSE Convert
a text file (that is an ASCII file) containing a matrix into UCINET
for windows format.
DESCRIPTION Imports
a text file (that is an ASCII file) containing a matrix into UCINET
for windows format. The datafile must be pure text with spaces, commas
or carriage returns between the characters.
PARAMETERS
Input dataset:
Name of text file
to be imported.
# of columns
The number of columns
in the data matrix.
# of rows
The number of rows
in the data matrix
Output data type: (Default = Real)
Choices are:
Byte - whole numbers in the range 0 to 255 inclusive.
Missing values
are not allowed.
Smallint - whole numbers in the range -32000 to 32000.
Missing values
are not allowed.
Real - real numbers in the range -1.E36 to 1.E36.
Missing values
permissible.
Output dataset:
Name of UCINET
data file, this will be set to the same name as the input file by default.
LOG FILE UCINET
data file.
TIMING O(N^2).
#^{49}$^{50}DATA
> IMPORT>EXCEL
PURPOSE Convert
EXCEL files (4.0 or 5.0/95) into UCINET format.
DESCRIPTION Imports
simple EXCEL files (4.0 or 5.0/95) into UCINET format. Note that the
spreadsheet must have no extras such as shading or borders.
PARAMETERS
Input dataset:
Name of EXCEL type
file containing data to be imported.
Output dataset:
Name of UCINET
data file, this will be set to the same name as the input file by default.
LOG FILE UCINET
data file.
TIMING O(N^2).
COMMENTS This
is very sensitive and many users find it easier to copy and paste from
their spreadsheet into the UCINET spreadsheet. The easiest way is to
copy the data only (ie not the labels) paste into the UCINET spreadsheet
by first blocking the same dimensions as you wish to import. To import
the labels save them and use the label import feature inScribble11 DESCRIBE.
#^{51}$^{52}K^{53}DATA
> IMPORT > NEGOPY
PURPOSE Convert
text files formatted for the Negopy program into UCINET datasets.
DESCRIPTION Reads
the .dat and .nam Negopy files and creates a UCINET dataset.
PARAMETERS
Input link file: <*.dat>
Name of file, such
as TRADE71.DAT, containing ties among actors. Format of the file looks
like this:
(2I3,1F5.1,1f3.1)
19 23 156.7 26.2
19 28 162.3 28.9
...
The first line
is a Fortran format statement, required by Negopy but ignored by UCINET.
You can just put a blank line if you like. The second line indicates
a tie from person 19 to person 23, of strength 156.7 on the first relation,
and of strength 26.2 on the second relation.
Input name file: <*.nam>
Name of file, such
as TRADE71.NAM, containing labels of actors. Format looks like this:
(1I2,1X,1A30)
01 Billy-Bob
02 Johnny
...
Number of relations: (Default = 1)
Number of relations
contained in the input link file (i.e., the number of columns of data
after the two actor id numbers).
Output dataset: (Default = 'Imported')
Name of UCINET
dataset to be created.
LOG FILE Data
displayed in matrix form.
TIMING O(N^2).
COMMENTS Negopy
is a program written by Bill Richards and Andy Seary.
#^{54}$^{55}K^{56}DATA
> EXPORT>DL
PURPOSE Convert
UCINET data files into DL format.
DESCRIPTION Converts
UCINET data files into DL format, for a full description of the DL format
go to .YXWXN . help
dl
PARAMETERS
Input dataset:
Name of file containing
data to be exported. Data type: Matrix.
Output format : (Default = "Full matrix")
Choices are:
Full matrix
A complete N??N
matrix;
Lowerhalf
Gives the lower-triangle
and should only be used for symmetric matrices.
Upper half
Gives the upper-triangle
and should only be used for symmetric matrices.
Nodelist1
This is used on
binary matrices only. Each line of data consists of a row number (call
it i) followed by a list of column numbers (call each one
j) such that x(i,j) = 1.
Nodelist1B
This is used on
binary matrices only. Each line of data corresponds to a matrix row
(call it i). The first number on the line is the number of non-zero
cells in that row. This is followed by a list of column numbers (call
each one j) such that x(i,j) = 1. Note that rows must appear
in numerical order, and none may be skipped (unlike the Nodelist1 format).
Nodelist2
Each line begins
with a row id number followed by a list of column id numbers that are
connected to that row number. For use in 2-mode matrices
Edgelist1
This format is
used on data forming a matrix in which the rows and columns refer to
the same kinds of objects (e.g., an illness-by-illness proximity matrix,
or a person-by-person network). The 1-mode matrix X is built from pairs
of indices (a row and a column indicator). Pairs are typed one to a
line, with indices separated by spaces or commas. The presence of a
pair i,j indicates that there is a link from i to j,
which is to say a non-zero value in x(i,j). Optionally, the pair may
be followed by a value representing an attribute of the link, such as
its strength or quality. If no value is present, it is assumed to be
1.0. If a pair is omitted altogether, it is assigned a value of 0.0.
Edgelist2
This is used on
data forming a matrix in which the rows and columns refer to different
kinds of objects (e.g., illnesses and treatments). The 2-mode matrix
X is built from pairs of indices (a row and a column indicator). Pairs
are one to a line, with indices separated by spaces. The presence of
a pair i,j indicates that there is a link from row i to
column j, which is to say a non-zero value in x(i,j). If the
pair is followed by a value then this is the strength of the tie. If
no value is present, it is assumed to be 1.0. If a pair is omitted altogether,
it is assigned a value of 0.0.
Diagonals present (Default = Present)
If Absent diagonal
values will not be written to file.
(edgelist only) Type
Specify whether
the data is directed or undirected
Decimal places: (Default = 0)
The number of places
of decimals required. The default will correspond to the number
of places of decimals in the original UCINET data file. A smaller
value will result in rounding to the nearest value. A value of
0 will indicate Integer values only.
Field width (Default = Freefield)
Freefield will simply place each row of a matrix on a new line with no attempt to align the columns.
Automatic
will align the rows and columns into a matrix format. The user
can also specify the number of spaces for each field - this number should
be greater than the number of decimal places in the field.
Guaranteed space (Default = Yes)
Yes separates
each number in every row by a space. No prints each number
in a continuous list.
Page width (Default=10000):
The maximum width
of the output page.
Embed row labels(Default=No):
Should these labels
be embedded.
Embed column labels(Default=No):
Should these labels
be embedded.
Embed matrix labels(Default=No):
Should these labels
be embedded.
Output dataset:
Name of file to
be created with .txt file extension.
LOG FILE A
text DL data file of type specified.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{57}$^{58}DATA>EXPORT
>KRACKPLOT
PURPOSE Convert
UCINET data files into Krackplot format.
DESCRIPTION Converts
UCINET data files including co-ordinate and attribute files into Krackplot
format.
PARAMETERS
(Input) Network dataset:
Name of file containing
data to be exported. Data type: Matrix.
(Input) Co-ordinate dataset
Name of file containing
co-ordinates of points for the layout of the data. These are as in the
co-ordinate output of MDS. If there are no co-ordinates then this can
be left blank.
Node attributes (if any)
Name of file containing
actor attributes, given as a vector of shared attributes so that (1,2,3,1,2,2)
means that actors 1 and 4 share the same attribute actors 2,5,and 6
share the same attribute and actor 3 has a different attribute from
all the others.
Output data file:
Name of file to
be created.
LOG FILE Krackplot
data file.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{59}$^{60}DATA>EXPORT
>MAGE
PURPOSE Convert
UCINET data files into Mage format.
DESCRIPTION Converts
UCINET data files including co-ordinate files and attribute files into
Mage format for 3D visualization.
PARAMETERS
(Input) Network dataset:
Name of file containing
network data to be exported. Data type: Digraph
(Input) Co-ordinate dataset
Name of file containing
co-ordinates of points for the layout of the data. These are as in the
co-ordinate output of MDS. If there are no co-ordinates then this can
be left blank.
Node attributes (if any)
Name of file containing
actor attributes, given as a vector of shared attributes so that (1,2,3,1,2,2)
means that actors 1 and 4 share the same attribute actors 2,5,and 6
share the same attribute and actor 3 has a different attribute from
all the others. These attributes can be used in Mage to color the nodes
according to the attribute.
Ball Size (Default = 0.15)
Radius of the nodes
in the image, a value of zero eliminates nodes, typically values are
from 0.05 to 0.5.
Line thickness (Default = 2)
A number from 1
to 5 which specifies the thickness of the lines.
Arrow Size (Default = 0.25)
Size of arrow heads,
typically values are from 0.05 to 0.5.
Arrow Angle (Default = 20)
The angle that
the arrow makes with the edge in degrees.
Font Size (Default = 20)
Size of the font
used on the image to display the node labels
Output File
Name of file to
be created, normally the file extension should be .kin.
Launch Mage on Exit (Default = 'Yes')
If yes exported
file is immediately displayed in Mage
LOG FILE Mage
data file.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{61}$^{62}DATA>EXPORT
> PAJEK > NETWORK
PURPOSE Convert
UCINET graph or digraph files into Pajek format together with any categorical
attribute files.
DESCRIPTION Converts
UCINET data files into Pajek format, the conversion can take valued
data and dichotomize it during the export and also export associated
categorical attribute files together with co-ordinate files. The conversion
will also automatically delete isolated vertices if required.
PARAMETERS
(Input) Network dataset:
Name of file containing
network data to be exported. Data type: Valued digraph.
Dichotomize vals > than:
For valued data
a cut-off value used to convert the data to a binary matrix, for binary
data leave blank.
Delete isolates? (Default = 'No')
If yes isolated
vertices are not included in the exported file
(Input) Co-ordinate dataset
Name of file containing
co-ordinates of points for the layout of the data. These are as in the
co-ordinate output of MDS. If there are no co-ordinates then this can
be left blank.
(Input) Attribute dataset
Name of file containing
categorical actor attributes, given as a vector of shared attributes
so that (1,2,3,1,2,2) means that actors 1 and 4 share the same attribute
actors 2,5,and 6 share the same attribute and actor 3 has a different
attribute from all the others. If there is more than one attribute this
can be combined into an attribute matrix with the rows representing
the actors and each column corresponding to a different attribute.
Output Attribute file:
Name of Pajek attribute
file to be created. If there is more than one attribute then one file
will be created for each attribute with the same file name but with
the column number added as the last character in the name. Pajek categorical
attribute files have the file extension .clu.
Output Network file:
Name of Pajek file
containing the adjacency matrix of the network, the file has .net as
an extension.
Launch Pajek on exit?
If yes then Pajek
is launched on exit.
LOG FILE Pajek
.net data file.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{63}$^{64}DATA>EXPORT
> PAJEK > CATEGORICAL ATTRIBUTE
PURPOSE Convert
UCINET categorical attribute files into a Pajek file.
DESCRIPTION Converts
UCINET categorical attribute files into Pajek format ie Pajek clu files.
The conversion can take a matrix of attributes and create a set of Pajek
clu files one for each column of the matrix. These files can be used
in Pajek to color the nodes according to a particular attribute.
PARAMETERS
(Input) Attribute dataset
Name of file containing
categorical actor attributes, given as a vector of shared attributes
so that (1,2,3,1,2,2) means that actors 1 and 4 share the same attribute
actors 2,5,and 6 share the same attribute and actor 3 has a different
attribute from all the others. If there is more than one attribute this
can be combined into an attribute matrix with the rows representing
the actors and each column corresponding to a different attribute.
Output file(s) prefix:
Name of Pajek attribute
file to be created. If there is more than one attribute then one file
will be created for each attribute with the same file name but with
the column number added as the last character in the name. Pajek categorical
attribute files have the file extension .clu.
LOG FILE Lists
the Pajek clu files created
TIMING O(N)
COMMENTS None.
REFERENCES None.
#^{65}$^{66}DATA>EXPORT
> PAJEK > QUANTITATIVE ATTRIBUTE
PURPOSE Convert
UCINET quantative attribute files into a Pajek file.
DESCRIPTION Converts
UCINET quantative attribute files into Pajek format ie Pajek vec files.
The conversion can take a matrix of attributes and create a set of Pajek
vec files one for each column of the matrix. These files can be used
in Pajek to change the sizes of the nodes according to a particular
attribute.
PARAMETERS (Input) Attribute dataset
Name of file containing
quantative actor attributes. These maybe attributes of the actors Eg
age or possibly network attributes Eg centrality. If there is
more than one attribute this can be combined into an attribute matrix
with the rows representing the actors and each column corresponding
to a different attribute.
Output file(s) prefix:
Name of Pajek attribute
file to be created. If there is more than one attribute then one file
will be created for each attribute with the same file name but with
the column number added as the last character in the name. Pajek quantative
attribute files have the file extension .vec.
LOG FILE Lists
the Pajek vec files created
TIMING O(N)
COMMENTS None.
REFERENCES None.
#^{67}$^{68}DATA>EXPORT
> METIS
PURPOSE Convert
UCINET network files into Metis files.
DESCRIPTION Converts
UCINET datafiles either binary or valued but only symmetric into data
files for the Metis partitioning software.
PARAMETERS
Input dataset
Name of UCINET
data file containing network. Data Type: Valued symmetric graph
Type of Data
Choices are Binary
or Valued.
Output Dataset
Name of Metis file
to be created, note there are no prescribed file extensions.
LOG FILE Metis
file created
TIMING O(N)
COMMENTS None.
REFERENCES None.
#^{69}$^{70}DATA
> EXPORT>RAW
PURPOSE Convert
UCINET data files into raw format.
DESCRIPTION Converts
UCINET data files into raw format, these are the same as the DL format
but without the headers, for full information of the DL formats
go to help
dl.YXWXN .
PARAMETERS
Input dataset:
Name of file containing
data to be exported. Data type: Matrix.
Output format : (Default = "Full matrix")
Choices are:
Full matrix
A complete N??N
matrix;
Lowerhalf
Gives the lower-triangle
and should only be used for symmetric matrices.
Upper half
Gives the upper-triangle
and should only be used for symmetric matrices.
Nodelist1
This is used on
binary matrices only. Each line of data consists of a row number (call
it i) followed by a list of column numbers (call each one
j) such that x(i,j) = 1.
Nodelist1B
This is used on
binary matrices only. Each line of data corresponds to a matrix row
(call it i). The first number on the line is the number of non-zero
cells in that row. This is followed by a list of column numbers (call
each one j) such that x(i,j) = 1. Note that rows must appear
in numerical order, and none may be skipped (unlike the Nodelist1 format).
Nodelist2
Each line begins
with a row id number followed by a list of column id numbers that are
connected to that row number. For use in 2-mode matrices
Edgelist1
This format is
used on data forming a matrix in which the rows and columns refer to
the same kinds of objects (e.g., an illness-by-illness proximity matrix,
or a person-by-person network). The 1-mode matrix X is built from pairs
of indices (a row and a column indicator). Pairs are typed one to a
line, with indices separated by spaces or commas. The presence of a
pair i,j indicates that there is a link from i to j,
which is to say a non-zero value in x_{(i,j)}. Optionally, the pair may be followed
by a value representing an attribute of the link, such as its strength
or quality. If no value is present, it is assumed to be 1.0. If a pair
is omitted altogether, it is assigned a value of 0.0.
Edgelist2
This is used on
data forming a matrix in which the rows and columns refer to different
kinds of objects (e.g., illnesses and treatments). The 2-mode matrix
X is built from pairs of indices (a row and a column indicator). Pairs
are one to a line, with indices separated by spaces. The presence of
a pair i,j indicates that there is a link from row i to
column j, which is to say a non-zero value in x(i,j). If the
pair is followed by a value then this is the strength of the tie. If
no value is present, it is assumed to be 1.0. If a pair is omitted altogether,
it is assigned a value of 0.0.
Diagonals present (Default = Present)
If Absent diagonal
values will not be written to file.
(edgelist only) Type
Specify whether
the data is directed or undirected
Decimal places: (Default = 0)
The number of places
of decimals required. The default will correspond to the number
of places of decimals in the original UCINET data file. A smaller
value will result in rounding to the nearest value. A value of
0 will indicate Integer values only.
Field width (Default = Freefield)
Freefield will simply place each row of a matrix on a new line with no attempt to align the columns.
Automatic
will align the rows and columns into a matrix format. The user
can also specify the number of spaces for each field - this number should
be greater than the number of decimal places in the field.
Guaranteed space (Default = Yes)
Yes separates
each number in every row by a space. No prints each number
in a continuous list.
Page width (Default=10000):
The maximum width
of the output page.
Embed row labels(Default=No):
Should these labels
be embedded.
Embed column labels(Default=No):
Should these labels
be embedded.
Embed matrix labels(Default=No):
Should these labels
be embedded.
Output dataset:
Name of file to
be created with txt file extension.
LOG FILE A
text data file of type specified.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{71}$^{72}K^{73}DATA
> EXPORT > UCINET 3.0
PURPOSE Convert
UCINET data files into Ucinet 3.0 format.
DESCRIPTION Converts
UCINET data files into Ucinet 3.0 format.
PARAMETERS
Input dataset:
Name of file containing
data to be exported. Data type: Matrix.
Output format:
Choices are:
Lower triangular matrix
Symmetric square matrix
Non-symmetric square matrix
Rectangular square matrix
Stacked square matrices
Stacked
triangular matrices
Output data type:
Choices are:
Binary
Non-Binary
Decimal places:
The number of decimal
places to include.
Output data file:
Name of file to
be created.
LOG FILE Ucinet
3.0 data file.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{74}$^{75}K^{76}DATA>EXPORT>EXCEL
PURPOSE Export
a UCINET dataset to Excel format.
DESCRIPTION Creates
an Excel spreadsheet file in either Excel 4, Excel 5and 7, Tab
delimited or HMTL
PARAMETERS
Input dataset:
Name of dataset
to be converted. Data type: any UCINET file.
Which version of Excel:
Choices are:
Excel 5 and 7
Excel 4
Tab-Delimited Text File
HTML
LOG FILE None
TIMING N/A
COMMENTS Note
that Tab-delimited and HTML are more flexible and are recommended for
later versions of Excel. The spreadsheet editor can also be used to
save datsets as Excel files
REFERENCES None
#^{77}$^{78}K^{79}DATA
>ATTRIBUTE
PURPOSE Create
a network from attribute data.
DESCRIPTION Convert
a vector of valued attributes to a matrix based upon either exact matches,
differences, absolute differences, squared differences, product or sums
of the values.
PARAMETERS
Dataset containing attribute vector:
Name of data file
containing vector of valued attributes. This vector must be a row or
column of a matrix , it can be the only row or column. Data type: Matrix
Vector is Row or Column?:
Choose either row
or column
Which Row/Col (Default = 1)
The number of the
row or column that contains the attributes to be converted.
Method: (Default = Absolute Difference).
Choices are:
Exact Matches
Matrix X is formed
by X(i,j) = 1 if vector(i) = vector(j) and 0 otherwise.
Difference
Matrix X is formed
by X(i,j) = vector(i) - vector(j).
Absolute Difference
Matrix X is formed
by X(i,j) = ABS (Vector(i) - vector(j)).
Squared Difference
Matrix X is formed
by X(i,j) = (vector(i) - vector(j))^2.
Product
Matrix X is formed
by X(i,j) = vector(i) * vector(j).
Sum
Matrix X is formed
by X(i,j) = vector(i) + vector(j).
Output dataset:
Name of file which
contains constructed matrix.
LOG FILE Constructed
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{80}$^{81}K^{82}DATA
>AFFILIATIONS
PURPOSE Create
a network from affiliation data.
DESCRIPTION Converts
an m??n
matrix to an m??m
or n??n
by forming AA'^{ }or
A'A
using two different types of binary multiplication. Given
a binary
incidence matrix A where the rows represent actors and the columns events, then the matrix AA'
gives the number of events in which actors simultaneously attended.
Hence AA'^{ }(i,j) is the number of events attended
by both actor i and actor j. The matrix A'A gives the number of events simultaneously
attended by a pair of actors. Hence A'A(i,j)
is the number of actors who attended both event i and event j. If the
data is valued there are two options. The cross product (or co-occurence
method) constructs the standard matrix product as in the binary case.
The minimum method takes taking the minimum of the two values in the
sums instead of the products. Hence if row i was (5,6,0,1) and row j
was (4,2,4,0) then AA'(i,j) is 5*4+6*2+0*4+1*0=32 for the cross product
and min(5,4)+min(6,2)+min(0,4)+min(1,0)=6 for the minimum method. These
produce the same answers for binary data.
The routine also
allows for the final matrix to be normalized to accomodate the different
sizes of the events. Consider two actors i and j and let X be the product
of the number events they both attended and the number of events they
both did not attend, let Y be the product of the number events i attended
and j did not with the number of events j attended and i did not. If
X=Y the normalized entry is 0.5 otherwise it is (X-SQRT(XY))/(X-Y).
PARAMETERS
Input dataset:
Name of file containing
2-mode dataset. Data type: Matrix
Which mode: (Default = Row).
Choices are:
Row
Represents row
by row matrix of overlaps, i.e. forms AA'
Column
Represents column
by column matrix of overlaps, i.e. forms A'A.
Method:
Choices are:
Cross-Products (co-occurrence)
Uses standard matrix
multiplication.
Minimums (for valued data)
Uses matrix multiplication
in which the binary operation of multiplication is replaced by taking
the minimum.
Normalization:
Selecting none
gives the raw products. Bonacich '72 normalization gives the values
as described in the description above.
Output dataset: (Default = 'Affiliations').
Name of file which
contains new matrix.
LOG FILE New
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES Bonnacich
P. (1972) 'Techniques for analyzing overlapping memberships' Sociological
Methodology 176-185 Jossey-Bass.
#^{83}$^{84}K^{85}DATA>CSS
PURPOSE Combines
a number of different relations or cognitive "slices" of the
same network into a single pooled network. These may either be a number
of views of the whole network or the view of the whole network through
all ego centered networks.
DESCRIPTION The
input is a set of k adjacency matrices, each of the form A(i,j) stacked
into a three-dimensional matrix, A(i,j,k). This form is useful
for cognitive social structures, where k refers to the perceiver of
a relation from i to j. This routine compresses this 3-D matrix
into a two-dimensional matrix, A'(i,j) using one of two methods.
One is to compute the element-wise sum over the k matrices: A'(i,j)
= SUM over k of A(i,j,k) This matrix can be dichotomized around
a threshold to produce a "consensus" structure.
Alternatively,
one can produce a "locally aggregated structure" (LAS) by
setting A'(i,j) = A(i,j,i)+A(i,j,j). In other words, the value
of a given cell in the aggregate matrix is a function only of the perceptions
of the two individuals involved, not the whole group. This matrix
can also be dichotomized.
PARAMETERS
Input dataset:
Name of file containing
any set of matrices representing the same network. Data type: Valued
graph. Multirelational.
Method of Pooling graphs (Default = Slice)
Choices are:
Slice.
Take an individuals view of the network. This simply extracts a single
matrix from the structure.
Row
LAS.
Construct a matrix which uses each respondents row as a row in the data
matrix. The result is that each row of the data corresponds to the respondents
perception of that row.
Column
LAS.
Construct a matrix which uses each respondents column as a column in
the data matrix. The result is that each column of the data corresponds
to the respondents perception of that column.
Intersection
LAS.
Construct a matrix with a connection between i and j if both i and j
agree that such a connection exists.
Union
LAS.
Construct a matrix with a connection between i and j if either i or
j state that such a connection exists.
Median
LAS.
Construct a matrix with values A(i,j) which are the median of i's value
of the i,j connection and j's value of the connection.
Consensus.
The consensus takes the sum of all the respondents and then dichotomises
the sum.
Average.
The average of all the respondents view of the network.
If the users choose
either Slice or Consensus then the following parameters will be highlighted.
(For Slice Method) Which informants slice? (Default = 1)
Number of actor
to be the informant
(For consensus method) Threshold value (Default =0.5)
Threshold value
for dichotomising the aggregated matrix.
Output dataset (Default = 'Pooled')
Output file that
will contain pooled graph .
LOG FILE Pooled
graph adjacency matrix.
TIMING O(N^2)
COMMENTS None.
REFERENCES Krackhardt
D. (1987). 'Cognitive social structures'. Social Networks 9, 104-134.
#^{86}$^{87}K^{88}DATA>DISPLAY
PURPOSE Display
UCINET datasets on the screen.
DESCRIPTION Allows
display of all or part of any UCINET dataset.
PARAMETERS
Data Set Filename
Name of file to
be displayed. Data type: Matrix.
Width of Fields (Default = Min)
The width of field
gives the size allocated for the width of each cell. The default
value will display the number in each cell separated by a single space.
# of decimals (Default = Min)
Defines the number
of places of decimals to be displayed. The default will give the
number of the original data up to a maximum of 2 places of decimals.
Print zeros as (Default = 0)
Enter blank to
suppress zeros.
Scale factor (Default= 1)
Scales up entries
by multiplying them by the scale factor. Useful for seeing small numbers.
Note the data is left unchanged.
Which rows (Default = All)
Rows to be displayed
are done so in the order specified by a row list. Each element
of the list is separated by a comma or space. The keywords, TO,
FIRST and LAST are permissible. Hence 3, 7 TO 9, FIRST 2 will
display rows 3, 7, 8, 9, 1 and 2 in that order.
Which cols (Default = All)
Columns to be displayed
are done so in the order specified by a column list in the same way
as the rows above.
Row blocking (if any): (Default = None)
To partition the
rows of the displayed matrix into blocks, specify a blocking vector
by giving the dataset name, a dimension and an integer value. For example,
to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and
use that information to sort the rows of the matrix. All rows with identical
values on the criterion vector (i.e. the second row of attrib) will
be placed in the same block of the matrix.
Column blocking (if any): (Default = None)
To partition the
columns of the displayed matrix into blocks, specify a blocking vector
by giving the dataset name, a dimension and an integer value. For example,
to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and
use that information to sort the columns of the matrix. All columns
with identical values on the criterion vector (i.e. the second row of
attrib) will be placed in the same block of the matrix.
LOG FILE Display
of UCINET dataset, or part of dataset as prescribed.
TIMING Linear.
COMMENTS 'Width
of Field' should be greater than # of places of decimals. If this
is not the case data is still displayed with no spaces between cells
causing the labels to be incorrectly aligned.
REFERENCES None.
#^{89}$^{90}K^{91}DATA>DESCRIBE
PURPOSE Gives
a description of a UCINET dataset and allows the user to import, enter
or edit the labels
DESCRIPTION Displays
information contained in UCINET header file, this includes the data
type; number of dimensions, size of matrix, title and labels. The labels
can be edited, entered or imported. To edit an existing label simply
double click on the label and perform the edit. The edits will only
be kept if the file is saved using the 'save as' button. To type in
a new set of labels change the label flag from false to true and double
click in the label box. Proceed as an edit remembering to save the file
when you have finished. You can import labels saved in ASCII by clicking
on the import button and then entering the appropriate file name.
PARAMETERS None
LOG FILE None
TIMING Linear.
COMMENTS None.
REFERENCES None.
#^{92}$^{93}K^{94}DATA>EXTRACT
PURPOSE To
extract parts of a dataset from a UCINET dataset.
DESCRIPTION Extracts
by means of specified lists rows, columns or matrices from UCINET IV
datasets.
PARAMETERS
Input dataset:
Name of file from
which data is to be extracted. Data type: matrix.
Are you going to Keep or Delete (Default = Keep)
User can either
specify which rows, columns or matrices form the new dataset or which
rows, columns or matrices will be deleted to form the new dataset.
Which rows (Default = All (None))
Rows to be kept
or dropped are specified by a list. Each row number is listed separated
by a comma or space. The keywords TO, FIRST and LAST are permissible.
Hence FIRST 3, 5 TO 7, 10, 12 would give row numbers 1, 2, 3, 5, 6,
7, 10 and 12. ALL gives all possible rows, NONE gives no rows. Lists
kept in a UCINET dataset can be used. Enter the filename followed by
ROW (or COLUMN) and a number to specify which row or column of the file
to use.The list must be specified using a binary vector where a 1 in
position k indicates that vertex k is a member of the list, a zero indicates
that k is not a member.
Which columns (Default = All (None))
Same as above but
for columns.
Which matrices(Default = All (None))
In multirelational
data matrices from different levels can be selected using the same list
format as above.
Output dataset: (Default = 'Extract')
Name of UCINET
dataset that will contain edited data.
LOG FILE Newly
created dataset with labeled rows and columns.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
#^{95}$^{96}DATA
>EXTRACT MAIN COMPONENT
PURPOSE Extracts
the largest component from a network.
DESCRIPTION The
main component of a network is the component with the largest number
of vertices. This routine extracts the largest weak component of a directed
network and creates a new UCINET dataset consisting of just that componet.
If the data is undirected then this would simply be the largest component
of the network.
PARAMETERS
Input network dataset:
Name of network
file from which the component is to be extracted. Data type: Valued
Graph
Output network:
Name of UCINET
dataset that will contain the main weak component Its defaiult name
will be the name of the input network dataset with -main appended to
the end.
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{97}$^{98}DATA>SUBGRAPHS
FROM PARTITIONS
PURPOSE Create
new UCINET datasets from a network that correspond to a partition indicator
vector.
DESCRIPTION A
partition indicator vector for a network divides all the nodes into
mutually exclusive groups. That is each node is placed in one and only
one group and this group is given by the indicator vector. A value
of k in row i of the vector means that actor i is in group k.
All other actors in the same group should be assigned the same value
k. The number of different k's in the vector gives the number of groups
in the partition. This routine creates separate UCINET datasets consisting
of nodes in the same partition. Different partitions on the same dataset
can be represented by a partition matrix where the columns (or rows)
of the matrix represent different partitions.
PARAMETERS
Input network:
Name of file containing
network from which partitions are to be extracted.
Input Partition:
Name of
new UCINET dataset that contains the partition matrix. The name of the
matrix should be enetered followed by the word col (or row) and the
col (row) number corresponding to the column of the parttion that is
to be extracted. If the matrix is a vector and only has one column this
still needs to be included. Eg Partition col 1.
Output Prefix
Each class within
a partition creates a new dataset. This will have the name specified
in the output prefix followed by a dash and the number k in the partition
indicator vector. The default is the same as the name of the input file.
Make files for which classes of groups?
A check box list
corresponding to all the groups within the partition. A new datafile
will be created for each one checked. The list gives the size of each
group. These can be checked or unchecked by hand or if all groups above
a certain size are required a minimum size can be set in the bok on
the right. This will uncheck all groups smaller than the size set.
LOG FILE List
of datsets created
TIMING O(N)
COMMENTS None
REFERENCES None
#^{99}$^{100}DATA>EGONET
PURPOSE Construct
an ego centered network from the whole network
DESCRIPTION The
neighborhood of an actor is the set of actors they are connected to
together with the actors that are connected to them. An ego centered
network is the subgraph induced by the set of neighbors. That is the
network that consists of all the neighbors and the connections between
them. The idea of an ego network can be extended to a group of
actors and the neighborhood is simply the union of the neighborhoods
of the group. This procedure returns the adjacency matrix of the ego
network and provides an option to include or exclude ego(s) from the
network
PARAMETERS Input Dataset
Name of file containing
the network from which the egonet is to be constructed.
Focal Nodes
The node or nodes
on whom the neighborhood will be built. Nodes are specified by a list.
Each node is listed separated by a comma or space. The keywords
TO, FIRST and LAST are permissible. Hence FIRST 3, 5 TO 7, 10, 12 would
give nodes 1, 2, 3, 5, 6, 7, 10 and 12. Lists kept in a UCINET
dataset can be used. Enter the filename followed by ROW (or COLUMN)
and a number to specify which row or column of the file to use.The list
must be specified using a binary vector where a 1 in position k indicates
that vertex k is a member of the list, a zero indicates that k is not
a member.
Include focal? (Default = 'Yes')
Whether to include
the focal nodes in the network or not.
Output dataset (Default = 'Neighborhood')
Name of file containing
adjacency matrix of the ego network.
LOG FILE Ego
network adjacency matrix.
TIMING O(N)
COMMENTS None
REFERENCES None
#^{101}$^{102}DATA
>REMOVE ISOLATES
PURPOSE Delete
all isolates in a network.
DESCRIPTION An
isolate is a vertex of degree zero. This routine deletes all the isolates
from the network.
PARAMETERS
Input network data
Name of file containing
network data . Data type: Valued Graph
Output network
Name of file containing
network with isolates deleted. The default is the name of the input
network followed by -NoIsolates.
LOG FILE Network
with the isolates deleted.
TIMING O(N^2)
COMMENTS None
REFERENCES None
#^{103}$^{104}DATA
>REMOVE PENDANTS
PURPOSE Iteratively
delete all pendants in a network.
DESCRIPTION A
pendant vertex in an undirected graph has degree one. This routine deletes
all the pendants from the network, this may create new pendants and
these are then deleted. This continues iteratively until there are no
more pendants left. For directed data this routine iteratively removes
vertices with out-degree one and in-degree zero.
PARAMETERS
Input network
Name of file containing
network data . Data type: Valued Graph
Output network (Default= 'PendantsRemoved')
Name of file containing
network with pendants deleted.
LOG FILE Network
with the pendants deleted.
TIMING O(N^2)
COMMENTS None
REFERENCES None
#^{105}$^{106}DATA
> UNPACK
PURPOSE To
unpack matrices from a UCINET dataset.
DESCRIPTION Unpacks
some or all matrices from a UCINET multirelational dataset. This routine
is similar to extract for matrices except it places each extracted matrix
as a single UCINET dataset. Hence extracting n matrices results in n
different single datasets.
PARAMETERS
Input dataset:
Name of file from
which data is to be unpacked. Data type: matrix multirelational.
Which relations to unpack (Default = ALL)
List of relations
to unpack. Each matrix number is listed separated by a comma or space.
The keywords TO, FIRST and LAST are permissible. Hence FIRST 3, 5 TO
7, 10, 12 would give matrix numbers 1, 2, 3, 5, 6, 7, 10 and 12. ALL
gives all possible matrices.
LOG FILE Lists
the filenames of the unpacked matrices
TIMING Linear
COMMENTS None.
REFERENCES None.
#^{107}$^{108}DATA>JOIN
PURPOSE Combine
UCINET data files to form a single data file. Combines sets of single
matrices into a new matrix by merging all rows or all columns.
Also combines sets of single matrices or multi-relational matrices into
one multi-relational matrix.
DESCRIPTION Combines
sets of single matrices, with equal columns, row wise into a larger
matrix. If A1, A2 ... AN are all matrices with R1,
R2, ... RN rows respectively and C columns then these
are merged into the R1 + R2 +...+ RN by C matrix (A1 A2
... AN) transpose.
Also combines sets
of single matrices, with equal rows, column wise into a larger matrix.
If A1, A2, ... AN are all matrices with R rows and C1, C2,
... CN columns respectively then these are merged into the R by C1 +
C2 + CN matrix (A1 A2 ... AN).
Certain UCINET
routines permit the analysis of multiple relations on the same set of
actors. This routine can create a single data file which brings
together all the relevant networks or matrices and makes them suitable
for analysis.
PARAMETERS
Files selected:
Names of datasets each containing one or more matrices. The names should be entered in the order required in the merged data set. To enter a file, highlight one or more files in the Possible Files and click on the > button and they will be moved across. Clicking on < moves the files back. All possible files can be moved across by clicking on >> or <<. To select more than one file press Ctrl and then click. The files will be placed in the order they are selected.
Dims to join (Default = Rows)
Defines which method is to be used.
Choices are:
Rows
Matrices combine
row-wise creating extra rows. Each matrix must be a single relation
with an equal number of columns.
Columns
Matrices combine
column-wise creating extra columns. Each matrix must be a single relation
with an equal number of rows.
Matrices
Matrices appended
as additional matrices or relations. Networks must all have the same
dimensions.
Destination filename (Default = 'Joined')
Name of the file
which will contain merged dataset.
LOG FILE The
merged data set with appropriate labels.
If Rows has been
selected and the original matrices do not have row labels then the new
row labels are of the form i-j indicating that the row was formed from
row j of matrix i.
If Columns has
been selected then the new columns are labeled in a similar way to Row
labels described above.
If Matrices has
been selected then each relation is numbered sequentially.
TIMING Linear.
COMMENTS None.
REFERENCES None.
#^{109}$^{110}DATA
> PERMUTE
PURPOSE Re-order
rows, columns or matrices in a dataset according to a user specified
list.
DESCRIPTION Re-ordering
of matrices can be by a list given at the keyboard or from a dataset.
PARAMETERS
Input dataset:
Name of dataset
to be permuted. Data type: Matrix
New order of rows (Default is the natural order)
Rows are ordered
as specified by a list. Each row number is listed separated by
a comma or space. The keywords TO, FIRST and LAST are permissible.
Hence 5, FIRST 3, 6 TO 8, 4, LAST 2, 9 specifies the order 5, 1, 2,
3, 6, 7, 8, 4, 10, 11, 9.
A UCINET data file
can be specified which contains the order. This must be of the
form
<file name>
ROW (or COLUMN) <number>
where file name
is the name of the data file. The command ROW or COLUMN followed
by the appropriate number specifies which row or column of the dataset
is to be used. The keyword RANDOM is also allowed.
New order of cols (Default is the natural order)
Columns are ordered
by a list using the same convention as for rows.
New order of matrices (Default is the natural order)
Matrices are ordered
by a list using the same convention as for rows.
Output Filename (Default = 'Permuted')
Name of file which
will contain permuted dataset.
LOG FILE Permuted
dataset.
TIMING O(N^2).
COMMENTS There
is a limitation of 255 characters on keyboard entered lists. Lists
longer than 255 characters must be specified in a UCINET dataset.
REFERENCES None.
#^{111}$^{112}DATA
> SORT
PURPOSE Re-orders
nodes in a network so that they correspond to the monotonic ordering
of a prescribed vector.
DESCRIPTION Arranges the nodes of a network so that they are in the same order as an external vector.
The sort can be
either ascending or descending. Hence if the ASCENDING option
is chosen and the external vector is (V1, V2, ... VN), the nodes would
be ordered so that node i would be before node j if and only if Vi £ Vj.
The external vector can be selected from the rows or columns of any
UCINET data matrix.
PARAMETERS
Input dataset
Name of dataset
to be sorted. Data type: Matrix
Dimensions to be arranged:
Choices are:
Both-Both rows and columns are simultaneously sorted
Rows Just the rows are sorted and the column order is preserved
Columns
Just the columns are sorted and the row order is preserved
Sort order (Default = Ascending)
Choices are:
Ascending
Gives a sort which
corresponds to placing the elements of the prescribed vector
in the order from smallest to largest.
Descending
Gives a sort which
corresponds to placing the elements of the prescribed vector
in the order from largest to smallest.
Criterion vector (sort key) :
Either the name
of the UCINET dataset from which the prescribed vector will be taken
with the row or column specified as follows:
<dataset>
ROW (or COLUMN) <number>
where <dataset>
is the name of the dataset containing the criterion vector. The command
ROW or COLUMN followed by the appropriate number specifies which row
or column of the dataset is to be used.
Alternatively,
a list of values may be entered, one for each row or column being sorted.
Each list entry is separated by a comma or a space. There must be as
many values as rows or columns being sorted.
To sort in ascending
or descending order the dataset itself should be used as the key.
Output dataset: (Default = Sorted)
Name of file which
will contain sorted dataset.
LOG FILE Sorted
dataset.
TIMING O(N*LOG(N)).
COMMENTS User
prescribed SORT to a keyboard list is provided by the routine Scribble1170
PERMUTE
REFERENCES None.
#^{113}$^{114}K^{115}DATA
>TRANSPOSE
PURPOSE Take
the transpose of a matrix.
DESCRIPTION Interchanges
the rows and columns of a matrix. Note that this corresponds to taking
the converse of a directed graph. That is, reversing the direction
of every arc.
PARAMETERS
Output dataset (Default = 'Transpose')
Name of file containing
transposed data.
LOG FILE Transposed
matrix.
TIMING O(N^2).
COMMENTS More
complicated transposes for three-dimensional matrices can be done using CN_B.2
TOOLS>MATRIX>ALGEBRA
REFERENCES None.
#^{116}$^{117}K^{118}DATA
>PARTITION TO SETS
PURPOSE Transforms
a partition indicator vector into a group by actor incidence matrix
and display partition by groups.
DESCRIPTION A
partition indicator vector has the form (k1,k2,...,ki...) where ki assigns
vertex i to group ki. So that (1 1 2 1 2) assigns vertices 1,
2 and 4 to block 1; and 3 and 5 to block 2. A group by vertex
incidence matrix has vertices as its columns and the groups as the rows.
A 1 in row i column j indicates that actor j is a member of group i;
the values are zero otherwise.
PARAMETERS
Input dataset:
Partition indicator vector. This can either be entered at the keyboard by specifying the elements of the vector, each number separated by a comma or space or as a UCINET dataset.
For partitions
kept in a UCINET data file enter the filename followed by ROW (or COLUMN)
and a number to specify which row or column of the file to use. Data
type: Partition indicator vector.
Output dataset: (Default = 'PartitionToSets').
Name of file which
will contain group by vertex incidence matrix.
LOG FILE A
list of the groups. Each group is numbered and specified by the
vertices it contains.
TIMING O(N^2)
COMMENTS Partition
indicator vectors enters using the keyboard are restricted to 255 characters.
Longer vectors should be specified using a UCINET dataset.
REFERENCES None.
#^{119}$^{120}K^{121}DATA
>RESHAPE
PURPOSE Reorganize
the data into different size matrix or matrices.
DESCRIPTION This
routine treats any input data as one long list. The list is formed
row by row and, if applicable, level by level. The new matrix
is then filled up row by row and then level by level from this list.
PARAMETERS
# of rows desired (Default = 0)
Number of rows
in reshaped matrix.
# of columns desired (Default = 0)
Number of columns
in reshaped matrix.
# of matrices desired (Default = 1)
Number of different
matrices required.
Output dataset (Default = 'Reshaped')
Name of file containing
reshaped data.
LOG FILE Reshaped
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{122}$^{123}K^{124}DATA
> CREATE NODE SETS
PURPOSE To
create a group indicator vector based on comparing two vectors or a
vector and a number.
DESCRIPTION Given
a vector of attributes or values for every actor and a threshold number
then this routine selects actors which are have a value which is less
than (or greater than) the threshold. More generally the threshold can
itself be a vector so that actors are selected if they have a value
less than (or greater than) the value in the corresponding cell in the
threshold vector. An example of using two vectors would be the selection
of actors whose closeness centrality is less than their degree centrality.
PARAMETERS
Variable 1:
Name of file from
which contains value or attribute vector this must be a UCINET data
file. Enter the filename followed by ROW (or COL) and a number to specify
which row or column of the file to use.
Relational Operator
Criterion by which to compare the actor values or attributes.
Choices are:
LT -Less than
LE -Less than or equal to
EQ -Equal to
NEQ -Not equal to
GE -Greater than or equal to
GT -Greater than
Variable 2
The threshold value
or vector. If a single value is required then this can be typed in directly.
Vectors must be specified using a UCINET data file, enter the filename
followed by ROW (or COLUMN) and a number to specify which row or column
of the file to use.
Output dataset (Default = 'SELECTED')
Name of file to contain group indicator matrix. This will be a single column vector with selected actors having a 1 and non-selected actors having a 0.
LOG FILE Displays
the group indicator vector.
TIMING Linear
COMMENTS The
group indicator vector can be used in routines such as G4RLJE Extract
REFERENCES None.
#^{125}$^{126}K^{127}TRANSFORM
> BLOCK
PURPOSE Partition
nodes in a data graph into blocks and calculate block densities, sums
or other statistics.
DESCRIPTION The adjacency matrix is partitioned into submatrices. The average, sum, maximum, minimum, standard deviation, or sum of squares of each submatrix is then calculated.
This routine is
virtually identical to the Networks>Properties>Density routine,
except that it provides more options for aggregating cells within a
matrix block.
PARAMETERS
Input dataset:
Name of file containing
matrices to be blocked. Data type: Matrix.
Method: (Default = Average)
Choices are
Average -Arithmetic mean of all cells in each submatrix.
Sum -Simple sum of all cells in each submatrix.
Maximum -Largest value of all cells in each submatrix.
Minimum -Smallest value of all cells in each submatrix.
Std Dev -Standard deviation of all cells in each submatrix.
SSQ
-Sum of squares of all cells in each submatrix.
Utilize Diagonal values (Default = No)
Whether diagonals
are to be included in density calculations.
Row partition/blocking (if any):
To partition the
rows of the data matrix into blocks, specify a blocking vector by giving
the dataset name, a dimension and an integer value. For example, to
use the second row of a dataset called ATTRIB, enter "ATTRIB ROW
2". The program will then read the second row of ATTRIB and use
that information to sort the rows of the matrix. All rows with identical
values on the criterion vector (i.e. the second row of attrib) will
be placed in the same block of the matrix. Densities will then
be computed separately for each block. The block partitions can also
be typed directly into the box by typing in a partition indicator vector.
A partition indicator vector has the form (k1,k2,...,ki...) where ki
assigns vertex i to group ki. So that (1 1 2 1 2) assigns vertices
1, 2 and 4 to block 1; and 3 and 5 to block 2.
Column partition/blocking (if any):
To partition the
columns of the data matrix into blocks, specify a blocking vector by
giving the dataset name, a dimension and an integer value. For example,
to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and
use that information to sort the columns of the data matrix. All columns
with identical values on the criterion vector (i.e. the second row of
attrib) will be placed in the same block of the matrix. Densities
will then be computed separately for each block.The block partitions
can also be typed directly into the box by typing in a partition indicator
vector. A partition indicator vector has the form (k1,k2,...,ki...)
where ki assigns vertex i to group ki. So that (1 1 2 1 2) assigns
vertices 1, 2 and 4 to block 1; and 3 and 5 to block 2.
(Output) Reduced image dataset (Default = 'Blocked')
Name of dataset
that will contain the reduced block density matrix.
(Output) Pre-image dataset (Default= 'PreImage')
Name of dataset
that will contain the original data with the rows and columns permuted
to form the blocks.
LOG FILE List
of block numbers together with their members. The pre-image matrix ie
the permuted original data matrix. Blocked matrices. A blank in the
matrix indicates that a matrix value (such as the average), was undefined.
TIMING O(N^2)
COMMENTS Users
who wish to produce a binary image matrix from the output of this routine
can obtain one by using Transform>Dichotomize.
REFERENCES None.
#^{128}$^{129}K^{130}TRANSFORM>COLLAPSE
PURPOSE Combine
one or more rows or columns of a matrix.
DESCRIPTION Combines
row, columns or both simultaneously to form a new smaller matrix.
The value of the combined cells can either be the average, the sum,
the maximum or the minimum of the set of cells which are to be collapsed.
PARAMETERS
Input dataset
Name of file containing
matrix to be collapsed. Data type: Matrix.
Aggregation operation: (Default = Sum).
Specifies how to
aggregate the cells which are to be collapsed.
Choices are:
Average - The arithmetic mean of all the cells.
Sum - The sum of all the cells.
Maximum- Maximum value of all the cells.
Minimum - Minimum value of all the cells.
Enter instructions for collapsing:
In the window provided the user must provide instructions to the routine which specify which rows or columns must be collapsed.
The following keywords
are used.
ROWS to combine rows.
COLS to combine columns
NODES to combine
rows and columns simultaneously.
Each new
line must commence with one of these keywords. Each keyword is
followed by a list of the rows, columns or nodes which are to be collapsed.
The list has elements separated by spaces or commas, the keywords TO
is permissible. For example:
ROWS 1
3 4
collapse rows 1,
3 and 4 to a single row;
COLS 2 TO 4
COLS 1, 6
collapses columns
2, 3 and 4 to one column and 1 and 6 to another column separately.
(For Square Mats) Include diagonal values? (Default = No).
No excludes diagonal
values from consideration.
OUTPUT dataset: (Default = 'Collapse').
Name of file which
contains labeled collapsed matrix described below.
LOG FILE A
list of assignments of rows and columns to blocks. The blocks
specify the new row and column numbers for each of the old row and column
numbers.
The collapsed matrix.
Each row or column is labeled. Rows or columns that have been
collapsed are labeled by B followed by their block number. Rows
or columns which have not been collapsed retain the label R (for row)
or C (for column) followed by their row or column number.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{131}$^{132}K^{133}TRANSFORM>RECODE
PURPOSE Change
ranges of matrix values to new values.
DESCRIPTION The
routine allows the user to change values or a range of values in a matrix
to a new value. Up to 5 values or ranges can be recoded.
PARAMETERS
Input dataset
Name of dataset
to be recoded. Data type: Matrix.
Rows to recode (Default = All)
Rows to be recoded
are specified by a list. Each row number is listed separated by a comma
or space. The keywords TO, FIRST and LAST are permissible. Hence FIRST
3, 5 TO 7, 10, 12 would give row numbers 1, 2, 3, 5, 6, 7, 10 and 12.
ALL gives all possible rows. Lists kept in a UCINET dataset can be used.
Enter the filename followed by ROW (or COLUMN) and a number to specify
which row or column of the file to use. The list must be specified using
a binary vector where a 1 in position k indicates that vertex k is a
member of the list, a zero indicates that k is not a member.
Cols to recode (Default = All)
Columns to be recoded
are specified by a list. Each column number is listed separated by a
comma or space. The keywords TO, FIRST and LAST are permissible. Hence
FIRST 3, 5 TO 7, 10, 12 would give column numbers 1, 2, 3, 5, 6, 7,
10 and 12. ALL gives all possible columns. Lists kept in a UCINET dataset
can be used. Enter the filename followed by ROW (or COLUMN) and a number
to specify which row or column of the file to use. The list must be
specified using a binary vector where a 1 in position k indicates that
vertex k is a member of the list, a zero indicates that k is not a member.
Mats (levels) to recode (Default = All)
Matrices to be
recoded are specified by a list. Each matrix number is listed separated
by a comma or space. The keywords TO, FIRST and LAST are permissible.
Hence FIRST 3, 5 TO 7, 10, 12 would give matrix numbers 1, 2, 3, 5,
6, 7, 10 and 12. ALL gives all possible matrices. Lists kept in a UCINET
dataset can be used. Enter the filename followed by ROW (or COLUMN)
and a number to specify which row or column of the file to use. The
list must be specified using a binary vector where a 1 in position k
indicates that vertex k is a member of the list, a zero indicates that
k is not a member.
Include diagonal values: (Default = No).
Yes means that diagonal values are recoded.
No
ignores the diagonal in the recoding.
Five boxes of the form
values to
are recoded as
If the values x,
y and z are entered so that the completed line reads
values x to y are
recoded as z
then all values
of the matrix in the range from x to y inclusive are changed to the
value z. To change a single value set both x and y to the value. Note
that the value na can be used for missing values.
Output dataset: (Default = 'Recode').
Name of file which
contains recoded matrix.
LOG FILE Recoded
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{134}$^{135}K^{136}TRANSFORM>REVERSE
PURPOSE Convert
similarity data to distance data, or distance to similarity by a linear
transformation.
DESCRIPTION Subtract
each value of the matrix from the sum of the maximum and minimum entries.
PARAMETERS
Input dataset:
Name of file containing
matrix to be reversed. Data type: Matrix.
Rows to reverse: (Default = 'ALL')
Enter id numbers
of all rows whose values are to be reversed.
Columns to reverse: (Default = 'ALL')
Enter id numbers
of all columns whose values are to be reversed.
Matrices (levels) to reverse: (Default = 'ALL')
Enter id numbers
of all matrices in dataset whose values are to be reversed.
(Sq. matrices only) Include diagonal values (Default = Yes).
Whether diagonals
are to be included in the reversing process.
Output dataset: (Default = 'Reverse').
Name of file that
will contain reversed matrix.
LOG FILE Display
of reversed matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{137}$^{138}K^{139}TRANSFORM>DICHOTOMIZE
PURPOSE Form
a binary matrix from a valued matrix.
DESCRIPTION Given
a specified cut-off value then the valued matrix is made binary by comparing
each element with the cut-off value. Comparisons can be strictly
greater, greater than or equal, equal, less than or equal or strictly
less than.
PARAMETERS
Input dataset:
Name of matrix
to be dichotomized. Data type: Matrix.
Cut-off value: (Default = 0).
Any user-specified
value. MEAN gives the average value of all the cells in
the input matrix.
Diagonal OK? (Default = No)
Yes means that diagonal elements are considered valid in calculating the mean.
No ignores
diagonal values.
Cut-off operator: (Default = 'GT').
Choices are:
GT - Matrix values replaced by a 1 if they are strictly greater than the cut-off value and 0 otherwise.
GE - Matrix values replaced by a 1 if they are greater than or equal to the cut-off value and 0 otherwise.
EQ - Matrix values replaced by a 1 if they are equal to the cut-off value and 0 otherwise.
LE - Matrix values replaced by a 1 if they are less than or equal to the cut-off value and 0 otherwise.
LT - Matrix values replaced by
a 1 if they are strictly less than the cut-off value and 0 otherwise.
Output dataset: (Default = 'Dichotomize').
Name of file which
contains dichotomized matrix.
LOG FILE Dichotomized
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{140}$^{141}K^{142}TRANSFORM
> DIAGONAL
PURPOSE Perform
simple operations on the diagonal of a square matrix.
DESCRIPTION Set
the diagonal of a matrix to a new value. Save the diagonal of
a matrix.
PARAMETERS
Input dataset
Name of file on
which to perform the transformations. Data type: Square matrix.
New diagonal value(s): (Default = 0).
A single value
will set all diagonal elements to the value. A list will set the diagonal
to the values in the list; these values can be separated by a
space or comma. The name of a data file of any UCINET dataset consisting
of a square matrix of the same size. The diagonal of the input
dataset will be set to the same value of the diagonal of the specified
data set.
(Output) Diagonal Dataset: (Default = 'DiagonalSaveDiag').
Name of file which
contains a square matrix with the diagonal of the input dataset as its
diagonal and zeros elsewhere. This file is not displayed in the LOG
FILE.
(Output) Changed Matrix: (Default = 'DiagonalNewMat').
Name of file which
contains matrix with new diagonal values.
LOG FILE Matrix
with reset diagonal.
TIMING O(N).
COMMENTS None.
REFERENCES None.
#^{143}$^{144}K^{145}TRANSFORM>SYMMETRIZE
PURPOSE Change
an unsymmetric matrix into a symmetric matrix by using one of a variety
of criteria.
DESCRIPTION Produces
a symmetric square matrix by one of the following methods. Replace x_{ij}
and x_{ji} by their maximum, minimum, average,
sum, absolute difference, product or x_{ij}/x_{ji} (provided x_{ji} is non zero) i < j. Alternatively
make the lower triangle equal the upper triangle or the upper triangle
equal the lower triangle.
The routine also
produces a symmetric matrix with binary values on all off-diagonal by
replacing x_{ij} and x_{ji} by 1 if x_{ij} > x_{ji} for i £ j. The > operation in x_{ij}
> x_{ji} can be replaced by ³, =, <, £.
If the data has
missing values
PARAMETERS
Input dataset:
Name of file containing
matrix to be symmetrized. Data type: Square matrix.
Symmetrizing method (Default = Maximum).
Choices are:
Maximum - Replace x_{ij} and x_{ji} by max(x_{ij},x_{ji}), i < j.
Minimum - Replace x_{ij} and x_{ji} by min(x_{ij},x_{ji}), i < j.
Average - Replace x_{ij} and x_{ji} by (x_{ij} + x_{ji})/2, i < j.
Sum - Replace x_{ij} and x_{ji} by x_{ij} + x_{ji}, i < j.
Difference - Replace x_{ij} and x_{ji} by abs(x_{ij} - x_{ji}), i < j.
Product - Replace x_{ij} and x_{ji} by x_{ij}x_{ji}, i < j.
Division - Replace x_{ij} and x_{ji} by x_{ij}/x_{ji}, i < j provided x_{ji} is non zero.
Lower Half - Replace x_{ij} by x_{ji}, i < j.
Upper Half - Replace x_{ji} by x_{ij}, i < j.
Upper > Lower - Replace x_{ij} and x_{ji} by 1 if x_{ij} > x_{ji}, by 0 otherwise, i < j.
Upper ³ Lower - Replace x_{ij} and x_{ji} by 1 if x_{ij}³ x_{ji}, by 0 otherwise, i < j.
Upper = Lower - Replace x_{ij} and x_{ji} by 1 if x_{ij} = x_{ji}, by 0 otherwise, i < j.
Upper £ Lower - Replace x_{ij} and x_{ji} by 1 if x_{ij }£ x_{ji}, by 0 otherwise, i < j.
Upper
< Lower
- Replace x_{ij} and x_{ji} by 1 if x_{ij} < x_{ji}, by 0 otherwise, i < j.
Handle missing
Specify how to
treat missing data in the symmetrization process. Choose the non-missing
value allows the user to reduce or even eliminate the number of missing
values in the data. Both missing means that if either value is missing
then this is recorded as missing in the symmetrized data.
Output dataset (Default = 'Symmetrize').
Name of file containing
symmetrized data.
LOG FILE Symmetrized
matrix.
TIMING O(N^2).
COMMENTS None.
#^{146}$^{147}K^{148}TRANSFORM>NORMALIZE
PURPOSE Normalize
the values in a matrix.
DESCRIPTION This
routine normalizes using a variety of techniques.
Each technique
can be applied to either the whole matrix or just the rows or columns.
In addition an iterative facility is provided to Normalize both rows
and columns simultaneously. These operate on the matrix as follows:
Marginal:
normalizes the sum to be 100. This is achieved by dividing by
the current sum of the rows, columns or matrix and multiplying by 100.
Mean: normalizes
the mean to be zero. This is achieved by subtracting from every
row, column, or matrix element the current mean.
Standard Deviation:
normalizes the standard deviation to be one. This is achieved
by dividing the rows, columns or matrix by the current standard
deviation.
Z-Score:
standardizes the mean to be zero and the standard deviation to be one.
This is achieved by subtracting from every row, column or matrix element
the current mean and then dividing the rows, columns or matrix by the
current standard deviation.
Euclidean:
standardizes the Euclidean norm to be one. This is achieved by
dividing the rows, columns or matrix by the current Euclidean norm.
Maximum:
standardizes the rows, columns or matrix to each have a maximum value
of 100. This is achieved by dividing the matrix or each row or
column by the current maximum and multiplying by 100.
The routine also
allows each of these options to be applied to the rows and columns simultaneously.
This involves an iterative procedure in which the technique is first
applied to the rows and then the columns and then the rows etc.
It is terminated when (and if) there is convergence.
PARAMETERS
Input dataset
Name of file containing
matrix to be standardized. Data type: Matrix.
Which dimension(s) to standardize: (Default = Columns).
Choices are:
Rows - Normalization is applied to the rows of the matrix independently.
Columns - Normalization is applied to the columns of the matrix independently.
Matrix - Normalization is applied to the entire matrix.
Both -
Normalization is applied to the rows, then the columns, then the rows
etc iteratively until convergence.
Standardizing criterion: (Default = Marginal).
Choices are:
Marginal
- Forces the sum of elements to be 100. By row, column, matrix
or row and column.
Mean - Forces the mean of elements to be
zero. By row, column, matrix, or row and column.
Std-Dev - Forces the standard deviation to
be one. By row, column, matrix or row and column. If standard
deviation is initially zero then elements of matrix are treated as missing.
Z-Score - Forces the mean of the elements
to be zero and the standard deviation to be 1. By row, column,
matrix or row and column. If standard deviation is initially zero
then elements of matrix are treated as missing.
Euclidean - Forces the Euclidean norm,
to be one. By row, column, matrix or row and column.
Maximum - Forces the maximum of the elements
to be 100. By row, column or row and column. Forces the
maximum element to be one for the whole matrix.
Constant to replace zeros with (Default =0.0)
Zeros can cause
this procedure to crash and this can be overcome by replacing them with
a relatively small value.
(Sq. matrices only) Include diagonal values? (Default = Yes).
Yes includes
diagonals. No treats diagonal values as missing.
(For iterative norm.) Convergence tolerance (Default=0.001)
When both is selected
the routine iterates to convergence the tolerance specifies a point
at which when the values change by less than the tolerance the routine
has converged.
(For iterative norm.) Max # of iterations (Default=100)
When both is selected
the routine iterates to convergence. Convergence will be deemed to have
failed if the tolerance has not been achieved before the maximum number
of iterations has taken place.
Output dataset: (Default = 'Normalize').
Name of file which
contains normalized matrix.
LOG FILE Normalized
matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{149}$^{150}TRANSFORM>ARITHMETIC
OPERATIONS>WITHIN DATASETS>AGGREGATIONS
PURPOSE Perform
simple aritmetic operations across rows, columns or matrices.
DESCRIPTION Find
the sum, average, standard deviation, minimium or maximum of sets of
numbers which are taken from the complete rows, columns, or matrices
or combinations thereof in one dataset.
PARAMETERS Input dataset
Name of file containing
dataset on which the operation is to be performed. Data type: Multirelational
Arithmetic Operation
Name of arithmetic
procedure that is to be performed. Options are sum,
average, standard deviation, minimium or maximum
Break-out results by
Specifies how the
numbers are to be selected from the dataset. Note these can only be
chosen after a dataset has been entered in the 'input dataset' field.
If the data has
more than one level that is more than one matrix then the following
options are presented
nothing - aggregate across rows, cols and matrices to yield one value for entire dataset
rows - aggregate across cols and matrices yielding one value for each row in dataset
cols - aggregate across rows and matrices, yielding one value for each column in dataset
levels - aggregate across rows and colums, yieling one value for each matrix in dataset
rows & cols - aggregate across matrices, yielding one row-by-column matrix summarizing all matrices
rows & levels - aggregate across columns, yielding summary of rows for each matrix
cols
& levels - aggregate across rows, yielding summary of columns for
each matrix in dataset
If the data is
a single matrix then the following options are presented
nothing - aggregate across rows and columns to yield single value for whole matrix
rows - aggregate across columns to yield a number for each row in data matrix
cols
- aggregate across rows to yield a number for each column in data matrix
LOG FILE Matrix
with the results of the arithmetic operation
TIMING Linear
COMMENTS More
complicated operations can be done using matrix algebra
REFERENCES None
#^{151}$^{152}TRANSFORM>ARITHMETIC
OPERATIONS>WITHIN DATASETS>CELLWISE TRANSFORMATIONS
PURPOSE Performs
a cellwise transformation to every element of the dataset.
DESCRIPTION Applies
one of the following transformations to every element in the data, raise
to a power, take the logarithm, negate, take absolute value, take the
reciprocal apply a linear transformation.
PARAMETERS
Input Dataset
Name of file containing
data to be transformed. Data type: Multirelational
Output Dataset
Name of file containg
transformed matrix.
Transformations
Clicking on box
activates the required transformation. Raise to the power, multiply
by a constant and add a constant require the user to type in the constants
in the relevant boxes. The defaults will result in an identity transformation.
The order listed in the tick boxes is the order in which the transformations
will be applied. Hence ticking raise to the power and add a constant
will result in the cells being raised to a power followed by adding
the constant. If the user requires these in a different order the routine
will need to be used more than once applying the required transformation
at each stage.
The user should
also use the radio buttons to select wheether the transformation should
apply to the diagonal for square matrices.
LOG FILE The
result of the transformation is displayed.
TIMING Linear
COMMENTS More
complicated transformations can be made using matrix algebra.
REFERENCES None
#^{153}$^{154}K^{155}TRANSFORM > BIPARTITE
PURPOSE Convert
a 2-mode dataset into a 1-mode adjacency matrix
DESCRIPTION Any
2-mode incidence matrix can be thought of as a bipartite graph. If the
2-modes are actors and events then the bipartite graph consists of the
union of the actors and events as vertices with the edges only connecting
actors with events (ie no connections between actors or between events).
This routine takes a 2-mode incidence matrix and converts it to a 1-mode
adjacency matrix of a bipartite graph. If the incidence matrix had n
rows and m columns then the resultant adjacency matrix would be a square
matrix of dimension m+n.
PARAMETERS
Input 2-mode dataset:
Name of file containing
incidence matrix.
Value to fill within-mode ties: (Default=0.0)
The incidence matrix
specifies the values of ties from actors to events the values of the
(non-existent) ties of actors to actors and events to events is not
given. The user can override the default value of zero by specifying
their own within mode value.
Make result symmetric? (Default = 'No')
If yes is selected
matrix is symmetrized by taking the maximum of Xij and Xji.
Output dataset:(Default='bi')
Name of file containing
adjacency matrix of bipartite graph.
LOG FILE Adjacency
matrix of bipartite graph.
TIMING Linear.
COMMENTS None.
REFERENCES None.
#^{156}$^{157}K^{158}TRANSFORM
> INCIDENCE
PURPOSE Convert
an adjacency matrix to an incidence matrix.
DESCRIPTION An
incidence matrix is a node by edge matrix. The rows represent the nodes
of a graph and the columns the edges. A one in row i column j
indicates that node i is incident to edge j. This representation
is often called the hypergraph representation.
PARAMETERS
Input dataset:
Name of file containing
adjacency matrix. Data type: Digraph
Treat data as directed: (Default=No)
If Yes then reciprocal
ties will occur twice in an incidence matrix.
Include self loops: (Default = No)
If No self loop
ties will be ignored.
Output Filename: (Default = 'Incidence')
Name of data file
which will contain incidence matrix. The columns will be labeled
with edge labels.
LOG FILE Labeled
incidence matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{159}$^{160}K^{161}TRANSFORM
> LINEGRAPH
PURPOSE Construct
the line graph of a graph or network.
DESCRIPTION The
line graph of a graph G is the graph obtained by using the edges of
G as vertices, two vertices being adjacent whenever the corresponding
edges are. In a digraph the arcs of a digraph are the vertices and two
vertices are adjacent if the corresponding arcs induce a walk.
PARAMETERS
Input Dataset:
Name of file containing
graph from which to create the line graph. Data type: Digraph.
Include self-loops: (Default = NO)
NO means that self
loops will not generate vertices in the line graph.
Output dataset: (Default = 'Linegraph')
Name of file which
contains constructed linegraph.
LOG FILE Adjacency
matrix of the line graph vertices labeled with corresponding edges from
original graph
TIMING O(N^2).
COMMENTS Note
that multirelational data cannot be converted to line graph format.
Users should do each relation separately.
REFERENCES None.
#^{162}$^{163}K^{164}TRANSFORM
> MULTIGRAPH
PURPOSE Convert
a valued graph into a set of binary graphs.
DESCRIPTION A
single binary graph is created for each different value of a valued
graph. All created graphs are stacked in a single dataset.
PARAMETERS
Input dataset:
Name of file containing
valued data. Data type: Valued graph.
Splitting operator (Default = EQ)
Choices are:
GT - Greater than yields M_{ijk} = 1 if x_{ij }> w_{k}
GE - Greater than or equal yields M_{ijk} = 1 if x_{ij} ³ w_{k}
EQ - Equal yields M_{ijk} = 1 if x_{ij} = w_{k}
LE - Less than or equal to yields M_{ijk} = 1 if x_{ij} £ w_{k}
LT-
Less than yields M_{ijk}
= 1 if x_{ij} < w_{k}
where M_{ijk}
is the (i,j) entry of the kth adjacency matrix, x_{ij} is the (i,j) entry of the input data,
and w_{k} are the ordered values of the weights
of the valued data placed in ascending order.
Include self- loops (Default = NO)
If NO then self
loops are ignored.
Count zeros as valid relationships: (Default = NO)
If NO then
no binary graph is created corresponding to the w_{k} value zero. If YES then a binary
graph corresponding to the value zero is included in the multigraph.
Output Dataset (Default = 'Multigraph')
Name of file that
will contain multigraph as a set of binary graphs.
LOG FILE Constructed
multigraph.
TIMING O(N^2).
COMMENTS The
number of relations constructed will correspond to the number of different
values. Care should be taken not to enter datasets that will create
a large number of binary graphs.
REFERENCES None.
#^{165}$^{166}K^{167}TRANSFORM>MULTIPLEX
PURPOSE Constructs
a multiplex graph from a multirelational graph.
DESCRIPTION Technically
if G(V,{R_{i}}) is a multirelational graph with
vertex set V and relations {R_{i}}, i e I. If v and w are two vertices
of G then the bundle of relations connecting v to w, B_{vw}, is defined as B_{vw} = {R_{i}: vR_{i}w}. Let M_{k} be the set of all bundles. The multiplex
graph is the valued graph with valued adjacency matrix X_{i,j}
= k where k is the M_{k }bundle of relations connecting i to
j. Non technically the algorithm determines how many different distinct
patterns of relations (the bundles) link any pair of vertices and assigns
each of these a numerical label. The arcs in the output multiplex graph
are then labeled with these identifying numbers.
PARAMETERS
Input dataset:
Name of file that
contains multirelational binary network data. Valued data are automatically
converted to multirelational binary data using a technique identical
to Multigraph. Data type: Digraph. Multirelational.
Include transpose(s) in the multiplexing (Default = No).
For non-symmetric
data the transposes can be automatically added as additional relations.
Convert data to geodesic distances (Default = No)
Option to convert
each relation in dataset to geodesic distances.
Output dataset (Default = 'Multiplex')
Output file that
will contain multiplex graph.
LOG FILE Multiplex
graph adjacency matrix.
TIMING Exponential.
COMMENTS In
the worst case, the timing for the algorithm is exponential. The timing
depends on the number of possible bundles; up to 2 to the power N bundles
can occur when there are N different relations.
REFERENCES None.
#^{168}$^{169}K^{170}TRANSFORM
> SEMIGROUP
PURPOSE Construct
the semigroup of a graph, digraph or multirelational graph.
DESCRIPTION The
semigroup of a network is an algebraic representation of all compound
relations.
Given a set of
adjacency matrices R1,R2,...,Rn of a multirelational graph then the
set of all possible Boolean products of pairs of matrices gives all
possible relations of length 2. If any of these products is repeated
then they are discarded. We continue with products of length 3
etc until no new matrices are found. The set of all matrices constructed
in this way together with the operation of Boolean matrix multiplication
form a semigroup.
This routine finds
all members of the semigroup, or members of the semigroup up to a certain
length of product. In addition the semigroup is specified by a
multiplication table.
PARAMETERS
Input dataset:
Name of file containing
adjacency matrix or matrices. Data type: Digraph. Multirelational.
Maximum length of "words": (Default = 9)
The products are
called words. The maximum length of products to be considered is known
as the word length.
Save elements of semigroup ?: (Default = No)
If only the multiplication
table and words are required then it is not necessary to save the matrix
elements.YES causes all generated matrices to be saved in a file specified
below.
Output semigroup: (Default = 'SEMIGROUP')
Name of file which
will contain all compounded relations provided the save elements of
semigroup parameter was set to YES. These are given as a list,
each relation is sequentially numbered. This file does not appear in
the LOG FILE.
Output multiplication table: (Default = 'MULTABLE')
Name of file which
will contain the multiplication table specified below.
LOG FILE Semigroup
multiplication table.
Each row (and column)
is labeled with the compound relation number. The rows also give
the word that accounts for the compound. Hence if row 6 is labeled
1 1 2 1 then relation 6 is the matrix obtained by Boolean matrix multiplication
of the original relations numbered 1 1 2 1 in that order. The
value in row i column j is the result of the Boolean matrix multiplication
of relation i and relation j.
If the word length
is not sufficient to generate all elements of the semigroup then the
right multiplication table of the generated elements is displayed.
This table gives the product of the generated elements with the input
matrices.
TIMING Algorithm
is exponential.
COMMENTS Relatively
small datasets can result in large semigroups.
REFERENCES None.
#^{171}$^{172}K^{173}TOOLS
> MDS > METRIC
PURPOSE Metric
multidimensional scaling of a proximity matrix.
DESCRIPTION Given
a matrix of proximities (similarities or dissimilarities) among a set
of items, the program finds a set of points in k-dimensional space such
that the Euclidean distances among these points corresponds as closely
as possible to the input proximities.
PARAMETERS
Input dataset
Name of file containing
proximity matrix. Data type: Square symmetric matrix.
No of dimensions: (Default = 2)
Number of dimensions
to use in representing items in Euclidean space.
Similarities or Dissimilarities? (Default = Similarities)
Whether the data
represent similarities or dissimilarities. If similarities, large
values of X(i,j) will draw i and j close together on the MDS map.
If dissimilarities, large values will push i and j apart on the map.
Starting Configuration: (Default = Classic)
How to generate initial location of points in space.
Choices are:
Classic - Performs Gower's classical metric
ordination procedure.
File - Reads starting coordinates from UCINET dataset.
If this option
is chosen then the user must complete the parameter:
Random - Locates points randomly in space.
Starting Config Filename
Name of the coordinate
dataset if the file option is taken. This UCINET dataset should consist
of an nxk matrix of values. Each column corresponds to the co-ordinates
in each of the dimensions specified. Hence row i gives the co-ordinates
of the ith point.
Adjust data to nearest Euclidean (Default = Yes)
Iteratively adjusts
the data so that it obeys the triangle inequality.
Output dataset: (Default = 'MetricMdsCoord')
Name of file containing
the co-ordinates of the points in Euclidean space.
LOG FILE The
output first gives a 2D scatterplot of the first pair of co-ordinates.
The x-axis is the first co-ordinate set and the y-axis is the second.
The scatterplot can be saved or printed. Simple editing can be achieved
using the options button. The labels can be turned on or off and values
can be attached to the points (or removed). The scales can also be changed.
More advanced editing is possible by double clicking in the plot, this
invokes the chart wizard. To find the label attached to a single point
when all the labels are moved click on a single point, this will highlight
all the points, then click a second time to highlight one vertex. Now
double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option
allow the user to save all the chart data into a file which can be reviewed
using Tools>Scatterplot>Review. The chart itself can be
saved as a windows metafile which can then be read into a word processing
or graphics package. Only one chart can be open at one time and
the chart window will be closed if you click on any other UCINET window.
Behind the chart is a numeric display of coordinates of each point
in space together with information about the stress.
TIMING O(N^4)
COMMENTS MDS
solutions are not unique, and they are subject to convergence to local
minima. The first point means that two or more maps can be equally
good (same stress) but place points in radically different locations.
The second point means that it is possible for the algorithm to fail
to find the configuration with least stress. If you suspect this
has happened, run the program several times using random starting configurations.
Stress values below 0.1 are excellent and above 0.2 unacceptable.
This routine only
works if the regional settings are set to UK or USA. If you do not have
these regional settings and do not get a plot then change them in the
settings control panel on your machine.
REFERENCES Gower
#^{174}$^{175}K^{176}TOOLS
> MDS > NON-METRIC
PURPOSE Non-metric
multidimensional scaling of a proximity matrix.
DESCRIPTION Given
a matrix of proximities (similarities or dissimilarities) among a set
of items, program finds a set of points in k-dimensional space such
that the Euclidean distances among these points corresponds as closely
as possible to a rank preserving transformation of the input proximities.
The algorithm is based on the MDS(X) MINISSA program.
PARAMETERS
Input dataset
Name of file containing
proximity matrix. Data type: Square symmetric matrix.
No of dimensions: (Default = 2)
Number of dimensions
to use in representing items in Euclidean space.
Similarities or Dissimilarities? (Default = Similarities)
Whether the data
represent similarities or dissimilarities. If similarities, large
values of X(i,j) will draw i and j close together on the MDS map.
If dissimilarities, large values will push i and j apart on the map.
Starting Configuration: (Default = Torsca)
How to generate initial location of points in space.
Choices are:
Metric
- Performs Gower's classical metric ordination procedure.
Torsca - Uses principal components of rank-order
data.
File - Reads starting coordinates from
UCINET dataset.
Random - Locates points randomly in space.
Starting Config Filename
Name of the coordinate
dataset if the file option is chosen . This UCINET dataset should consist
of an nxk matrix of values. Each column corresponds to the co-ordinates
in each of the dimensions specified. Hence row i gives the co-ordinates
of the ith point.
Print Diagnostics (Default = No)
If Yes is selected
then dyads with large discrepancies between the proximity data and the
plot distances will be printed.
Output dataset: (Default = 'NonMetricMdsCoord')
Name of file containing
the co-ordinates of the points in Euclidean space.
LOG FILE The
output first gives a 2D scatterplot of the first pair of co-ordinates.
The x-axis is the first co-ordinate set and the y-axis is the second.
The scatterplot can be saved or printed. Simple editing can be achieved
using the options button. The labels can be turned on or off and values
can be attached to the points (or removed). The scales can also be changed.
More advanced editing is possible by double clicking in the plot, this
invokes the chart wizard. To find the label attached to a single point
when all the labels are moved click on a single point, this will highlight
all the points, then click a second time to highlight one vertex. Now
double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option
allow the user to save all the chart data into a file which can be reviewed
using Tools>Scatterplot>Review. The chart itself can be
saved as a windows metafile which can then be read into a word processing
or graphics package. Only one chart can be open at one time and
the chart window will be closed if you click on any other UCINET window.
Behind the chart is a numeric display of coordinates of each point
in space together with information about the stress. If the print
diagnostics have been selected then dyads with large differences between
the proximity data and the distances in the co-ordinate date are listed.
TIMING O(N^4)
COMMENTS MDS
solutions are not unique, and they are subject to convergence to local
minima. The first point means that two or more maps can be equally
good (same stress) but place points in radically different locations.
The second point means that it is possible for the algorithm to fail
to find the configuration with least stress. If you suspect this
has happened, run the program several times using random starting configurations.
Stress values below 0.1 are excellent and above 0.2 unacceptable.
This routine only
works if the regional settings are set to UK or USA. If you do not have
these regional settings and do not get a plot then change them in the
settings control panel on your machine.
REFERENCES Kruskal
J B and Wish M (1978). Multidimensional Scaling, Newbury Park:
Sage Publications.
Kruskal J B (1964).
Multidimensional Scaling by optimizing goodness-of-fit to a non-metric
hypothesis. Psychometrika 29, 1-27.
#^{177}$^{178}K^{179}TOOLS
> CLUSTERING > HIERARCHICAL
PURPOSE Perform
Johnson's hierarchical clustering on a proximity matrix.
DESCRIPTION Given
a symmetric n-by-n representing similarities or dissimilarities among
a set of n items, the algorithm finds a series of nested partitions
of the items. The different partitions are ordered according to decreasing
[increasing] levels of similarity [dissimilarity]. The algorithm begins
with the identity partition (in which all items are in different clusters).
It then joins the pair of items most similar (least different), which
are then considered a single entity. The algorithm continues in this
manner until all items have been joined into a single cluster (the complete
partition).
PARAMETERS
Input dataset
Name of file containing
proximity matrix to be clustered. Data type: Square symmetric matrix.
Method: (Default = AVERAGE)
Choices are:
SINGLE_LINK
Also known as the
"minimum" or "connectedness" method. Distance between
two clusters is defined as smallest dissimilarity (largest similarity)
between members.
COMPLETE_LINK
Also known as the
"maximum" or "diameter" method. Distance between
two clusters is defined as largest dissimilarity (smallest similarity)
between members.
AVERAGE
Distance between
clusters defined as average dissimilarity (or similarity) between members.
Similarities or Distances? (Default = Similarities)
Whether items i
and j should be clustered together when X(i,j) is large or when it is
small. If data are Similarities, items i and j are clustered
together if X(i,j) is very large. If data are Dissimilarities,
items i and j are clustered together if X(i,j) is very small.
Compute ultrametric proximity matrix? (Default = NO)
Hierarchical clustering
can be seen as transforming a dissimilarity matrix into an ultrametric
distance matrix. The ultrametric distances correspond monotonically
to the number of iterations (partitions) needed to join a given pair
of items.
Diagram Type: (Default = Dendrogram)
The clustering
can be shown as a dendrogram or a tree
diagram.
Output Partition matrix: (Default = 'Part')
Name of dataset
to contain the partition-by-item indicator matrix. Each column of this
matrix gives the cluster to which each item was assigned in a given
partition. The columns are labeled by the level of the cluster.
A value of k in a column labeled x and row j means that actor j was
in partition k at level x. Actor k is always a member of partition
k and is a representative label for the group. It can be used
by procedures like Transform>Block to obtain density matrices
at any level of blocking. This file is not displayed in the LOG
FILE.
Output Ultrametric matrix (if desired):
Name of dataset
to contain the item-by-item ultrametric proximity matrix, if desired.
LOG FILE Primary
output are cluster diagrams. The first diagram (either a tree diagram
or a dendrogram) re-orders the actors so that they are located close
to other actors in similar clusters. The level at which any pair of
actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top
gives the level at which they are clustered. The diagram can be printed
or saved. Parts of the diagram can be viewed by moving the mouse to
the split point in a tree diagram or the beginning of a line in the
dendrogram and clicking. The first click will highlight a portion of
the diagram and the second click will display just the highlighted portion.
To return to the original right click on the mouse. There is also a
simple zoom facility simply change the values and then press enter.
If the labels need to be edited (particularly the scale labels) then
you should take the partition indicator matrix into the spreadsheet
editor remove or reduce the labels and then submit the edited data to
Tools>Dendrogram>Draw. The output also produces a standard
Log file that contains a different cluster diagram which looks like
this:
A B C D E F G H I J
1
Level 1 2 3 4 5 6 7 8 9 0
----- - - - - - - - - - -
1.000 XXXXX XXX XXX XXXXX
1.422 XXXXX XXX XXXXXXXXX
1.578 XXXXXXXXX XXXXXXXXX
3.287 XXXXXXXXXXXXXXXXXXX
In this example,
the data were distances among 10 items, labeled A through J. The results
are 4 nested partitions, corresponding to rows in the diagram. Within
a given row, an 'X' between two adjacent columns indicates that the
items associated with those columns were assigned to the same cluster
in that partition. For example, in the first partition (level 1.000),
items D and E belong to the same cluster, but C is a member of a different
cluster. In the third partition (level 1.578), items D, E and C all
belong to the same cluster.
The levels indicate
the degree of association (similarity or dissimilarity) among items
within clusters. If, as in the example, the data are distances and the
clustering method is single link, the a level of 1.578 means that every
item within a cluster is no more than 1.578 units distant from at least
one other item in that cluster. If the clustering method is complete
link, a level of 1.578 indicates that every item in a cluster no more
than 1.578 units distant from every other item in the cluster. For the
average clustering method, a level of 1.578 indicates that the average
distance among items within the cluster is 1.578.
For similarity
data, the meaning of the levels for the single link and complete link
methods is, in a sense reversed. For the single link method, a level
of 1.578 means that every item in a cluster is at least 1.578 units
similar to at least one other item in the cluster. For the complete
link method, a level of 1.578 means that every item in a cluster is
at least 1.578 units similar to every other item in the cluster.
TIMING O(N^3)
COMMENTS None.
REFERENCES Johnson, S C (1967). 'Hierarchical clustering schemes'. Psychometrika, 32, 241-253.
#^{180}$^{181}K^{182}TOOLS
> CLUSTERING > OPTIMISATION
PURPOSE Optimizes
a cost function which measures the total distance or similarity within
classes for a proximity matrix.
DESCRIPTION Given
a partition of a proximity matrix of similarities into clusters, then
the average similarity values within each gives a measure of the extent
to which the groups form clusters. A slightly different approach
is required for distance data - in this case the cost is measured by
summing the values for each pair of actors belonging to the same block.
The routine attempts to optimize these measures to try and find the
best fit for a given number of blocks. The cost function can be
changed to give greater weight to relationships between the clusters.
In this case the cost simultaneously reflects a high degree of association
within clusters and a similarity of association between members of different
clusters using a correlation criteria. To do this correlate the data
with an ideal structure matrix A(i,j) in which the i,j th entry is a
one if actor i and j are in the same partition and zero otherwise. This
correlation can either be Pearson correlation or a much faster pseudocorrelation
measure. This cost is then either maximized or minimized depending on
whether the proximity matrix contains similarities or distances.
The similarity value needs to be maximized and the distance measure
minimized. The routine uses a tabu search minimization procedure
and therefore to maximize multiplies the costs by -1.
PARAMETERS
Input dataset:
Name of file containing
proximity matrix to be clustered. Data type: Square symmetric matrix.
Number of clusters: (Default = 2)
Number of clusters
into which the actors must be assigned.
Fit criterion
Density the average value within clusters for similarity and the sum for distance data.
PseudoCorrelation a simple fast correlation measure between the clustered data and the ideal structure matrix.
Correlation the Pearson correlation measure between
the clustered data and the ideal structure matrix.
Are diagonal values valid? (Default = No)
Whether diagonals
are to be included on the cost function.
Type of Data:
Similarities causes large values to be clustered
together. Distances
causes small values to be clustered together.
Max # of iterations in a series: (Default = 12)
The algorithm starts
from an arbitrary partition and attempts to decrease the cost by taking
the steepest descent. If the cost cannot be reduced then the algorithm
continues its search in the neighborhood of the current partition.
This search direction is a mildest ascent direction and from there new
search directions are explored. This exploration only continues
for a fixed number of iterations in a series. If no improvement
is made after the fixed number of iterations the algorithm terminates
with the current minimum. Increasing the parameter gives a more
exhaustive and therefore slower search. The recommended default
value is automatically entered on the form once the input data has been
selected.
Length of time in penalty box: (Default = 5)
If the algorithm
makes an ascending step then it is possible that the best possible descending
step is the reverse of the direction just taken. This parameter
prohibits a move along the reverse direction for a set number of steps.
The larger the value the more difficult it will be to come back to a
previously explored local minimum, however it will also be more difficult
to explore the vicinity of that minimum. The default has been
shown experimentally to be the most useful.
Number of random starts: (Default = 3)
The whole procedure
is repeated with a different initial partition. The best of these
are then selected as a minimum.
Random Number Seed:
The random number
seed generates the initial partition. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat the analysis with different
initial configurations. The range is 1 to 32000.
Output Partition Dataset: (Default = 'TabuCluster').
Name of output
file which contains a partition indicator vector. This vector
has the form (k1,k2,...ki...) where ki assigns vertex i to block ki,
so that (1 1 2 1 2) assigns vertices 1, 2 and 4 to block 1, and 3 and
5 to block 2. This vector is not displayed at output.
LOG FILE The value of the cost function.
List of clusters. Each cluster is labeled and is specified by the vertices it contains.
The blocked proximity
matrix. The rows and columns of the original matrix are permuted
into clusters. The proximity matrix is displayed in terms of the
matrix clusters it contains.
TIMING Each
iteration of the tabu search algorithm is O(N^2).
COMMENTS Care should be taken when using this routine.
The algorithm seeks
to find the minima of the cost function. Even if successful this
result may still have a high value in which case the blocking may not
conform very closely to structural equivalence.
In addition there
may be a number of alternative partitions which also produce the minimum
value; the algorithm does not search for additional solutions.
Finally it is possible that the routine terminates at a local minima
and does not locate the desired global minima.
To test the robustness
of the solution the algorithm should be run a number of times from different
starting configurations. If there is good agreement between these
results then this is a sign that there is a clear split of the data
into the reported blocks.
REFERENCES Glover
F (1989). Tabu Search - Part I. ORSA Journal on Computing
1, 190-206.
Glover F (1990).
Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
#^{183}$^{184}K^{185}TOOLS
-> 2 MODE > SVD
PURPOSE Perform
a singular value decomposition of real-valued matrix.
DESCRIPTION Given
an n-by-m matrix X with n ³ m, SVD finds matrices U, D, and V
such that X = UDV'. The matrix D is an r-by-r diagonal matrix
containing r singular values. The matrix U is an n-by-r matrix
containing the r eigenvectors of XX' and V is an m-by-r matrix containing
the r eigenvectors of X'X. The eigenvectors are sorted in descending
order by eigenvalue. With symmetric data, U and V are identical
(except for sign reversals).
PARAMETERS
Input dataset:
File containing
matrix X to be decomposed; must have at least as many rows as columns
(otherwise transpose the matrix then resubmit). Data type: Matrix.
How to scale row and column scores: (Default = Axes)
Choices are:
Coordinates -
Eigenvectors are
weighted by their respective eigenvalues.
Loadings
Eigenvectors are
weighted by the square root of the eigenvalues (yields factor loadings
when SVD is applied to correlation matrix).
Axes
No rescaling is
performed.
No of factors to save: (Default = 3)
Maximum value of
r, the number of eigenvectors used to decompose X.
Reconstruct matrix from factors: (Default = No)
If YES, the product
UDV' is computed using r eigenvectors (see 'Number of factors to save',
above). The result is the best possible approximation of X using
matrices of rank r based on a least squares criterion.
(Output) File to contain row scores: (Default = 'RScores')
Name of dataset
to contain U matrix.
(Output) File to contain column scores: (Default = 'CScores')
Name of dataset
to contain V matrix.
(Output) File to contain singular values: (Default = 'Eigen')
Name of dataset
to contain D matrix.
(Output) File to contain reconstructed matrix: (Default = 'Recon')
Name of dataset
to contain the approximation X that is UDV'.
(Output) File to contain combined row/column scores: (Default = 'RCScores')
Name of dataset
to contain concatenated U and V matrices to produce single (M+n)-by-r
matrix (useful for plotting row and column scores on same map).
LOG FILE The
output first gives a 2D scatterplot of the first two dimensions (eigenvectors).
The scatterplot can be saved or printed. Simple editing can be achieved
using the options button. The labels can be turned on or off and values
can be attached to the points (or removed). The scales can also be changed.
More advanced editing is possible by double clicking in the plot, this
invokes the chart wizard. To find the label attached to a single point
when all the labels are moved click on a single point, this will highlight
all the points, then click a second time to highlight one vertex. Now
double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option
allow the user to save all the chart data into a file which can be reviewed
using Tools>Scatterplot>Review. The chart itself can be
saved as a windows metafile which can then be read into a word processing
or graphics package. Only one chart can be open at one time and
the chart window will be closed if you click on any other UCINET window.
Behind the chart
is a numeric display of coordinates (U and V matrices) of each
point (rows and columns of X) in r-space.
TIMING O(N^3).
COMMENTS This
routine only gives a plot if the regional settings are set to UK or
USA. If you do not have these regional settings and do not get a plot
then change them in the settings control panel on your machine.
REFERENCES Press
W H, Flannery B P, Teukolsky S A and Vetterling W T (1989). Numerical
Recipes in Pascal. New York: Cambridge University Press.
#^{186}$^{187}K^{188}TOOLS
> 2 MODE > FACTOR ANALYSIS
PURPOSE Perform a complete factor analysis of a 2-mode matrix.
.
DESCRIPTION Decomposes
a matrix into factors using either principal components or minimum residuals
methods.
PARAMETERS
Input dataset.
Name of dataset
containing 2-mode matrix to be factored. Data type: Matrix.
Method of factor analysis (Default = Principal Components)
Choices are
Principal Components
Perform a principle
component analysis in which the matrix is factored into a product of
the most dominant eigenvectors.
Minimum Residuals
Factor the matrix
into factors so that the residuals (the sum of squares of the difference
between the original data and the product of the factors) are minimized.
Method of factor rotation: (Default = Varimax)
Choices are
None
No rotation is
performed
Varimax.
Maximizes purity of factors.
Quartimax. Maximizes purity of variables
(minimizes loading on multiple factors).Factors are rotated after deleting
excess factors (see below).
Number of factors: (Default=3)
Number of factors
into which to decompose the matrix. IMPORTANT NOTE: Factors are rotated
after deleting excess factors.
(OUTPUT) Factor Scores: (Default = 'Scores')
Name of file containing
the factor scores for each actor on each factor.
(OUTPUT) Factor Loadings: (Default = 'Loadings')
Name of file containing
the factor loadings for each actor on each factor.
(OUTPUT) Eigenvectors: (Default= 'Eigen')
Name of file containing
eigenvalues corresponding to each eigenvector (factor).
(OUTPUT) Factor score coefficients: (Default='Coefs')
Name of file containing
the factor coefficients for each actor on each factor.
LOG FILE The
log file gives a full set of descriptive statistics of each actors profile.
These are followed by the eigenvalues placed in descending order of
size and labeled as factors in ascending order. The value of each is
expressed as a percentage of the sum and a cumulative percentage of
all the factors given so far is presented. The final column gives the
ratio of the factor below to the current factor. This is followed by
a matrix of factor loadings, entry X(i,j) is the loading of the jth
factor on actor i.
TIMING O(N^3)
COMMENTS None
REFERENCES None
#^{189}$^{190}K^{191}TOOLS
> 2 MODE > CORRESPONDENCE
PURPOSE Perform
a correspondence analysis of a single real-valued matrix.
DESCRIPTION Given
a non-negative, n-by-m matrix with n ³ m, this routine represents the n rows
and m columns as vectors in a common multidimensional space. The
algorithm essentially performs a singular value decomposition of an
adjusted data matrix in which rows and columns have been separately
normalized to yield more equal marginals.
PARAMETERS
Input dataset:
Name of file containing
matrix to be analyzed, it must have at least as many rows as columns
(otherwise transpose the matrix then resubmit). Data type: Matrix.
How to scale row and column scores: (Default = COORDINATES)
Choices are:
Coordinates - Scores for each point on each dimension
adjusted both for point marginals and dimension weights (eigenvalues).
CGS - According to Carroll-Green-Schaffer,
this transformation makes distance between a row and a column just as
interpretable as distance between a row and a row or a column and a
column.
Optimal - Scores for each point are corrected
for point marginals, but not dimension weights.
Axes - No rescaling is performed.
Number of factors to save: (Default = 3)
Maximum value of
r, the number of eigenvectors used to decompose the matrix.
Reconstruct matrix from factors: (Default = No)
If YES, the row
and column scores are combined to approximate the data matrix with r
eigenvectors (see 'Number of factors to save', above). The result
is the best possible approximation of X using matrices of rank r based
on a least squares criterion.
Keep the trivial first factor: (Default = No)
The Normalization
step prior to singular value decomposition causes first eigenvector
to be constant. If Yes, this factor is retained and eigenvalue
percentages include it. If No, the factor is dropped and
eigenvalue percentages do not include it.
(Output) File to contain row scores: (Default = 'CorrespondenceRScores')
Name of dataset
to contain coordinates of row points.
(Output) File to contain column scores: (Default = 'CorrespondenceCScores')
Name of dataset
to contain coordinates of column points.
(Output) File to contain singular values: (Default = 'CorrespondenceEigen')
Name of dataset
to contain eigenvalue of each dimension.
(Output) File to contain reconstructed matrix: (Default = CorrespondenceRecon')
Name of dataset
to contain the approximated data matrix (if any).
(Output) File to contain combined row/column scores: (Default = 'CorrespondenceRscores')
Name of dataset
to contain concatenated row and column scores to produce single (m+n)-by-r
matrix (useful for plotting row and column scores on same map).
LOG FILE The
output first gives a 2D scatterplot of the first two dimensions (eigenvectors).
The scatterplot can be saved or printed. Simple editing can be achieved
using the options button. The labels can be turned on or off and values
can be attached to the points (or removed). The scales can also be changed.
More advanced editing is possible by double clicking in the plot, this
invokes the chart wizard. To find the label attached to a single point
when all the labels are moved click on a single point, this will highlight
all the points, then click a second time to highlight one vertex. Now
double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option
allow the user to save all the chart data into a file which can be reviewed
using Tools>Scatterplot>Review. The chart itself can be
saved as a windows metafile which can then be read into a word processing
or graphics package. Only one chart can be open at one time and
the chart window will be closed if you click on any other UCINET window.
The log file has
a numeric display of coordinates (eigenvectors) of each point in r-space.
TIMING O(N^3).
COMMENTS See
theKOQFGA routine for more information.
SVD
This routine only
gives a plot if the regional settings are set to UK or USA. If you do
not have these regional settings and do not get a plot then change them
in the settings control panel on your machine.
REFERENCES None.
#^{192}$^{193}K^{194}TOOLS
> SIMILARITIES
PURPOSE Compute
similarities among rows or columns of a matrix using one of various
measures.
DESCRIPTION Given
a matrix with n rows and m columns, the program computes either an n-by-n
matrix of similarities among the rows, or an m-by-m matrix of similarities
among the columns.
PARAMETERS
Input dataset:
Name of file containing
matrix to be analyzed. Data type: Matrix.
Measure of profile similarity: (Default = CORRELATION)
Choices are:
Correlation - Pearson's product-moment correlation.
Covariance - Mean-centered cross products: Sxy/n - SxSy/n^2
Cross-Products - Sum of products: Sxy
Matches - Proportion of cases in which x_{i} = y_{i} for all i
Positive
Matches - Proportion
of cases in which x_{i} = y_{i} given that either x_{i} > 0 or y_{i} > 0 or both
Compute similarities among Rows or Cols: (Default = COLUMNS)
If Rows,
an n-by-n similarity matrix representing the similarity between each
pair of rows is computed. If Columns, an m-by-m similarity matrix
is computed representing the similarity between each pair of columns.
(For sq. mats) Diagonal valid (Default = YES)
If No, values along
the main diagonal are treated as though they were missing.
Output dataset: (Default = 'Similarities')
Name of dataset
to contain output similarity matrix.
LOG FILE Similarity
matrix, displayed with 2 decimal places.
TIMING O(N^3).
COMMENTS Missing
values are ignored.
REFERENCES None.
#^{195}$^{196}K^{197}TOOLS
> DISSIMILARITIES
PURPOSE Compute
dissimilarities among rows or columns of a matrix using one of various
measures.
DESCRIPTION Given
a matrix with n rows and m columns, the program computes either an n-by-n
matrix of dissimilarities among the rows, or an m-by-m matrix of dissimilarities
among the columns.
PARAMETERS
Input dataset:
Name of file containing
matrix to be analyzed. Data type: Matrix.
Measure of profile similarity: (Default = 'EUCLIDEAN')
Choices are:
Euclidean
Euclidean distance:
SQRT(S(x_{i}-y_{i})^2)
. When missing values are present, the computed distance is multiplied
by n/m where n is the size of the vectors and m is the number of non-missing
values.
Manhattan
City-block distance: S
abs(x_{i}-y_{i}) When missing values are present,
the computed distance is multiplied by n/m where n is the size of the
vectors and m is the number of non-missing values.
Normed SSD
Normed sum of squared
differences: S(x_{i}-y_{i})^2/ Sx_{i}^2Sy_{i}^2
Non-Matches
Proportion of cases
in which x_{i} does not equal y_{i} for all i.
Positive Non-Matches
Proportion of cases
in which x_{i} does not equal y_{i} given that either x_{i} > 0 or y_{i} > 0 or both.
Compute dissimilarities among Rows or Cols (Default = COLUMNS)
If Rows,
an n-by-n dissimilarity matrix representing the dissimilarity between
each pair of rows is computed. If Columns an m-by-m dissimilarity
matrix is computed representing the dissimilarity between each pair
of columns.
(For sq. mats) Diagonal valid (Default = YES)
If No, values along
the main diagonal are treated as though they were missing.
Output dataset:(Default = Dissimilarities)
Name of dataset
to contain output dissimilarity matrix.
LOG FILE Dissimilarity
matrix.
TIMING O(N^3).
COMMENTS Missing
values are ignored.
REFERENCES None.
#^{198}$^{199}K^{200}TOOLS
> STATISTICS > UNIVARIATE
PURPOSE Compute
standard univariate statistics on values of a matrix.
DESCRIPTION Procedure
computes mean, standard deviation, variance, Euclidean norm, maximum,
minimum and total number of observations for each row or column of a
matrix, or for the matrix taken as a whole.
PARAMETERS
Input dataset
Name of file containing
matrix to be analyzed. Data
type: Matrix.
Which dimension to analyse: (Default = COLUMNS)
Choices are:
Rows - Statistics are computed separately
for each row in matrix. Result is a matrix whose rows correspond
to the rows of the data matrix and the columns are statistics.
Columns - Statistics are computed separately
for each column in matrix. Result is a matrix whose columns correspond
to the columns of the data matrix and the rows are statistics.
Matrices - Statistics are computed on the matrix
as a whole.
(For square mats) Diagonal valid? (Default = YES)
Whether diagonal
values in square matrices are to be ignored (treated like missing values).
Output Dataset: (Default = 'UnivariateStats')
Name of data set
to contain output statistics.
LOG FILE Matrix
of statistics.
TIMING O(N^2).
COMMENTS Missing
values are ignored.
REFERENCES None.
#^{201}$^{202}K^{203}TOOLS
> STATISTICS > MATRIX (QAP) > QAP-CORRELATION
PURPOSE Compute
correlation and other simiarity measures between entries of two square
matrices, and assess the frequency of random measures as large as actually
observed.
DESCRIPTION The
procedure is principally used to test the association between networks.
Often, one network is an observed network while the other is a model
or expected network.
The algorithm proceeds
in two steps. In the first step, it computes Pearson's correlation
coefficient (plus simple matching, Jaccard, Goodman Kruskal Gamma and
Hamming distance) between corresponding cells of the two data matrices.
In the second step, it randomly permutes rows and columns (synchronously)
of one matrix (the observed matrix, if the distinction is relevant)
and recomputes the correlation and other measures.
The second step
is carried out hundreds of times in order to compute the proportion
of times that a random measure is larger than or equal to the observed
measure calculated in step 1. A low proportion (< 0.05) suggests
a strong relationship between the matrices that is unlikely to have
occurred by chance.
PARAMETERS
Data Matrix:
Name of dataset
containing the first matrix (the observed or dependent matrix, if such
distinctions are meaningful). Data type: Square Matrix.
Structure Matrix:
Name of dataset
containing the expected, modelled or independent matrix (if such distinctions
are meaningful). Data type: Square Matrix.
Number of random permutations: (Default = 500)
Number of correlations
to compute between the data matrix and the randomly permuted structure
matrix. The larger the number of permutations, the better the
estimates of standard error and "significance", but the longer
the computation time.
Treat diagonals as valid? (Default = NO)
If YES, the values
along the main diagonals of each matrix are included in the computation
of correlation. Otherwise, they are treated as missing.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
LOG FILE The
outpt consists of some summary statistics of each of the matrices followed
by the results. The following sample output is generated:
Value Signif Avg SD P(Large) P(Small) NPerm
------- ------ ------- ------ ------- ------- ---------
Pearson Correlation: 0.120 0.101 -0.002 0.086 0.101 0.943 2500.000
Simple Matching: 0.667 0.101 0.625 0.032 0.101 0.943 2500.000
Jaccard Coefficient: 0.176 0.101 0.120 0.039 0.101 0.943 2500.000
Goodman-Kruskal Gamma: 0.319 0.101 -0.021 0.249 0.101 0.943 2500.000
Hamming Distance:70.000 0.101 78.738 6.348
0.943 0.101 2500.000
The Value column
indicates the observed value between the two networks, in this
case 0.120 for correlation and 0.176 for Jaccard. The average
random correlation was almost zero with a standard error of 0.086.
The percentage of random correlations that were as large as 0.120 was
0.101 that is 10.01%. Hence of the 2,500 random permuatations just over
250 produced a correlation of 0.120 or higher. At a typical 0.05
level, this correlation would not be considered significant since 0.101>
0.05. The table gives the P(Large) as well as P(Small) note that for
the Hamming distance it is P(Small) that needs to be considered as smaller
values imply more similarity. The column headed significance attempts
to identify the correct value from P(Small) and P(Large), however
when the observed value is close to zero it can get this wrong since
this selection is based upon whether the observed value is positive
or negative. In this instance the user should consider the measure used
and the type of data.
TIMING O(N^2)
per permutation.
COMMENTS The
program ignores missing values.
REFERENCES None.
#^{204}$^{205}K^{206}TOOLS
> STATISTICS > MATRIX (QAP) > QAP-REGRESSION
PURPOSE Regress
a dependent matrix on one or more independent matrices, and assess significance
of the r-square and regression coefficients.
DESCRIPTION The
procedure is principally used to model a social relation (matrix) using
values of other relations.
The algorithm proceeds
in two steps. In the first step, it performs a standard multiple
regression across corresponding cells of the dependent and independent
matrices.
In the second step,
it randomly permutes rows and columns (together) of the dependent matrix
and recomputes the regression, storing resultant values of r-square
and all coefficients. This step is repeated hundreds of times
in order to estimate standard errors for the statistics of interest.
For each coefficient, the program counts the proportion of random permutations
that yielded a coefficient as extreme as the one computed in step 1.
The primary requirement for conducting a multiple regression quadratic
assignment procedure is that all the variables in the regression have
to be one-mode, two-way matrices. That is, they must all be NxN
networks. Person-by-object or Person-by-event matrices can be
converted to NxN matrices using Data>Affiliations.
PARAMETERS
Dependent variable:
Name of dataset
containing the observed or dependent data: the matrix whose values
are to be predicted. Data type: Square Matrix.
Independent variables:
Names of datasets
containing the independent or predictor matrices. To include more than
one dataset using the browse button highlight all required files by
pressing Ctrl and clicking with the mouse. If the file names are typed
they should be separated by commas with no spaces. File names that contain
spaces should be enclosed in quotation marks. Data type: Square
Matrices.
Number of random permutations: (Default = 500)
Number of regressions
to compute between the data matrix and the randomly permuted structure
matrix. The larger the number of permutations, the better the
estimates of standard error and "significance", but the longer
the computation time.
Treat diagonals as valid? (Default = No)
If Yes, the values
along the main diagonals of each matrix are included in the computations.
Otherwise, they are treated as missing.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
LOG FILE Two
tables are output. The first looks like this:
R-Square One-Tailed Probability
0.023 0.618
The table gives
the observed r-square along with the proportion of random trials yielding
an r-square as large or larger than the observed.
The second table
is as follows:
Unstandardized Two-Tailed
Independent Coefficient Probability
Intercept 0.385965 0.178
R1 -0.007519 0.866
R2 -0.150376 0.170
R3
0.000000 0.838
This table gives
the Unstandardized regression coefficient for each independent variable,
including the intercept, along with the proportion of random trials
yielding a coefficient with an absolute value as large or larger than
the observed. In this example, all the coefficients have non-significant
probabilities, indicating that the observed values are well within the
range of random variation.,
TIMING O(N^2).
COMMENTS The
program ignores missing values.
REFERENCES None.
#^{207}$^{208}TOOLS
> STATISTICS > AUTOCORRELATION > CATEGORICAL > JOIN COUNT
PURPOSE Perform
randomization test of autocorrelation for a symmetric adjacency matrix
which is partitioned into two groups.
DESCRIPTION Relates
a dyadic binary variable (an actor-by-actor adjacency matrix) to a monadic
variable (a vector representing an attribute of each actor). For example,
if the dyadic variable consists of who is friends with whom, and the
categorical variable is gender, the procedure tests whether friendship
is patterned by gender (e.g., do boys prefer boys and girls prefer girls?).
The routine is limited to two groups and is based upon counting the
entries within and between the groups and comparing them with a randomized
model.
PARAMETERS
Input Dataset
Name of file containing
matrix to be analyzed. Data type: Graph
Partition Vector:
The name of an
UCINET dataset that contains a partition of the actors into two groups.
To partition the data matrix into groups specify a vector by giving
the dataset name, a dimension (either row or column) and an integer
value. For example, to use the second row of a dataset called ATTRIB,
enter "ATTRIB ROW 2". The program will then read the second
row of ATTRIB and use that information to define the groups. All actors
with identical values on the criterion vector (i.e. the second row of
attrib) will be placed in the same group.
No. of Permutations: (Default = 10000)
The number of random
permutations required in the test.
Treat diagonals as valid? (Default = No)
If Yes, the values
along the main diagonals of each matrix are included in the computations.
Otherwise, they are treated as missing.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
LOG FILE The actor attributes are recoded to 1 and 2 these are reported.
A table which gives the observed and expected counts for the data. The first row gives the counts within group 1, the second is the counts between the groups and the third is the counts within group 2. The expected simply gives the values that would be expected if the ones were randomly distributed within and between the groups. The observed gives the counts of the data and the difference subtracts the expected from the observed. The P>=Diff and P<=Diff give the relative frequency that a randomly permuted matrix gets a difference as large or larger and as small or smaller than the observed. These columns are used to test the significance of the observed data.
TIMING O(N^2)
COMMENTS None
REFERENCES Cliff,
A D and Ord, J K 1973 Spatial Autocorrelation. Pion, London.
#^{209}$^{210}TOOLS
> STATISTICS > AUTOCORRELATION > CATEGORICAL > RCT ANALYSIS
PURPOSE Perform
randomization test of autocorrelation for a symmetric adjacency matrix
which is partitioned into groups.
DESCRIPTION Relates
a dyadic binary variable (an actor-by-actor adjacency matrix) to a monadic
variable (a vector representing an attribute of each actor). For example,
if the dyadic variable consists of who is friends with whom, and the
categorical variable is gender, the procedure tests whether friendship
is patterned by gender (e.g., do boys prefer boys and girls prefer girls?).
The routine is similar to performing a standard chi squared test except
instead of using the chi squared distribution the underlying distribution
is constructed using a randomization procedure.
PARAMETERS
Input Dataset
Name of file containing
matrix to be analyzed. Data type: Graph
Attribute:
The name of an
UCINET dataset that contains a partition of the actors into two groups.
To partition the data matrix into groups specify a vector by giving
the dataset name, a dimension (either row or column) and an integer
value. For example, to use the second row of a dataset called ATTRIB,
enter "ATTRIB ROW 2". The program will then read the second
row of ATTRIB and use that information to define the groups. All actors
with identical values on the criterion vector (i.e. the second row of
attrib) will be placed in the same group.
No. of Permutations: (Default = 1000)
The number of random
permutations required in the test.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
Output Dataset (Default= 'lltab')
Name of output
dataset that contains the frequencies in the observed data corresponding
to the partition.
LOG FILE The actor attributes are recoded to run from 1 and these are reported.
A table which gives the cross classified frequencies, that is a contingency table corresponding to the attributes and the input dataset.
A table which gives the expected values of the frequencies assuming that the ties are independent and randomly distributed throughout the groups.
The observed values in each cell of the first table divided by the corresponding cell in the second table are then reported. This is followed by the observed chi square value, ie the square of the observed minus the expected divided by the expected value.
The average permutation
frequency table gives the mean values of the entries from all the permutation
tests. Each of the generated entries have their value compared
with the observed value and the significance is the relative frequency
of the number of times the generated value is larger than the observed.
TIMING O(N^2)
COMMENTS None
REFERENCES Cliff,
A D and Ord, J K 1973 Spatial Autocorrelation. Pion, London.
#^{211}$^{212}K^{213}TOOLS
> STATISTICS > AUTOCORRELATION > CATEGORICAL > ANOVA / DENSITY
PURPOSE Perform
randomization test of autocorrelation for a categorical variable.
DESCRIPTION Relates
a dyadic variable (an actor-by-actor matrix) to a monadic variable (a
vector representing an attribute of each actor). For example, if the
dyadic variable consists of who is friends with whom, and the categorical
variable is gender, the procedure tests whether friendship is patterned
by gender (e.g., do boys prefer boys and girls prefer girls?). The test
is based upon the densities within each block and is similar to performing
an analysis of variance. Three different models which have different
patterns of density are possible.
PARAMETERS
Network or Proximity Matrix
Name of file containing
matrix to be analyzed. Data type: Matrix.
Actor Attribute:
Name of file containing
actor attributes, given as a vector of shared attributes so that (1,2,3,1,2,2)
means that actors 1 and 4 share the same attribute actors 2,5,and 6
share the same attribute and actor 3 has a different attribute from
all the others.
Model (Default = Structural Blockmodel)
Choices are:
Constant
Homophily.
Tests hypothesis that actors prefer to interact with members of their
own kind (as defined by the actor attribute), and assumes that all groups
have equal inbreeding tendencies.
Variable
Homophily.
Similar to the constant homophily model, except that it assumes that
each group or class of actors has a different homophilic tendency (different
inbreeding parameter).
Structural
Blockmodel.
Most general model. Just asks whether the different classes have significantly
different interaction patterns. For example, girls might prefer girls
(inbreeding), while boys also prefer girls (outbreeding).
Number of random perms: (Default=1000)
Number of autocorrelations
to compute between the data matrix and the randomly permuted structure
matrix. The larger the number of permutations, the better the
estimates of standard error and "significance", but the longer
the computation time.
Treat diagonals as valid? (Default = No)
If Yes, the values
along the main diagonals of each matrix are included in the computations.
Otherwise, they are treated as missing.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
Output dataset
(Default= 'AUTOSIM')
LOG FILE The actor attributes are recoded so they run from 1 to n, these are reported.
The between group and in-group means are reported if either of the homophily models were chosen. For constant homophily the in-group mean is the overall mean of all within group interactions. For variable homophily each separate within group mean is reported. For the structural blockmodels option the total sum, the average value and the number of cells within each block are reported. In all cases this is followed by the value of the autocorrelation together with the r-squared value, the root mean square and the sum of squares. Below this is the autocorrelation averaged over all the permutations together with the standard error. Finally the proportion of random values which are as large as the actual autocorrelation is reported. This gives the significance of the calculated value, so for example if this were below 0.05 we would conclude at the 5% level that the dyadic variable is related to the categorical attribute.
TIMING O(N^2)
COMMENTS None
REFERENCES None
#^{214}$^{215}K^{216}TOOLS>STATISTICS>AUTOCORRELATION>INTERVAL/RATIO
PURPOSE Perform
a randomization test of autocorrelation with an interval or ratio level
attribute variable.
DESCRIPTION Relates
a dyadic variable (an actor-by-actor matrix) to a monadic variable (a
vector representing an interval-scaled attribute of each actor). For
example, if the dyadic variable is who is friends with whom, and the
monadic variable is height, the procedure tests whether friendship is
patterned by height (e.g., children prefer to be friends with children
who are the same height as themselves).
PARAMETERS
Network or Proximity Matrix
Name of file containing
matrix to be analyzed. Data type: Matrix.
Actor Attribute(s)
Name of file containing
actor attributes.
Model (Default = Geary)
Choices are:
Geary.
Geary's C statistic (larger negative values indicate greater positive
autocorrelation).
Moran. Moran's I statistics (larger
positive values indicate greater positive autocorrelation).
Number of random perms: (Default=1000)
Number of autocorrelations
to compute between the data matrix and the randomly permuted structure
matrix. The larger the number of permutations, the better the
estimates of standard error and "significance", but the longer
the computation time.
Treat diagonals as valid? (Default = No)
If Yes, the values
along the main diagonals of each matrix are included in the computations.
Otherwise, they are treated as missing.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
Output dataset
(Default= 'AUTOSIM')
LOG FILE The value of the autocorrelation followed by the autocorrelation averaged over all the permutations together with the standard error. The proportion of random values which are as large for Geary or small for Moran as the actual autocorrelation gives the significance of the calculated value and this is reported.
TIMING O(N^2)
COMMENTS None
REFERENCES See
Cliff and Ord's classic 1973 book 'Spatial autocorrelation' London:
Pion.
#^{217}$^{218}K^{219}TOOLS
> STATISTICS > VECTOR > REGRESSION
PURPOSE Regress
a dependent vectors on one or more independent vectors, and assess significance
of the r-square and regression coefficients.
DESCRIPTION The
procedure is principally used to model a vector using values of other
vectors.
The algorithm proceeds
in two steps. In the first step, it performs a standard multiple
regression across corresponding cells of the dependent and independent
vectors.
In the second step,
it randomly permutes rows the elements of the dependent vector and recomputes
the regression, storing resultant values of r-square and all coefficients.
This step is repeated hundreds of times in order to estimate standard
errors for the statistics of interest. For each coefficient, the
program counts the proportion of random permutations that yielded a
coefficient as extreme as the one computed in step 1.
PARAMETERS
Dependent dataset:
Name of dataset
containing the observed or dependent data: the vector whose values
are to be predicted. This is given as a column in a matrix. Data type:
Matrix.
Dependent column #: (Default=1)
Specifies which
column of the data matrix contains the dependent vector.
Independent dataset:
Names of dataset
containing the independent vectors. All independent vectors must be
contained in a single matrix. Data type: Matrix.
Independent column #s: (Default=1)
Specifies which
columns of the independent dataset contain the independent vectors.
Columns to be selected are specified by a list. Each column number is
listed separated by a comma or space. The keywords TO, FIRST and LAST
are permissible. Hence FIRST 3, 5 TO 7, 10, 12 would give column numbers
1, 2, 3, 5, 6, 7, 10 and 12. ALL gives all possible columns. Lists kept
in a UCINET dataset can be used. Enter the filename followed by ROW
(or COLUMN) and a number to specify which row or column of the file
to use.The list must be specified using a binary vector where a 1 in
position k indicates that vertex k is a member of the list, a zero indicates
that k is not a member.
Number of random permutations: (Default = 1000)
Number of regressions
to compute between the original data and the randomly permuted data.
The larger the number of permutations, the better the estimates of standard
error and "significance", but the longer the computation time.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
(Output) Regression Coefficients: (Default='Coefs')
Name of file containing
the regression coefficients.
(Output) Correlation Matrix:(Default= 'RegCorr')
Name of file containing
the correlation matrix.
(Output) Inverse of correlation Matrix (Default='RegInv')
Name of file containing
the inverse of the correlation matrix.
(Output) Predicted values and residuals. (Default='PredVals')
Name of file containing
the predicted values and residuals.
LOG FILE The
correlation matrix followed by information on the model fit. This is
followed by a table of regression coefficients. This table gives the
Unstandardized and standardized regression coefficients for each independent
variable, including the intercept, along with the proportion of random
trials yielding a coefficient i) as large or larger, ii) as small or
smaller and iii) as extreme as the observed value. These values
give the significance of the coefficients.
TIMING O(N^2).
COMMENTS The
program ignores missing values.
REFERENCES None.
#^{220}$^{221}K^{222}TOOLS
> STATISTICS > VECTOR > ANOVA
PURPOSE Performs
an ANOVA with a significance based upon a permutation test.
DESCRIPTION Undertakes
a standard analysis of variance but uses a permutation test to generate
the significance level so that standard assumptions on independence
and random sampling are not required.
PARAMETERS
Dependent (Y) variable:
Name of file containing
the dependent vector, this must be a UCINET data file. Enter the filename
followed by ROW (or COL) and a number to specify which row or column
of the file to use.
Independent (X) variable:
Name of file containing
the independent vector, this must UCINET data file. Enter the filename
followed by ROW (or COL) and a number to specify which row or column
of the file to use.
Number of random permutations: (Default = 5000)
The larger the
number of permutations, the better the estimates of standard error and
"significance", but the longer the computation time.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
LOG FILE A
standard analysis of variance table together with the significance value
derived from the permutation test.
TIMING N/A
COMMENTS None
REFERENCES None
#^{223}$^{224}K^{225}TOOLS
> STATISTICS > VECTOR > T TEST
PURPOSE Performs
a t-test with a significance based upon a permutation test.
DESCRIPTION Undertakes
a standard t-test to compare the means of two groups but uses a permutation
test to generate the significance level so that standard assumptions
on independence and random sampling are not required.
PARAMETERS
Dependent (Y) variable:
Name of file containing
the dependent vector, this must be a UCINET data file. Enter the filename
followed by ROW (or COL) and a number to specify which row or column
of the file to use.
Independent (X) variable:
Name of file containing
the independent vector, this must be a UCINET data file. Enter the filename
followed by ROW (or COL) and a number to specify which row or column
of the file to use.
Number of random permutations: (Default = 5000)
The larger the
number of permutations, the better the estimates of standard error and
"significance", but the longer the computation time.
Random number seed:
The random number
seed sets off the random permutations. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat an analysis. The range
is 1 to 32000.
LOG FILE Gives
standard statistics on each group followed by significance tests. The
difference in means is reported together with the two one tailed tests
assessing whether one mean is greater than the other and the two tailed
test.
TIMING N/A
COMMENTS None
REFERENCES None
#^{226}$^{227}TOOLS
> STATISTICS >COMPARE DENSITIES>PAIRED
PURPOSE Give
a statistical test for the comparison of the densities of two networks
in which the actors are paired.
DESCRIPTION This
routine uses a bootstrap technique to compare the densities of two not
necessarily independent networks with the same actors. This method is
analogous to the classical paired sample t-test for estimating the standard
error of the difference. Its main use would be in comparing the same
relation on the same set of actors at two different time points.
PARAMETERS 1st Network
Name of UCINET
dataset containing one of the datasets to be compared. Data type: Valued
graph
2nd Network
Name of UCINET
dataset containing the same actors (in the same order) as the 1st dataset.
Data type: Valued graph.
Number of Samples
Gives the number
of times sampling with replacement is used to construct the distribution.
LOG FILE The
output gives the density of both matrices together with the difference
and the number of samples taken. This is followed by a classical t-test.
The estimated bootstrap standard errors are then reported together with
the bootstrap standard error of the differences, the bootstrap 95% confidence
intervals and the bootstrap t-statistic assuming independent samples.
The bootstrap standard error, confidence interval, t-statistic and average
value are then reported for the paired samples. Finally the proportion
of differences (absolute, as large as and as small as) to the observed
values are given.
TIMING
COMMENTS
REFERENCES Tom A.B. Snijders and Stephen P. Borgatti (1999) Non-Parametric Standard Errors and Tests for Network Statistics. Connections 22(2): 1-11
#^{228}$^{229}TOOLS
> STATISTICS >COMPARE>DENSITIES>THEORETICAL PARAMETER
PURPOSE Give a statistical
test for the comparison of the density of a network to a theoretical
value.
DESCRIPTION This
routine uses a bootstrap technique to compare the density of a network
to a specified value. In essence a distribution is built up by sampling
the network with replacement from the vertices. There is an assumption
that vertices are interchangeable.
PARAMETERS 1st Network
Name of UCINET
dataset containing the datasets to be compared. Data type: Valued graph
Expected Density
Value of the theoretical
parameter to which the observed value will be compared.
Number of Samples
Gives the number
of times sampling with replacement is used to construct the distribution.
LOG FILE The
output gives the parameter value and the density of the matrix together
with the difference and the number of samples taken. This is followed
by the actual variance and the classical estimate of the standard error.
The number of samples in the bootstrap are then reported together with
the estimated bootstrap standard error, z-score and average density.
Finally the proportion of differences (absolute, as large as and as
small as) to the observed values are given.
TIMING
COMMENTS
REFERENCES Tom
A.B. Snijders and Stephen P. Borgatti (1999) Non-Parametric Standard
Errors and Tests for Network Statistics. Connections 22(2): 1-11
#^{230}$^{231}TOOLS>STATISTICS>COMPARE
AGGREGATE PROXIMITY MATRICES>PARTITION
PURPOSE Use a permutation test to compare proximity
matrices aggregated from a cognitive social structure into two mutually
exclusive groups.
DESCRIPTION To compare aggregated proximity matrices
from a partition of the respondents into two mutually exclusive groups
Eg male and female, we begin by correlating the two matrices (or computing
a dissimilarity measure). This is our observed test statistic. Then
we go back to the individual level data and divide the respondents into
two groups at random. We then aggregate the matrices separately for
each group, obtaining an aggregate proximity matrix for each group.
Next, we correlate these matrices (or compute dissimilarity measure)
and store the result. This process is repeated thousands of times to
generate a distribution of (dis)similarities under the null hypothesis
of independence (i.e., judged proximities are independent of gender).
We then count the proportion of correlations (or dissimilarity measures)
that are as small (or as large) as the observed measure. The proportion
of correlations as small as the observed (or, equally, the proportion
of dissimilarity coefficients as large as the observed) gives the p-value:
the likelihood that the difference we see could be obtained by chance.
Note that the aggregation is simply the mean of the matrices.
PARAMETERS Input Dataset
Name of dataset containing the cognitive social structure.
Data type: Valued
graph, multirelational.
Utilize diagonal values (Default=No)
If YES diagonal
values are included
Data are symmetric
(Default = No)
Partition Vector
The name of an Ucinet
dataset.To partition the
matrices of the data matrix into groups, specify a blocking vector by
giving the dataset name, a dimension and an integer value. For example,
to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and
use that information to sort the matrices. All matrices with identical
values on the criterion vector (i.e. the second row of attrib) will
be placed in the same group. There should only be two groups and so
the vector should only contain two different values. The partition can
also be typed in directly so that 1 1 2 1 2 2 2 places matrices 1,2
and 4 in one group and matrices 3,5,6 and 7 in the other group.
No. of permutations (Default =2000)
Number of Permutations
used in the permutation test.
Output Dataset (Default = 'agprox')
Name of file that
will contain the mean of the matrices corresponding to each group. Two
files will be produced one for each group and they will be called agprox1
and agprox2. These are not displayed in the logfile.
LOG FILE A listing
of the partitions used in the aggregation procedure, followed by the
sizes of the two groups, the number of observations and the number of
permutations used in the test. The observed correlation and Euclidean
distance are the values calculated between the two aggregated matrices.
This is followed by the average correlation and Euclidean distance over
all the random permutations. Finally the number of times the correlation
and regression were as high or higher and as low or lower are given
as a probability. These values are used to determine the significance
of the observed values.
TIMING O(N^2)
COMMENTS None
REFERENCES Borgatti,
S.P. () A Statistical Method for Comparing Aggregate Data Across a Priori
Groups
#^{232}$^{233}TOOLS>STATISTICS>COMPARE
AGGREGATE PROXIMITY MATRICES>OVERLAPPING GROUPS
PURPOSE Use a permutation test to compare proximity
matrices aggregated from a cognitive social structure into two groups
which may overlap.
DESCRIPTION To compare aggregated proximity matrices
from a partition of the respondents into two possibly overlapping groups
Eg Smokers and Drinkers, we begin by correlating the two matrices (or
computing a dissimilarity measure). This is our observed test statistic.
Then we go back to the individual level data and divide the respondents
into two groups at random. We then aggregate the matrices separately
for each group, obtaining an aggregate proximity matrix for each group.
Next, we correlate these matrices (or compute dissimilarity measure)
and store the result. This process is repeated thousands of times to
generate a distribution of (dis)similarities under the null hypothesis
of independence (i.e., judged proximities are independent of gender).
We then count the proportion of correlations (or dissimilarity measures)
that are as small (or as large) as the observed measure. The proportion
of correlations as small as the observed (or, equally, the proportion
of dissimilarity coefficients as large as the observed) gives the p-value:
the likelihood that the difference we see could be obtained by chance.
Note that the aggregation is simply the mean of the matrices.
PARAMETERS Input Dataset
Name of dataset containing the cognitive social structure.
Data type: Valued
graph, multirelational.
Utilize diagonal values (Default=No)
If YES diagonal
values are included
Data are symmetric
(Default = No)
Group Indicator Matrix
The name of an Ucinet
dataset. This dataset must
contain a row for each actor and two columns representing the two groups.
The (i,j)th entry is a 1 if actor i is in group j (j= 1 or 2) and zero
otherwise. The matrix is simply a standard incidence matrix with two
columns.
No. of permutations (Default =2000)
Number of Permutations
used in the permutation test.
Output Dataset (Default = 'agprox')
Name of file that
will contain the mean of the matrices corresponding to each group. Two
files will be produced one for each group and they will be called agprox1
and agprox2. These are not displayed in the logfile.
LOG FILE A listing
of the partitions used in the aggregation procedure, followed by the
sizes of the two groups, the number of observations and the number of
permutations used in the test. The observed correlation and Euclidean
distance are the values calculated between the two aggregated matrices.
This is followed by the average correlation and Euclidean distance over
all the random permutations. Finally the number of times the correlation
and regression were as high or higher and as low or lower are given
as a probability. These values are used to determine the significance
of the observed values.
TIMING O(N^2)
COMMENTS None
REFERENCES Borgatti,
S.P. () A Statistical Method for Comparing Aggregate Data Across a Priori
Groups
#^{234}$^{235}K^{236}TOOLS
> STATISTICS > P1
PURPOSE Fits
the Holland and Leinhardt P_{1} model for binary networks.
DESCRIPTION All
dyads (i,j) in a sociometric choice matrix X can be classified as
mutual (x_{ij} = x_{ji} = 1), asymmetric (x_{ij}
not equal to x_{ji}), or null (x_{ij} = x_{ji} = 0). The probabilities of each type
of dyad are modelled as a function of three sets of substantive parameters:
expansiveness of each actor, popularity of each actor, and reciprocity.
The probabilities of mutual, asymmetric and null dyads, denoted m_{ij},
a_{ij}, and n_{ij} respectively, are modeled as follows:
m_{ij }= l_{ij}exp(r+2q+a_{i}+a_{j}+b_{i}+b_{j})
a_{ij }= l_{ij}exp(q+a_{i}+b_{j})
n_{ij }= l_{ij}
In the equations,
the a
parameters are interpreted as "productivity" or "expansiveness"
measures for each node. The b parameters are interpreted as "attractiveness"
or "popularity" measures. The r parameter is interpreted as a general
measure of the tendency towards "reciprocity" or "mutuality"
in the network. The q parameter is a function of the density
of the network, reflecting the total number of arcs observed. Finally,
the l
parameters are normalizing constants used to insure that the modeled
probabilities add to 1 for any given dyad.
PARAMETERS
Input Dataset:
Name of file that
contains network to be analyzed. Data type: Valued graph.
(Output) Parameter dataset (Default = 'Alphabet')
Name of file to
contain alpha and beta parameters.
(Output) Expected values (Default = 'P1Expect')
Name of file to
contain P1 expected values.
Output residual values (Default= 'P1Resid')
Name of file to
contain P1 residuals.
LOG FILE G-squared
negative goodness-of-fit value with degrees of freedom. Probabilities
are not printed because the theoretical distribution governing these
values has not yet been established.
Values of q
and r
.
Expansiveness (a)
and popularity (b)
parameters for each actor.
An nxn matrix containing
the P1 expected value between each pair of actors.
An nxn matrix of
residuals (observed data minus expected) between each pair of actors.
A 3J.X0E of symmetrized residuals. single-link
hierarchical clustering
TIMING O(N^4).
COMMENTS The
model would be more useful if the distribution of G-squared were known:
as it is, we cannot say for certain when the model fits and when it
does not.
REFERENCES Holland
P and Leinhardt J (1981). "An Exponential Family of Probability
Distributions for Directed Graphs." Journal of the American Statistical
Association 76:33-6
#^{237}$^{238}K^{239}TOOLS
> MATRIX ALGEBRA
PURPOSE Command-driven
matrix algebra package.
DESCRIPTION Input
and output are UCINET datasets. Capabilities are divided into functions
and procedures, which have different syntax. Further, within functions
we can distinguish three basic types:
Uniary
Operations. Those that
operate on a single dataset and take no arguments (e.g. ABS, which takes
the absolute value of every cell in the matrix);
Binary
Operations. Those that
perform algebraic and arithmetic operations require two or more datasets
(e.g. ADD, which adds corresponding cells of two or more matrices);
Inner
Products. Those that perform
arithmetic operations on various dimensions (i.e. rows, columns, matrices)
of a single dataset (e.g. TOTAL, which sums values of a matrix broken
out by row, column, level or combinations of these).
When you choose
Algebra from the menu, then a command window will open up. You can
close the window by clicking on the close button. Commands are typed
in the command window you can scroll back to previous commands by using
the up and down arrows.
The difference
in the two kinds of commands is reflected in their syntax.
1.
Functions
Functions have
this basic syntax:
<output matrix>
= <function>(<arguments>)
In the documentation
to follow, an item enclosed in angle brackets denotes a name or other
input to be provided by the user. Hence, <output
matrix> refers to the name of a dataset to be supplied by the user.
Items enclosed in square brackets will denote optional arguments.
Anything else, such as an equal sign or parenthesis, is something to
be typed verbatim.
An example of valid
syntax for a function is this:
y = inverse(x)
In the example,
x is a pre-existing dataset in the current folder, inverse
is the name of a function, and y is the name of a yet-to-be-created
dataset to contain the inverse of the matrix in x. Datasets
may be named using their full pathnames, as in:
a:tdavis = transpose(c:\ucinet\data\davis)
Most functions
will have a single argument consisting of the name of an input matrix.
Others will have two or more arguments, again consisting of the names
of datasets. For instance, the syntax for the ADD command
is as follows:
<matrix>
= add(<matrix1>,<matrix2>,...)
An example would
be:
mpx = add(business,marriage,friend)
A few functions
take other kinds of arguments. For example, to generate an identity
matrix with 5 rows and columns, you would type:
junk = identity(5)
2.
Procedures
The syntax for
procedures differs from functions in that there is no output matrix:
<procedure><arguments>
An example is:
display padgett
Another example
is:
svd davis =
u d v
This requests a
singular value decomposition of the matrix davis into three matrices
(datasets) to be called u, d, and v.
3.
Expressions
One useful fact
to remember is that whenever the syntax for a function or procedure
calls for the name of a matrix, a function may be substituted instead.
For example, the command
y = inverse(transpose(inf))
requests that the
inverse of the transpose of a matrix inf be calculated and saved
as dataset y. There is no limit to the amount of nesting.
For example, the following command is perfectly valid, though neither
efficient nor very readable:
b = prod(inv(prod(transp(x),x)),prod(transp(x),y))
A less error-prone
alternative would be the following series:
xt = transp(x)
xtx = prod(xt,x)
xty = prod(xt,y)
b = prod(inv(xtx),xty)
FURTHER INFORMATION
Transformer_Functions Uniary
Functions
Between_Functions Binary
Functions
Within_Functions Inner
Products
Procedures Procedures
#^{240}$^{241}K^{242}TOOLS
> SCATTERPLOT> DRAW
PURPOSE Plots
one matrix column against another in the (x,y) plane.
DESCRIPTION Plots
two specified columns of a matrix against each other. The x co-ordinates
(horizontal axes) are an element of the first column and the y co-ordinates
(vertical axes) are the corresponding elements of the second column.
Points can be labeled using ASCII characters.
PARAMETERS
Input dataset:
Name of file containing
matrix with data to be plotted. Data type: Matrix.
Column to use for horizontal or x-axis: (Default = 1).
Column number for
horizontal axis.
Column to use for vertical or y-axis: (Default = 2).
Column to use for
vertical axis.
File containing point labels, if any:
If blank then points
are labeled by row number. If used, file should be ASCII and contain
the labels. The labels must be specified in a list, each separated
by a comma, the list must contain the same number of labels as rows
in the data matrix.
LOG FILE
A scatter plot with the tick marks on the axes. Each point on
the scatter plot is marked by the row of the column vectors or a label
from the label file. If two points have the same coordinates then
the label corresponding to the highest row number is used.The scatterplot
can be saved or printed. Simple editing can be achieved using the options
button. The labels can be turned on or off and values can be attached
to the points (or removed). The scales can also be changed. More advanced
editing is possible by double clicking in the plot, this invokes the
chart wizard. To find the label attached to a single point when all
the labels are moved click on a single point, this will highlight all
the points, then click a second time to highlight one vertex. Now double
click on the vertex and the label will be highlighted in the chart designer.
The save button and the save chart data option allow the user
to save all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as
a windows metafile which can then be read into a word processing or
graphics package. Only one chart can be open at one time and the
chart window will be closed if you click on any other UCINET window.
TIMING Linear
COMMENTS This
routine only works if the regional settings are set to UK or USA. If
you do not have these regional settings and do not get a plot then change
them in the settings control panel on your machine.
REFERENCES None.
#^{243}$^{244}K^{245}TOOLS
> SCATTERPLOT REVIEW
PURPOSE Displays
previously filed scatter plots.
DESCRIPTION Scatter
plots can be saved as files and reviewed directly using this routine.
They are saved with the extension sdf.
PARAMETERS
Input dataset:
Name of scatterplot
file to be displayed.
LOG FILE None
but scatterplot is displayed.
TIMING N/A
COMMENTS None
REFERENCES None
#^{246}$^{247}K^{248}TOOLS
> DENDROGRAM /TREE DIAGRAM> DRAW
PURPOSE Generates
a dendrogram or tree diagram from hierarchically nested partition data.
DESCRIPTION This
routine allows for the creation of the hierarchical cluster diagrams
from a UCINET generated partition matrix. It is also possible
to generate the diagrams from user defined partition matrices.
PARAMETERS
Input dataset
Name of file containing
a partition indicator matrix. A partition indicator matrix has
rows which correspond to different partitions and columns which represent
members of the groups. A value of k in row i and column j means
that actor j is in group k for the partition corresponding to row i.
All other actors in the same group should be assigned the same value
in row i. Each successive row must specify an increasingly finer
(or coarser) partition. The row labels (if specified) correspond
to the levels of the partition.
LOG FILE A
hierarchical clustering diagram either a tree diagram or a dendrogram.
The plot re-orders the actors so that they are located close to other
actors in similar clusters. The level at which any pair of actors are
aggregated is the point at which both can be reached by tracing from
the start to the actors from right to left. The scale at the top gives
the level at which they are clustered. The diagram can be printed or
saved. Parts of the diagram can be viewed by moving the mouse to the
split point in a tree diagram or the beginning of a line in the dendrogram
and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return
to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels
need to be edited (particularly the scale labels) then you should take
the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data.
TIMING Linear
COMMENTS None
REFERENCES None.
#^{249}$^{250}K^{251}TOOLS
>DENDROGRAM/TREE DIAGRAM >REVIEW
PURPOSE Displays
previously filed cluster diagrams.
DESCRIPTION Dendrograms
and tree diagrams can be saved as bitmap files and reviewed directly
using this routine. They are saved with the extension bmp.
PARAMETERS
Input bitmap filename:
Name of file
to be displayed.
LOG FILE None
TIMING N/A
COMMENTS None
REFERENCES None
#^{252}$^{253}K^{254}UNIARY
OPERATIONS
ABSOLUTE - Syntax: abs(<mat>).
Takes the absolute value
of every value in <mat>. May be abbreviated to "ABS".
Example:
junk = abs(a:\atlanta\corrmat)
ARCTAN - Syntax: arc(<mat>).
Takes the arctangent of each value in <mat>. Example:
junk = arc(a:\atlanta\corrmat)
COMMON
LOG - Syntax: log10(<mat>).
Takes the base 10 logarithm of each value of the argument. Example:
junk = log10(a:\atlanta\corrmat)
COSINE - Syntax: cos(<mat>).
Takes the cosine of each value in <mat>. Example:
junk = cos(a:\atlanta\corrmat)
EXPONENT - Syntax: exp(<mat>).
Raises e (the base of natural logarithms) to the power given by each
cell of the argument. Example:
junk = exp(a:\atlanta\corrmat)
FILL - Syntax: fill(<mat>,<nr>,<nc>).
Expands the matrix in <mat> to the dimensions given by <nr>
and <nc> by duplicating values. For example, given matrix
X, the command
1 2 3
X = 4 5 6
7
8 9
y = fill(x,5,6)
yields:
1 2 3 1 2 3
4 5 6 4 5 6
Y = 7 8 9 7 8 9
1 2 3 1 2 3
4
5 6 4 5 6
GENERALISED
INVERSE - Syntax:
ginv(<mat>). Given a dataset <mat> containing
a matrix X (with at least as many rows as columns), the function computes
the inverse X^-1 such that XX^-1 = I, where I is the identity matrix.
junk = ginv(a:\atlanta\corrmat)
IDENTITY - Syntax: id(<n>).
Generates an identity matrix with <n> rows and columns.
Example:
i = id(100)
INVERSE - Syntax: inv(<mat>).
Given a dataset <mat> containing a square non-singular matrix
X, the function computes the inverse X^-1 such that XX^-1 = I, where
I is the identity matrix. If the matrix is not square, or is not of
full rank, use the generalized inverse ginv instead. Example:
junk = inv(a:\atlanta\corrmat)
LOG - See NATURAL LOG or COMMON LOG.
LINEAR - Syntax: lin(<mat>,<real>,<real>).
Given a data set containing a matrix then the function performs a linear
transformation on every cell value. If a cell value was x then
the function forms real 1x + real 2. If real 2 is omitted then
it is assumed to be zero. Example:
junk = lin(a:\atlanta\corrmat,3.2,4)
creates a new matrix junk
which has each cell transformed by multiplying by 3.2 and adding 4.
MATRIX - Syntax: mat(<real>[,<nr>][,<nc>],[<n1>]).
Converts a number into a matrix, or creates a matrix of constants.
If <nr>, <nc>, and <n1> are not specified, the function
returns a 1-by-1 matrix containing the value <real>. The
parameter <n1> specifies the number of levels/matrices to create.
To specify <n1>, you must specify <nr> and <nc> as
well. Examples:
junk = mat(3.92) {creates 1-by-1 matrix}
junk = mat(4,10,10) {creates 10-by-10 matrix containing only 4s}
junk = mat(4,10,10,2) {creates
2 10-by-10 matrices containing only 4s}
This function is useful
for adding a constant to a matrix. For example,
junk = add(freqs,mat(0.01,8,10))
adds the constant 0.01
to every cell of the 8-by-10 matrix contained in freqs.
NATURAL
LOG - Syntax: log(<mat>)
or ln(<mat>). Takes the natural logarithm of each
value of the argument. Examples:
junk = log(a:\atlanta\corrmat)
junk = ln(a:\atlanta\corrmat)
NEGATIVE - Syntax: neg(<mat>).
Multiplies each value of <mat> by -1. Example:
revcorr = neg(a:\atlanta\corrmat)
RECIPROCAL - Syntax: rec(<mat>).
Multiplies each value of the argument by -1. Example:
junk = rec(a:\atlanta\corrmat)
ROUND - Syntax: round(<mat>)
or rnd(<mat>). Rounds each value of <mat> to
the nearest integer. Example:
junk = rnd(a:\atlanta\corrmat)
SINE - Syntax: sin(<mat>).
Computes sine of each value in <mat>.Example:
junk = sin(a:\atlanta\corrmat)
SQUARE - Syntax: sqr(<mat>).
Computes square of each value in <mat>. Example:
junk = sqr(a:\atlanta\corrmat)
SQUARE
ROOT - Syntax: sqrt(<mat>).
Computes square root of each value in <mat>. Example:
junk = sqrt(a:\atlanta\corrmat)
TRUNCATE - Syntax: trunc(<mat>) or
trnc(<mat>). Rounds each value of <mat> down to the
largest whole number contained by the value. Example:
junk = trunc(a:\atlanta\corrmat)
FURTHER INFORMATION
Binary
OperationsBetween_Functions
Uniary
OperationsWithin_Functions
ProceduresProcedures
CN_B.2 Matrix
Algebra
#^{255}$^{256}K^{257}BINARY
OPERATIONS
AVERAGE - Syntax: avg(<mat1>,<mat2>,...).
Takes the average value of corresponding cells across two or more matrices.Example:
c = avg(a,b)
BOOLEAN
PRODUCT - Syntax:
bprod(<mat1>,<mat2>). Boolean multiplication of
two binary matrices. Example:
junk = bprod(business,marriage)
DIVIDE - Syntax: div(<mat1>,<mat2>).
Divides each cell of <mat1> by the corresponding cell of <mat2>.
Divisions by zero result in missing values.Example
junk = div(c:\atlanta\corrmat,mcorr)
EQUAL - Syntax: eq(<mat1>,<mat2>,...).
Compares two or more matrices and puts a value of 1 where all matrices
have the same value and a 0 where any are different. For example,
typing
junk = eq(a,b)
gives a new binary matrix
called junk which has 1s in those cells where a and b
have the same value, and has 0s elsewhere.
GREATER
THAN - Syntax: gt(<mat1>,<mat2>,...).
Compares two or more matrices, creating a new matrix which is 1 for
all cells where the first matrix is strictly larger than all subsequent
matrices, and 0 elsewhere.
c = gt(a,b)
In the example, the matrix
c will have 1s only in those cells where a dominates b.
GREATER
THAN OR EQUAL TO -
Syntax: ge(<mat1>,<mat2>,...). Compares two
or more matrices, creating a new matrix which is 1 for all cells where
the first matrix is larger than or equal to all subsequent matrices,
and 0 elsewhere.
c = ge(a,b)
In the example, the matrix
c will have 1s only in those cells where a is not dominated
by b.
LESS
THAN - Syntax: 1t(<mat1>,<mat2>,...).
Compares two or more matrices, creating a new matrix which is 1 for
all cells where the first matrix is strictly less than all subsequent
matrices, and 0 elsewhere.
c = lt(a,b)
In the example, the matrix
c will have 1s only in those cells where a is dominated by
b.
LESS
THAN OR EQUAL TO -
Syntax: le(<mat1>,<mat2>,...). Compares two
or more matrices, creating a new matrix which is 1 for all cells where
the first matrix is less than or equal to all subsequent matrices, and
0 elsewhere.
c = le(a,b)
In the example, the matrix
c will have 1s only in those cells where a is smaller than
or equal to the value of b.
MAXIMUM - Syntax: max(<mat1>,<mat2>,...).
Takes the largest value of corresponding cells across two or more matrices.
c = max(a,b)
MINIMUM - Syntax: min(<mat1>,<mat2>,...).
Takes the smallest value of corresponding cells across two or more matrices.
c = min(a,b)
MULTIPLY - Syntax: mul(<mat1>,<mat2>,...).
Takes the average value of corresponding cells across two or more matrices.
c = mul(a,b)
PRODUCT - Syntax: prod(<mat1>,<mat2>,...).
Matrix multiplication of two matrices. This is NOT element-wise
multiplication of corresponding values (see MULTIPLY).Example:
buskin = prod(business,marriage)
In the example, the
business matrix is pre-multiplied by marriage.
SQUARED
DIFFERENCE - Syntax:
sqrdif(<mat1>,<mat2>,...). Takes the squared difference
of corresponding cells across two or more matrices.
c = sqrdif(a,b)
One application of this
function is to compare a data matrix with a predicted matrix, based
on a least squares criterion.
SUBTRACT - Syntax: sub(<mat1>,<mat2>,...).
Subtracts the values of corresponding cells of two or more matrices
from the first matrix mentioned.
c = sub(a,b)
In the example, the values
of b are subtracted from the values of a.
FURTHER INFORMATION
Uniary
OperationsTransformer_Functions
Inner
ProductsWithin_Functions
ProceduresProcedures
CN_B.2 Matrix
Algebra
#^{258}$^{259}K^{260}INNER
PRODUCTS
WAVERAGE - Syntax: wavg(<mat1>,[R½C½L] [R½C½L]). Average
values of <mat1>, with optional breakout by one or two dimensions.
Examples:
rowmeans = wavg(davis rows)
colmeans = wavg(davis cols)
density = wavg(davis)
avgtie = wavg(newcomb
rows cols)
The last example totals
all matrices contained in thenewcomb dataset to get a single
matrix. In other words, it takes a 3-dimensional table (rows,
columns and matrices) and aggregates across matrices to obtain a table
with just rows and columns.
TOTAL - Syntax: tot(<mat1>,[R½C½L] [R½C½L]). Adds
values of <mat1>, with optional breakout by one or two dimensions.
Examples:
rowsums = tot(davis rows)
colsums = total(davis cols)
nties = tot(davis)
allrels = tot(newcomb
rows cols)
The last example totals
all matrices contained in the newcomb dataset to get a single
matrix. In other workds, it takes a 3-dimensional table (rows,
columns and matrices) and aggregates across matrices to obtain a table
with just rows and columns.
WMAXIMUM - Syntax: wmax(<mat1> [r½c½1] [r½c½1]). Takes
the largest value of within a dataset, optionally broken out by one
or more dimensions. Example:
rowmax = wmax(ron1 rows)
matmax = wmax(krack
lev)
WMINIMUM - Syntax: wmin(<mat1> [r½c½1] [r½c½1]). Takes
the smallest value of within a dataset, optionally broken out by one
or more dimensions. Example:
rowmin = wmin(ron1 rows)
matmin = wmin(krack
lev)
TRANSPOSE - Syntax: transp(<mat> [<dim><dim>]).
Exchanges any two dimensions of a dataset. If no dimensions are
given, rows and columns are assumed. Examples:
tdavis = transp(davis)
cent2 = transp(cent
cols levs)
FURTHER INFORMATION
Uniary
OperationsTransformer_Functions
Binary
operationsBetween_Functions
ProceduresProcedures
CN_B.2 Matrix
Algebra
#^{261}$^{262}K^{263}PROCEDURES
In this section we document
each ALGEBRA procedure individually, giving the syntax and a
brief description for each one. The syntax gives the minimum abbreviation
and any alternate spellings. The procedures are arranged in alphabetical
order by concept.
CHANGE
FOLDER - Syntax:
cd<drive:\folder>). Change default folder (and/or drive).
Affects where UCINET will look for data and where data will be saved.
cd\ucinet\data
cd a:
DISPLAY - Syntax: disp <mnat>
or dsp <mat>. Displays all cells of <mat> to
the screen.
dsp c:\ucinet\data\padgett
dsp ginv(transp(davis))
LET - Syntax: let <function call>.
Technically, the LET command is always implicit before any function
statement. For example, the following two commands are identical:
xtx = prod(transp(x),x)
let xtx = prod(transp(x),x)
The only reason to use
LET is if your output dataset has the same name as an ALGEBRA
procedure, which would confuse the interpreter. For example, the
following command would NOT create a dataset called "DSP":
dsp = inverse(xtx)
Instead, the interpreter
would assume that you wanted to display a matrix called "= inverse(xtx)".
However, the following would work:
let dsp = inverse(xtx)
QUIT - Syntax: quit or exit.
Leave ALGEBRA and close the matrix algebra windows. Usage:
exit
quit
SINGULAR
VALUE DECOMPOSITION
- Syntax: svd<amat> = <umat><dmat><vtmat>,
where <amat> is an m-by-n data matrix of rank r, <umat>
will be an m-by-r output matrix, <dmat> will be a diagonal r-by-r
output matrix, and <vtmat> will be an n-by-r output matrix.
The program requires m ³ n. Usage:
svd davis = u d vt
The <umat> and <vtmat>
matrices are often referred to as "row scores" and "column
scores" respectively. The <dmat> matrix contains singular
values down the main diagonal and zeros elsewhere.
The singular value decomposition
of a square, symmetric matrix gives row and column scores equal to the
eigenvectors of the matrix, and the singular values are their eigenvalues.
The SVD of any matrix X gives row scores equal to the eigenvectors of
XX' and column scores equal to the eigenvectors of X'X. The singular
values of X are the square of the eigenvalues of both XX' and X'X.
FURTHER INFORMATION
Uniary
OperationsTransformer_Functions
Binary
OperationsBetween_Functions
Inner
ProductsWithin_Functions
CN_B.2 Matrix
Algebra
#^{264}$^{265}K^{266}NETWORK
> COHESION > DISTANCE
PURPOSE Constructs
a distance or generalized distance matrix between all nodes of a graph.
Allows for transformation of this matrix from distance to nearness.
DESCRIPTION The
length of a path is the number of edges it contains. The distance
between two nodes is the length of the shortest path. The generalized
distance is the length of an optimum path.
This optimum can be any of the following:
The cost of a path
is the sum of all values on the edges of a path. The optimum is
the cheapest cost.
The strength of
a path is the strength of its weakest link. The optimum is the
strongest path.
The probability
of a path is the product of the probabilities of its edges. The
optimum is the most probable path.
If there is more
than one optimum path then the algorithm uses the shortest optimum path.
For a binary adjacency matrix distance and generalized distance will
be equivalent.
The distance matrix
can be converted to a nearness matrix by means of a nearness transformation.
This transformation can be achieved by taking reciprocals, linear transformations,
exponentiation or frequency decays.
PARAMETERS
Input dataset
Name of file containing
dataset to be analyzed. Data type: Valued graph.
Type of Data: (Default = ADJACENCY)
Choices are:
Adjacency - standard binary data, distance corresponds to graph theoretic geodesic.
Strengths - values indicate cost or lengths of links between nodes. Optimum is strongest path.
Costs - values indicate strengths, capacities or cost. Optimum is the cheapest cost.
Probabilities - values indicate probability of link
and restricted to [0,1]. Optimum is most probable path.
Nearness transformation: (Default = NONE)
Converts distance matrix to a nearness matrix by a variety of methods.
Choices are:
None
- no transformation is
applied and raw distances are given as output.
Multiplicative - distances between nodes are divided
into the largest possible distance. New values are given by Yij
= (N-1)/Dij.
Additive - distances between nodes are subtracted
from the total number of nodes. New values are given by Yij =
N - Dij.
Linear
- distances between nodes are transformed linearly into [0,1].
New values are given by Yij = 1 - (Dij - 1)/(N-1).
Exponential
- distances between nodes are transformed using exponential decay.
New values are given by Yij = bDij. The attenuating factor b
is selected by the user and should satisfy 0 < b < 1.
Freq
Decay - Uses Burt's
1976 frequency decay function. The nearness of i and j is one
minus the proportion of actors that are as close to i as j is.
Attenuation Factor: (Default = 0.5)
Value of the attenuation
factor b
when exponential is chosen. Larger values give slower decay.
Output dataset: (Default = 'GeodesicDistance')
Name of data file
containing distance matrix.
LOG FILE Matrix
of distances between all pairs of nodes.
TIMING O(N^3)
COMMENTS Note the distances correspond to the number of links and not the optimum values.
Optimum values
are calculated by 1SBLQHA NETWORK>COHESION>REACHABILITY
REFERENCES Doreian
P (1974). 'On the connectivity of social networks'. Journal of
Mathematical Sociology, 3, 245-258.
Burt R (1976).
'Positions in networks'. Social Forces, 55, 93-122.
#^{267}$^{268}K^{269}NETWORK>COHESION>NO.
OF GEODESICS
PURPOSE Counts
the number of geodesics connecting all pairs of vertices.
DESCRIPTION A
geodesic is a shortest path. There may be more than one shortest path
connecting any two vertices. This procedure gives the number of shortest
paths connecting all pairs of vertices.
PARAMETERS
Input dataset:
Name of file containing
network data. Data type: Digraph.
Output Filename: (Default = 'GeodesicsCount').
Name of dataset
containing counts of geodesics for every pair of vertices.
LOG FILE An
nxn matrix in which row i column j gives the number of geodesics connecting
i to j.
TIMING O(N^4).
COMMENTS None.
REFERENCES None.
#^{270}$^{271}K^{272}NETWORK
> COHESION > REACHABILITY
PURPOSE Constructs
a matrix of reachability values for every pair of nodes.
DESCRIPTION The
reachability for a pair of nodes is the value of an optimum path.
The algorithm produces
a value in row i, col j of a matrix if node j is reachable from node
i and a blank otherwise.
This value can be any of the following:
The length of the shortest path.
The cost of the cheapest path, where the cost is the sum of all the values.
The strength of
the strongest path, where the strength is the value of the weakest link.
The probability of the most 'probable' path, where the probability of a path is the product of the probabilities of its edges.
PARAMETERS
Input dataset
Name of file containing
dataset to be analyzed. Data type: Valued graph.
Type of Data: (Default = ADJACENCY)
Choices are:
Adjacency - standard binary data, distance corresponds
to graph theoretic geodesic.
Strengths
- values indicate cost or lengths of links between nodes.Optimum is
strongest path.
Costs - values indicate strengths, capacities
or cost.Optimum is the cheapest cost.
Probabilities - values indicate probability of link
and restricted to [0,1]. Optimum is most probable path.
Output dataset: (Default = 'Reachability')
Name of data file
containing reachability matrix.
LOG FILE Matrix
of reachability values between all pairs of nodes.
TIMING O(NLOGN)
COMMENTS None but see comments on
2CQC8Q NETWORK>COHESION>DISTANCE
REFERENCES Doreian
P (1974). 'On the connectivity of Social Networks'. Journal
of Mathematical Sociology, 3, 245-258.
#^{273}$^{274}K^{275}NETWORK
> COHESION > MAX FLOW
PURPOSE Compute
the maximum flow (= the minimum cut) between all pairs of nodes in a
network.
DESCRIPTION In
a valued or binary network the value of each edge (1 or 0 for binary
networks) can represent a capacity. Let c(x) denote the capacity of
each edge of a network N. A flow in N between two nodes s and
t is a function f such that
0 £
f(x) £
c(x) for every edge x and
for every node z ¹ s or t, Sf(yz) =Sf(zw). So
that for each node, except s and t, the total amount of flow into the
node equals the total flow leaving the node.
The total flow
leaving s is the same as that going into t, this value is called the
value of the flow. The maximum flow is simply the maximum value
possible between two vertices.
This procedure
uses the algorithm due to Gomory and Hu to compute the maximum flow
between all pairs of vertices of a symmetric graph.
PARAMETERS
Input dataset
Name of file containing
network to be analyzed. Data type: Valued graph - symmetric matrix only
with integer values.
Output Filename (Default = 'MaxFlow').
Name of data file
containing maximum flows between all pairs of vertices.
LOG FILE The
Input dataset followed by an nxn matrix in which row i column j gives
the value of the maximum flow from vertex i to vertex j (i¹j).
TIMING O(N^4).
COMMENTS The
maximum flow in a network is equal to the minimum cut. A cut between
two vertices s and t is a collection of edges which contains an edge
from every s-t path. The value of a cut is the sum of the value
of the edges. A minimum cut is the minimum value of all possible cuts
between two vertices. For a binary network this value is called the
local edge connectivity.
REFERENCES Ford
L R and Fulkerson D R (1956). 'Maximum flow through a network'.
Canadian Journal of Mathematics, 8, 399-404.
Gomory R E and
Hu T C (1964). 'Synthesis of a communication network'. Journal
of SIAM (Appl Math), 12, 348.
#^{276}$^{277}NETWORK>COHESION>POINT
CONNECTIVITY
PURPOSE Compute
the local point connectivity between all pairs of nodes in a network.
DESCRIPTION The
local (point) connectivity of two non-adjacent vertices is the number
of vertices that need to be deleted so that no path connects them, this
is equal to the maximum number of vertex disjoint paths connecting them.
PARAMETERS
Input dataset
Name of file containing
network to be analyzed. Data type: Digraph
Output Filename (Default = 'PointConnectivity').
Name of data file
containing maximum flows between all pairs of vertices.
LOG FILE An
nxn matrix in which row i column j gives the local point connectivity
from vertex i to vertex j (i ¹ j). This value is precisely the
maximum number of vertex independent paths from i to j.
TIMING O(N^4).
COMMENTS None
REFERENCES None
#^{278}$^{279}K^{280}NETWORK
> REGIONS > COMPONENTS>SIMPLE GRAPHS
PURPOSE Identify
the components, of an undirected graph - and the weak or strong components
of a directed graph.
DESCRIPTION In
an undirected graph two vertices are members of the same component if
there is a path connecting them. In a directed graph two vertices are
in the same weak component if their is a semi-path connecting them.
Two vertices x and y are in the same strong component if there is a
path connecting x to y and a path connecting y to x.
PARAMETERS
Input dataset:
Name of file containing
network data to be analyzed. Dat type: Directed graph.
Minimum Size to save: (Default = 3)
Size of smallest
component which is to be saved in the component by actor incidence matrix
specified below.
Kind of components: (Default = Strong)
For directed data
specify whether Strong or Weak components are required.
For undirected data either choice will yield the components.
Output sets: (Default = 'SubgroupComponentsSets')
Name of file which
will contain a component by actor incidence matrix. A 1 in row
i column j means that node j is in component i. This file is not
displayed in the LOG FILE.
Output Partition: (Default = 'SubgroupComponentsPart')
Name of file which
will contain a partition vector. A j in the ith position means
that node i is a member of component j. This file is not displayed
in the LOG FILE.
LOG FILE Number of components found.
List of all nodes indicating which labeled component each node is in.
List of components
greater than minimum size, labeled - each component is specified by
the vertices it contains.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{281}$^{282}NETWORK
> REGIONS > COMPONENTS > VALUED GRAPHS
PURPOSE Identify
the weak components corresponding to each cut-off value of a weighted
graph.
DESCRIPTION In
a valued graph, the set of dichotomized graphs corresponding to each
possible weight form a nested sequence of graphs. The weak components
of each of these would also be nested and can be combined to form an
hierarchical clustering of weak components. Once two nodes have been
placed in the same weak component of a dichotomized graph for a particular
cut-off value they remain in the same weak component for all smaller
cut-off values. This procedure produces a hierarchical clustering based
on these facts.
PARAMETERS Input valued network
Name of file containing
valued digraph. Data type: Valued graph.
Output Dataset (Default = 'hicomp')
Name of dataset
to contain the partition indicator matrix. Each column of this matrix
gives the component to which each actor was assigned in a given level.
The columns are labeled by the corresponding cut-off value. A
value of k in a column labeled x and row j means that actor j was in
component k at cut-off value x.
LOG FILE Hierarchical
clustering diagram of the components. The columns are rearranged and
labeled. A '·' in row label i column label j means
that vertex j was not in a weak component with any other vertex (i.e.
it was an isolate) using a cut-off value of i. An 'X' indicates
that vertex j was in a non-trivial weak component with all vertices
on the same row as j which can be found by tracing across that row without
encountering a space.
TIMING O(N^4),
actually N^2 times number of different values.
COMMENTS None
REFERENCES None
#^{283}$^{284}K^{285}NETWORK
> REGIONS > BICOMPONENTS
PURPOSE Finds
all the bi-components or blocks of a graph.
DESCRIPTION A
cutpoint of a graph is a vertex whose removal increases the number of
components. A non-separable graph is a graph that is connected non-trivially
and has no cutpoints. A block or bicomponent of a graph is a maximal
non-separable subgraph. The name bi-component reflects the fact that
it requires the deletion of two vertices to disconnect it. Bi-components
overlap, but a vertex that is in more than one bi-component must be
a cutpoint.
PARAMETERS
Input dataset:
Name of file containing
graph to be analyzed. Data type: Graph.
Output dataset (Default = 'Blocks')
Name of file that
will contain a block by actor incidence matrix. A 1 in column i row
j means that node j is in bi-component i. This file is not displayed
in the LOG FILE..
LOG FILE Number of bi-components found.
List of bi-components,
labeled - each bi-component is specified by the vertices it contains.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
#^{286}$^{287}K^{288}NETWORK
> REGIONS > K-CORES
PURPOSE List
all k-cores of a graph.
DESCRIPTION A
k-core in an undirected graph is a connected maximal induced subgraph
which has minimum degree greater than or equal to k. This procedure
finds all k-cores for every possible value of k.
PARAMETERS
Input dataset:
Name of
file containing data to be analyzed. Data type Graph.
Output dataset: (Default = 'Kcores')
Name of file which
will contain a k-core by actor partition matrix. The partition by actor
matrix is defined as follows: a value of k in a column labeled
i and row labeled j means that node j is in partition k for the i-core
partition. From the remarks above it follows that if there is only one
value k in the column labeled i then node j is not a member of any i-core.
Otherwise all other members of j's i-core will have a value of k in
the same column.
LOG FILE A
single link hierarchical clustering dendrogram the actors are
re-ordered so that they are located close to other actors in similar
k-cores. The level at which any pair of actors are aggregated is the
point at which both can be reached by tracing from the start to the
actors from right to left. The scale at the top gives the level at which
they are clustered. The diagram can be printed or saved. Parts of the
diagram can be viewed by moving the mouse to the beginning of
a line in the dendrogram and clicking. The first click will highlight
a portion of the diagram and the second click will display just the
highlighted portion. To return to the original right click on the mouse.
There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale
labels) then you should take the partition indicator matrix into the
spreadsheet editor remove or reduce the labels and then submit the edited
data to Tools>Dendrogram>Draw.
In the clustering diagram each level corresponding to a different value
of 'k' in k-core. Behind the dendrogram is a clustering diagram representing
the same thing. Each row is labeled by the possible values of k.
The columns are rearranged and labeled. A
'·' in row i column label j indicates that
vertex j is not in any i-core. An 'X' indicates that vertex j
is in an i-core, all other members of j's i-core are found by tracing
along row i in both directions from column j until a space is encountered
in each direction. The column labels corresponding to an 'X' which
are connected to j's 'X' are all members of j's i-core.
TIMING O(N^3)
COMMENTS K-Cores
are not necessarily cohesive subsets but they do identify areas of the
graph which contain clique like structures.
REFERENCES Seidman
S (1983). 'Network structure and minimum degree'. Social
Networks, 5, 269-287.
#^{289}$^{290}K^{291}NETWORK
> SUBGROUPS > CLIQUES
PURPOSE Find
all cliques in a network.
DESCRIPTION A clique is a maximally complete subgraph.
The program implements
the Bron and Kerbosch (1973) algorithm to find all Luce and Perry (1949)
cliques greater than a specified size. The routine will also provide
an analysis of the overlapping structure of the cliques. This
analysis gives information on the number of times each pair of actors
are in the same clique, and gives a hierarchical clustering based
upon this information. It is also does the dual operation by examining
the number of actors a pair of cliques has in common. This to is submitted
to an hierarchical clustering routine.
PARAMETERS
Input dataset
Name of file containing
data to be analyzed. Data type: Graph.
Minimum Size: (Default = 3)
This gives the
smallest group size which is to be considered a clique. The range is
1 to N.
Analyze pattern of overlaps? (Default = YES).
Yes means that an analysis of clique overlap will be performed. This includes the construction of a clique co-membership matrix, and an hierarchical clustering which is saved in a partition indicator matrix as described below. The co-clique matrix is also constructed and this is also submitted to an hierarchical clustering routine.
No restricts
the analysis to identifying cliques only.
Diagram Type: (Default = 'Tree diagram')
When analyzing
the overlap the clustering diagram can either be a Tree Diagram
or a Dendrogram.
(Output) Clique indicator matrix: (Default = 'CliquesSets').
Name of file which
contains a clique by actor incidence matrix. A 1 in column i row
j indicates that actor j is a member of clique i. This matrix
is not displayed in the LOG FILE.
(Output) Co-membership matrix: (Default = 'CliquesOver').
Name of file which
contains clique overlap matrix described in LOG FILE below. Note
that if no analysis of pattern overlaps was chosen then this file is
not created.
(Output) Partition indicator matrix: (Default = 'CliquePart').
Name of file which
contains partition indicator matrix derived from overlap analysis.
The partition indicator matrix corresponds to the hierarchical clustering
displayed in the LOG FILE. A value of k in a column labeled i and row
j means that actor j is in partition k and is in i cliques with every
other member of partition k. Actor k is always a member of partition
k, and is a representative label for the group.
LOG FILE Number of cliques found.
List of cliques,
labeled - each clique is specified by the vertices it contains.
The following output
is also produced if YES was inserted on the form in reply to the question
'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the clustering of the
actor by actor co-membership matrix. In the matrix a value of
k in row i column j means that vertices i and j occurred in the same
clique k times. The ith diagonal entry gives the number of cliques
which contain i.
The tree diagram
(or a dendrogram) re-orders the actors so that they are located close
to other actors in similar clusters. The level at which any pair of
actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top
gives the level at which they are clustered and corresponds to the number
of overlaps. The diagram can be printed or saved. Parts of the diagram
can be viewed by moving the mouse to the split point in a tree diagram
or the beginning of a line in the dendrogram and clicking. The first
click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right
click on the mouse. There is also a simple zoom facility simply change
the values and then press enter. If the labels need to be edited (particularly
the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the diagram
is a window containing the number of cliques and a list as specified
above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are
rearranged and labeled. A '·' in row label i column label j means
that vertex j was not in i cliques with any other vertex. An 'X'
indicates that vertex j was in i cliques with all vertices on the same
row as j which can be found by tracing across that row without encountering
a space.
This is followed
by the clique by clique co-membership matrix. In the matrix a
value of k in row i column j means that cliques i and j contain k actors
in common. The ith diagonal entry gives the number of actors in
clique i. This is followed by a clustering diagram corresponding to
an hierarchical clustering of the clique by clique co-membership matrix.
TIMING Algorithm
is exponential.
COMMENTS None.
REFERENCES Luce
R and Perry A (1949). A method of matrix analysis of group structure.
Psychometrika 14, 95-116.
Bron C and Kerbosch
J (1973). Finding all cliques of an undirected graph. Comm
of the ACM 16, 575-577.
#^{292}$^{293}K^{294}NETWORKS
> SUBGROUPS > N-CLIQUES
PURPOSE Find
all n-cliques in a network.
DESCRIPTION An
n-clique of an undirected graph is a maximal subgraph in which every
pair of vertices is connected by a path of length n or less. These are
found using an adapted version of the Bron and Kerbosch (1973) algorithm.
The routine will also provide an analysis of the overlapping structure
of the n-cliques. This analysis gives information on the number
of times each pair of actors are in the same n-clique and gives an hierarchical
clustering based upon this information.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Value of N: (Default = 2)
All members of
an n-clique are connected by a path of length n or less. A value
of 1 would give all Luce and Perry cliques; the maximum value
of N-1 would give the components of the graph.
Minimum Size: (Default = 3)
This gives the
smallest group size which is to be considered an n-clique. The range
is 1 to N.
Analyze pattern of overlaps? (Default = YES).
Yes means that an analysis of n-clique overlap will be performed. This includes the construction of an n-clique co-membership matrix, and an hierarchical clustering which is saved in a partition indicator matrix as described below.
No restricts
the analysis to identifying n-cliques only.
Diagram Type: (Default = 'Tree diagram')
When analyzing
the overlap the clustering diagram can either be a Tree Diagram
or a Dendrogram.
(Output) n-clique indicator matrix: (Default = 'NClqSets').
Name of file which
contains a n-clique by actor incidence matrix. A 1 in column i
row j indicates that actor j is a member of n-clique i. This matrix
is not displayed in the LOG FILE.
(Output) Co-membership matrix: (Default = 'NClqOver').
Name of file which
contains n-clique overlap matrix described in LOG FILE below.
Note that if no analysis of pattern overlaps was chosen then this file
is not created.
(Output) Partition indicator matrix: (Default = 'NClqPart').
Name of file which
contains partition indicator matrix derived from overlap analysis.
The partition indicator matrix corresponds to the hierarchical clustering
displayed in the LOG FILE. A value of k in a column labeled i and row
j means that actor j is in partition k and is in i n-cliques with every
other member of partition k. Actor k is always a member of partition
k, and is a representative label for the group.
LOG FILE Number of n-cliques found.
List of n-cliques,
labeled - each n-clique is specified by the vertices it contains.
The following output
is also produced if YES was inserted on the form in reply to the question
'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the single link clustering
of the n-clique overlap matrix. In the n-clique overlap matrix
a value of k in row i column j means that vertices i and j occurred
in the same n-clique k times. The ith diagonal entry gives the
number of n-cliques which contain i.
The tree diagram
(or a dendrogram) re-orders the actors so that they are located close
to other actors in similar clusters. The level at which any pair of
actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top
gives the level at which they are clustered and corresponds to the number
of overlaps. The diagram can be printed or saved. Parts of the diagram
can be viewed by moving the mouse to the split point in a tree diagram
or the beginning of a line in the dendrogram and clicking. The first
click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right
click on the mouse. There is also a simple zoom facility simply change
the values and then press enter. If the labels need to be edited (particularly
the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the diagram
is a window containing the number of n-cliques and a list as specified
above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are
rearranged and labeled. A
'·' in row label i column label j means
that vertex j was not in i n-cliques with any other vertex. An
'X' indicates that vertex j was in i n-cliques with all vertices on
the same row as j which can be found by tracing across that row without
encountering a space.
TIMING Algorithm
is exponential.
COMMENTS Usually
only 2-n-cliques or 3-n-cliques are of significance.
REFERENCES Luce
R (1950). Connectivity and generalized n-cliques in sociometric
group structure. Psychometrika 15, 169-190.
Bron C and Kerbosch
J (1973). Finding all n-cliques of an undirected graph.
Comm of the ACM 16, 575-577.
#^{295}$^{296}K^{297}NETWORK
> SUBGROUPS > N-CLAN
PURPOSE Find
all n-clans in a network.
DESCRIPTION An
n-clan is an n-clique which has diameter less than or equal to n as
an induced subgraph. These are found by using the n-clique routine
and checking the diameter condition.
The routine will
also provide an analysis of the overlapping structure of the n-clans.
This analysis gives information on the number of times each pair of
actors are in the same n-clan and gives an hierarchical clustering based
upon this information.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Value of N: (Default = 2)
All members of
an n-clan are in an n-clique and have the additional property that they
are connected by a path of length n or less in which each vertex is
also a member of the n-clique. A value of 1 would give all Luce
and Perry cliques; the maximum value of N-1 would give the components
of the graph.
Minimum Size: (Default = 3)
This gives the
smallest group size which is to be considered an n-clan. The range is
1 to N.
Analyze pattern of overlaps? (Default = YES).
Yes means that an analysis of n-clan overlap will be performed. This includes the construction of an n-clan co-membership matrix, and an hierarchical clustering which is saved in a partition indicator matrix as described below.
No restricts
the analysis to identifying n-clans only.
Diagram Type: (Default = 'Tree diagram')
When analyzing
the overlap the clustering diagram can either be a Tree Diagram
or a Dendrogram.
(Output) n-clan indicator matrix: (Default = 'NClanSets').
Name of file which
contains a n-clan by actor incidence matrix. A 1 in column i row
j indicates that actor j is a member of n-clan i. This matrix
is not displayed in the LOG FILE.
(Output) Co-membership matrix: (Default = 'NClanOver').
Name of file which
contains n-clan overlap matrix described in LOG FILE below. Note
that if no analysis of pattern overlaps was chosen then this file is
not created.
(Output) Partition indicator matrix: (Default = 'NClanPart').
Name of file which
contains partition indicator matrix derived from overlap analysis.
The partition indicator matrix corresponds to the hierarchical clustering
displayed in the LOG FILE. A value of k in a column labeled i and row
j means that actor j is in partition k and is in i n-clans with every
other member of partition k. Actor k is always a member of partition
k, and is a representative label for the group.
LOG FILE Number of n-clans found.
List of n-clans,
labeled - each n-clan is specified by the vertices it contains.
The following output
is also produced if YES was inserted on the form in reply to the question
'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the single link clustering
of the n-clan overlap matrix. In the n-clan overlap matrix a value
of k in row i column j means that vertices i and j occurred in the same
n-clan k times. The ith diagonal entry gives the number of n-clans
which contain i.
The tree diagram
(or a dendrogram) re-orders the actors so that they are located close
to other actors in similar clusters. The level at which any pair of
actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top
gives the level at which they are clustered and corresponds to the number
of overlaps. The diagram can be printed or saved. Parts of the diagram
can be viewed by moving the mouse to the split point in a tree diagram
or the beginning of a line in the dendrogram and clicking. The first
click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right
click on the mouse. There is also a simple zoom facility simply change
the values and then press enter. If the labels need to be edited (particularly
the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the diagram
is a window containing the number of n-clans and a list as specified
above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are
rearranged and labeled. A
'·' in row label i column label j means
that vertex j was not in i n-clans with any other vertex. An 'X'
indicates that vertex j was in i n-clans with all vertices on the same
row as j which can be found by tracing across that row without encountering
a space.
TIMING Algorithm
is exponential.
COMMENTS Usually
only 2-clans or 3-clans are signified.
REFERENCES Mokken
R (1979). Cliques, clubs and clans. Quality and Quantity
13, 161-173.
#^{298}$^{299}K^{300}NETWORK
> SUBGROUPS > K-PLEX
PURPOSE Find
all k-plexes in a network.
DESCRIPTION A
k-plex is a maximal subgraph with the following property: each
vertex of the induced subgraph is connected to at least n-k other vertices,
where n is the number of vertices in the induced subgraph. The
basic algorithm is a depth first search.
The routine will
also provide an analysis of the overlapping structure of the k-plexes.
This analysis gives information on the number of times each pair of
actors are in the same k-plex and gives an hierarchical clustering based
upon this information.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Value of K: (Default = 2)
The value of k
specifies the relative minimum size of the degree of each vertex compared
with the size of the k-plex. A value of 1 corresponds to a Luce
and Perry clique. Every vertex in a k-plex of size n has degree
at least n-k in the subgraph induced by the k-plex. The range
of k is 1 to N. (A value of N would give the whole graph as the
only k-plex).
Minimum Size: (Default = 3)
This gives the
smallest group size which is to be considered a k-plex. The range is
1 to N, normally this should be at least K+2.
Analyze pattern
of overlaps? (Default = YES).
Yes means
that an analysis of k-plex overlap will be performed. This includes
the construction of an k-plex co-membership matrix, and an hierarchical
clustering which is saved in a partition indicator matrix as described
below.
No restricts
the analysis to identifying k-plexes only.
Diagram Type: (Default = 'Tree diagram')
When analyzing
the overlap the clustering diagram can either be a Tree Diagram
or a Dendrogram.
(Output) k-plex indicator matrix: (Default = 'KPlexSet').
Name of file which
contains a k-plex by actor incidence matrix. A 1 in column i row
j indicates that actor j is a member of k-plex i. This matrix
is not displayed in the LOG FILE.
(Output) Co-membership matrix: (Default = 'KplexOvr').
Name of file which
contains k-plex overlap matrix described in LOG FILE below. Note
that if no analysis of pattern overlaps was chosen then this file is
not created.
(Output) Partition indicator matrix: (Default = 'KplexPrt').
Name of file which
contains partition indicator matrix derived from overlap analysis.
The partition indicator matrix corresponds to the hierarchical clustering
displayed in the LOG FILE. A value of m in a column labeled i and row
j means that actor j is in partition m and is in i k-plexes with every
other member of partition m. Actor m is always a member of partition
m, and is a representative label for the group.
LOG FILE Number of k-plexes found.
List of k-plexes,
labeled - each k-plex is specified by the vertices it contains.
The following output
is also produced if YES was inserted on the form in reply to the question
'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the single link clustering
of the k-plex overlap matrix. In the k-plex overlap matrix a value
of m in row i column j means that vertices i and j occurred in the same
k-plex m times. The ith diagonal entry gives the number of k-plexes
which contain i.
The tree diagram
(or a dendrogram) re-orders the actors so that they are located close
to other actors in similar clusters. The level at which any pair of
actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top
gives the level at which they are clustered and corresponds to the number
of overlaps. The diagram can be printed or saved. Parts of the diagram
can be viewed by moving the mouse to the split point in a tree diagram
or the beginning of a line in the dendrogram and clicking. The first
click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right
click on the mouse. There is also a simple zoom facility simply change
the values and then press enter. If the labels need to be edited (particularly
the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the diagram
is a window containing the number of k-plexes and a list as specified
above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are
rearranged and labeled. A
'·' in row label i column label j means
that vertex j was not in i k-plexes with any other vertex. An
'X' indicates that vertex j was in i k-plexes with all vertices on the
same row as j which can be found by tracing across that row without
encountering a space.
TIMING Algorithm
is exponential.
COMMENTS It
is advisable to initially select k and the minimum size n so that k<
(n+2)/2 - in this case the diameter of the k-plex is 2 (or less).
If a k-plex is connected and k ?? (n+2)/2 then the diameter is always
less than or equal to 2k-n+1, however it should not be assumed that
the k-plex is connected and this would need to be examined.
REFERENCES Seidman
S and Foster B (1978). A graph theoretic generalization of the
clique concept. J or Math Soc, 6, 139-154.
Seidman S and Foster
B (1978). A note on the potential for genuine cross-fertilization
between anthropology and mathematics. Social Networks 1, 65-72.
#^{301}$^{302}K^{303}NETWORK
> SUBGROUPS > LAMBDA SETS
PURPOSE List
all lambda sets of a graph.
DESCRIPTION The
edge connectivity of a pair of vertices is the minimum number of edges
which must be deleted so that there is no path connecting them.
A lambda set is
a maximal subset of vertices with the property that the edge connectivity
of any pair of vertices within the subset is strictly greater than the
edge connectivity of any pair of vertices, one of which is in the subset
and one of which is outside.
Hence if l(a,b)
represents the edge-connectivity of two vertices a and b from a graph
G(V,E) then a subset S is a lambda set if it is the maximal set with
the property that for all a,b,c e S and d e V-S then l(a,b)
> l(c,d).
The algorithm employed
first computes the maxima flow (i.e. the connectivity) between all pairs
of vertices (see 3G02W15) and uses this information to construct
the lambda sets.
NETWORKS>COHESION>MAX FLOW
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
(Output) Partition Matrix: (Default = 'LambdaSetsPart')
Name of file which
contains partition indicator matrix which corresponds to the hierarchical
clustering produced in the LOG FILE. A value of k in a column labeled
i and row j means that actor j is in partition k; the other members
of the partition form a lambda set with minimum edge-connectivity i.
Actor k is always a member of partition k, and is a representative label
for the group. This matrix is not displayed in the LOG FILE.
(Output) Lambda Matrix: (Default = 'LambdaSetsFlow')
Name of data file
containing maximum flows between all pairs of vertices.
(Output) Permutation Vector: (Default = 'LambdaSetsPerm')
Name of data file
which contains the permutation of the nodes used in constructing the
single link hierarchical clustering diagram below.
LOG FILE An
hierarchical clustering dendrogram, each level corresponding to a different
degree of minimum internal edge-connectivity. This value characterizes
the lambda set. The level at which any pair of actors are aggregated
is the point at which both can be reached by tracing from the start
to the actors from right to left. The scale at the top gives the level
at which they are clustered. The diagram can be printed or saved. Parts
of the diagram can be viewed by moving the mouse to the beginning
of a line in the dendrogram and clicking. The first click will highlight
a portion of the diagram and the second click will display just the
highlighted portion. To return to the original right click on the mouse.
There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale
labels) then you should take the partition indicator matrix into the
spreadsheet editor remove or reduce the labels and then submit the edited
data to Tools>Dendrogram>Draw.
In the clustering diagram each level corresponding to a different value
of 'k' in k-core. Behind the dendrogram is a clustering diagram representing
the same thing. The columns are rearranged and labeled.
A '·' in
row labeled i column label j indicates that vertex j is not in a lambda
set of minimum connectivity i. An 'X' indicates that vertex j
is a member of the lambda set, all other members of j's lambda set are
found by tracing along row labeled i in both directions from column
j until a space is encountered in each direction. The column labels
corresponding to an 'X' which are connected to j's X are all members
of j's lambda set with minimum connectivity i.
The single link
hierarchical diagram is followed by a maximum flow matrix. The
maximum flow between i and j is given by the value in row i column j.
The diagonal is set equal to the number of vertices, theoretically this
value should be infinite.
TIMING 0(N^4).
COMMENTS Note
this algorithm works on integer valued graphs by the natural extension
of connectivity to minimum weight cutsets.
REFERENCES Borgatti
S P, Everett M G and Shirey P R (1990). 'LS Sets, Lambda Sets
and other cohesive subsets'. Social Networks 12, 337-357.
#^{304}$^{305}K^{306}NETWORK
>SUBGROUPS >FACTIONS
PURPOSE Optimizes
a cost function which measures the degree to which a partition consists
of clique like structures using a tabu search method.
DESCRIPTION Given
a partition of a binary network of adjacencies into n groups, then a
count of the number of missing ties within each group summed with the
ties between the groups gives a measure of the extent to which
the groups form separate clique like structures. The routine uses a
tabu search minimization procedure to optimize this measure to find
the best fit.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph.
Number of factions: (Default = 2)
Number of partitions
into which the data needs to be split.
Maximum # of iterations in a series: (Default = 20)
The algorithm starts
from an arbitrary partition and attempts to decrease the cost by taking
the steepest descent. If the cost cannot be reduced then the algorithm
continues its search in the neighborhood of the current partition.
This search direction is a mildest ascent direction and from there new
search directions are explored. This exploration only continues
for a fixed number of iterations in a series. If no improvement
is made after the fixed number of iterations the algorithm terminates
with the current minimum. Increasing the parameter gives a more exhaustive
and therefore slower search.
Length of time in penalty box: (Default = 15)
If the algorithm
makes an ascending step then it is possible that the best possible descending
step is the reverse of the direction just taken. This parameter
prohibits a move along the reverse direction for a set number of steps.
The larger the value the more difficult it will be to come back to a
previously explored local minimum, however it will also be more difficult
to explore the vicinity of that minimum. The default has been
shown experimentally to be the most useful.
Number of random starts: (Default = 10 )
The whole procedure
is repeated with a different initial partition. The best of these
are then selected as a minimum.
Random Number Seed:
The random number
seed generates the initial partition. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat the analysis with different
initial configurations. The range is 1 to 32000.
Output partition dataset: (Default = 'FactionsPart')
Name of dataset
which contains a partition indicator vector. This vector has the form
(k1,k2,...ki,...) where ki assigns vertex i to faction ki so that (1
1 2 1 2) assigns vertices 1, 2 and 4 to faction 1 and 3 and 5 to faction
2. This vector is not displayed in the LOG FILE.
Output sets dataset: (Default ='FactionsSets')
Name of dataset
which contains the sets information.
LOG FILE The value of the cost function.
The group assignments. A list of the factions labeled, each faction is specified by the vertices it contains.
A grouped adjacency
matrix. A blocked permuted adjacency matrix where the diagonal
blocks correspond to the factions.
TIMING Each
iteration of the tabu search algorithm is O(N^2). Random tests
with default parameters as specified indicate O(N^2.5).
COMMENTS Care
should be taken when using this routine.
The algorithm seeks
to find the minima of the cost function. Even if successful this
result may still be a high value in which case the factions may not
represent cohesive subgroups.
In addition there
may be a number of alternative partitions which also produce the minimum
value; the algorithm does not search for additional solutions.
Finally it is possible that the routine terminates at a local minima
and does not locate the desired global minima.
To test the robustness
of the solution the algorithm should be run a number of times from different
starting configurations. If there is good agreement between these
results then this is a sign that there is a clear split of the data
into the reported factions.
REFERENCES de
Amorim S G, Barth??lemy J P and Ribeiro (1990).
Clustering and Clique Partitioning: Simulated Annealing and Tabu
Search Approaches. Research report from Groupe d'??tudes et de recherche en analyse des
d??cisions.
Ecole des Hautes Etudes Commerciales, Ecole Polytechnique, Universit??
McGill.
F Glover (1989).
Tabu Search - Part I. ORSA Journal on Computing 1, 190-206.
F Glover (1990).
Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
#^{307}$^{308}NETWORK
>SUBGROUPS >F-GROUPS
PURPOSE Find
mutually exclusive groups based upon weak transitivity.
DESCRIPTION A
tripple of three vertices x,y and z taken from a valued graph is weakly
transitive if whenever there is a tie connecting x to y and y to z stronger
than some fixed value s then there exists a tie connecting x and z greater
than a smaller value w.
This routine checks
for weak transitivity by taking the largest value in the graph for s
and a user prescribed value for w. The value of s is reduced until a
triple is found that violates the weak transitivity condition. The value
of s is then used to dichotomize the graph. The components (weak components
for digraphs) of this dichotomized graph form the mutually exclusive
F-groups.
PARAMETERS
Input dataset
Name of file
containing network to be analyzed. Data type: valued graph.
Cut-off value for absent ties (Default =0.0)
Value below which
ties are ignored. The value w in the desription above.
Output dataset (Granovetter)
Name of which will
contain the strong and weak tie matrix described in the LOG FILE.
Output Frequencies dataset
Name of file that
will contain the number and percentage of absent weak and strong ties.
LOG FILE Value above which ties are considered strong. That is the value s in the above description.
The F-groups with
more than 2 actors in them are then displayed.
A matrix is then
displayed. A value of zero indicates that a tie was below the cut-off
value for absent ties ie below w. A value of 2 indicates a tie above
the value of the strong tie, ie above s and ties that lie between these
values are coded as 1.
A frequency table
that counts the ties values in the new matrix is then presented. This
gives frequencies and the frequencies expressed as a percentage.
TIMING O(N^3)
COMMENTS None
REFERENCES None
#^{309}$^{310}K^{311}NETWORK
> EGO NETWORKS > DENSITY
PURPOSE Compute
standard ego network measures for every actor in a network.
DESCRIPTION This
routine systematically constructs the ego network for every actor within
the network and computes a collection of ego network measures. For directed
data both in and out networks can be considered separately or together.
PARAMETERS
Input network:
Name of file which
contains network to be analyzed. Data type: Digraph.
Type of ego neighborhood: (Default = UNDIRECTED)
Choices are:
UNDIRECTED-considers all actors connected to and from ego.
IN-NEIGHBORHOOD-considers only actors with a tie to ego.
OUT-NEIGHBORHOOD-considers only actors with a tie from
ego.
Output dataset (Default = EgoNet)
Name of file containing
ego-by-variable matrix.
LOG FILE A
table of ego network measures. All measures exclude ties involving ego
itself. The measures include the following:
Size. The
number of actors (alters) that ego is directly connected to.
Ties. The
total number of ties in the ego network (not counting ties involving
ego).
Pairs.
The total number of pairs of alters in the ego network -- i.e., potential
ties.
Density.
The number of ties divided by the number of pairs, times 100.
Avgdist.
The average geodesic (graph-theoretic) distance between pairs of alters.
This is only computed for networks in which every alter is reachable
from every other.
Diameter.
The longest geodesic distance within the ego network (unless infinite).
NweakComp.
The number of weak components in the ego network.
PweakComp.
The number of weak components as a percentage of the number of alters.
2StepReach.
The number of alters expressed as a percentage that are within 2 links
of ego.
ReachEffic.
2-step reach as a percentage of the number of alters plus the sum of
the their network sizes.
TIMING O(N^3)
COMMENTS None
REFERENCES None
#^{312}$^{313}K^{314}NETWORKS
> EGO NETWORKS > STRUCTURAL HOLES
PURPOSE Compute
measures of structural holes.
DESCRIPTION Compute
several measures of structural holes, including all of the measures
developed by Ron Burt. The measures are computed for all nodes in the
network, treating each one in turn as ego.
PARAMETERS
Input dataset:
Name of file containing
network to analyze. Data type: Directed Graph.
Output structural holes dataset: (default = 'holes')
Name of actor-by-variable
matrix to hold structural hole measures.
Output dyadic redundancy dataset: (default = 'redund').
Name of actor-by-actor
matrix that indicates the extent to which the column actor (an alter)
is a redundant contact for the row actor (ego).
Output dyadic constraint dataset: (default = 'const').
Name of actor-by-actor
matrix that indicates the extent to which the row actor (ego) is constrained
by each other actor its ego network.
LOGFILE
Three tables are
output. First is the set of monadic (nodal) structural hole measures
based on redundancy and constraint. The following measures are displayed:
effsize.
Burt's measure of the effective size of ego's network (essentially,
the number of alters minus the average degree of alters within the ego
network, not counting ties to ego).
efficiency.
The effective size divided by the number of alters in ego's network.
constraint.
Burt's constraint measure (equation 2.4, pg. 55 of Burt, 1992). Essentially
a measure of the extent to which ego is invested in people who are invested
in other of ego's alters.
hierarchy.
Burt's adjustment of constraint (equation 2.9, pg 71), indicating the
extent to which constraint on ego is concentrated in a single alter.
The second table
is the dyadic redundancy matrix. For each ego (rows) it gives the extent
to which each of its alters are tied to all of ego's other alters (i.e.,
the extent to which the alter is redundant).
The third table
is the dyadic constraint matrix. For each ego (rows) it gives the extent
to which it is constrained by each of its alters. Ego is contained by
alter j if (a) j represents a large proportion of ego's
relational investment, and (b) if ego is heavily invested in other people
who are in turn heavily invested in j. In short, j
constrains Ego if ego is heavily invested in j directly and indirectly.
TIMING O(N^3)
REFERENCES Burt,
R.S. 1992. Structural Holes: The social structure of competition.
Cambridge: Harvard University Press.
#^{315}$^{316}K^{317}NETWORK
> CENTRALITY > DEGREE
PURPOSE Calculates
the degree and normalized degree centrality of each vertex and gives
the overall network degree centralization.
DESCRIPTION The
number of vertices adjacent to a given vertex in a symmetric graph is
the degree of that vertex. For non-symmetric data the in-degree
of a vertex u is the number of ties received by u and the out-degree
is the number of ties initiated by u. In addition if the data
is valued then the degrees (in and out) will consist of the sums of
the values of the ties. The normalized degree centrality is the
degree divided by the maximum possible degree expressed as a percentage.
The normalized values should only be used for binary data.
For a given binary network with vertices v_{1}....v_{n} and maximum degree centrality c_{max}, the network degree centralization measure is S(c_{max} - c(v_{i})) divided by the maximum value possible, where c(v_{i}) is the degree centrality of vertex v_{i}.
The routine calculates
these measures and some descriptive statistics based on these measures.
Directed graphs may be symmetrized and the analysis is performed as
above, or an analysis of the in and out degrees can be performed.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued Graph
Treat data as symmetric: (Default = Yes).
If Yes directed data is automatically converted to undirected by taking the underlying graph.
No gives
a separate analysis for in and out-degrees.
Count reflexive ties (diagonal values)? (Default = No).
No means that self
loops are ignored.
Output dataset:
(Default = 'FreemanDegree').
Name of file which
will contain degree and normalized degree centrality of each vertex.
LOG FILE A table which contains a list of the degree and normalized degree (n Degree) centralities expressed as a percentage for each vertex, together with the share. The share is the centrality measure of the actor divided by the sum of all the actor centralities in the network. These have been ordered so that the actor with the highest centrality appears first. Note the stored UCINET output file retains the original order.
Descriptive statistics
which give the mean, standard deviation, variance, minimum value and
maximum value for each list generated. This is followed by the
degree network centralization index expressed as a percentage.
For directed data
the tables are the same as for undirected except that separate values
are calculated for in and out degrees.
TIMING O(N).
COMMENTS Degree
centrality measures network activity. For valued data the non-normalized
values should be used and the degree centralization should be ignored.
REFERENCES Freeman
L C (1979). 'Centrality in Social Networks: Conceptual clarification',
Social Networks 1, 215-239.
#^{318}$^{319}K^{320}NETWORK
> CENTRALITY > CLOSENESS
PURPOSE Calculates
the farness and normalized closeness centrality of each vertex and gives
the overall network closeness centralization.
DESCRIPTION The farness of a vertex is the sum of the lengths of the geodesics to every other vertex. The reciprocal of farness is closeness centrality. The normalized closeness centrality of a vertex is the reciprocal of farness divided by the minimum possible farness expressed as a percentage. As an alternative to taking the reciprocal after the summation, the reciprocals can be taken before. In this case the closeness is the sum of the reciprocated distances so that infinite distances contribute a value of zero. This can also be normalized by dividing by the maximum value. In addition the routine also allows the use user to measure distance by the sums of the lengths of all the paths or all the trails. If the data is directed the routine calculates separate measures for in-closeness and out closeness.
For a given network
with vertices v_{1}....v_{n} and maximum closeness centrality c_{max},
the network closeness centralization measure is S(c_{max} - c(v_{i})) divided by the maximum value possible,
where c(v_{i}) is the closeness centrality of vertex
v_{i}.
The routine calculates
centrality, network closeness centralization and some descriptive statistics
based on these measures for symmetric and directed graphs.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph
Type:
Choices are:
Freeman (geodesic paths) distances are lengths of geodesic paths, the standard Freeman measure.
Reciprocal Distances distances are the reciprocal of the lengths of the geodesic paths.
All paths distances between actors are the sums of the distances on all paths connecting them.
All
trails distances between
the actors are the sums of the distances on all trails connecting them.
Output Dataset: (Default = 'Closeness')
Name of file which
will contain farness and normalized closeness centrality of each vertex.
LOG FILE A table which contains a list of the farness (or closeness) and normalized closeness centrality expressed as a percentage, for each vertex. These have been ordered so that the actor with the highest centrality appears first. Note the stored UCINET output file retains the original order.
Descriptive statistics
which give the mean, standard deviation, variance minimum value and
maximum value for both lists. This is followed by the closeness
network centralization index expressed as a percentage. If the data
is directed then separate in and out values are calculated.
TIMING O(N^3)
for Freeman and reciprocal distances, the other two can be exponential.
COMMENTS Closeness centrality be thought of as an index of the expected time-until-arrival for things flowing through the network via optimal paths.
For the routine
to work any graph should be connected and any digraph strongly connected.
If this is not the case the routine usess a value one higher than the
maximum possible as the distance measure.
REFERENCES Freeman
L C (1979). 'Centrality in Social Networks: Conceptual clarification'.
Social Networks 1, 215-239.
#^{321}$^{322}K^{323}NETWORK
> CENTRALITY > BETWEENNESS > NODES
PURPOSE Calculates
the betweenness and normalized betweenness centrality of each vertex
and gives the overall network betweenness centralization.
DESCRIPTION Let
bjk be the proportion of all geodesics linking vertex j and vertex k
which pass through vertex i. The betweenness of vertex i is the
sum of all bjk where i, j and k are distinct. Betweenness is therefore
a measure of the number of times a vertex occurs on a geodesic.
The normalized betweenness centrality is the betweenness divided by
the maximum possible betweenness expressed as a percentage.
For a given network
with vertices v_{1}....v_{n} and maximum betweenness centrality
c_{max}, the network betweenness centralization
measure is S(c_{max}
- c(v_{i})) divided by the maximum value possible,
where c(v_{i}) is the betweenness centrality of
vertex v_{i}.
The routine calculates
these measures, and some descriptive statistics based on these measures,
for symmetric and unsymmetric graphs.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph.
Output dataset: (Default = 'FreemanBetweenness').
Name of file which
will contain betweenness and normalized betweenness centrality of each
vertex.
LOG FILE A table which contains a list of the betweenness and normalized betweenness centrality expressed as a percentage for each vertex.These have been ordered so that the actor with the highest centrality appears first. Note the stored UCINET output file retains the original order.
Descriptive statistics
which give the mean, standard deviation, variance, minimum value and
maximum value for both lists. This is followed by the betweenness
network centralization index expressed as a percentage.
TIMING O(N^3).
COMMENTS Betweenness centrality measures information control.
Care should be
taken in interpreting betweenness for directed data.
REFERENCES Freeman
L C (1979). 'Centrality in Social Networks: Conceptual Clarification'.
Social Networks 1, 215-239.
#^{324}$^{325}NETWORKS
> CENTRALITY > REACH CENTRALITY
PURPOSE Counts
the number of nodes each node can reach in k or less steps. For
k = 1, this is equivalent to degree centrality. For directed networks,
both in-reach and out-reach are calculated.
DESCRIPTION The input is a binary network. The output is a node by distance matrix X in which xij indicates the proportion of nodes that node i can reach in j or fewer steps. In a connected network, each row will eventually reach 1 (100%). The routine also calculates the eccentricity of each node. That is the distance of the node in question to the one that is furthest away.
In addition, the
routine calculates some descriptive statistics based on these measures
for symmetric graphs.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph
Output Dataset: (Default = 'ReachCentrality')
Name of file which
will contain reach proportions for each node at each level of distance.
LOG FILE A table that gives the proportion of nodes reached by each node at each level of distance. The proportion is expressed as a value from zero to one. A value of x in row i column j means that 100x% of nodes are reachable from i in a path of length j or less. For directed data values for those that can be reached from the node and those that can reach the target node are reported. Descriptive statistics which give the mean, standard deviation, variance minimum value and maximum value for the proportion are given.
Finally the eccentricity
of each node is given, for directed data both in and out eccentricity
are calculated.
TIMING O(N^2).
COMMENTS When
searching for key individuals who are well positioned to reach many
people in a few number of steps, this measure provides a natural metric
for assessing each node.
REFERENCES
#^{326}$^{327}NETWORK
> CENTRALITY > BETWEENNESS >LINES
PURPOSE Calculates
the betweenness centrality of each line.
DESCRIPTION Let
bjk be the proportion of all geodesics linking vertex j and vertex k
which pass through edge i. The betweenness of edge i is the sum
of all bjk where j and k are distinct. Betweenness is therefore
a measure of the number of times an edge occurs on a geodesic.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph.
Output dataset: (Default = 'EdgeBetweenness').
Name of file which
will contain the betweenness centrality of each edge.
LOG FILE A
matrix in which the i,j th entry gives the edge betweenness of the edge
(i,j).
TIMING O(N^3).
COMMENTS Betweenness
centrality measures information control.
REFERENCES Freeman
L C (1979). 'Centrality in Social Networks: Conceptual Clarification'.
Social Networks 1, 215-239.
#^{328}$^{329}NETWORK
> CENTRALITY > BETWEENNESS >HIERARCHICAL REDUCTION
PURPOSE Produces
a hierarchically nested set of vertices based on betweenness.
DESCRIPTION The
betweenness of each vertex is calculated and those with a score of zero
are deleted, the procedure is then repeated on the reduced graph until
all vertices have been deleted. Initially all vertices are placed in
the hierarchy and then at each level the deleted vertices are removed.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph.
(Output) (Default = 'hierbet').
Name of file which
will contain the partition vector. The vector consists of a single row
with each column corresponding to a vertex. A value k in column i means
that actor i was deleted after k iterations.
(Output) Partition (Default =hierbetpart)
Name of dataset
to contain the partition-by-item incidence matrix. Each column of this
matrix corresponds to a cluster labeled by the level of the cluster.
A value of 1 in a column labeled x and row j means that actor j was
in the cluster at level x.
LOG FILE The partition vector described above. A cluster diagram in which the columns have been re-arranged so that actors in the same cluster at each level are consecutive. A value of 1 in a row labeled x and column labelled j means that actor j was in the cluster at level x.
TIMING O(N^3).
COMMENTS
REFERENCES Freeman
L C (1979). 'Centrality in Social Networks: Conceptual Clarification'.
Social Networks 1, 215-239.
#^{330}$^{331}K^{332}NETWORK
> EGO NETWORKS > BROKERAGE
PURPOSE Calculates
the brokerage measures proposed by Gould & Fernandez (1989).
DESCRIPTION Given
(a) a graph, and (b) a partition of nodes, this procedure calculates
measures of five kinds of brokerage. Brokerage occurs when, in a triad
of nodes A, B and C, A has a tie to B, and B has a tie to C, but A has
no tie to C. That is, A needs B to reach C, and B is therefore a broker.
When A, B, and C may belong to different groups, 5 kinds of brokerage
are possible. The five kinds are named using terminology from social
roles. In the description below, the notation G(x) is used to indicate
the group that node x belongs to. Important: It is assumed
that a-->b-->c. For example, a (the source node) gives information
to b (the broker), who gives information to c (the destination node).
Coordinator. Counts
the number of times b is a broker and G(a) = G(b) = G(c), that is, all
three nodes belong to the same group.
Consultant.
Counts the number of times b is a broker and G(a) = G(c), but G(b)¹
G(a); that is, the broker belongs to one group, and the other two belong
to a different group.
Gatekeeper.
Counts the number of times b is a broker and G(a) ¹ G(b) and G(b) = G(c), that is, the
source node belongs to a different group.
Representative.
Counts the number of times b is a broker and G(a) = G(b) and G(c) ¹
G(b). That is, the destination node belongs to a different group.
Liaison.
Counts the number of times b is a broker and G(a) ¹ G(b) ¹ G(c). That is, each node belongs to
a different group.
When b is not the
only intermediary between a and c, it is possible to give b only partial
credit. That is, if there are two paths of length two between a and
c, one of which involves b, we can choose to give b only 1/2 point instead
of a full point. This is an option in the program.
The routine calculates
these measures for each node in the network, and also the total of the
five.
The program also
computes the expected values of each brokerage measure given the number
of groups and the size of each group. That is, the expected values under
the assumption that brokerage is independent of the group status of
nodes. A final output divides the observed brokerage values by these
expected scores.
PARAMETERS Input dataset:
Name of file containing
network to be analyzed. Data type: Digraph
Partition vector:
The name of an
UCINET dataset that contains a partition of the actors. To partition
the data matrix into groups specify a vector by giving the dataset name,
a dimension (either row or column) and an integer value. For example,
to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and
use that information to define the groups. All actors with identical
values on the criterion vector (i.e. the second row of attrib) will
be placed in the same group.
Method: (default = 'unweighted')
Choices are 'unweighted'
and 'weighted'. Unweighted directs the program to simply count up the
number of times that a given node b is in a brokering position,
regardless of how many other nodes are serving the same function with
the same pair of endpoints a and c. Weighted directs the
program to give partial scores in inverse proportion to the number of
alternatives.
(Output) Un-normalized Brokerage
Name of the file
containing the raw count of scores for each type of brokerage.
(Output) Normalized Brokerage
Name of file containg
brokerage scores divided by the expected values.
LOG FILE 1) A table giving the brokerage scores for each node.
2) A table giving the brokerage scores divided by the expected values.
3) A table giving
the expected values.
TIMING O(n^3).
COMMENTS None
REFERENCE Gould, J. and Fernandez, J. 1989. Structures of mediation: A formal approach to brokerage in transaction networks. Sociological Methodology :89-126.
#^{333}$^{334}K^{335}NETWORK
> CENTRALITY > FLOW BETWEENNESS
PURPOSE Calculates
the flow betweenness and normalized flow betweenness centrality of each
vertex and gives the overall network betweenness centralization.
DESCRIPTION Let
mjk be the amount of flow between vertex j and vertex k which must pass
through i for any maximum flow. The flow betweenness of vertex
i is the sum of all mjk where i, j and k are distinct and j < k.
The flow betweenness is therefore a measure of the contribution of a
vertex to all possible maximum flows.
The normalized flow
betweenness centrality of a vertex i is the flow betweenness of i divided
by the total flow through all pairs of points where i is not a source
or sink.
For a given binary
network with vertices v_{1}....v_{n} and maximum flow betweenness centrality
c_{max}, the network flow betweenness centralization
measure is S(c_{max}
- c(v_{i})) divided by the maximum value possible,
where c(v_{i}) is the flow betweenness centrality
of vertex v_{i}.
The routine calculates
these measures, and some descriptive statistics based on these measures
for symmetric, unsymmetric and valued graphs.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued symmetric graph - integer
values only.
Output dataset: (Default = 'FlowBetweenness').
Name of file which
will contain flow-betweenness and normalized flow betweenness centrality
of each vertex.
LOG FILE The
maximum flow matrix. This gives the maximum flow between all pairs
of vertices - the diagonals give the network size.
A table which contains
a list of the flow-betweenness and normalized flow betweenness (nFlowbet)
centrality expressed as a percentage for each vertex. Descriptive
statistics which give the mean, standard deviation, variance, minimum
value and maximum value for both lists.
This is followed
by the flow betweenness network centralization index expressed as a
percentage.
TIMING O(N^4).
COMMENTS The
measure is based upon the concept of information flow. In valued
data the values should in some way correspond to the capacity for flow,
hence valued data should represent similarity.
REFERENCES Freeman
L C, Borgatti S P and White D R (1991). 'Centrality in valued
graphs: A measure of betweenness based on network flow'. Social
Networks 13, 141-154.
#^{336}$^{337}K^{338}NETWORK
> CENTRALITY > EIGENVECTOR
PURPOSE Calculates
the eigenvector of the largest positive eigenvalue as a measure of centrality.
DESCRIPTION Given
an adjacency matrix A, the centrality of vertex i (denoted c_{i}),
is given by c_{i} =aSA_{ij}c_{j} where a
is a parameter. The centrality of each vertex is therefore determined
by the centrality of the vertices it is connected to. The parameter
?? is required to give the equations a non-trivial solution and is therefore
the reciprocal of an eigenvalue. It follows that the centralities
will be the elements of the corresponding eigenvector. The normalized
eigenvector centrality is the scaled eigenvector centrality divided
by the maximum difference possible expressed as a percentage.
For a given binary
network with vertices v_{1}....v_{n} and maximum eigenvector centrality
c_{max}, the network eigenvector centralization
measure is S(c_{max}
- c(v_{i})) divided by the maximum value possible,
where c(v_{i}) is the eigenvector centrality of
vertex v_{i}.
This routine calculates
these measures and some descriptive statistics based on these measures.
This routine only handles symmetric data and in these circumstances
the eigenvalues provide a measure of the accuracy of the centrality
measure. To help interpretation the routine calculates all positive
eigenvalues but only gives the eigenvector corresponding to the largest
eigenvalue.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued Graph (Symmetric data only).
Output dataset: (Default = 'BonacichCentrality').
Name of file which
will contain eigenvector centrality measure for each vertex.
LOG FILE A
table of positive eigenvalues. The eigenvalues are placed in descending
order under the heading VALUE. The table gives information on
'how dominant' the largest eigenvalue is. The table gives the
percentage and cumulative percentage of the total eigenvalue sum for
each eigenvalue. The ratio of each eigenvalue to the next largest
is also presented.
This is followed
by a list of vertices which contains the eigenvector and normalized
eigenvector centrality measure for every vertex. These values
should be interpreted in terms of an interval scale.
Finally the network
eigenvector centralization index expressed as a percentage is given.
TIMING O(N^3).
COMMENTS The
ratio of the largest eigenvalue to the next largest should be at least
1.5 and preferably 2.0 or more for the centrality measure to be robust.
If this is not the case then a full factor analysis should be undertaken.
REFERENCES Bonacich
P (1972). Factoring and Weighting Approaches to status scores
and clique identification. Journal of Mathematical Sociology 2,
113-120.
#^{339}$^{340}K^{341}NETWORK
> CENTRALITY > POWER
PURPOSE Compute
Bonacich's power based centrality measure for every vertex and give
an overall network centralization index for this centrality measure.
DESCRIPTION Given
an adjacency matrix A, the centrality of vertex i (denoted c_{i}),
is given by c_{i} =SA_{ij(}a+bc_{j)} where a and b are parameters. The centrality
of each vertex is therefore determined by the centrality of the vertices
it is connected to.
The value of a
is used to Normalize the measure, the value of b is an attenuation factor which gives
the amount of dependence of each vertex's centrality on the centralities
of the vertices it is adjacent to. The Normalization parameter
is automatically selected so that the sum of squares of the vertex centralities
is the size of the network.
The parameter b
is selected by the user, negative values should be selected if an individual's
power is increased by being connected to vertices with low power and
positive values selected if an individual's power is increased by being
connected to vertices with high power.
For a given binary
network with vertices v_{1}....v_{n} and maximum degree centrality c_{max},
the network degree centralization measure is S(c_{max} - c(v_{i})) divided by the maximum value possible,
where c(v_{i}) is the degree centrality of vertex
v_{i}.
The routine calculates
power centrality and some descriptive statistics of the measure.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued Graph (Symmetric data only).
Value of attenuation factor (Beta): (Default = 0.0).
A value of
0 gives a centrality measure directly proportional to the degree of
each vertex. Positive values give weight to being connected to
powerful actors, negative values give weight to being connected to low
powered actors. Larger values in modulus gives greater weight
to actors further away.
Output dataset: (Default = 'BonacichPower').
Name of file which
contains power centrality measure for every vertex.
LOG FILE A table which contains the power centrality of each actor.
Descriptive statistics
which give the mean, standard deviation, variance, minimum value and
maximum value for the measure.
TIMING O(N^3).
COMMENTS It
is advisable to select b so that its absolute value is less than the
absolute value of the reciprocal of the largest eigenvalue of the adjacency
matrix. An upper-bound on the eigenvalues can be obtained by the
largest row or (column) sums of the matrix.
REFERENCES Bonnacich
P (1987). Power and Centrality: A family of Measures. American
Journal of Sociology 92, 1170-1182.
#^{342}$^{343}K^{344}NETWORK>CONNECTIONS>HUBBEL/KATZ
(INFLUENCE)
PURPOSE Calculate
the influence measure between every pair of vertices using the models
of Hubbell, Katz or Taylor.
DESCRIPTION Successive
powers of matrices provide measures of influence since they enumerate
the number of possible walks of given length between all pairs of nodes.
Since longer walks are assumed to contribute less in terms of influence,
an attenuation factor is included and the sum of all walks is taken.
Hubbell includes the identity matrix in the series whereas Katz does
not.
For Hubbell the
influence matrix is I + S(bA)^i that equals inverse of (I - bA)
under certain conditions. It follows that for Katz the influence matrix
is inverse of (I - bA) -I under the same condition. Taylor's
measure is a normalized version of the Katz measure. For each power
in the series subtract the column marginals from the row marginals and
normalize by the total number of walks of that length.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued graph.
Computational Method:
Choices are:
Hubbel - influence matrix defined by inverse
of (I - bA)
where A is the adjacency matrix and b is the attenuation factor.
Katz - influence matrix defined by inverse
of (I - bA)
- I where it is the adjacency matrix and b is the attenuation factor.
Taylor - takes the Katz influence matrix
and takes the column marginals from the row marginals and normalizes.
Attenuation Factor (Beta): (Default = 0.5)
The value of the
attenuation factor. This value should be smaller than the reciprocal
of the absolute value of the dominant eigenvalue. This can be
guaranteed by using the simple bound that all eigenvalues are smaller
than the largest row (or column) sum.
Divide matrix by overall sum: (Default = NO)
Dividing the initial
matrix by the sum of all its elements guarantees that the series will
converge.
Output dataset:(Default = 'Influence')
Name of file which
will contain the influence matrix. Row i column j will give actor
i's influence over actor j.
LOG FILE Influence
matrix.
TIMING O(N^3).
COMMENTS None.
REFERENCES Hubbell
C H (1965). 'An input-output approach to clique identification'.
Sociometry, 28, 377-399.
Katz L (1953).
'A new status index derived from sociometric data analysis'. psychometrika,
18, 34-43.
Taylor M (1969).
'Influence structures'. Sociometry 32, 490-502.
#^{345}$^{346}K^{347}NETWORK
> CENTRALITY > INFORMATION
PURPOSE Calculate
the Stephenson and Zelen information centrality measure for each vertex,
and give an overall network information centralization index.
DESCRIPTION The
weighted function of the set of all paths connecting vertex i to vertex
j is any weighted linear combination of the paths such that the sum
of the weights is unity. Assuming that each link in a path is
independent, and the variance of a single link is unity, it can be concluded
that the variance of a path is simply its length.
The information
measure between two vertices i and j is the inverse of the variance
of the weighted function. The information centrality of a vertex
i is the harmonic mean of all the information measures between i and
all other vertices in the network.
The routine calculates
these measures and some descriptive statistics based on these measures
for symmetric graphs.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Include diagonal in calculations? (Default = NO).
If NO self-loops
are ignored.
Output dataset: (Default = 'Information').
Name of file which
will contain information content and normalized information centrality
of each vertex.
LOG FILE A
table which contains a list of the information content Together with
descriptive statistics which give the mean, standard deviation, variance,
minimum value and maximum value.
TIMING O(N^3).
COMMENTS None
REFERENCES Stephenson
K and Zelen M (1991). 'Rethinking Centrality'. Social Networks
13.
#^{348}$^{349}NETWORKS>CENTRALITY>MULTIPLE
MEASURES
PURPOSE Computes
four normalized centrality measures: degree, closeness, betweenness,
and eigenvector.
DESCRIPTION Only normalized versions of the measures for undirected data are given . There are no descriptive statistics nor are there any centralization measures.
PARAMETERS Input dataset:
Name of file containing
network to be analyzed. Data type: Graph
Output Dataset: (Default = 'Centrality')
Name of file which
will contain centrality measures for each node.
LOG FILE A
table of centrality measures.
TIMING O(N^2).
COMMENTS
REFERENCES See
individual measures.
#^{350}$^{351}GROUP
> CENTRALITY > DEGREE > FIND
PURPOSE Find
a group with a specified size with the highest group degree centrality.
DESCRIPTION The
group degree centrality of a group of actors is the size of the set
of actors who are directly connected to group members. This routine
uses a simple greedy algorithm to optimize this measure for a fixed
size group. Local minima are avoided by taking a number of different
random starting configurations.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Desired Group Size (Default = 10).
Specified size
of group.
No. of starts: (Default = 100)
Number of random
starts used to avoid local minima
Output dataset: ('BestDegGroup')
Name of UCINET
dataset containing a group indicator vector. The rows give the actors
and an actor is in the group with the largest group degree centrality
if the entry in the vector is a 1. This vector is not shown in the LOGFILE.
LOG FILE The
fit is the percentage of actors (both within and outside) adjacent to
group members. The starting fit, final fit and the number of actors
together with the final number of actors connected to the group are
reported. This is followed by a list of the members of the group with
the highest group degree centrality.
TIMING O(N^2).
COMMENTS Note
that this routine just finds one group. There could be many others.
REFERENCE Everett,
M.G. and Borgatti, S.P. (1999) The Centrality of Groups and Classes.
Journal of Mathematical Sociology 23 181-202.
#^{352}$^{353}GROUP
> CENTRALITY > DEGREE > TEST
PURPOSE Performs a permutation test to assess
whether a specified group has a high degree group centrality score.
DESCRIPTION The group degree centrality of a group
of actors is the size of the set of actors who are directly connected
to group members. This routine uses a simple sampling procedure to test
whether a specified group has a higher group degree centrality measure
than those produced at random.
PARAMETERS Input Network:
Name of file containing
network to be analyzed. Data type: Graph.
Central Group
Name of UCINET
file containing a column vector which specifies the actors in the specified
group. A 1 in row j indicates that actor j is in the group and a 0 indicates
that the actor is not a member.
Number of permutations (Default = 5000)
Number of permutations
taken in the random sampling procedure.
LOG FILE The group degree centrality for the
specified data set, this is labelled as the observed # reached. The
mean and standard deviation of the group centrality for the random samples.
Finally the number of times, expressed as a p-value, that a random sample
achieved a group centrality score as high or higher than the specified
group.
TIMING O(N^2).
COMMENTS None
REFERENCE Everett,
M.G. and Borgatti, S.P. (1999) The Centrality of Groups and Classes.
Journal of Mathematical Sociology 23 181-202.
#^{354}$^{355}K^{356}NETWORK
> CORE/PERIPHERY > CONTINUOUS
PURPOSE Fit
a continuous (ratio-level) core/periphery model to a data network, and
estimate the coreness of each actor.
DESCRIPTION Simultaneously
fits a core/periphery model to the data network and estimates the degree
of coreness or closeness to the core of each actor. This is done by
finding a vector C such that the product of C and C transpose is as
close as possible to the original data matrix. In addition a number
of measures which try to assess the degree to which the network falls
into a core/periphery structure for different sizes of core are calculated.
Each measure starts with the actor with the highest coreness score and
places them in the core and all other actors are placed in the periphery.
The core is then successively increased by moving the actor with the
highest coreness score from the periphery into the core. This is continued
until the periphery consists of a single actor. nDiff is a generalization
of centralization and sums the differences between the actor in the
core with the lowest coreness score with all those in the periphery
and adds to this the sum of the difference between the actor with the
highest score in the periphery and all the actors in the core. This
value is then normalized. Diff is similar but places a weighting on
the size of the core, this weighting is equal to the square root of
the core size and so the measure gives greater value to smaller cores.
The correlation measure correlates the given coreness scores with the
ideal scores of a one for every core member and a zero for actors in
the periphery. Finally, Ident is the same as the correlation measure
but uses Euclidean distance in place of correlation.
PARAMETERS Input dataset:
Name of file containing
network to be analyzed. Data type: Valued Digraph.
Data are Pos or Neg: (Default = POSITIVE)
Use positive to indicate that larger values imply
a stronger relationship. Use negative to indicate that larger values in
the data imply a more distant relationship.
Use Corr or Distance: (Default = CORR)
Which measure of
fit to use. Corr measures the correlation between the
data matrix and the product of C and C transpose. Distance uses Euclidean distance in place of
correlation, in this case C is simply the principal eigenvector. Minres is factor analysis without diagonals
Prevent Negatives:
It is possible
for the best C to contain negative values, choosing yes prevents this happening.
Max # of iterations: (Default = 1000)
The maximum number
of iterations used in the optimization procedure.
Diagonal values valid: (Default = NO)
If NO diagonal
values are ignored.
Output dataset: (Default = 'Coreness')
Name of file containing
coreness values.
LOG FILE The correlation or Euclidean distance between the model and the data at the start and end of the optimization procedure together with the number of iterations required. Minres option just gives the final correlation.
The coreness of each actor, this has been normalized so that the sum of squares is one. Followed by some descriptive statistics including gini coefficients and an heterogeneity measure. The gini coefficient measures how the scores are distributed over the population and measures the amount of inequality in the data. If everyone had the same score it gives a value of zero, if a single actor had a value of 1 and everyone else had a score of zero it gives a value of 1. The composite score is an adjusted measure which takes account of the fact that we are looking for core-periphery structures. The heterogeneity measure is based on a simple summing of proportions which measures the extent to which the scores are evenly distributed.
This is followed by a table of the four concentration measures which assess the extent to which the data fits a core periphery structure. Each column gives a different measure, the value in row i places the i actors with the highest coreness in the core and the remainder in the periphery.
This is followed by a recommended core size based on the correlation measure. See the comments below.
Finally the expected
values are given, this is C times C transpose and then normalized so
that it has the same mean and standard deviation as the data.
TIMING O(N^3)
COMMENTS The
concentration measures can need careful interpretation. If nDiff has
a clear maxima which is not at 1 or n-1 then this indicates a solid
core periphery structure. Often nDiff has a number of maxima indicating
that there are a group of actors situated between the core and the periphery.
If the user still wishes to specify a core then the other measures can
be used. Diff is a biased measure and gives more weight to smaller cores
and again if this has a clear maxima this can indicate a core. If this
does not yield any conclusive results or there is no requirement to
favor smaller cores then it is recommended that the correlation is used
together with nDiff or Diff. The correlation measure can indicate an
area in which to focus and the other measures can be used to fine tune
the measure to identify a core size. Ident should be used in the same
way as correlation but it places more weight on the absolute scores.
REFERENCES Borgatti SP and Everett M G (1999) Models of core/periphery structures. Social Networks 21 375-395
Comrey AL (1962)
The minimum residual method for factor analysis. Psychological Reports
11, 15-18.
#^{357}$^{358}K^{359}NETWORK
> CORE/PERIPHERY > CATEGORICAL
PURPOSE Uses
a genetic algorithm to fit a core/periphery model to the data.
DESCRIPTION Simultaneously
fits a core/periphery model to the data network, and identifies which
actors belong in the core and which belong in the periphery.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Valued Digraph.
Data are Pos or Neg: (Default = POSITIVE)
Use positive
to indicate that larger values imply a stronger relationship. Use
negative to indicate that larger values in the data imply a more
distant relationship.
Algorithm: (Default = CORR)
Choices are:
CORR
The fit function is the correlation between the permuted data matrix and an ideal structure matrix consisting of ones in the core block interactions and zeros in the peripheral block interactions. This value is maximized.
DENSITY
The fit function is
the density of the core block interactions.This value is maximized.
SXY
The fit function is
the element wise product of the permuted data matrix and an ideal structure
matrix consisting of ones in the core block interactions and zeros in
the peripheral block interactions.This value is maximized.
EMPTYPER
The fit function is
the number of entries in the peripheral block interactions. This value
is minimized.
Density of core-to-periphery blocks:
This sets the density
of the core to periphery ties in the ideal structure matrix.. If left
blank or the word missing is entered these ties are ignored. Any other
value is entered into every cell in the off diagonal blocks of the ideal
structure matrix.
Maximum # of iterations: (Default = 200)
Sets the maximum number
of iterations performed.
Population Size: (Default = 100)
Number of genes in
the population.
Output partition: (Default = 'CLUSPART')
Name of output file
which contains a cluster indicator vector. This vector has the
form (k1,k2,...ki...) where ki assigns vertex i to cluster ki where ki is either 1 or 2 where 1
is the core and 2 is the periphery,
so that (1 1 2 1 2) assigns vertices 1, 2 and 4 to the
core, and 3 and 5 to the periphery.
This vector is not displayed at output.
Output cluster indicator matrix: (Default = 'CLUSTERS')
Name of file which
contains a cluster by actor incidence matrix. A 1 in row i column
j indicates that actor j is a member of cluster i,
i = 1 or 2 with 1 representing the core and 2 the periphery. This matrix is not displayed in
the LOG FILE.
LOG FILE The
starting and the final correlation of the ideal structure and the permuted
adjacency matrix (regardless of which option was chosen). A listing
of the members of the core and the periphery. A blocked adjacency matrix
dividing the actors into the core and periphery.
TIMING O(N^2)
per iteration. Correlation is considerably slower than the other options
COMMENTS Care
should be taken when using this routine.
The algorithm seeks
to find the minima (maxima) of the cost function. Even if successful
this result may still be a high (low) value in which case the partition
may not represent a core/periphery model.
In addition there
may be a number of alternative partitions which also produce the minimum
(maximum) value; the algorithm does not search for additional
solutions. Finally it is possible that the routine terminates
at a local minima (maxima) and does not locate the desired global minima
(maxima).
To test the robustness
of the solution the algorithm should be run a number of times from different
starting configurations. If there is good agreement between these
results then this is a sign that there is a clear split of the data
into a core/periphery structure.
REFERENCES Borgatti SP and Everett M G (1999)
Models of core/periphery structures. Social Networks 21 375-395
#^{360}$^{361}K^{362}NETWORK
> ROLES & POSITIONS > STRUCTURAL > PROFILE
PURPOSE Compute
measures of structural equivalence based upon comparisons of rows and
columns of data matrices and forms clusters based upon the results.
DESCRIPTION The
profile of an actor is the row vector corresponding to the actor in
the adjacency matrix. Multiple relations are permissible and the
profile vector is the concatenation of each individual relation profile
vector. This matrix can be real or binary.
Structurally equivalent
actors have the same profile except for the diagonal entries of the
adjacency matrix. This routine compares the profile vectors of
all pairs of actors and hence computes a measure of profile similarity.
Measures of similarity can be made using Euclidean distance, Pearson
correlation, exact matches or matches of positive entries only.
Euclidean distance produces a distance matrix and all the other options
produce a similarity matrix. This matrix is then analyzed by single
link hierarchical clustering.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Multirelational.
Measure of profile similarity/distance: (Default = EUCLIDEAN DISTANCE).
Choices are:
Euclidean
Distance - The distance
between the vectors in n-dimensional space, i.e. the root of the sum
of squared differences.
Correlation
- Pearson product correlation coefficient of every pair of profiles.
Matches - Proportion of exact matches between
all pairs of profiles.
Positive
Matches - Proportion
of exact matches in which at least one element is positive, between
all pairs of profiles.
Method of handling diagonal values: (Default = RECIPROCAL)
Choices are:
Reciprocal - In considering adjacency matrix
X and comparing profile of actor i with actor j we replace the
comparison of elements x_{ii} with x_{ji} and x_{ij} with x_{jj} by the comparisons x_{ii} with x_{jj} and x_{ij} with x_{ji} respectively.
Ignore - Diagonals are treated as
missing values so that the comparisons of x_{ii} with x_{ji} and x_{ij} with x_{jj} are dropped.
Retain - Profile vectors are compared directly
element by element, including the x_{ii} and x_{jj} elements.
Include transpose in calculations?: (Default = YES).
Including transposes
means that profiles correspond to rows and columns. This is obviously
not necessary for symmetric data.
For binary data: convert to geodesic distances: (Default = NO).
Converts binary
data to geodesic data before performing an analysis.
Diagram Type: (Default = 'Dendrogram')
The clustering
diagram can either be a Tree Diagram or a Dendrogram.
(Output) Equivalence matrix: (Default = 'SE').
Name of data file
containing actor by actor equivalence matrix.
(Output) Partition dataset: (Default = 'SEPart').
Name of data file
containing partition indicator matrices derived from single link hierarchical
clustering. A value of k in row labeled x and column j means that
actor j is in partition k at level x. Actor k is always a member
of partition k, and is a representative label for the group. This matrix
is not displayed in the LOG FILE.
LOG FILE Single
link hierarchical clustering dendrogram (or tree diagram) of the structural
equivalence matrix. The level at which any pair of actors are aggregated
is the point at which both can be reached by tracing from the start
to the actors from right to left. The diagram can be printed or
saved. Parts of the diagram can be viewed by moving the mouse to the
split point in a tree diagram or the beginning of a line in the dendrogram
and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return
to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels
need to be edited (particularly the scale labels) then you should take
the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the plot
is the actor by actor structural equivalence matrix. This is followed
by an alternative clustering diagram representing the same information
as above. The columns are rearranged and labeled. A '·'
in column label j at level x means that actor j is not in any cluster
at level x. An x indicates that actor j is in a cluster at this
level together with those actors which can be traced across that row
without encountering a space.
TIMING O(N2).
COMMENTS None.
REFERENCES Burt
R (1976). Positions in Networks. Social Forces, 55, 93-122.
#^{363}$^{364}K^{365}NETWORK
> ROLES & POSITIONS > STRUCTURAL EQUIVALENCE > CONCOR
PURPOSE Partitions
network data by splitting blocks based upon the CONvergence of iterated
CORrelations (CONCOR).
DESCRIPTION Given
an adjacency matrix, or a set of adjacency matrices for different relations,
a correlation matrix can be formed by the following procedure.
Form a profile vector for a vertex i by concatenating the ith row in
every adjacency matrix; the i,jth element of the correlation matrix
is the Pearson correlation coefficient of the profile vectors of i and
j. This (square, symmetric) matrix is called the first correlation
matrix.
The procedure can
be performed iteratively on the correlation matrix until convergence.
Each entry is now 1 or -1. This matrix is used to split the data
into two blocks such that members of the same block are positively correlated,
members of different blocks are negatively correlated.
CONCOR uses the
above technique to split the initial data into two blocks. Successive
splits are then applied to the separate blocks. At each iteration
all blocks are submitted for analysis, however blocks containing two
vertices are not split. Consequently n-partitions of the binary
tree can produce up to 2n blocks.
Note that any similarity
matrix can be used as input.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Multirelational.
Include transpose in calculations?: (Default = YES).
For non-symmetric
data each vertices profile would depend on its out ties only (since
we only consider rows). The in-ties can be considered by adding
the transpose of the data matrices as additional relations.
Method of handling diagonal values: (Default = RECIPROCAL)
Choices are:
Reciprocal - In considering adjacency matrix
X and comparing profile of actor i with actor j we replace the
comparison of elements x_{ii} with x_{ji} and x_{ij} with x_{jj} by the comparisons x_{ii} with x_{jj} and x_{ij} with x_{ji} respectively.
Ignore - Diagonals are treated as
missing values so that the comparisons of x_{ii} with x_{ji} and x_{ij} with x_{jj} are dropped.
Retain
- Profile vectors are compared directly element by element, including
the x_{ii} and x_{jj} elements.
Max depth of splits (not blocks): (Default =2).
How far down the
binary tree splits are to be taken. A value of n can produce up
to 2n blocks.
Convergence criteria: (Default = 0.2).
In practice iterations
are not taken to convergence but taken to within a tolerance TOL.
Convergence is accepted on values of 1.0 - TOL and -1.0 + TOL.
Smaller values of TOL increase computation time but create more robust
solutions.
Maximum iterations: (Default = 25).
The maximum number
of iterations performed on the correlation matrix before terminating
through lack of convergence.
Input is corr mat: (Default ='No')
If the input dataset
is a correlation matrix already then set to 'yes'.
(Output) Partition dataset: (Default = 'ConcorCPart').
Name of file which
contains partition by actor indicator matrix.The indicator matrix has
the same number of rows as specified by the 'Max # of partitions' the
number of columns equals the size of the network. The value k in row
i column label j means that vertex labeled j is in block k at level
i (that is the ith partition). All other members of block k can
be found by simply locating all column labels which correspond to an
entry of k in the matrix. This matrix is not displayed in the LOG FILE.
(Output) Permuted dataset: (Default = 'ConcorCCPerm').
Name of file which
contains permuted vertex vector. Permuted vector is such that
vertices in the same block are grouped together. This vector is not
displayed in the LOG FILE.
(Output) First correlation matrix: (Default = 'Concor1stCorr').
Name of file which
contains the correlation matrix constructed after the first iteration.
LOG FILE The
correlation matrix constructed during the first iteration.
Blocks represented
in terms of a clustering dendrogram. The blocks are given for each level
specified in 'Max # of partitions'. The level at which any pair of actors
are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. Hence to find all members
of vertex i's block at level k simply locate the value of k on the line
connected to i then all actors that can be reached from this point by
tracing to the left are in i's block. The diagram can be printed or
saved. Parts of the diagram can be viewed by moving the mouse to the
split point in a tree diagram or the beginning of a line in the dendrogram
and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return
to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels
need to be edited (particularly the scale labels) then you should take
the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the dendrogram
is the correlation matrix constructed during the first iteration. Followed
by an alternative cluster diagram. Members of the same block are connected
by row of X's. Hence to find all members of vertex i's block at
level k simply locate the X in column label i at level k and trace along
in both directions until a space is encountered. All column labels
corresponding to the Xs found are members of i's block. A '·'
indicates a singleton block.
A blocked adjacency
matrix. The rows and columns of the original adjacency matrix
are permuted into blocks. The adjacency matrix is displayed in
terms of the matrix blocks it contains.
The correlation
coefficient R-squared of the partitioned data matrix and an ideal structure
matrix. The structure matrix has the same dimension as the data
matrix but each cell in a block is set to the average value of the corresponding
block in the data matrix.
TIMING Each
iteration is O(N^3).
COMMENTS The
algorithm splits every non-trivial block at every level. The user
may wish to reject a split at some level - since the history of all
splits are given it is a simple matter to recombine clusters if the
user so wishes.
REFERENCES Breiger
R, Boorman S and Arabie P (1975). An algorithm for clustering
relational data, with applications to social network analysis and comparison
with multi-dimensional scaling. Journal of Mathematical Psychology,
12, 328-383.
#^{366}$^{367}NETWORKS>ROLES
& POSITIONS>STRUCTURAL EQUIVALENCE>OPTIMIZATION>BINARY
PURPOSE Optimizes
a cost function which measures the degree to which a partition forms
structurally equivalent blocks using a tabu search method.
DESCRIPTION A
partition of a network divides the adjacency matrix into matrix blocks. For perfect structural equivalence
each block should consist of zeros or all ones. The number of
errors in a block are the least number of changes required to make either
all zeros or all ones. The
sum of the errors of all the matrix blocks gives a measure
or cost function of the degree of structural equivalence for a given
partition. The routine attempts to optimize this cost function to try
and find the best partition of the vertices into a specified number
of blocks.
PARAMETERS
Input dataset:
Name of file containing
network to be analyzed. Data type: Graph.
Number of blocks: (Default = 2).
Number of groups or
blocks into which the vertices are to be assigned. The number
of matrix blocks will be the square of this number.
Output sets dataset: (Default = 'SbmSets').
Name of file which
contains a block by actor incidence matrix. A 1 in row i column
j indicates that actor j is a member of block i. This matrix is
not displayed in the LOG FILE.
Output Partition Dataset: (Default = 'SbmPart').
Name of output file which contains a partition indicator vector. This vector has the form (k1,k2,...ki...) where ki assigns vertex i to block ki, so that (1 1 2 1 2) assigns vertices 1, 2 and 4 to block 1 and 3 and 5 to block 2.
This vector is not
displayed in the LOG FILE.
Additional
Are diagonal values valid? (Default = NO)
Whether diagonals are
to be included in cost function.
Maximum # of iterations in a series: (Default = 50)
The algorithm starts
from an arbitrary partition and attempts to decrease the cost by taking
the steepest descent. If the cost cannot be reduced then the algorithm
continues its search in the neighborhood of the current partition. This
search direction is a mildest ascent direction and from there new search
directions are explored. This exploration only continues for a
fixed number of iterations in a series. If no improvement is made
after the fixed number of iterations the algorithm terminates with the
current minimum. Increasing the parameter gives a more exhaustive and
therefore slower search.
Random Number Seed:
The random number seed
generates the initial partition. UCINET generates a different
random number as default each time it is run. This number should
be changed if the user wishes to repeat the analysis with different
initial configurations. The range is 1 to 32000.
Length of time in penalty box: (Default =25)
If the algorithm makes
an ascending step then it is possible that the best possible descending
step is the reverse of the direction just taken. This parameter
prohibits a move along the reverse direction for a set number of steps.
The larger the value the more difficult it will be to come back to a
previously explored local minimum, however it will also be more difficult
to explore the vicinity of that minimum. The default has been shown
experimentally to be the most useful.
Number of random starts: (Default = 5)
The whole procedure
is repeated with a different initial partition. The best of these
are then selected as a minimum.
LOG FILE The number of errors and the
R-squared value for the initial partition. The R-squared value is the correlation coefficient
of the partitioned data matrix
and an ideal structure matrix. The structure matrix has the same
dimension as the data matrix but each block is set to a
one or zero corresponding to the nearest
block in the data matrix.
The final number
or errors the R-squared value and the errors in each block after the
optimization.
List of blocks.
Each block is labeled and is specified by the vertices it contains.
The blocked adjacency
matrix. The rows and columns of the original adjacency matrix
are permuted into blocks. The adjacency matrix is displayed in
terms of the matrix blocks it contains.
TIMING Each iteration
of the tabu search algorithm is O(N^2).
COMMENTS Care
should be taken when using this routine.
The algorithm seeks
to find the minima of the cost function. Even if successful this
result may still have a high value in which case the blocking may not
conform very closely to structural equivalence.
In addition there may
be a number of alternative partitions which also produce the minimum
value; the algorithm does not search for additional solutions.
Finally it is possible that the routine terminates at a local minima
and does not locate the desired global minima.
To test the robustness
of the solution the algorithm should be run a number of times from different
starting configurations. If there is good agreement between these
results then this is a sign that there is a clear split of the data
into the reported blocks.
REFERENCES Panning
W (1982). 'Fitting blockmodels to data'. Social Networks
4, 81-101.
Glover F (1989).
Tabu Search - Part I. ORSA Journal on Computing 1, 190-206.
Glover F (1990).
Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
#