Skip to content

Conversation

@H-Dempsey
Copy link
Contributor

Hello,

Thank you so much for making this python implementation of BCT.

I wanted to check that I could make agreement_weighted give the same results as agreement, by passing weights that are all the same.
I could not accomplish this, and I think the issue is that the input ci needs to be an NxM matrix and not an MxN matrix.
I have made some small changes to the agreement_weighted function.

I have also written a test, so that this can be tracked in the future.
I have provided below MATLAB and Python codes that show the agreement_weighted and agreement outputs are the same, and that these results agree with MATLAB BCT.

Expand MATLAB demo code
% EXAMPLE 1 WITH EQUAL WEIGHTS
ci = [1, 1, 2, 2, 3;
      1, 2, 2, 3, 3;
      1, 1, 2, 3, 3]';
wts = ones(1, size(ci,2));

% Compute agreement matrices
D_agreement = agreement(ci);
D_weighted = agreement_weighted(ci, wts);

% Undo normalization and fill the diagonal with zeros to match D_agreement
D_weighted = D_weighted * size(ci,2);
D_weighted = D_weighted.*~eye(length(D_weighted));

% Print results
D_agreement
D_weighted

% EXAMPLE 2 WITH UNEQUAL WEIGHTS
ci = [1, 1, 2, 2, 3;
      1, 2, 2, 3, 3;
      1, 1, 2, 3, 3]';
wts = [0.2 0.1 0.7];

% Compute agreement matrices
D_agreement = agreement(ci);
D_weighted = agreement_weighted(ci, wts);

% Undo normalization and fill the diagonal with zeros to match D_agreement
D_weighted = D_weighted * size(ci,2);
D_weighted = D_weighted.*~eye(length(D_weighted));

% Print results
D_agreement
D_weighted

function D = agreement(ci,buffsz)
%AGREEMENT      Agreement matrix from clusters
%
%   D = AGREEMENT(CI) takes as input a set of vertex partitions CI of
%   dimensions [vertex x partition]. Each column in CI contains the
%   assignments of each vertex to a class/community/module. This function
%   aggregates the partitions in CI into a square [vertex x vertex]
%   agreement matrix D, whose elements indicate the number of times any two
%   vertices were assigned to the same class.
%
%   In the case that the number of nodes and partitions in CI is large
%   (greater than ~1000 nodes or greater than ~1000 partitions), the script
%   can be made faster by computing D in pieces. The optional input BUFFSZ
%   determines the size of each piece. Trial and error has found that
%   BUFFSZ ~ 150 works well.
%
%   Inputs,     CI,     set of (possibly) degenerate partitions
%               BUFFSZ, optional second argument to set buffer size
%
%   Outputs:    D,      agreement matrix
%
%   Richard Betzel, Indiana University, 2012

%modification history
%09.24.2012 - added loop for big N that makes the function slower but also
% prevents it from maxing out memory.

n = size(ci,2);

if nargin < 2
    buffsz = 1000;
end

if n <= buffsz
    
    ind = dummyvar(ci);
    D = ind*ind';
    
else
    
    a = 1:buffsz:n;
    b = buffsz:buffsz:n;
    
    if length(a) ~= length(b)
        b = [b, n];
    end
    
    x = [a' b'];
    nbuff = size(x,1);
    
    D = zeros(size(ci,1));
    for i = 1:nbuff
       y = ci(:,x(i,1):x(i,2));
       ind = dummyvar(y);
       D = D + ind*ind';
    end
    
end
D = D.*~eye(length(D));
end

function D = agreement_weighted(CI,Wts)
% AGREEMENT_WEIGHTED     Weights agreement matrix
%
%   D = AGREEMENT_WEIGHTED(CI,WTS) is identical to AGREEMENT, with the 
%   exception that each partitions contribution is weighted according to 
%   the corresponding scalar value stored in the vector WTS. As an example,
%   suppose CI contained partitions obtained using some heuristic for 
%   maximizing modularity. A possible choice for WTS might be the Q metric
%   (Newman's modularity score). Such a choice would add more weight to 
%   higher modularity partitions.
%
%   NOTE: Unlike AGREEMENT, this script does not have the input argument
%   BUFFSZ.
%
%   Inputs:     CI,     set of partitions
%               WTS,    relative weight of importance of each paritition
%
%   Outputs:    D,      weighted agreement matrix
%
%   Richard Betzel, Indiana University, 2013

Wts = Wts./sum(Wts);
[N,M] = size(CI);
D = zeros(N);
for i = 1:M
    d = dummyvar(CI(:,i));
    D = D + (d*d')*Wts(i);
end
end
Expand Python demo code
import numpy as np

def dummyvar(cis, return_sparse=False):
    """
    This is an efficient implementation of matlab's "dummyvar" command
    using sparse matrices.

    input: partitions, NxM array-like containing M partitions of N nodes
        into <=N distinct communities

    output: dummyvar, an NxR matrix containing R column variables (indicator
        variables) with N entries, where R is the total number of communities
        summed across each of the M partitions.

        i.e.
        r = sum((max(len(unique(partitions[i]))) for i in range(m)))
    """
    # num_rows is not affected by partition indexes
    n = np.size(cis, axis=0)
    m = np.size(cis, axis=1)
    r = np.sum((np.max(len(np.unique(cis[:, i])))) for i in range(m))
    nnz = np.prod(cis.shape)

    ix = np.argsort(cis, axis=0)
    # s_cis=np.sort(cis,axis=0)
    # FIXME use the sorted indices to sort by row efficiently
    s_cis = cis[ix][:, range(m), range(m)]

    mask = np.hstack((((True,),) * m, (s_cis[:-1, :] != s_cis[1:, :]).T))
    indptr, = np.where(mask.flat)
    indptr = np.append(indptr, nnz)

    import scipy.sparse as sp
    dv = sp.csc_matrix((np.repeat((1,), nnz), ix.T.flat, indptr), shape=(n, r))
    return dv.toarray()

def agreement_weighted(ci, wts):
    """
    D = AGREEMENT_WEIGHTED(CI,WTS) is identical to AGREEMENT, with the
    exception that each partitions contribution is weighted according to
    the corresponding scalar value stored in the vector WTS. As an example,
    suppose CI contained partitions obtained using some heuristic for
    maximizing modularity. A possible choice for WTS might be the Q metric
    (Newman's modularity score). Such a choice would add more weight to
    higher modularity partitions.

    NOTE: Unlike AGREEMENT, this script does not have the input argument
    BUFFSZ.

    Parameters
    ----------
    ci : NxM :obj:`numpy.ndarray` ###
        set of M (possibly degenerate) partitions of N nodes
    wts : Mx1 :obj:`numpy.ndarray`
        relative weight of each partition

    Returns
    -------
    D : NxN :obj:`numpy.ndarray`
        weighted agreement matrix
    """
    ci = np.array(ci)
    n, m = ci.shape ###
    wts = np.array(wts) / np.sum(wts)

    D = np.zeros((n, n))
    for i in range(m):
        d = dummyvar(ci[:, i].reshape(n, 1)) ###
        D += np.dot(d, d.T) * wts[i]
    return D

def agreement(ci, buffsz=1000):
    """
    Takes as input a set of vertex partitions CI of
    dimensions [vertex x partition]. Each column in CI contains the
    assignments of each vertex to a class/community/module. This function
    aggregates the partitions in CI into a square [vertex x vertex]
    agreement matrix D, whose elements indicate the number of times any two
    vertices were assigned to the same class.

    In the case that the number of nodes and partitions in CI is large
    (greater than ~1000 nodes or greater than ~1000 partitions), the script
    can be made faster by computing D in pieces. The optional input BUFFSZ
    determines the size of each piece. Trial and error has found that
    BUFFSZ ~ 150 works well.

    Parameters
    ----------
    ci : NxM :obj:`numpy.ndarray`
        set of M (possibly degenerate) partitions of N nodes
    buffsz : int | None
        sets buffer size. If not specified, defaults to 1000

    Returns
    -------
    D : NxN :obj:`numpy.ndarray`
        agreement matrix
    """
    ci = np.array(ci)
    n_nodes, n_partitions = ci.shape

    if n_partitions <= buffsz:  # Case 1: Use all partitions at once
        ind = dummyvar(ci)
        D = np.dot(ind, ind.T)
    else:  # Case 2: Add together results from subsets of partitions
        a = np.arange(0, n_partitions, buffsz)
        b = np.arange(buffsz, n_partitions, buffsz)
        if len(a) != len(b):
            b = np.append(b, n_partitions)
        D = np.zeros((n_nodes, n_nodes))
        for i, j in zip(a, b):
            y = ci[:, i:j]
            ind = dummyvar(y)
            D += np.dot(ind, ind.T)

    np.fill_diagonal(D, 0)
    return D

# Example 1 with equal weights.
ci = np.array([
    [1, 1, 2, 2, 3],
    [1, 2, 2, 3, 3],
    [1, 1, 2, 3, 3],
]).T
wts = np.ones(ci.shape[1])

D_agreement = agreement(ci)
D_weighted = agreement_weighted(ci, wts)

# Undo the normalisation and fill the diagonal with zeros to get the same result.
D_weighted = D_weighted * ci.shape[1]
np.fill_diagonal(D_weighted, 0)

print("\nAgreement\n")
print(D_agreement)
print("\nAgreement weighted\n")
print(D_weighted)

# Example 2 with unequal weights.
ci = np.array([
    [1, 1, 2, 2, 3],
    [1, 2, 2, 3, 3],
    [1, 1, 2, 3, 3],
]).T
wts = np.array([0.2,0.1,0.7])

D_agreement = agreement(ci)
D_weighted = agreement_weighted(ci, wts)

# Undo the normalisation and fill the diagonal with zeros to get the same result.
D_weighted = D_weighted * ci.shape[1]
np.fill_diagonal(D_weighted, 0)

print("\nAgreement\n")
print(D_agreement)
print("\nAgreement weighted\n")
print(D_weighted)
Expand output comparisons image

Let me know if I should add or change anything.

Harry

This tests whether agreement_weighted gives the same results as agreement when the weights are all the same.
@aestrivex aestrivex merged commit 75823f1 into aestrivex:master Nov 25, 2025
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants