Predictive modeling with social networks
Download
1 / 139

Predictive modeling with social networks - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Predictive modeling with social networks. Jennifer Neville & Foster Provost. Tutorial at ACM SIGKDD 2008. Tutorial AAAI 2008. Social network data is everywhere. Call networks Email networks Movie networks Coauthor networks Affiliation networks Friendship networks

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Predictive modeling with social networks' - tadhg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Predictive modeling with social networks
Predictive modeling with social networks

Jennifer Neville & Foster Provost

Tutorial at ACM SIGKDD 2008

Tutorial

AAAI 2008


Social network data is everywhere
Social network data is everywhere

  • Call networks

  • Email networks

  • Movie networks

  • Coauthor networks

  • Affiliation networks

  • Friendship networks

  • Organizational networks

http://images.businessweek.com/ss/06/09/ceo_socnet/source/1.htm


“…eMarketer projects that worldwide online social network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”


Modeling network data
Modeling network data network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

our focus today

  • Descriptive modeling

    • Social network analysis

    • Group/community detection

  • Predictive modeling

    • Link prediction

    • Attribute prediction


Goal of this tutorial
Goal of this tutorial network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

  • Our goal is not to give a comprehensive overview of relational learning algorithms (but we provide a long list of references and resources)

  • Our goal is to present

  • the main ideas that differentiate predictive inference and learning in (social) network data,

  • example techniques that embody these ideas,

  • results, from real applications if possible, and

  • references and resources where you can learn more

In three hours (less the break) we cannot hope to be comprehensive in our coverage of theory, techniques, or applications. We will present the most important concepts, illustrate with example techniques and applications, and provide a long list of additional resources.


The problem: network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” Attribute Prediction in Networked Data

  • To start, we’ll focus on the following inference problem:

  • For any node i, categorical variable yi, and value c, estimate p(yi = c|DK)

DK is everything known

about the network

?

Macskassy & Provost (JMLR 2007)

provide a broad treatment

for univariate networks


Let s start with a real world example

Let’s start with a real-world example network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”


Example network targeting hill et al 06
Example: network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” Network targeting (Hill et al. ‘06)

  • Define “Network Targeting” (NT)

    • cross between viral marketing and traditional targeted marketing

    • from simple to sophisticated…

      • construct variable(s) to represent whether the immediate network neighborhood contains existing customers

      • add social-network variables to targeting models, etc. (we’ll revisit)

    • then:

      • target individuals who are predicted (using the social network) to be the best prospects

      • simplest: target “network neighbors” of existing customers

      • this could expand “virally” through the network without any word-of-mouth advocacy, or could take advantage of it.

  • Example application:

    • Product: new communications service

    • Firm with long experience with targeted marketing

    • Sophisticated segmentation models based on data, experience, and intuition

      • e.g., demographic, geographic, loyalty data

      • e.g., intuition regarding the types of customers known or thought to have affinity for this type of service

Hill, Provost, and Volinsky. “Network-based Marketing: Identifying likely adopters via consumer networks. ” Statistical Science 21 (2) 256–276, 2006.


Sales rates are substantially higher for network neighbors hill et al 06
Sales rates are substantially higher network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” for network neighbors (Hill et al. ‘06)

Relative Sales Rates for Marketing Segments

(1.35%)

(0.83%)

(0.28%)

(0.11%)


Firms increasingly are collecting data on explicit social networks of consumers
Firms increasingly are collecting network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” data on explicit social networks of consumers


Other applications
Other applications network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

  • Fraud detection

  • Targeted marketing

  • Bibliometrics

  • Firm/industry classification

  • Web-page classification

  • Epidemiology

  • Movie industry predictions

  • Personalization

  • Patent analysis

  • Law enforcement

  • Counterterrorism


Outline of the tutorial part i
Outline of the tutorial: part I network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

  • The basics

    • contemporary examples of social network inference in action

    • what’s different about network data?

    • basic analysis framework

    • (simple) predictive inference with univariate networks

      • disjoint inference

      • network linkage can provide substantial power for inference, if techniques can take advantage of relational autocorrelation

    • inductive inference (learning) in network data

      • disjoint learning – models learn correlation among attributes of labeled neighbors in the network

Note on terminology: In this tutorial, we use the term “inference” to refer to the making of predictions for variables’ unknown values, typically using a model of some sort. We use “learning” to denote the building of the model from data (inductive inference). Generally we use the terminology common in statistical machine learning.Note on acronyms: see reference guide at end of tutorial


Outline of the tutorial part ii
Outline of the tutorial: part II network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

  • Moving beyond the basics

    • collective inference

      • network structure alone can provide substantial power for inference, if techniques can propagate relational autocorrelation

      • inferred covariates can influence each other

    • collective learning

      • learning using both the labeled and unlabeled parts of the network, requires collective inference

    • social/data network vs. network of statistical dependencies

    • throughout:

      • example learning techniques

      • example inference techniques

      • example applications

  • Additional topics (time permitting)

    • methodology, evaluation, potential pathologies, understanding sources of error, other issues


So what s different about networked data

So, what’s different about network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” networked data?


Data graph
Data graph network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”


Unique characteristics of networked data
Unique characteristics of network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.” networked data

  • Single data graph

    • Partially labeled

    • Widely varying link structure

    • Often heterogeneous object and link types

    • From learning perspective: graph contains both training data and application/testing data

  • Attribute dependencies

    • Homophily; autocorrelation among variables of linked entities

    • Correlation among attributes of linked entities

    • Correlations between attribute values and link structure

Suggest key techniques:

guilt-by-association

relational learning

collective inference


Relational autocorrelation
Relational autocorrelation network ad spending will grow from $1.2 billion in 2007 to $2.2 billion in 2008, 82%.”

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

  • Correlation between the values of the same variable on related objects

    • Related instance pairs:

    • Dependence between pairs of values of X:

High autocorrelation

Low autocorrelation



Disjoint inference no learning
Disjoint inference (no learning) inference?

Use links to labeled nodes

(i.e., guilt by association)


Is guilt by association justified theoretically
Is guilt-by-association justified theoretically? inference?

Thanks to (McPherson, et al., 2001)

  • Birds of a feather, flock together

    • – attributed to Robert Burton (1577-1640)

  • (People) love those who are like themselves

    • -- Aristotle, Rhetoric and Nichomachean Ethics

  • Similarity begets friendship

    • -- Plato, Phaedrus

  • Hanging out with a bad crowd will get you into trouble

    • -- Foster’s Mom


Is guilt by association justified theoretically1
Is guilt-by-association justified theoretically? inference?

  • Homophily

    • fundamental concept underlying social theories

      • (e.g., Blau 1977)

    • one of the first features noticed by analysts of social network structure

      • antecedents to SNA research from 1920’s (Freeman 1996)

    • fundamental basis for links of many types in social networks (McPherson, et al., Annu. Rev. Soc. 2001)

      • Patterns of homophily:

      • remarkably robust across widely varying types of relations

      • tend to get stronger as more relationships exist

    • Now being considered in mathematical analysis of networks (“assortativity”, e.g., Newman (2003))

  • Does it apply to non-social networks?


Relational autocorrelation is ubiquitous

Biology inference?

Functions of proteins located in together in cells (Neville & Jensen ‘02)

Tuberculosis infection among people in close contact (Getoor et al ‘01)

Business

Industry categorization of corporations that share common boards members (Neville & Jensen ‘00)

Industry categorization of corporations that co-occur in news stories (Bernstein et al ‘03)

Citation analysis

Topics of coreferent scientific papers (Taskar et al ‘01, Neville & Jensen ‘03)

Marketing

Product/service adoption among communicating customers (Domingos & Richardson ‘01, Hill et al ‘06)

Fraud detection

Fraud status of cellular customers who call common numbers (Fawcett & Provost ‘97, Cortes et al ‘01)

Fraud status of brokers who work at the same branch (Neville & Jensen ‘05)

Movies

Box-office receipts of movies made by the same studio (Jensen & Neville ‘02)

Web

Topics of hyperlinked web pages (Chakrabarti et al ‘98, Taskar et al ‘02)

Relational autocorrelation is ubiquitous



Disjoint inference
Disjoint inference inference?

Relative Sales Rates for Marketing Segments

(1.35%)

(0.83%)

(0.28%)

(0.11%)

(Hill et al. ‘06)




Traditional learning and classification
Traditional learning and classification inference?

Non-relational classif.

  • Logistic regression

  • Neural networks

  • Naïve Bayes

  • Classification trees

  • SVMs

yi

yj

xi

xj

home location, main calling location, min of use, …

NYC,NYC,4350,3,5,yes,no,1,0,0,1,0,2,3,0,1,1,0,0,0,..

NYC,BOS,1320,2,no,no,1,0,0,0,0,1,5,1,7,6,7,0,0,1,…

BOS,BOS,6543,5,no,no,0,1,1,1,0,0,0,0,0,0,4,3,0,4,..

...

  • Methods:


Network learning and classification
Network learning and classification inference?

yi

yj

Relations

xi

xj

  • Methods:

Non-relational classif.

Network classification

  • Structural logistic regression

  • Relational naïve Bayes

  • Relational probability trees

  • Relational SVMs

home location, main calling location, min of use, …

NYC,NYC,4350,3,5,yes,no,1,0,0,1,0,2,3,0,1,1,0,0,0,..

NYC,BOS,1320,2,no,no,1,0,0,0,0,1,5,1,7,6,7,0,0,1,…

BOS,BOS,6543,5,no,no,0,1,1,1,0,0,0,0,0,0,4,3,0,4,..

...


Relational learning
Relational learning inference?

  • Learning where data cannot be represented as a single relation/table of independently distributed entities, without losing important information

  • Data may be represented as:

    • a multi-table relational database, or

    • a heterogeneous, attributed graph, or

    • a first-order logic knowledge base

  • There is a huge literature on relational learning and it would be impossible to do justice to it in the short amount of time we have

  • For additional information, see:

    • Pointers/bibliography on tutorial page

    • International Conference on Inductive Logic Programming

    • Cussens & Kersting’s ICML’04 tutorial: Probabilistic Logic Learning

    • Getoor’s ICML’06/ECML’07 tutorials: Statistical Relational Learning

    • Domingos’s KDD’07/ICML’07 tutorials: Statistical Modeling of Relational Data

    • Literature review in Macskassy & Provost JMLR’07


Disjoint learning part i
Disjoint learning: part I inference?

Create (aggregate) features of labeled neighbors

(Perlich & Provost KDD’03) treat aggregation

and relational learning feature construction


Disjoint learning of relational models
Disjoint learning of relational models inference?

+

+

+

+

+

+

+

+

+

+

+

Y

N

Y

N

Y

N

Create aggregate features of relational information

Learn (adapted) flat model

Consider local relational neighborhood around instances


Social network features can be created for flat models
Social network features can be created for “flat” models

  • where xG is a (vector of) network-based feature(s)

  • Example applications:

  • Fraud detection

    • construct variables representing connection to known fraudulent accounts (Fawcett & Provost ‘97)

    • or the similarity of immediate network to known fraudulent accounts (Cortes et al. ‘01; Hill et al. ‘06b)

  • Marketing(Hill et al. ’06a)

  • Creation of SN features can be (more or less) systematic: (Popescul & Ungar ’03; Perlich & Provost ’03,’06; Karamon et al. ’07,’08; Gallagher & Eliassi-Rad ‘08)

  • Compare with developing kernels for relational data (e.g., Gartner’03)

  • Also: Ideas from hypertext classification extend to SN modeling:

    • hypertext classification has text + graph structure

    • construct variables representing (aggregations of) the classes of linked pages/documents (Chakrabarti et al. ‘98; Lu & Getoor ‘03)

    • formulate as regularization/kernel combination (e.g., Zhang et al. KDD’06)

    • General survey of web page classification: (Qi & Davison, 2008)


Example structural logistic regression popescul et al 03
Example Structural logistic regression(Popescul et al. ‘03)

  • Features

    • Based on boolean first-order logic features used in inductive logic programming

    • Top-down search of refinement graph

    • Includes additional database aggregators that result in scalar values (e.g. count, max)

  • Model

    • Logistic regression

    • Two-phase feature selection process with AIC/BIC


Example relational probability trees neville et al 03
Example Relational probability trees(Neville et al. ‘03)

  • Features

    • Uses set of aggregators to construct features (e.g., Size, Average, Count, Proportion)

    • Exhaustive search within a user-defined space (e.g., <3 links away)

  • Model

    • Decision trees with probability estimates at leaves

    • Pre-pruning based on chi-square feature scores

    • Randomization tests for accurate feature selection (more on this later)



Learning patterns of labeled nodes
Learning patterns of labeled nodes

  • Features can be constructed that represent “guilt” of a node’s neighbors:

  • where xG is a (vector of) network-based feature(s)

  • Example application:

  • Marketing(Hill et al. ’06a)



Lift in sales with network based features
Lift in sales with network-based features

traditional variables

traditional + network


Similar results for predicting customer attrition churn
Similar results for predicting customer attrition/churn

Thanks to KXEN

see also (Dasgupta et al. EDBT’08) & (Birke ’08) on social networks & telecom churn



Disjoint learning part ii
Disjoint learning: part II

  • Use node identifiers in features connections to specific individuals can be telling


Side note not just for networked data ids are important for any data in a multi table rdb

challenge: aggregation over 1-to-n relationships

Side note: not just for “networked data” – IDs are important for any data in a multi-table RDB


Towards a theory of aggregation perlich provost mlj 06 a recursive bayesian perspective
Towards a theory of aggregation (Perlich & Provost MLJ’06): A (recursive) Bayesian perspective

  • Traditional (naïve) Bayesian Classification:

  • P(c|X)=P(X|c)*P(c)/P(X) Bayes’ Rule

  • P(X|c)= P(xi|c) Assuming conditional independence

  • P(xi|c) & P(c) Estimated from the training data

  • Linked Data:

    xi might be an object identifier (e.g. SSN) => P(xi|c) cannot be estimated

    Let WI be a set of k objects linked to xi => P(xi|c) ~ P(linked-to-Wi|c)

    P(Wi|c) ~ P(O|c) Assume O is drawn independently

    P(Wi|c) ~ ( P(oj |c)) Assuming conditional independence


How to incorporate identifiers of related objects in a nutshell
How to incorporate identifiers of related objects (in a nutshell)

  • Estimate from known data:

    • class-conditional distributions of related identifiers (say D+ & D-)

    • can be done, for example, assuming class-conditional independence in analogy to Naïve Bayes

    • save these as “meta-data” for use with particular cases

  • Any particular case C has its own “distribution” of related identifiers (say Dc)

  • Create features

    • d(Dc,D+ ), d(Dc, D- ), (d(Dc, D+ ) – d(Dc, D-))

    • where d is a distance metric between distributions

  • Add these features to target-node description(s) for learning/estimation

  • Main idea:

  • “Is the distribution of nodes to which this case is linked

  • similar to that of a <whatever>?”


Density estimation for aggregation
Density estimation for aggregation nutshell)

1: Class-conditional distributions

2: Case linkage distributions:

?

4: Extended feature vector:

3: L2 distances for C1:

L2(C1, DClass 1) = 1.125

L2(C1, DClass 0) = 0.08


A snippet from an actual network including bad guys
A snippet from an actual network including “ nutshell)bad guys”

Dialed-digit detector (Fawcett & P., 1997)

Communities of Interest (Cortes et al. 2001)

  • nodes are people

  • links are communications

  • red nodes are fraudsters

these two bad guys are well connected


Classify buyers of most common title from a korean e book retailer
Classify buyers of most-common title from a Korean E-Book retailer

Estimate whether or not customer will purchase

the most-popular e-book: Accuracy=0.98 (AUC=0.96)

Class-conditional distributions across identifiers of 10 other popular books

Watch for more results later




Collective inference
Collective inference perform

Use links among unlabeled nodes


Collective inference models
Collective inference models perform

A particularly simple guilt-by-association model is that a value’s probability is the average of its probabilities at the neighboring nodes

?

  • Gaussian random field (Besag 1975; Zhu et al. 2003)

  • “Relational neighbor” classifier - wvRN (Macskassy & P. 2003)


Model partially labeled network with a random field
Model partially-labeled network perform with a random field

  • Treat network as a random field

    • a probability measure over a set of random variables {X1, …, Xn} that gives non-zero probability to any configuration of values for all the variables.

  • Convenient for modeling network data:

    • A Markov random field satisfies

    • where Ni is the set of neighbors of Xi under some definition of neighbor.

    • in other words, the probability of a variable taking on a value depends only on its neighbors

    • probability of a configuration x of values for variables X the normalized product of the “potentials” of the states of the k maximal cliques in the network:

(Dobrushin, 1968; Besag, 1974; Geman and Geman, 1984)


Markov random fields
Markov random fields perform

  • Random fields have a long history for modeling regular grid data

    • in statistical physics, spatial statistics, image analysis

    • see Besag (1974)

  • Besag (1975) applied such methods to what we would call networked data (“non-lattice data”)

  • Some notable contemporary example applications:

    • web-page classification (Chakrabarti et al. 1998)

    • viral marketing (Domingos & Richardson 2001, R&D 2002)

    • eBay auction fraud (Pandit et al. 2007)


Collective inference cartoon
Collective inference cartoon perform

  • relaxation labeling – repeatedly estimate class distributions on all unknowns, based on current estimates

?

?

?

?

?

?

?

?

?

?


Collective inference cartoon1
Collective inference cartoon perform

  • relaxation labeling – repeatedly estimate class distributions on all unknowns, based on current estimates

?

?

?

?

?

?

?

?

?

?


Collective inference cartoon2
Collective inference cartoon perform

  • relaxation labeling – repeatedly estimate class distributions on all unknowns, based on current estimates

?

?

?

?

?

?

?

?

?

?


Collective inference cartoon3
Collective inference cartoon perform

  • relaxation labeling – repeatedly estimate class distributions on all unknowns, based on current estimates

?

?

?

?

?

?

?

?

?

?


Collective inference cartoon4
Collective inference cartoon perform

  • relaxation labeling – repeatedly estimate class distributions on all unknowns, based on current estimates


Various techniques for collective inference see also jensen et al kdd 04
Various techniques for collective inference perform (see also Jensen et al. KDD’04)

  • MCMC, e.g., Gibbs sampling (Geman & Geman 1984)

  • Iterative classification (Besag 1986; …)

  • Relaxation labeling (Rosenfeld et al. 1976; …)

  • Loopy belief propagation (Pearl 1988)

  • Graph-cut methods (Greig et al. 1989; …)

  • Either:

    • estimate the maximum a posteriori joint probability distribution of all free parameters

  • or

    • estimate the marginal distributions of some or all free parameters simultaneously (or some related likelihood-based scoring)

  • or

    • just perform a heuristic procedure to reach a consistent state.



  • Using the relational neighbor classifier and collective inference we can ask

    Using the relational neighbor classifier and collective inference, we can ask:

    How much “information” is in the network structure alone?


    Network classification case study
    Network classification case study inference, we can ask:

    • 12 data sets from 4 domains(previously used in ML research)

      • IMDB (Internet Movie Database) (e.g., Jensen & Neville, 2002)

      • Cora (e.g., Taskar et al., 2001) [McCallum et al., 2000]

      • WebKB [Craven et al., 1998]

        • CS Depts of Texas, Wisconsin, Washington, Cornell

        • multiclass & binary (student page)

        • “cocitation” links

      • Industry Classification [Bernstein et al., 2003]

        • yahoo data, prnewswire data

    • Homogeneous nodes & links

      • one type, different classes/subtypes

    • Univariate classification

      • only information: structure of network and (some) class labels

      • guilt-by-association (wvRN) with collective inference

      • plus several models

      • that “learn” relational patterns

    Macskassy, S. and F. P. "Classification in Networked Data: A toolkit and a univariate case study."Journal of Machine Learning Research 2007.


    Local models to use for collective inference see macskassy provost jmlr 07
    Local models to use for collective inference inference, we can ask:(see Macskassy & Provost JMLR’07)

    • network-only Bayesian classifier nBC

      • inspired by (Charabarti et al. 1998)

      • multinomial naïve Bayes on the neighboring class labels

    • network-only link-based classifier

      • inspired by (Lu & Getoor 2003)

      • logistic regression based on a node’s “distribution” of neighboring class labels, DN(vi) (multinomial over classes)

    • relational-neighbor classifier (weighted voting)

      • (Macskassy & Provost 2003, 2007)

    • relational-neighbor classifier (class distribution)

      • Inspired by (Perlich & Provost 2003)


    How much information is in the network structure
    How much information is in inference, we can ask:the network structure?

    • Labeling 90% of nodes

    • Classifying remaining 10%

    • Averaging over 10 runs


    Machine learning research papers from cora dataset
    Machine learning research papers inference, we can ask:(from CoRA dataset)


    Rbn vs wvrn macskassy provost 07
    RBN vs wvRN inference, we can ask:(Macskassy & Provost ‘07)


    Using identifiers perlich provost 06
    Using identifiers inference, we can ask:(Perlich & Provost ‘06)

    (compare: Hill & P. “The Myth of the Double-Blind Review”, 2003)


    A counter-terrorism application… inference, we can ask:

    Poor concentration for primary-data only (iteration 0)

    High concentration after one secondary-access phase (iteration 1)

    rightmost people are

    completely

    unknown, therefore

    ranking is uniform

    (Macskassy & P., Intl. Conf. on Intel. Analysis 2005)

    5046 is moderately noisy:

    ¼ of “known” bad guys were mislabeled

    most suspicious

    • high concentration of bad guys at “top” of suspicion ranking

    • gets better with increased secondary-data access


    Characteristics of network data
    Characteristics of network data inference, we can ask:

    • Single data graph

      • Partially labeled

      • Widely varying link structure

      • Often heterogeneous object and link types

    • Attribute dependencies

      • Homophily, autocorrelation among class labels

      • Correlation among attributes of related entities

      • Correlations between attribute values and link structure


    Networks graphs
    Networks ≠ graphs? inference, we can ask:

    • Networked data can be much more complex than just sets of (labeled) vertices and edges.

      • Vertices and edges can be heterogeneous

      • Vertices and edges can have various attribute information associated with them

    • Various methods for learning models that take advantage of both autocorrelation and attribute dependencies

      • Probabilistic relational models (RBNs, RMNs, AMNs, RDNs, …)

      • Probabilistic logic models (BLPs, MLNs, …)


    Models of network data2
    Models of network data inference, we can ask:


    Disjoint learning part iii
    Disjoint learning: part III inference, we can ask:

    Assume training data are fully labeled, model dependencies among linked entities


    Relational learning1
    Relational learning inference, we can ask:

    • Let’s consider briefly three approaches

      • Model with inductive logic programming (ILP)

      • Model with probabilistic relational model (graphical model+RDB)

      • Model with probabilistic logic model (ILP+probabilities)


    First order logic modeling
    First-order logic modeling inference, we can ask:

    • The field of Inductive Logic Programming has extensively studied modeling data in first-order logic

    • Although it has been changing, traditionally ILP did not focus on representing uncertainty

    • First-order logic for statistical modeling of network data?

      • a strength is its ability to represent and facilitate the search for complex and deep patterns in the network

      • a weakness is its relative lack of support for aggregations across nodes (beyond existence)

      • more on this in a minute…

    • in the usual use of first-order logic, each ground atom either is true or is not true (cf., a Herbrand interpretation)

    • …one of the reasons for the modern rubric “statistical relational learning”


    Network data in first order logic
    Network data in first-order logic inference, we can ask:

    Bill

    coworkers

    Amit

    married

    Candice

    friends

    • broker(Amit), broker(Bill), broker(Candice), …

    • works_for(Amit, Bigbank), works_for(Bill, E_broker), works_for(Candice, Bigbank), …

    • married(Candice, Bill)

    • smokes(Amit), smokes(Candice), …

    • works_for(X,F) & works_for(Y,F) -> coworkers(X,Y)

    • smokes(X) & smokes(Y) & coworkers(X,Y) -> friends(X,Y)

    What’s the problem with using FOL for our task?


    Probabilistic graphical models
    Probabilistic graphical models inference, we can ask:

    • Probabilistic graphical models (PGMs) are convenient methods for representing probability distributions across a set of variables.

      • Bayesian networks (BNs), Markov networks (MNs), Dependency networks (DNs)

      • See Pearl (1988), Heckerman et al. (2000)

    • Typically BNs, MNs, DNs are used to represent a set of random variables describing independent instances.

      • For example, the probabilistic dependencies among the descriptive features of a consumer—the same for different consumers


    Example a bayesian network modeling consumer reaction to new service
    Example inference, we can ask:A Bayesian network modeling consumer reaction to new service

    Technical

    sophistication

    Positive reaction

    before trying service

    lead user

    characteristics

    income

    Quality

    sensitivity

    Positive reaction

    after trying service

    Amount

    of use


    Probabilistic relational models
    Probabilistic relational models inference, we can ask:

    • The term “relational” recently has been used to distinguish the use of probabilistic graphical models to represent variables across a set of dependent, multivariate instances.

    • These methods model the full joint distribution over the attribute values in a network using a probabilistic graphical model (e.g., BN, MN)

      • For example, the dependencies between the descriptive features of friends in a social network

      • We saw a “relational” Markov network earlier when we discussed Markov random fields for univariate network data

        • although the usage is not consistent, “Markov random field” often is used for a MN over multiple instances of the “same” variable

    • In these probabilistic relational models, there are dependencies within instances and dependencies among instances

    • Key ideas:

      • Learn from a single network by tying parameters across instances of same type

      • Use aggregations to deal with heterogeneous network structure


    Modeling the joint network distribution
    Modeling the joint “network” distribution inference, we can ask:

    • Relational Bayesian networks

      • Extend Bayes nets to network settings (Friedman et al. ‘99, Getoor et al. ‘01)

      • Efficient closed form parameter estimation, but acyclicity constraint limits representation of autocorrelation dependencies and makes application of guilt-by-association techniques difficult

    • Relational Markov networks

      • Extension of Markov networks (Taskar et al ‘02)

      • No acyclicity constraint but feature selection is computationally intensive because parameter estimation requires approximate inference

      • Associative Markov networks are a restricted version designed for guilt-by-association settings, for which there are efficient inference algorithms (Taskar et al. ‘04)

    • Relational dependency networks

      • Extension of dependency networks (Neville & Jensen ‘04)

      • No acyclicity constraint, efficient feature selection, but model is an approximation of the full joint and accuracy depends on size of training set


    Example: inference, we can ask:Can we estimate the likelihood that a stock broker is/will be engaged in activity that violates securities regulations?


    Detecting bad brokers for neville et al kdd 05
    Detecting “bad brokers” for inference, we can ask:(Neville et al. KDD‘05)

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    • NASD (now FINRA) is the largest private-sector securities regulator

    • NASD’s mission includes preventing and discovering misconduct among brokers (e.g., fraud)

    • Current approach: Hand-crafted rules that target brokers with a history of misconduct (HRB)

    • Task: Use relational learning techniques to automatically identify brokers likely to engage in misconduct based on network patterns

    Broker

    Disclosure

    Bad* Broker

    Branch

    +

    *”Bad” = having violated securities regulations


    Rpt identified additional brokers to target neville et al kdd 05
    RPT identified additional brokers to target inference, we can ask:(Neville et al. KDD’05)

    • “One broker I was highly confident in ranking as 5…

    • Not only did I have the pleasure of meeting him at a shady warehouse location, I also negotiated his bar from the industry...

    • This person actually used investors' funds to pay for personal expenses including his trip to attend a NASD compliance conference!

    • …If the model predicted this person, it would be right on target.”

    • Informal examiner feedback


    Relational dependency networks neville jensen jmlr 07
    Relational dependency networks inference, we can ask:(Neville & Jensen JMLR‘07)

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>3

    CoWorkerCount(IsFraud)>3

    CoWorkerCount(IsFraud)>3

    DisclosureCount(Yr<1995)>3

    DisclosureCount(Yr<1995)>3

    DisclosureCount(Yr<1995)>3

    +

    +

    Broker3

    Broker2

    Broker1

    +

    Has Business1

    Has Business3

    Has Business2

    +

    Is Fraud2

    Is Fraud3

    Is Fraud1

    +

    DisclosureCount(Yr<2000)>0

    DisclosureCount(Yr<2000)>0

    DisclosureCount(Yr<2000)>0

    CoWorkerCount(IsFraud)>0

    CoWorkerCount(IsFraud)>0

    CoWorkerCount(IsFraud)>0

    DisclosureAvg(Yr)>1997

    DisclosureAvg(Yr)>1997

    DisclosureAvg(Yr)>1997

    On Watch1

    On Watch2

    On Watch3

    Branch1

    +

    Region1

    +

    +

    Area1

    +

    DisclosureMax(Yr)>1996

    DisclosureMax(Yr)>1996

    DisclosureMax(Yr)>1996

    +

    Disclosure2

    Disclosure3

    Disclosure1

    Year1

    Year3

    Year2

    Type1

    Type3

    Type2

    Learn statistical dependencies among variables

    Construct “local” dependency network

    Broker

    Disclosure

    Branch

    Has Business

    Region

    Year

    Is Fraud

    Unroll over particular

    data network for

    (collective) inference

    On Watch

    Area

    Type


    Data on brokers branches disclosures
    Data on brokers, branches, disclosures inference, we can ask:

    Broker

    Disclosure

    Branch

    Has Business

    Region

    Year

    Is Fraud

    On Watch

    Area

    Type


    Learned rdn for broker variables neville jensen jmlr 07
    Learned RDN for broker variables inference, we can ask:(Neville & Jensen JMLR’07)

    Broker

    Disclosure

    Branch

    Has Business

    Region

    Year

    Is Fraud

    On Watch

    Area

    Type

    note: needs to be “unrolled” across network


    Important concept
    Important concept! inference, we can ask:

    • The network of statistical dependencies does not necessarily correspond to the data network

    • Example on next three slides…


    Recall broker dependency network
    Recall: broker dependency network inference, we can ask:

    Broker

    Disclosure

    Branch

    Has Business

    Region

    Year

    Is Fraud

    On Watch

    Area

    Type

    note: this dependency network needs to be “unrolled” across the data network


    Broker data network
    Broker data network inference, we can ask:

    Disclosure

    Statistical dependencies between brokers “jump across” branches; similarly for disclosures

    Broker

    Bad* Broker

    +

    Branch

    +

    +

    *”Bad” = having violated

    securities regulations

    +

    +

    +

    +

    +

    +

    +

    +


    Model unrolled on tiny data network
    Model unrolled on (tiny) data network inference, we can ask:

    Broker1

    Broker2

    Broker3

    Has Business1

    Has Business3

    Has Business2

    Is Fraud1

    Is Fraud2

    Is Fraud3

    On Watch1

    On Watch2

    On Watch3

    Branch1

    Region1

    Area1

    Disclosure1

    Disclosure2

    Disclosure3

    Year3

    Year1

    Year2

    Type3

    Type1

    Type2

    (three brokers, one branch)


    Putting it all together relational dependency networks
    Putting it all together: inference, we can ask:Relational dependency networks

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>1

    CoWorkerCount(IsFraud)>3

    CoWorkerCount(IsFraud)>3

    CoWorkerCount(IsFraud)>3

    DisclosureCount(Yr<1995)>3

    DisclosureCount(Yr<1995)>3

    DisclosureCount(Yr<1995)>3

    +

    +

    Broker3

    Broker2

    Broker1

    +

    Has Business1

    Has Business3

    Has Business2

    +

    Is Fraud2

    Is Fraud3

    Is Fraud1

    +

    DisclosureCount(Yr<2000)>0

    DisclosureCount(Yr<2000)>0

    DisclosureCount(Yr<2000)>0

    CoWorkerCount(IsFraud)>0

    CoWorkerCount(IsFraud)>0

    CoWorkerCount(IsFraud)>0

    DisclosureAvg(Yr)>1997

    DisclosureAvg(Yr)>1997

    DisclosureAvg(Yr)>1997

    On Watch1

    On Watch2

    On Watch3

    Branch1

    +

    Region1

    +

    +

    Area1

    +

    DisclosureMax(Yr)>1996

    DisclosureMax(Yr)>1996

    DisclosureMax(Yr)>1996

    +

    Disclosure2

    Disclosure3

    Disclosure1

    Year1

    Year3

    Year2

    Type1

    Type3

    Type2

    Learn statistical dependencies among variables

    Construct “local” dependency network

    Broker

    Disclosure

    Branch

    Has Business

    Region

    Year

    Is Fraud

    Unroll over particular

    data network for

    (collective) inference

    On Watch

    Area

    Type


    Combining first order logic and probabilistic graphical models
    Combining first-order logic and probabilistic graphical models

    • Recently there have been efforts to combine FOL and probabilistic graphical models

      • e.g., Bayesian logic programs (Kersting and de Raedt ‘01), Markov logic networks (Richardson & Domingos MLJ’06)

      • and see discussion & citations in (Richardson & Domingos ‘06)

    • For example: Markov logic networks

      • A template for constructing Markov networks

        • Therefore, a model of the joint distribution over a set of variables

      • A first-order knowledge base with a weight for each formula

    • Advantages:

      • Markov network gives sound probabilistic foundation

      • First-order logic allows compact representation of large networks and a wide variety of domain background knowledge


    Markov logic networks richardson domingos mlj 06
    Markov logic networks models(Richardson & Domingos MLJ’06)

    • A Markov Logic Network (MLN) is a set of pairs (F, w):

      • F is a formula in FOL

      • w is a real number

    • Together with a finite set of constants, it defines a Markov network with:

      • One node for each grounding of each predicate in the MLN

      • One feature for each grounding of each formula F in the MLN, with its corresponding weight w

    See Domingos’ KDD’07 tutorialStatistical Modeling of Relational Data for more details


    Mln details
    MLN details models

    Friends(A,B)

    1.1

    1.1

    1.1

    Friends(A,A)

    Smokes(A)

    Smokes(B)

    Friends(B,B)

    1.1

    1.5

    1.5

    Cancer(A)

    Cancer(B)

    Friends(B,A)

    Two constants:

    Anna (A) and Bob (B)

    wi: weight of formula i

    ni(x): # true groundings of formula i in x


    Why learning collective models improves classification jensen et al kdd 04
    Why learning collective models improves classification models(Jensen et al. KDD’04)

    • Why learn a joint model of class labels?

      • Could use correlation between class labels and observed attributes on related instances instead

      • But modeling correlation among unobserved class labels is a low-variance way of reducing model bias

      • Collective inference achieves a large decrease in bias at the cost of a minimal increase in variance


    Comparing collective inference models xiang neville sna kdd 08
    Comparing collective inference models models(Xiang & Neville SNA-KDD’08)

    Learning helps when autocorrelation is low and there are other attributes dependencies

    Learning helps when linkage is low and labeling is plentiful


    Global vs local autocorrelation
    Global vs. local autocorrelation models

    • MLN/RDN/RMN:

      • exploit global autocorrelation

      • learning implicitly assumes training and test set are disjoint

      • assumes autocorrelation is stationary throughout graph

    • ACORA with identifiers (Perlich & Provost MLJ’06)

      • exploits local autocorrelation

      • relies on overlap between training and test sets

      • need sufficient data locally to estimate

    • What about a combination of the two?

    • (open question)


    Autocorrelation is non stationary
    Autocorrelation is non-stationary models

    Cora: topics in coauthor graph

    IMDb: receipts in codirector graph


    Shrinkage models angin neville sna kdd 08
    Shrinkage models models(Angin & Neville SNA-KDD ‘08)



    For targeting consumers, collective inference gives additional improvement, especially for non-network neighbors(Hill et al. ‘07)

    Predictive Performance

    (Area under ROC curve/

    Mann-Whitney Wilcoxon stat)

    Predictive Performance

    (Area under ROC curve/

    Mann-Whitney Wilcoxon stat)

    * with network sampling and pruning


    Models of network data3
    Models of network data additional improvement, especially for non-network neighbors


    Collective learning
    Collective learning additional improvement, especially for non-network neighbors

    Consider links among unlabeled entities during learning


    Semi supervised learning
    Semi-supervised learning additional improvement, especially for non-network neighbors

    • To date network modeling techniques have focused on

      • exploiting links among unlabeled entities (i.e., collective inference)

      • exploiting links between unlabeled and labeled for inference (e.g., identifiers)

    • Can we take into account links between unlabeled and labeled during learning?

      • Large body of related work on semi-supervised and transductive learning but this has dealt primarily with i.i.d. data

      • Exceptions:

        • PRMs w/EM (Taskar et al. ‘01)

        • Relational Gaussian Processes (Chu et al. ‘06)

      • Open: there has been no systematic comparison of models using different representations for learning and inference


    Recall rbn vs wvrn
    Recall: RBN vs wvRN additional improvement, especially for non-network neighbors

    RBN+EM


    Pseudolikelihood em xiang neville kdd sna 08
    Pseudolikelihood-EM additional improvement, especially for non-network neighbors(Xiang & Neville KDD-SNA ‘08)

    • General approach to learning arbitrary autocorrelation dependencies in within-network domains

    • Combines RDN pseudolikelihood approach with mean-field approximate inference to learn a joint model of labeled and unlabeled instances

    • Algorithm

    • 1. Learn an initial disjoint local classifier (with pseudolikelihood estimation) using only labeled instances

    • 2. For each EM iteration:

      • E-step: apply current local classifier to unlabeled data with collective inference, use current expected values for neighboring labels; obtain new probability estimates for unlabeled instances;

      • M-step: re-train local classifier with updated label probabilities on unlabeled instances.


    Comparison with other network models
    Comparison with other network models additional improvement, especially for non-network neighbors

    Collective learning improves performance when:(1) labeling is moderate, or (2) when labels are clustered in the network


    Or when
    Or when… additional improvement, especially for non-network neighbors

    Learning helps when autocorrelation is low and there are other attributes dependencies

    Learning helps when linkage is low and labeling is plentiful


    Models of network data4
    Models of network data additional improvement, especially for non-network neighbors


    Collective learning disjoint inference
    Collective learning, disjoint inference additional improvement, especially for non-network neighbors

    • Use unlabeled data for learning, but not for inference

      • Open: No current methods do this

      • However, disjoint inference is much more efficient

      • May want to use unlabeled data to learn disjoint models (e.g., infer more labels to improve use of identifiers)


    Recap
    Recap additional improvement, especially for non-network neighbors


    Potential pathologies
    Potential pathologies additional improvement, especially for non-network neighbors

    • Statistical tests assume i.i.d data…

    • Networks have a combination of widely varying linkage and autocorrelation

    • …which can complicate application of conventional statistical tests

      • Naïve hypothesis testing can bias feature selection (Jensen & Neville ICML’02, Jensen et al. ICML’03)

      • Naïve sampling methods can bias evaluation (Jensen & Neville ILP’03)


    Bias in feature selection jensen neville icml 02
    Bias in feature selection additional improvement, especially for non-network neighbors(Jensen & Neville ICML’02)

    • Relational classifiers can be biased toward features on some classes of objects (e.g., movie studios)

    • How?

      • Autocorrelation and linkage reduceeffective sample size

      • Lower effective sample size increases variance of estimated feature scores

      • Higher variance increases likelihood that features will be picked by chance alone

      • Can also affect ordering among features deemed significant because impact varies among features (based on linkage)


    Adjusting for bias randomization tests
    Adjusting for bias: additional improvement, especially for non-network neighborsRandomization tests

    • Randomization tests result in significantly smaller models (Neville et al KDD’03)

      • Attribute values are randomized prior to feature score calculation

      • Empirical sampling distribution approximates the distribution expected under the null hypothesis, given the linkage and autocorrelation


    Metholodogy
    Metholodogy additional improvement, especially for non-network neighbors

    • Within-network classification naturally implies dependent training and test sets

    • How to evaluate models?

      • Macskassy & Provost (JMLR’07) randomly choose labeled sets of varying proportions (e.g., 10%. 20%) and then test on remaining unlabeled nodes

      • Xiang & Neville (KDD-SNA’08) choose nodes to label in various ways (e.g., random, degree, subgraph)

      • See (Gallagher & Eliassi-Rad 2007) for further discussion

    • How to accurately assess performance variance? (Open question)

      • Repeat multiple times to simulate independent trials, but…

        • Repeated training and test sets are dependent, which means that variance estimates could be biased (Dietterich ‘98)

      • Graph structure is constant, which means performance estimates may not apply to different networks


    Understanding model performance neville jensen mlj 08
    Understanding model performance additional improvement, especially for non-network neighbors(Neville & Jensen MLJ’08)

    • Collective inference is a new source of model error

    • Potential sources of error:

      • Approximate inference techniques

      • Availability of test set information

      • Location of test set information

    • Need a framework to analyze model systems

      • Bias/variance analysis for collective inference models

      • Can differentiate errors due to learning and inference processes


    Conventional bias variance analysis
    Conventional bias/variance analysis additional improvement, especially for non-network neighbors

    variance

    _

    Y*

    bias

    Y

    noise

    bias

    variance


    Conventional bias variance analysis1
    Conventional bias/variance analysis additional improvement, especially for non-network neighbors

    M1

    M2

    Model predictions

    Test Set

    M3

    Samples

    Models

    Training

    Set


    Bias variance decomposition for relational data
    Bias/variance decomposition for additional improvement, especially for non-network neighborsrelational data

    learningbias

    inferencebias

    noise

    learning bias

    Y*

    learning variance

    _

    YLI

    inference variance

    bias interaction term

    inference bias

    _

    YL

    Expectation over learning and inference


    Relational bias variance analysis part i
    Relational bias/variance analysis: part I additional improvement, especially for non-network neighbors

    +

    +

    +

    +

    M1

    +

    +

    +

    +

    M2

    +

    Model predictions(learning distribution)

    +

    Individual inference*on test set

    +

    +

    +

    M3

    +

    +

    +

    +

    +

    +

    Samples

    Learn modelsfrom samples

    +

    Training

    Set

    * Inference uses optimal probabilitiesfor neighboring nodes’ class labels


    Relational bias variance analysis part ii
    Relational bias/variance analysis: part II additional improvement, especially for non-network neighbors

    +

    +

    +

    +

    M1

    +

    +

    +

    +

    M2

    +

    Model predictions(total distribution)

    +

    +

    +

    +

    M3

    +

    +

    +

    +

    +

    +

    Samples

    Learn modelsfrom samples

    +

    Collective inferenceon test set

    Training

    Set


    Analysis shows that models exhibit different errors
    Analysis shows that models exhibit different errors additional improvement, especially for non-network neighbors

    RMNs have high inference bias

    RDNs have high inference variance

    Loss

    Bias

    Variance


    Some other issues part i
    Some other issues: part I additional improvement, especially for non-network neighbors

    • Propagating label information farther in the network

      • Leverage other features (Gallagher & Eliassi-Rad SNA-KDD’08)

      • Create “ghost” edges (Gallagher et al. KDD’08)

      • Create “similarity” edges from other features (Macskassy AAAI’07)

      • Leverage graph similarity of nodes (Fouss et al. TKDE’07)

      • Latent group models (Neville & Jensen ICDM’05)

    • Do we know anything about the dynamics of label propagation?

      • e.g., do true labels propagate faster than false ones?

      • see (Galstyan & Cohen ’05a,’05b,’06,’07)

    • What if labeling nodes is costly?

      • Choose nodes that will improve collective inference (Rattigan et al. ‘07, Bilgic & Getoor KDD ’08)

    • What if acquiring link data is costly?

      • Acquire link data “actively” (Macskassy & Provost IA ‘05)


    Some other issues part ii
    Some other issues: part II additional improvement, especially for non-network neighbors

    • What links you use makes a big difference

      • Automatic link selection (Macskassy & Provost JMLR ‘07)

      • Augment data graph w/2-hop paths (Gallagher et al. KDD ‘08)

    • How does propagating information with collective inference relate to using identifiers?

      • open question

    • Can we identify the (causal) reason for the observed network correlation?

      • Reasons might be:

        • Homophily: similar nodes link together

        • Social influence: linked nodes change attributes to similar values

        • External factor: causes both link existence and attribute similarity

      • Manski ‘93; Hill et al. ‘06; Bramoulle ‘07; Burk et al. ‘07; Ostreicher-Singer & Sundararajan ‘08; Anagnostopoulos et al. KDD’08


    Some other issues part iii
    Some other issues: part III additional improvement, especially for non-network neighbors

    • Computation and storage requirements can be prohibitive for data on real social networks -- how can we deal with massive (real) social networks?

      • ignore most of the network (traditional method)

      • use simple models/techniques! (e.g., Hill et al. 2007)

      • reduce size of network via sampling/pruning of links and/or nodes, hopefully without reducing accuracy (much) (e.g., Cortes et al. ‘01; Singh et al. ‘05; Hill et al. ‘06b; Zheng et al. ‘07)

    • What are the effects of partial network data collection?

      • one may not have access to or complete control over collection of nodes and/or links

      • different sampling/pruning methods may induce different effects (e.g., Stumpf et al. '05, Lee et al. '06, Handcock and Gile '02, Borgatti et al. '05)

      • can we improve accuracy by sampling/pruning?

        • irrelevant links/nodes can interfere with modeling (Hill et al. 2007)


    Some other issues part iv
    Some other issues: part IV additional improvement, especially for non-network neighbors

    • How to model networks changing over time?

      • Summarize dynamic graph w/kernel smoothing (Cortes et al. ‘01, Sharan & Neville SNA-KDD’07)

      • Sequential relational Markov models (Geustrin at al. IJCAI’03, Guo et al. ICML’07, Burk et al. ‘07)

    • How to jointly model attributes and link structure?

      • RBNs with link uncertainty (Getoor et al. JMLR‘03)

      • Model underlying group structure with both links and attributes (Kubica et al. AAAI’02, McCallum et al. IJCAI’05, Neville & Jensen ICDM’05)


    Conclusions part i
    Conclusions: part I additional improvement, especially for non-network neighbors

    • Social network data often exhibit autocorrelation, which can provide considerable leverage for inference

    • “Labeled” entities link to “unlabeled” entities

      • Disjoint inference allows direct “guilt-by-association”

      • Disjoint learning can use correlations among attributes of related entities to improve accuracy

    • “Unlabeled” entities link among themselves

      • Inferences about entities can affect each other (e.g., indirect gba)

      • Collective inference can improve accuracy

      • Results show that there is a lot of power for prediction just in the network structure

      • Collective learning can improve accuracy for datasets with a moderate number of labels or when labels are clustered in the graph


    Conclusions part ii
    Conclusions: part II additional improvement, especially for non-network neighbors

    • The social network can be used to create variables that can be used in traditional (“flat”) modeling

    • More sophisticated learning techniques exploit networks correlation in alternative ways

      • Node identifiers capture 2-hop autocorrelation patterns and linkage similarity

      • Models of the joint “network” distribution identify global attribute dependencies

      • These models can learn autocorrelation dependencies

      • Shrinkage models can combine both local and global estimates

    • Learning and evaluating network models is difficult due to complex network structure and attribute interdependencies

    • There are many important methodological issues and open questions


    • By this point, hopefully, you are familiar with: additional improvement, especially for non-network neighbors

      • a wide-range of potential applications for network mining

      • different approaches to network learning and inference

        • from simple to complex

      • Various issues involved with each approach

      • when each approach is likely to perform well

      • potential difficulties for learning accurate network models

      • various methodological issues associated with analyzing network models


    Related network analysis topics
    Related network-analysis topics additional improvement, especially for non-network neighbors

    • Identifying groups in social networks

    • Predicting links

    • Entity resolution

    • Finding (sub)graph patterns

    • Generative graph models

    • Social network analysis (SNA)

    • Preserving the privacy of social networks and SNA

    • Please see tutorial webpage for slides and additional pointers:http://www.cs.purdue.edu/~neville/courses/kdd08-tutorial.html

    see KDD’08 tutorial: Graph Mining and Graph Kernels

    see KDD’08 workshop: Social Network Mining & Analysis


    Thanks to

    Pelin Angin additional improvement, especially for non-network neighbors

    Avi Bernstein

    Scott Clearwater

    Lisa Friedland

    Brian Gallagher

    Henry Goldberg

    Michael Hay

    Shawndra Hill

    David Jensen

    John Komoroske

    Kelly Palmer

    Matthew Rattigan

    Ozgur Simsek

    Sofus Macskassy

    Andrew McCallum

    Claudia Perlich

    Ben Taskar

    Chris Volinsky

    Rongjing Xiang

    Ron Zheng

    Thanks to…


    http://pages.stern.nyu.edu/~fprovost/ additional improvement, especially for non-network neighbors

    http://www.cs.purdue.edu/~neville

    foster provost

    jennifer neville


    Fun mining facebook data associations
    Fun: Mining Facebook data additional improvement, especially for non-network neighbors(associations)

    • Birthday  School year, Status (yawn?)

    • Finance  Conservative

    • Economics  Moderate

    • Premed  Moderate

    • Politics  Moderate, Liberal or Very_Liberal

    • Theatre  Very_Liberal

    • Random_play  Apathetic

    • Marketing  Finance

    • Premed  Psychology

    • Politics  Economics

    • Finance  Interested_in_Women

    • Communications  Interested_in_Men

    • Drama  Like_Harry_Potter

    • Dating  A_Relationship, Interested_in_Men

    • Dating  A_Relationship, Interested_in_Women

    • Interested_in_Men&Women  Very_Liberal


    Acronym guide

    ACORA additional improvement, especially for non-network neighbors: Automatic construction of relational attributes (Perlich & Provost KDD'03)

    AMN: Associative Markov network (Taskar ICML'04)

    BN: Bayesian network

    BLP: Bayesian logic program (Kersting & de Raedt '01)

    DN: Dependency network (Heckerman et al. JMLR'00)

    EM: Expectation maximization

    GRF: Gaussian random field (Zhu et al. ICML'03)

    ILP: Inductive logic programming

    MLN: Markov logic network (Richardson & Domingos MLJ'06)

    MN: Markov network

    NT: Network targeting (Hill et al.'06)

    PGM: Probabilistic graphical models

    PL: Pseudolikelihood

    RBC: Relational Bayes classifier (Neville et al. ICDM'03)

    RBN: Relational Bayesian network (aka probabilistic relational models) (Friedman et al. IJCAI'99)

    RDB: Relational database

    RDN: Relational dependency network (Neville & Jensen ICDM'04)

    RGP: Relational Gaussian process (Chu et al. NIPS'06)

    RMN: Relational Markov network (Taskar et al. UAI'02)

    RPT: Relational probability trees (Neville et al. KDD'03)

    SLR: Structural logistic regression (Popescul et al. ICDM'03)

    wvRN: Weighted-vector relational neighbor (Macskassy & Provost JMLR'07)

    Acronym guide


    ad