Analysis of Politics and Industry Nexus: India Project Supervisor: Prof. Aaditeshwar Seth Himanshu Sharma (2010CS50284) MayankSrivastava (2010CS10224)
Objectives • Extract information about political-industry and intra political nexus from newspapers and some available structured sources on the web. • Represent it in the form of a graph with nodes representing entities and edges representing the relation between entities. • Analyze the graph obtained, rank the entities, and find correlation between news in different newspapers.
Implementation • Structured information collected from netapedia.in, myneta.info, PPPIndia.com and capitaline.info. • Continuous RSS feed collection from different newspapers. • Processing of the news through an NLP tool, OpenCalais. • Storing information in database in tables, filtering it and ranking the entities.
Ranking of Entities • Ranking entities using exponential moving average (called Fame from now onwards), which is updated on occurrence basis: High sensitivity to changing news, important entities in news come up while less important ones go down. • Ranking using PageRank algorithm with the exponential moving average used as personalization vector: Low sensitivity to changing news, shows the overall influence of an entity in the network.
Correlation Between Newspapers • Used Spearman’s rank correlation coefficient. • High correlation when entities are ranked using PageRank values. • Correlation coefficients as on 1st March (with respect to the overall data): • DNA (Business Section): 0.99118 • Hindustan Times: 0.99147 • DNA (Political Section): 0.99290 • The Times of India: 0.99305 • The Hindu: 0.99336
Correlation Between Newspapers • Low correlation when entities are ranked by Fame values. • Correlation coefficients as on 1st March (with respect to the overall data): • DNA (Business Section): 0.33939 • Hindustan Times: 0.41778 • DNA (Political Section): 0.52837 • The Times of India: 0.54673 • The Hindu: 0.57951 • Low correlation suggests that newspapers are biased.
More on Correlation • Plotted week to week correlation • Higher correlation between DNA (Business Section) and DNA (Political Section). • Hindu Shows a little lower correlation with Hindustan Times and The Times of India, showing some “different news from Times”. • Plotted inter-week correlation coefficients for newspaper: Mostly varies between 0.2 to 0.4 • Increased time duration to see longevity of news. Correlation values reach an asymptotic value of around 0.15 for political newspapers.
More on Correlation • For DNA (Business section), correlation touches 0.05. • DNA (Business Section) has lowest maximum longevity- It frequently switches news. • Longevity lower in general for The Hindi and Hindustan Times, as compared to DNA (Political Section) and The Times of India. • DNA (Political Section) and TOI cling to the same news and repeat it through a prolonged duration, while HT and Hindu prefer to switch news.
Bias By Newspapers: Examples • In August 2012, TOI gives a lot of emphasis on Nitish Kumar; while Hindu chooses to neglect it. • During mid of March 2013, Hindu, Hindustan Times and DNA (Political Section) give a lot of emphasis on ManmohanSingh,but The Times of India gives him less importance. Instead, it shows a number of news pertaining to Xi Jinping, while the rest ignore him.
Timelines Showing Some Important Entities Hindustan Times
Timelines Showing Some Important Entities The Times of India
Timelines Showing Some Important Entities DNA (Political Section)
Timelines Showing Bias with Political Parties Hindustan Times
Timelines Showing Bias with Political Parties The Times of India
Timelines Showing Bias with Political Parties DNA (Political Section)
Conclusions • The most important parts of news are shown almost equally by all newspapers. • Newspapers generally do biasing in showing the less important components of news. • Newspapers are generally biased in showing regional parties. • Janata Dal (United) is given preference by TOI and DNA, while ignored by Hindu. • Both Samajwadi Party and Akhileshyadav are very clearly avoided by Hindustan Times. • CPI is closely followed by Hindu, while Shiv Sena is avoided by it.
References • www.visualdataweb.org/relfinder.php • www.mpi-inf.mpg.de/yago-naga/yago • www.dbpedia.org • www.opencalais.com • www.wikipedia.org • www.myneta.info • www.netapedia.in • www.semanticproxy.com • “Identifying Influencers in Social Networks” by Kushal Dave, Rushi Bhatt, VasudevaVarma.