Skip to content

Research Outline

Prasad Talasila edited this page Feb 4, 2018 · 23 revisions

Research

This document summarises the objectives of IRCLogParser:

  1. Survey on usefulness of existing research results to OCC participants

  2. Overlapping Communities, see issue #142.

  3. Calculate conversation characteristics separately for new and experienced users (see issue #195)

  4. Calculate conversation characteristics for most active and least active hours (see issue #194).

  5. Use statistical distribution to determine an automatic cutoff for the minimum number of words spoken by a user (see issue #192).

  6. Identification of hubs and experts

    1. Graph ( HITS )
    2. Text ( See keywords and overlap )
    3. Expertise identification using N-gram text mining analysis
  7. Graph Motifs

  8. MultiChannel Network Analysis

    1. Birth, growth and evolution of multiple channels with common users. A channel gives birth to another channel if a user of a channel starts a new channel.
    2. Multidimensional network analysis of all channels
    3. Merged network analysis of all channels
    4. Analysis and visualization using muxViz - github and website;
    5. Clustering of evolving networks, see survey
  9. Identifying conversations

    1. Include undirected Messages
    2. Identify multiparty conversations
  10. Time Decaying Graphs

  11. Topic/Opinion Propagation

    • Ref: Publication by Prof. Niloy Ganguly
  12. Stream processing - Incremental computation on newly added logs

    1. Graph addition
    2. Predictions on conversations, response times, potential experts.
    3. Customize results based on User interest query
  13. Melding of Gource visualization / Logstalgia with time varying analysis

  14. Proper conversation threads built from message logs
    We are doing two approximations in the work.

    1. ignore undirected messages
    2. All directed message exchanges are bilateral communications
  15. Use the signal processing concepts like convolution, auto-correlation, cross-correlation to identify time period of recurring phenomenon in communities.

  16. Separate users in new comers and experienced participants. Now repeat the analysis separately for these two categories of users. Are the results any different? The answer to this question, would enhance / decrease the significance of the questions raised in the research paper.

  17. Emotional classification of text. See example of stackoverflow paper and github code

  18. Improve User Profile using topic modeling

    1. Use Latent Dirichlet Allocation (LDA) to create topics of a channel / multi-channels. See Python LDA tutorials - 1, 2
    2. Use online LDA to perform stream computation. If needed, there is a distributed LDA as well.
    3. As of 2017, autoencoders with customized loss functions is the state of the art way to perform loss modeling.

Code Development

  1. Profiling ( Memory and Time )

Results come automatically for builds on the benchmark branch

  1. Call Graphs

Can use doxygen here. Should be updates automatically via travis

  1. Benchmarking

Results come automatically for builds on the benchmark branch