Research Outline

Research

This document summarises the objectives of IRCLogParser:

Survey on usefulness of existing research results to OCC participants
Overlapping Communities, see issue #142.
Calculate conversation characteristics separately for new and experienced users (see issue #195)
Calculate conversation characteristics for most active and least active hours (see issue #194).
Use statistical distribution to determine an automatic cutoff for the minimum number of words spoken by a user (see issue #192).
Identification of hubs and experts
1. Graph ( HITS )
2. Text ( See keywords and overlap )
3. Expertise identification using N-gram text mining analysis
Graph Motifs
MultiChannel Network Analysis
1. Birth, growth and evolution of multiple channels with common users. A channel gives birth to another channel if a user of a channel starts a new channel.
2. Multidimensional network analysis of all channels
3. Merged network analysis of all channels
4. Analysis and visualization using muxViz - github and website;
5. Clustering of evolving networks, see survey
Identifying conversations
1. Include undirected Messages
2. Identify multiparty conversations
Time Decaying Graphs
Topic/Opinion Propagation
- Ref: Publication by Prof. Niloy Ganguly
Stream processing - Incremental computation on newly added logs
1. Graph addition
2. Predictions on conversations, response times, potential experts.
3. Customize results based on User interest query
Melding of Gource visualization / Logstalgia with time varying analysis
Proper conversation threads built from message logs
We are doing two approximations in the work.
1. ignore undirected messages
2. All directed message exchanges are bilateral communications
Use the signal processing concepts like convolution, auto-correlation, cross-correlation to identify time period of recurring phenomenon in communities.
Separate users in new comers and experienced participants. Now repeat the analysis separately for these two categories of users. Are the results any different? The answer to this question, would enhance / decrease the significance of the questions raised in the research paper.
Emotional classification of text. See example of stackoverflow paper and github code
Improve User Profile using topic modeling
1. Use Latent Dirichlet Allocation (LDA) to create topics of a channel / multi-channels. See Python LDA tutorials - 1, 2
2. Use online LDA to perform stream computation. If needed, there is a distributed LDA as well.
3. As of 2017, autoencoders with customized loss functions is the state of the art way to perform loss modeling.

Code Development

Profiling ( Memory and Time )

Results come automatically for builds on the benchmark branch

Call Graphs

Can use doxygen here. Should be updates automatically via travis

Benchmarking

Results come automatically for builds on the benchmark branch

Home

Research Docs

Research Outline

Developer Docs

Miscellaneous

References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research Outline

Research

Code Development

Research Docs

Developer Docs

Miscellaneous

Clone this wiki locally