-
Notifications
You must be signed in to change notification settings - Fork 13
Research Outline
This document summarises the objectives of IRCLogParser:
-
Survey on usefulness of existing research results to OCC participants
-
Overlapping Communities, see issue #142.
-
Calculate conversation characteristics separately for new and experienced users (see issue #195)
-
Calculate conversation characteristics for most active and least active hours (see issue #194).
-
Use statistical distribution to determine an automatic cutoff for the minimum number of words spoken by a user (see issue #192).
-
Identification of hubs and experts
- Graph ( HITS )
- Text ( See keywords and overlap )
- Expertise identification using N-gram text mining analysis
-
Graph Motifs
-
MultiChannel Network Analysis
- Birth, growth and evolution of multiple channels with common users. A channel gives birth to another channel if a user of a channel starts a new channel.
- Multidimensional network analysis of all channels
- Merged network analysis of all channels
- Analysis and visualization using muxViz - github and website;
- Clustering of evolving networks, see survey
-
Identifying conversations
- Include undirected Messages
- Identify multiparty conversations
-
Time Decaying Graphs
-
Topic/Opinion Propagation
- Ref: Publication by Prof. Niloy Ganguly
-
Stream processing - Incremental computation on newly added logs
- Graph addition
- Predictions on conversations, response times, potential experts.
- Customize results based on User interest query
-
Melding of Gource visualization / Logstalgia with time varying analysis
-
Proper conversation threads built from message logs
We are doing two approximations in the work.- ignore undirected messages
- All directed message exchanges are bilateral communications
-
Use the signal processing concepts like convolution, auto-correlation, cross-correlation to identify time period of recurring phenomenon in communities.
-
Separate users in new comers and experienced participants. Now repeat the analysis separately for these two categories of users. Are the results any different? The answer to this question, would enhance / decrease the significance of the questions raised in the research paper.
-
Emotional classification of text. See example of stackoverflow paper and github code
-
Improve User Profile using topic modeling
- Use Latent Dirichlet Allocation (LDA) to create topics of a channel / multi-channels. See Python LDA tutorials - 1, 2
- Use online LDA to perform stream computation. If needed, there is a distributed LDA as well.
- As of 2017, autoencoders with customized loss functions is the state of the art way to perform loss modeling.
- Profiling ( Memory and Time )
Results come automatically for builds on the benchmark branch
- Call Graphs
Can use doxygen here. Should be updates automatically via travis
- Benchmarking
Results come automatically for builds on the benchmark branch
- Branch History
- Best Practices
- Testing in Python
- Logger Config
- Refactoring Suggestions