Data collection, accuracy, and validation
All comments and posts are streamed and stored in real-time. Sentiment analysis is performed on the comments, and mentions of company names, socially-known company founders, and stock tickers are found from within the text. All data found is real-time and updated within 10 minutes of view. Similarly, quality-assurance algorithms are run on the data to ensure that posts removed by subreddit moderators are removed for the express purpose of bot/spam prevention. Duplicate mentions from the same user over a span of 24-hours are similarly removed, to prevent individual users from drastically swaying the statistics in one way or another. No bots (AutoModerator, VisualMod, etc) are included in the data. Machine learning via entity recognition is performed to obtain lists of possible company names and stock tickers, (with checks for potential 'misses'). Regardless of particular formatting of any given company or entity, each explicit mention of a company name, ticker, or relevant CEO will be found. Similarly, we record a high correlation (>95%) between the number of post exposures, the number of posts, and the number of comments in comment-validation.