Temporal Interference
This article is the second of the Automated News Analysis series, regarding a particular problem I overlooked during my first try at news analysis. The subject of this article is taking a dataset of news and price history, and attempting to assign sentiment to the news articles for which we know the price development. The problem that we will explore in particular, is removing influences from other news articles near our target article in time.
So the general methodology behind analyzing a new corpus is that on our training data set, we know how the price performed after some news event. The hope is that we can use this information, and take into account similarity to future news stories, to somehow retrieve some information as to what the price will do in the future.
So from this, we are making an assumption, that the price move is in some way related to the content of the news article. Intuitively, we would agree that this is the case. If a company receives a piece of good news, it is likely that the price will go up. If it receives bad news, it is likely to go down. Obviously, the system is messier than this, but that is the basic assumption between this entire project.
We would like to know how a particular news article actually affects the price data on historical data. A human analyst might look at a news article in history, and reason using external domain knowledge about the company, and form an opinion of his own. At this juncture, we are not ready to consider external knowledge for computation, we merely want to find some purely mathematical model that can approximate the effect a news event has on the price data.
In the past, I had generally just worked under the assumption that a good enough approximation would be to ask how the price changed a certain period of time after the news article. In hindsight, I think this is a very naive approach. The problem, primarily, stems from two major sources of confound– temporal interference and spatial interference. Temporal interference refers to interference from news articles proximal in time to the target news article.
Imagine a corpus having several bad news articles followed immediately by one good news article. In general, we would expect the influence of these bad news articles to outweigh the influence of the good one, and thus we would see a downward price move. Would there be any reasonable way to tease apart these two influences, at least on some approximate level, to better determine the sentiment of a particular news article.
To attempt to find a solution to this problem, we will approach it entirely theoretically. We will first define some assumptions for generating test data sets, and attempt to find an algorithm that can effectively approximate the initial seeds to the problem. In order to do this, we will first use strong assumptions, and then loosen them to more accurately represent what we think might be the case in the real world.
Our assumptions will be as follows:
A) The influence of events is solely responsible for the movement of price.
B) An event affects the price in a linear fashion, according to its sentiment.
C) An event affects the price over some predetermined constant time frame.
This gives us a nice way of generating test data. The goal of using this test data would be to find an algorithm that looks at a corpus and teases out the influence of each event into some coherent model that matches the random seeds we used to generate the test data.
I have a few ideas in mind for how I will solve this problem, but for now I will get to work on generating some test data, and then report back on progress.




