The Kevin Dolan » theory http://thekevindolan.com Putting the Kev in Dolan since 2009! Sun, 15 Aug 2010 00:40:56 +0000 en hourly 1 http://wordpress.org/?v=3.0 Perturbation Model of Price Movement http://thekevindolan.com/2010/02/perturbation-model/ http://thekevindolan.com/2010/02/perturbation-model/#comments Thu, 04 Feb 2010 05:20:01 +0000 Kevin http://thekevindolan.com/?p=706 shwayze

I was sitting in my networks class today, thinking of how it would be possible to implement an algorithm for taking into consideration the similarity of documents for teasing apart temporal interference, when I started coming to a more coherent model of what I’ve been trying to do in general.  This article will set up some early ideas for a model of what’s going on, what we’re attempting to accomplish, and possible general procedures for doing so.  It also sets up some terminology.

Essentially, at this stage we have a data history.  This data history is made of two parts, a set of price point information and a set of several relevant news articles.

We will call our price point information, the Time-Sensitive Response Variable, or TSRV.  Let us explore what we are assuming about the TSRV.

I began to think about the idea of making analogies to physics, because that’s something I understand a little better than economics.  I think the way a lot of people approach the stock market for investing is to think about price as position.  This gives way to the idea that the market may often find itself trending one way or the other.  The idea behind a trend is that the price has a certain velocity, which is resistant to change (intertia),  until some outside influence (force/acceleration) causes a reversal or something of that nature.

Having looked at a lot of stock graphs, I am not so sure this is the case.  I understand many successful traders would disagree with me, but for the sake of this project, we are going to think of the TSRV for price as velocity, that is it is resistant to movement without outside influence–it experiences inertia.  In this concept the price over time is a derivative of some unknown value behind the scenes, which I intuitively feel might exist, that behaves more like the traditional concept of price.

You might say that the price over time is constantly gyrating madly about, so thinking that the TSRV is resistant to change is ridiculous, but keep in mind I said it was resistant to change…undisturbed.  There is a constant barrage of outside influence coming in to affect the price.  I consider these outside forces, the analogy of a force in physics, which is proportional to acceleration.

We will be calling these outside influence perturbations.  Perturbations could inevitably take many forms, but for simplicity we will be thinking of individual perturbations as being discrete chunks of constant force with finite lengths.  In physics, we know that the acceleration observed is due to the sum of forces acting on an object.  So too is the effect of the perturbations additive.

We understand that there are a great number of perturbations affecting any TSRV, some of which we know about, many of which we do not.  Furthermore, we generally only know the existence of some perturbations, not any details of their strength, direction, or duration.  The perturbations we do not know about, will generally seem to manifest themselves as noise, but it should be known that under this model, there is no random TSRV movement, only movement due to unconsidered perturbations.

For our purposes with regards to automated news analysis, we have a set of several relevant news articles that we assume have some effect on the movement of the price, in this fashion.  Our end-goal is to approximate the effect that news articles have.  According to our definition of the effect of perturbations, there are two dimensions of the effect of a perturbation, the strength/direction of the acceleration caused, the influence, and the length of time it effects the price, the  duration.

Given a set of perturbations, we want to find some way to determine their characteristics, so that we can hopefully use some similarity metric to indicate possibly future price moves.

It is well understood that the duration will vary for most news articles, but the computational difficulty of determining both seems at this juncture beyond the scope of what I can hope to accomplish.  Perhaps sometime in the future we can devise more complex algorithms, but for now we will focus on determining the influence of perturbations given some predetermined duration.

Now that we have a more formal understanding of our basic assumptions, we can consider possible means of accomplishing the approximation of the characteristics of perturbations, but I’ll save that for next time.

And yes, that’s a picture of Shwayze.

]]>
http://thekevindolan.com/2010/02/perturbation-model/feed/ 2
Test Data Sets http://thekevindolan.com/2010/02/test-data-sets/ http://thekevindolan.com/2010/02/test-data-sets/#comments Tue, 02 Feb 2010 23:57:04 +0000 Kevin http://thekevindolan.com/?p=690 test-data

From my last post, I introduced the idea of creating test data sets for the purpose of finding an algorithm to tease apart the influence of individual news articles.  I have done just that and am posting the data sets for further analysis.

My method for generating these test files was as the following pseudocode describes:

-Take 3 parameters, TIME-STEP,  TIME-FRAME, and COUNT.

-Create COUNT news articles, each with the following encoded in their summary field:

-Time-frame equal to TIME-FRAME

-Influence randomly set between [-1,1]

-For each timestep 0 through (TIME-STEP * COUNT)

-Find all news articles before current time, within their Time-frame value of now

-Add the sum of those news articles’ Influence values to the current price

-Record the current price

Because we defined a constant TIME-FRAME ahead of time, a simpler algorithm could have been used, but I am planning on attempting experiments with variable time-frames at a later date, so this was a sensible solution to save myself some work in the future.

I created 6 data sets, each with 500 data points, as follows:

Data set 0

TIME-STEP: 1

TIME FRAME: 1

Data set 1

TIME-STEP: 1

TIME FRAME: 2

Data set 2

TIME-STEP: 1

TIME FRAME: 5

Data set 3

TIME-STEP: 1

TIME FRAME: 10

Data set 4

TIME-STEP: 1

TIME FRAME: 50

Data set 5

TIME-STEP: 3

TIME FRAME: 17

The motivation for choosing the values for data-sets 1-4 are simple, to see the effects of using longer and longer time-frames relative to time-steps.  Data set 5 exists for the sole purpose of seeing if any problems are present with weird offsets.  If we see anything unexpected there, future research may be necessary.

I have attached a zip file of the corpus, if you are interested: here.

]]>
http://thekevindolan.com/2010/02/test-data-sets/feed/ 1
Temporal Interference http://thekevindolan.com/2010/02/temporal-interference/ http://thekevindolan.com/2010/02/temporal-interference/#comments Mon, 01 Feb 2010 21:14:33 +0000 Kevin http://thekevindolan.com/?p=685 coyabaanniversary400

This article is the second of the Automated News Analysis series, regarding a particular problem I overlooked during my first try at news analysis.  The subject of this article is taking a dataset of news and price history, and attempting to assign sentiment to the news articles for which we know the price development.  The problem that we will explore in particular, is removing influences from other news articles near our target article in time.

So the general methodology behind analyzing a new corpus is that on our training data set, we know how the price performed after some news event.  The hope is that we can use this information, and take into account similarity to future news stories, to somehow retrieve some information as to what the price will do in the future.

So from this, we are making an assumption, that the price move is in some way related to the content of the news article.  Intuitively, we would agree that this is the case.  If a company receives a piece of good news, it is likely that the price will go up.  If it receives bad news, it is likely to go down.  Obviously, the system is messier than this, but that is the basic assumption between this entire project.

We would like to know how a particular news article actually affects the price data on historical data.  A human analyst might look at a news article in history, and reason using external domain knowledge about the company, and form an opinion of his own.  At this juncture, we are not ready to consider external knowledge for computation, we merely want to find some purely mathematical model that can approximate the effect a news event has on the price data.

In the past, I had generally just worked under the assumption that a good enough approximation would be to ask how the price changed a certain period of time after the news article.  In hindsight, I think this is a very naive approach.  The problem, primarily, stems from two major sources of confound– temporal interference and spatial interference.  Temporal interference refers to interference from news articles proximal in time to the target news article.

Imagine a corpus having several bad news articles followed immediately by one good news article.  In general, we would expect the influence of these bad news articles to outweigh the influence of the good one, and thus we would see a downward price move.  Would there be any reasonable way to tease apart these two influences, at least on some approximate level, to better determine the sentiment of a particular news article.

To attempt to find a solution to this problem, we will approach it entirely theoretically.  We will first define some assumptions for generating test data sets, and attempt to find an algorithm that can effectively approximate the initial seeds to the problem.  In order to do this, we will first use strong assumptions, and then loosen them to more accurately represent what we think might be the case in the real world.

Our assumptions will be as follows:

A) The influence of events is solely responsible for the movement of price.

B) An event affects the price in a linear fashion, according to its sentiment.

C) An event affects the price over some predetermined constant time frame.

This gives us a nice way of generating test data.  The goal of using this test data would be to find an algorithm that looks at a corpus and teases out the influence of each event into some coherent model that matches the random seeds we used to generate the test data.

I have a few ideas in mind for how I will solve this problem, but for now I will get to work on generating some test data, and then report back on progress.

]]>
http://thekevindolan.com/2010/02/temporal-interference/feed/ 0