Oct 31, 2008

Single Factor Analysis

What is Single Factor Analysis(SFA)? I really won't go into scientifically defining the terminology because it can get very complicated as the mathematical tools get varied.  So I am going to say in very layman terms, what exactly does SFA mean and how do we carry on the analysis. 

Factor Analysis is an analysis where you take two different variables like :

Then you vary Independent Variable and check the response on the Dependent Variables. The main objective is to prove or disprove that, the change in IV has effect on DV.  Therefore, usually in SFA we go by having a hypothesis, that there is a relationship and may prove it wrong or right during analysis.

Example I

The concentration of the CO2 ( [Co2]) has an effect on the temperature of Earth[T].

The above statement is the hypothesis. It may sound old, as it is the proven fact. But it wasn't back in 1824. And the interesting fact is that the Global Warming was not identified by Environmentalist, but by a French mathematician Jean Baptiste Joseph Fourier because he was looking at the NUMBERS. An analyst on his own way.

So what do you do to prove that above is true? There is many hypothesis test available like T-test, Z-Test and so on. But all the test follows a similar method. 

Our hypothesis is still incomplete. We have to say how the variable are related. So I say, the rise in [CO2] increases the [T]. Fair enough.  After this we analyse the data on CO2 concentration and temperature value. 

The reason for me to take this example is that it clearly associates the principle of Single Factor Analysis. 

Example II

This time lets take a business problem. In Banking sector there are many variables being recorded every minute. For the following example, I have used USA economic data from www.freelunch.com

Let me explain the variables first:

Disposable Personal Income is personal income less personal tax and non tax payments. It is the income available to persons for spending or saving.

Consumer Installment Credit Revolving is the credit taken from the bank by the consumer and revolving means that the value of the credit is not fixed and that the credit is also replenish able. Lets put this as simple as that. 

Other variables are the mathematical function. Among them Standard Deviation is the mean of the all the variance from the mean value. Also I had choosen these variable as they had their correlation value of about 0.8539, which is fair value to suggest relationship.

So ready? Let look at the values and see what it says. 

Do not be confused with the Rank variable, its the ways of categorising the variables. This is also one of the major step in the analysis process. It can be achieved by many complex clustering technique like K-means or by simple manual categorising.  In this case the ranking has been done will respect to the magnitude of the variables, separating from small ones to larger ones.

So the data shows, rise in the value as we move from top to down. It is always better to plot the graph. The reason for me to take the sum, average and Standard deviation will be clear after wards. Remember there is no 'time' variable concerned here. We are only looking at two variables and the effect of one on another. The graph below shows the average values of the independent variable for each ranking. It says nothing about the relationship between DV and IV.

Graphical Analysis

Lets look at the 'sum value'. The result shows that the for smaller values of Rank, the sum is very less and as the rank value increases, the sum of DV also increases. From business point of view, as the people have more saving on their hand, they are willing to spend more. Fair enough. 

The graph against the average value shows the similar analysis. The reason can be explained simply by realising that average value and the sum are related as average being the division of sum by the variable count.

So we  have same analysis.

But what is this? The graph has different characteristics here. The rank starting from 0 to 6 has almost zero value and almost rise in 7 and 8 but not much from 8 to 9.  This result has more things to tell than the above graphs of average and sum. 

Let me tell, exactly what we did just now. We simply  took the average value and applied a mathematical tools to give us a new variable that gave us a more better perspective on the variable. Such is the power of mathematical tools. 

So lets analyse. The standard deviation gives us the variation in the Dependent variables  for a change in Independent Variables of certain categories. 

From Rank 0 to 6, we see no variation, indicating that when the value of IV is small the variation in it has no effect on the DV. Its fair enough, cause when people are less savings they are not willing to spend more so less use of Credit Card. 

But with in, 6 and 7, the variance increases, indicating that when people have enough savings, small increses in their savings will largely affect their spending habbits. But in Rank 9, there is no change in the spending habbits. More savings would mean more spending. So people are more critical in their spending nature when their saving is in between 6 and 8 beacuse they are trying to balance their savings and spending during those time. This analysis is not possible by looking at the sum or average value. 

The above analyses has really made it very clear about the relationship between Revolving Balance and Disposal Income. It not only told that they are directly related but also how they are related. 

This is the SFA, also note that it was the standard deviation that gave us the proper understanding rather than sum and average. But this may not be the case, everytime. Also, we did not go by stating a hypothesis. Well, you may start by a hypothesis, but your statement is not going to change the method, any way is valid. 

So what do you all make of it? Let me hear from all of you.

Mathematical Reality

Why do we use so much of mathematics in analytics, and is there any way to justify this method? The answer is very straight and simple. Mathematics is taken as one of the most truly logical language. No doubt.

In addition, analytics means analysing the facts and logically following the clues in the facts to come to a practical conclusion. Its this process that exploits the power of mathematics, to structure the logic in the most appropriate way. The mathematical processing makes the logic in physical world more understandable and playable. Thus, giving analyst a tremendous power to handle the facts.

If there is a time, than I would suggest going through a book called "A Mathematician's Apology" by  G.H. Hardy. It is the most humblest book I have come across. The book is about a Mathematician who have chosen to devote his whole life on mathematics. There, he argues the difference between the Mathematical Reality and Physical Reality. Let me explain it in my words.

In a physical world, there are many qualities, lets call it variables which are given specific units. All these variables are in continuous change. One of the most important variable that changes almost every instance is no other than "time". (Be careful with what I have said right now, because its a very complicated statement, as change in itself is related to time). And all these variables are dependent upon one other, making it almost impossible to track them. This fact makes it impossible to consider every single variable during our analysis. Therefore, its impossible to model any physical system with 100% accuracy. 

But in mathematical world, every thing is ideal. In order to understand this idealism, I want to give some examples: lets take a circle or line or any geometric shape. 

In real world, when a circle is drawn, its never circle and also occupies physical space. But, in mathematics, line is considered to be 100% straight and ideally of zero thickness occupying no space at all.

Further, this circle is described by an algebraic equations. This mathematical equation is like the DNA of life, describing it in all possible ways.

Therefore, in mathematical domain, every calculations and predictions are accurate as concerned with logic but when we map them to physical domain, we get errors. These errors are due to the approximations in our analysis and due to the undeniable difference between the two domain. 

In the graph by the side, blue line is the real data from physical world, whereas the red line is the data from the mathematical world.

If the data is considered from 1 to 100 in the X-axis, we can see that the two lines are almost following each other. By taking this sample and modeling in the mathematical domain, we come to conclusion that the physical domain must follow the red line pattern. But NO! Some how, the real world's data is dropping down. There is tremendous error. This error can be reduced with more consideration of all the variables. T

The point is, its impossible to accurately model the physical system into a mathematical system with zero percent error. Therefore, we talk about approximation, probability and percentages. There is always a loss in the process of transformation of variables, from one domain to another. Then, why do we take this step to analyse. Its because, once the problem is in mathematics, its easier to manipulate the variables and  create a logical relations, this is made more easier by the availability of thousand and thousands of mathematical tools. And off course, the problem cannot be solved in the physical domain, it has to be transformed to mathematical domain.

Understanding these relationships between the physical variable and a mathematical variable makes a lot of difference in our analysis process.  Let me know what you think about it.

Some cool links:

Oct 17, 2008

The Difference

What kind of difference am I talking about? Well, one very important difference between the under developed and the developed country. I came across this difference long time back during my student years, but found it very interesting when it was talked about in one of the debate, in BBC radio. The difference lies in perspective of the science student, in the developing country and the developed  country. 

The program in the BBC radio had very interesting point to make. They had two teams of science student- one from UK and the other from some Asian country. Both of them were asked why they chose to study science. Remarkably the views were very very different. And off-course it had lot to do with the economic conditions of the country. 

The  students from under developed country told that, they hope to bring the changes in the country, by pursuing science. Some wanted to be engineers, some doctors and some wanted to go to medicine. In one view, it seems very likely that they would think that opting science would really help their country to grow. 

On the other hand, the student from the developed country had never thought of it that way. They were studying science merely because they were interested in the subject.  

Now here is the big difference, isn't it? So this is how young minds of different continents think. So.....seriously, does taking a science subject help you to over come the difficulties of your country. I really want to get to the bottom this is discussion. 

Some of my observations goes in this way.  Many science student in under developed country, do their further studies in foreign country and very rarely do we find them returning to their home land. Hmmm. Development! Where did that go? But it doesn't happen this way always and I am not blaming anyone. 

The main problem lies when young mind start getting deeper into their studies. As they go further, they find themselves entangled in the mystery of their field. They go further into the field rather than their vision. In other words, they get overwhelmed by the depth of knowledge. So knowing becomes more important than applying the known knowledge. Nothing wrong with going deeper with knowledge, just know when to act when you have it all. 

Therefore, its not the field that you choose that is going to help you help the country. Instead, its your vision that is going to help. What do you really want to do with your knowledge, is more important than what you want to study. Being practical with your knowledge is what I was trying to tell so far. 

So this difference between the beliefs has really something to tell us about the way we decide what we want to study and why. So if you really want to help try being practical.

Missing Value Approximation

What do we do when the data you have has a missing values? There are very few options like:

1. Delete the data
2. Make the missing value zero
3. Approximate
4. Ignore and so on

Now obviously, options 2 and 4 are not the likely way of dealing with the missing data. 

When we make the missing value ZERO, you are unknowingly creating a new data which may be either valid or not. For example, what if the ZERO value is one of the likely possible value of the variable that you are dealing with? In that case we have a wrong set of pattern. 

Further, we simply cannot ignore the data. Ignoring is some how similar to deleting.

When we delete the data, we are decreasing our sample number, hence decreasing the power of our statistical tools. As we know, the statistics work best when there are large number of data. In addition, there are other difficulties with deleting a data, which we will get to latter on.

Therefore, it seems very probable that we approximate the value. But we have to be very careful while doing so. This time, we will be looking at a simple algorithm to approximate certain typical set of data. There are commercial software available for the purpose like SPPS, SAS etc. To note, you can use any programming software to approximate. I have used java to read the data from excel and then approximate the value. Its your choice. 

Let me first give a very simple algorithm to approximate:

1.  Read the whole data in an array
2. Move along the array and find a missing value
3. When you find, take a difference between the values of the consecutive data over certain range
4. Take an average of the difference
5. Add that average to the data next to the missing value
6. The new value is the approximated value
7.  Repeat the process for all the missing values

In mathematical terms:


d = data array
n +1 =  location of the missing data
N = total number of difference taken

Now, in these variables we can play around to get the required approximated value. These variables control the trend of the approximation. 

When you are approximating, make sure to plot the data to look how your approximation has turned out. This will give you better control of your result. The reason for doing this is that no two data sets have the similar trend, therefore particular algorithm may work for one but not for other.

Lets take some  examples and see where there is places for errors and wrong approximation.

In this approximation, everything went very well. The data from 1 to 928 was missing. The trend looks very simple - the values are rising and not decreasing. Therefore, the approximation worked very well. Try for other data you have and you may be satisfied with the result. But can we justify our approximation. What if the values initial were peaking and not around zero. Therefore make sure, your approximation is also logical. For that try to understand the nature of the variable that you are handling. This way the margin for error is reduced.

One very important thing to remember is that, make sure you are not trying to approximate missing values where the missing values make about 75 or 80 percent of your data set. This can really go very very wrong. 

For example:

The above data looks very damaged. The trend is somehow unclear and missing values are almost above 70 percent of all. When we run the same algorithm for the above data we get the result as below:

What can we say about the above result. Very unlikely and unrealistic. Well the algorithm was the same but it did not turn out as we expected. In this case, we need more of an intelligent approximation that can look at the trend and decide how to approximate. May be some adaptive algorithm. But we can also argue, if we can really approximate the above data with complete satisfaction as the number of missing values are in large number.

Another example:

Data :


Wow!! How is that trend possible? You see, not always is the algorithm doing a job for us. Therefore, we have to be really very careful. In this case, our data was very unpredictable: we had many rise and fall. The algorithm samples only the consecutive data for predicting the next values, therefore we run into wrong prediction. From this we can say, the algorithm does not work with the random data. To clarify this point lets zoom into next data :

In the above graphs, we can clearly see that the data are fluctuating in a random manner. Remember the above graph represents the same data set, but in different level of zoom. In this case its virtually impossible to predict the missing value using the above algorithm. Therefore, we must come up with a better algorithm.

It think I have almost covered most of the fundamentals on missing values. So, if there is any suggestions or any observation on my analysis, please feel free to comment on it. 

Oct 13, 2008

Entrepreneurship II

lets continue to discuss on this topic of entrepreneurship more. With the world moving so fast in innovations and technology, I ask, can we play any role in it? Many of us do not think that we could have something as big as Microsoft or Sony or IBM. But wait a minute, are we purposely demeaning ourselves. Have we tried hard enough to reject the idea of building any new technology. Are we even encouraging the one, who are trying their best to do so? So lets analyze it.

Is money the biggest hurdle in creating a technology ? My answer is - very rarely. From my point of view, technology does not mean building an expensive machine, it simply means coming with an efficient solution. I have not set any boundary for technology. Its an idea or a faithful solution whose sole purpose is to upgrade the standards of the work. No doubt the capital is also necessary.

Its the fact that all the innovation have started with a dream to solve a particular problem. It will be cliche for me to repeat those achievements. To innovate you don't have to be a rocket scientist, you simply need to understand the needs of the society. If you can identify those areas then the solution will surely knock on your doors.

People fear technology as a job snatcher. Something that causes unemployment, but I see it as the opportunity to explore ourselves. The opportunity to live a different life. Long time back I heard someone say, "We want to learn to fish and not eat the fish exported to us." How true is the statement, we should really be investing in coming with our own technology, our own product. Also we must be able to compete with the world market. Can we? Am I being irritatingly optimistic? If so, do you see any wrong in that. Is it wrong for me to hopefully believe that we can off course compete and produce a world class product?

Very often, I have asked myself the correct way of achieving to my dreams and I find that the only way is to have a clear, long vision and lead. This merely means, on the other hand, being an entrepreneur. It is the fact of our world and it does not need any justification that - The faith of the society lies on the hand of one dreamer. Just one. Imagine if there are more than one. Can anyone stop them to innovated and lead? I wonder.

Entrepreneurship I

This time I am asking a very difficult and an important question to myself. Being an Electronic Engineer, I have a strong inclination to technology and innovation. But when I look around, in my country(Nepal), I get disappointed. I rarely see innovations happening.

Then a very important question arises- Where are we all heading towards? What are our priorities and how are we planning to achieve them, not as an individual but as a country.

Since my childhood, I have been told that the strength of our country lies in its Agriculture, Water Resources and Tourism. But are we the leaders in the above field? Do we dominate the world in those fields and the answer is NO. Am I asking for these changes too early? I fear that these may be our illusions to keep the hope around. I also use to hear, "Nepal ko dhan Hariyo Ban", which means "Green Forests are the riches of Nepal". But its been ages, that I have not heard it being said. Some how, those riches are lost. Are we sure that these few left strengths will not disappear in the similar fashion.

Every year, thousands of engineers, scientists and doctors graduate from colleges. Where do all of them land in their life? I myself cannot justify my position. Are we all only the job seekers? Are we all machines, waiting to serve? I assume not, so why is there less innovations, and if there are any that I am not aware of it then, where are they? I have myself ,struggled to find one. In a bigger picture, we are lagging in the field of innovations. We do not own any ideas or innovations of which we can be proud of. I was shocked to find out that, we even don't have any rights over some particular vegetation which is so truly ours. I must say but we are losing the right of claiming our identity. Is this where we are heading?

I want to say, some how disappointingly, that I am not proud of The Everest or The Buddha as, these are not our creation. We are just merely lucky to have them. I do not enjoy these jewels of ours, which is put on us so proudly. I want to be proud of what we can create, what we can deliver. I want to own a technology which is so rightfully ours and not others. We need to be proud of what we are doing and not of what we have, nor of what our ancestors did. Do we have the credentials of achieving this. I want to believe that we have ? Call me a dreamer or stupid child, if you can but I want to believe that we can be leaders in technology and innovation.

Oct 11, 2008

Teaching II

The things that I believed while I was teaching was that, no class must go by without student learning anything new. Every class must have something new to the students, otherwise it would be waste of their time. Usually, there are two things that contradict with each other during a class , one is called : Exams point of view or preparation for exams and other is the teaching new concepts and ideas and their relationships. 

On my terms, I always made student practise physics numerical questions on the class. This gave me and them the opportunity to work out the concepts and their application. In this particular class, there was nothing new to learn but there was a thing to learn about ways of applying things that the student learnt. So confusing , isn't it. To clarify let me give you all an example:

Imagine you are in the cockpit of and airplane and the pilot tells you everything about each of the buttons and controls in the cockpit. Now let us also assume, that you have really memorised every things! Now does that make you a better pilot. Hmm. I hope not. So what makes you the better pilot. Obviously, if you can really fly the plan and know the basics of it rather than all controls of a button. 

Teaching also is some how similar, first you introduce a controls of a button and let students play around with it. When they gain confidence over the control of the concept you can be sure that they know their course work. But that is not the end. So lets see a second scenarios: Just because you can fly a plane does not necessarily mean that you understand how the air plane fly, isn't it?

So being a student, its important that you make a balance between understanding, knowing and applying the concepts.  Being a student its your responsibility to see things from every perspective.  Its important to be open minded and not arrogantly dogmatic about what you think and what others think. 

Teaching I

Its almost been 3/4 years that I started to teach A Level Physics. I started with a reason to share all the things that I had learnt during my A level days. That seemed more convincing to me for doing so. But I had to manage time for my own studies, as well. For the third person, it looked as if I was walking on two separate roads. But from my perspective, I saw that the path was one and only one. 

I kept studying my engineering courses and preparing lectures. But it did not happen the way it usually does. Mostly, preparing a lecture would mean holding a pen and paper and listing the points to be taught. It might sound stupid and odd but my most lectures were prepared on my mind. I usually prepared my class, as I was cycling to campus, or sometime while listening to music and most of the time while day dreaming. It is really odd thing to say but it was usually this way, that I learnt to teach. If I had gone the other way, by looking at the books an preparing a note then I would have never been able to tell the students what I really wanted to.  I always thought that teachers are there to tell you things that are not written on books. 

Its usually my experience with education that I have never and ever reviewed my lecture note once it was written. Somehow the whole process of writing note seemed more important to me then going through it afterwards. Therefore, I usually encouraged my student to write on their own, instead on what teachers were telling. But for many reasons the techniques wasn't working. Most of the time, student rarely made their own notes which terribly upset me, for the only reason that they were missing the skills to write down what they knew and that they were being incapable of sharing their knowledge. The knowledge that isn't shared loses its reason of existence.

During my years of teaching, I always forced myself to be stubborn about not giving any notes to the student. But now and then I would receive complains from them, as the exam came nearer and they didn't have anywhere to refer. I would get really frustrated. But then with patience, I would note down the important points for their shake. Some how I hated myself for spoon feeding them but I had to do. 

Years have passed and I have all these mixed reactions with students, some success and some failure. I would think about it a lot and see if there was a point where I had gone wrong. From teachers perspective, its really difficult to see the failure in student, cause it would somehow reflect the failure on me. But I guess that was what I kept learning all these years, failure and success hand in hand. 


Oct 9, 2008

Time Series Data

Time Series Data is one of the most common type of data that requires analysis. The above data is a typical example of a time series data (data has been taken from www.freeluch.com). In the graph we can see the variation in the Customer Price Index of US with respect to time. Now, this information would be very interesting from an economic point of view.  But how do you go on analysing the data? What different calculations or operations would be required to be performed? Before going into it I would like to share another example of time series data. 

The data below (photo of oscilloscope)is taken from one of my previous research on DC motor. The graph shows the variation of voltage across an electric component.

Both of the above examples are the time series data. The data is obviously the output of the system, where an independent variable is being inputted. Now this particular study, is very interesting from the economic point of view and from the signal point of view( in case of electronics). The study of the data reveals the nature of the system and hence allows us to predict or forecast the future value. And this is one of the main reason behind analysis of the time series data. 

There are mainly two types of analysis:

Depending upon the nature of the data, the analysis can be chosen. For example, time domain analysis is more important to the economic analysis rather than a frequency domain. But in case of electric signal, both may be useful whereas, in case of others only frequency analysis may be fruitful. So basically it depends.

The reason for the transformation into different domain is that there are certain information about the data that are more visible in one domain than in other. 

For example, in order to analyse delay of the system, time domain would be more appropriate than the Frequency domain. Whereas, if bandwidth is the necessary than frequency domain would be more useful to visualize the data.

But I am trying to be unbias and look on the basic of analytics regardless of the system. So lets look at some of the traits of the time series data. Saying that, it would mean that we will be looking more into the mathematics.

When we are analysing the Time Series data, we look at the trend in the data. This means, if either the data is increasing or decreasing and in what rate. We also look at the variation and the rate at which the data is varying. These information provides us insight about the nature of the data. For example, if the data is linearly varying or not? 

In the first figure, we can see that the CPI value is almost constant from 1950 to 1975. But then it suddenly starts to rise from 1975 and on 1995 there is again no distinct variation. Now these kind of analysis is different from that of the analysis on the second figure. In second figure, the data is periodic in nature. At the edge, we can see a distinct variation in the magnitude of the data. We call it oscillation in the electric signal terminology. Henceforth, we can see that for different quantities varying with time, the nature of analysis can be different. Though the process is some what similar.

With in the time series data we have two different nature of the data:

Depending upon the type of the data, during the analysis different mathematical concepts are to be used. Also upon different nature of the data variation, different mathematical tools are used. What I mean to say is that the data can be continuous and predictable, the data can be random or chaotic, the data can be discrete and unrelated. 

But no matter what the nature is, the fundamental remains same only that it takes different form under different situation.

The day I dreamt of becoming someone

Reading a story is like pealing an onion, each chapters reveal new secrets and show their hidden layers. Everyone has their own story to...