Nov 30, 2008

Customer Modeling

Customer Modeling is way of understanding the psychology of people going to the market. Yes, I have been talking about it for quite some time. This time, lets try to be a bit realistic. So, lets start by questioning ourselves. For case study lets take an example of a Shopping Mall.

What are the reasons that we find people in the Shopping Mall?
  1. Just visiting as time pass and hang out, kind of social gathering
  2. Visiting for the first time
  3. Searching for gifts to buy
  4. Family trip
  5. Tourist visit
  6. Visiting with a particular motive to buy something  
These are some of the reasons that you find people in the Shopping Mall, assuming there is no movie complex in it. :)

The reason for listing down these factors is to understand the categories of customers coming to visit the shop.

Now among these 8 groups of people which one is the most valuable customer- meaning most likely to spend money. I personally see few, number 4,6,3. These are the likely customer because they are in the need of something.

Family Trip is more likely to spend their money than a individual roaming around. That is a very sociological thing. A Tourist has a tendency to buy souvenirs but may not spend much. Individual searching for gift or with particular target do spend but its difficult to change the set of their mind to buy something else as they are already predetermined.

Friends hanging around may not be very promising of all sue to that fact that they are casual passers buy with different agenda in their mind. People visiting for the first time really spends as he/she doesn't have a clue on what things can be found or are not psychologically prepared. 

This is the first step in categorising your customer and understanding their values. Next would be the in depth analysis of the customer who actually bought something. This needs more time and data. Few data that are necessary to understand from a individual customer are :
  1. The financial status (income) of the customer
  2. The loyalty of the customer
  3. The spending of the customer
  4. The categories of things bought by the customer
  5. The age and  location of the customer.
These are few details on the customer that can be gathered from some good Survey methods. after the data is received then an in depth analysis has to be carried out which includes generating a score card for a customer.

Score Card is number that can be used to measure the valuable of a customer. From these analysis, then a change in policy or other amendments can be carried out as recommended from the case study.
Some of the links that clarify these methods more clearly are:

Nov 29, 2008

Maslows's Hierarchy

Yes, one of the fundamental laws of every things, the Maslow's Hierarchy. It doesn't need to be stated because everyone is aware of this fact. The need that needs to be fulfilled in this world is the thing that makes everything possible and run. It is the reason that the world is moving forward. It is the engine of life.

Analysing the trends and everything might show us the relationship between the variables and their nature. But ultimately when we ask why then, we need to face Maslow's Hierarchy. All we need is Need

Supply is dependent upon Demand and the demand is dependent upon need. So its need at the end. In marketing if we can only understand the need of the consumer, then its all we need. We call it a need based approach. That is only the thing that can work. Example:

As soon as Internet was popular, people needed a damn good search engine that could just do search for them. Many tried but Google made it just the right. This is one example and there are more.

But there is one other point that I wanted to make. And that is - we know people buy when they need and when they can afford. But how is possible to make them need which they never thought is necessary. Hmmm. There is where marketing analytics come to play.

Just how is it possible and where should we target?Hmmm. Lets think. Remember one thing, people never stop needing. 

Nov 27, 2008

Modeling Organisational Behaviour

How often have you been able to predict your friend's behaviour. Hmm. Many times, isn't it? That is because, you have been spending years and years of time with your friend, so you know how your friend is going to react to a situation, unless the situation is something that is very very new. But, there are certain situations when, you get amazed by your friends behaviour and say "Ohh, I didn't know you thought that way?" 

So, again, is it possible to predict the behaviour of Organisation. Very roughly, yes. This is what a social scientist try exploring and it requires a very high analytical observation. Also, you need to analyse the historical data for better understanding of the organisation. The possibility of predicting an organisation and an individual is somehow different.

Organisation is a group of people connected with certain norms and values. The existence of the organisation is important for the sake of its goal and philosophy. Organisation put its goal in first priority, that is for what it exist. Therefore, many often we hear of sacrifices in a group for the sake of existence of the group. That might sound odd but it is the core fundamentals of being in a group. It is the in born social structure that cannot be broken. 

So, with these basic understanding of how a group works and proceeds with its goal, is important to be analysed. There will be lots of data and historical events that is of significance to understanding the nature of the organisation. 

With careful analysis and better approximation, I feel it is possible to predict the act of an organisation. Some of the field that could benefit from it is:

1. Modeling the Consumer of Market
2. Modeling the pack of certain Animal Species
3. Modeling the behaviour of Criminals
4. Modeling the behaviour of band of gangs, may be a Terrorist Group and so on.

In this way, it is a field which can be of tremendous helpful in understanding the organisation and learn the ways to optimise it. 

Richard Feynman

A child in a man. That is the best description that I can give. Richard Feynman is one of the most celebrated physicts and one of the very few people that have inspired me to Think. His way of life was simple as any. With his inquisitive behaviour and his way of dealing with the problem and solving it with impeccable accuracy, he was known to be a magician rather then a scientist. 

He was also considered to have a unique mind of all times. A mind that was free from bindings of society. There is a book called "Surely, you are joking. Mr. Feynman". It a book about his stories of life. It is very interesting to go into one of these great minds, and understand them. 

But, what's so interesting about his life is the fight he fought against the society's way of thinking. That is what I see the most important aspect of his life. The reason I am writing about him is to introduce this great mind which must not be gone unnoticed. Instead of me speaking about him and his theories, I prefer, the readers explore it on their own.

Hope, you enjoy his stories. It is something that any young mind must go through once.


Nov 26, 2008

Bruce Lee

Yes. Bruce Lee. Remember all the punches and kicks in his all time hits movies. But why am I talking about him. Am I going to talk about his movies? No, I am not.

I am going to say about his philosophies which have been so many times projected in his fighting styles. By the way, he also wrote a book. Many may not know but he had a unique way of thinking about the life and his art of fighting. Let me put forwards his thinking, in my own words.  

He believed in efficiency and rejected any amount of redundancy that affected the performance. He believed that there is no one style which is correct. Everyone has their own unique way and must harness those skills rather waste their time in doing something that is not efficient at all. 

There are many fighting styles carrying their own norms and values. Every style starts by assuming something. But slowly those assumption are taken as a Gospel truth, and people make them into rules which cannot be broken. But, without understanding the key meaning of style and just following the style, blind folded is not right. It's wrong.

Everyone must be able to express in their own way, without any binding. That is what free thinking means, without basing upon any philosophies. Also, the habit of people of categorising themselves causes strain in their thinking, which must not be the case. 

In case of fighting, we all human being has 2 hands and 2 legs, which makes only one way of fighting and no other. He believed that in fighting, you must be able to use every part of your body efficiently as you can and not be restricted by rules or fancy styles. For example, trying to fight like a monkey is not efficient, because you are not monkey, and you have your own way. Once, in his interview he commented that, he could make complicated moves and hit but those complicated moves have not significance in the effect of punch but instead may affect the total potential of punch.

So, what I am tyring to say is that, forget all the rules and regulation and think from the scratch and be free. If you come to the same conclusion like everyone than so it be, but there is lots of chances that you may come with something new.

Make yourself like a water, free and flowing, taking the shape of any vessel that you are put in. Be free in thinking that is what I meant.

Follow the link for further understanding of Bruce Lee's philosophies.

Basic Hypotheses Testing

Hypothesis testing is one of the most interesting and important statistical tools useful in suggesting a decision or coming to a conclusion about an experiment. Where is it used? Its used in all kinds of fields ranging from scientific studies to economic analyses to business and so on. So what is this testing any way?

Skipping the theory which you will find in any books or Internet sites. I will try explaining from the point of view of a problem rather than a solution. That way, the understanding becomes more better - at least that is what I think. OK, so lets start it.

Lets suppose, one analytics purposed a marketing campaign that is going to increase the sales to 300 per months in each outlet. After the campaign was carried out the result was as follow:

Number of outlet surveyed (n) = 50
Average number of sales (X) = 295 
Standard Deviation (S.D) = 20

Now, how would you react to the analyst. Was he correct or was he wrong? Hmmm. That's funny, because his hypothesis was 300 but the average was 295 very close but not 300

So, there are two options, either he is correct or he is wrong. But, in statistics and in these kind of calculation, it is very difficult to prove either, therefore, we decide in terms of by how much percentage was he right and by how much percentage he was wrong. We call this the significance level.

There are many types of testing depending upon the characteristics of variable and the hypotheses. For example, checking the hypotheses of proportion, difference between means, goodness of fit etc.

So, do you get the problem. Its the testing of the hypothesis and giving a verdict. In the above example, the hypothesis looks very close but its significance level must be check to verify that the difference in the mean, is insignificance to prove the analyst was right.

Now, follow the following testing method.

1. State the Null and Alternative hypothesis

In this example our:
                      Null Hypothesis 
                                       u is equal to  300
                                       u is not equal to 300
  u is the hypothesised mean

      This is also known as two-tailed test.

2. Choose the significance level

Usually the significance level is chosen around 0.01, 0.05, 0.10. The lesser the value the      more less margin for errors.
Lets choose 0.05.

3. Choose the test method

There are many testing methods available in the theory. For this we chose the simple, one sample t-Test. It measures the difference in the observed and hypothesised mean value.

4. Run the test either manually or in software

Lets calculate:

Standard Error (S.E) = S.D / sqrt(n) = 2.83
DF (degree of freedom) =  n-1 = 49
t = (X-u) /S.E = 1.77 where X is the observed mean.

5. Calculate the P-Value

Now we calculate the P-value. You can use t-Distribution calculator or a table to find the P value. P-Value is the probability that the t-score having a degree of freedom is less than -1.77 and greater then1.77. This choice is due to the fact that we are doing two-tailed test.

P(t<-1.77) = 0.04 and P(t>1.77) = 0.04. 
Thus P value = 0.04 + 0.04 = 0.08

6. Interpret the result  

Since the P-value is greater than our significance level, we cannot reject the Null Hypothesis.

So the verdict is that the Analyst was correct within the significance of .08.

There are two kind of error that we can happen in this process. 

First, we conclude that hypothesis is right when it is wrong. 
Second , we conclude that hypothesis is wrong when it is right.

The above error has been named as I and II. 

Well, the above analysis was not mean to teach how to carry a hypothesis test but to clarify the point of carrying out the test in the very first place. 

All the other statistic test are used so that we have a quantitative view of the problem and make decision depending upon the empirical value, as this gives us more confidence in our decision. But, as always I say, we need to utilize more than just numbers but also intuition.

Lastly, what would it mean to test the hypotheses on zero level of confidence. It would mean that the value of observed mean must be exactly equal to the hypothesised value for the hypothesis to be correct and there is no other option. It is an qualitative analysis.
Hope it was helpful.

Nov 25, 2008

Complexity and Simplicity

In an analytics field or any other field, there is always a problem. A particular thing that you want to achieve. May be you want your sales to rise by 50% or exactly 2.5% or you may want to build a house on an ocean! Daunting as it looks, but that is the nature of the problem. 

How do you go on solving it? I mean any problem or analysis? How do you do it? Obviously, you have to design a solution. I take solution as a design that has the potential to solve a problem. When you start designing for a solution, you start by constructing a structure of a foundation on which you are going to build your reasoning. Your whole reasoning will be based upon that foundation, that structure. If it is stable then your analysis or the whole system is stable otherwise, it will fall and that is inevitable.

Also, remember that the solution that you provide is going to be valid only for that particular problem and not other, because as soon as the situation changes so does the validity of you solution. Remember it.

It was the other night that I had a discussion with my friend on the complexity and the simplicity of a design or a solution. We both came to a conclusion that a particular design is the special purpose system and NOT general purpose. 

Also, in case of the level of the complexity and simplicity. The more complex is the system the more rigid it becomes and won't be able to handle the small disturbance. But when the system is simple then, it has a capability to grow and has more flexibility in it. 

But, this is not a proof that the solution must always be simple, what is the conclusion is that the analyst or a designer must have this in his mind that complexity has an inbuilt error of approximation which must be considered. The complex mathematical solutions are based mainly upon the mathematical reality which may not be always true which imposing it in the physical model.



Nov 24, 2008

Rolling Stone Vs Stationary Stone

I am a rolling stone. I roll around round and round in search for answer and in order to satisfy my inner needs, in order to find a way, that is best for me. But, as the old proverb goes, Rolling Stone never gathers any mosses. Yes, it doesn't. 

Then I look at all the other who have. I see no reason for it. Then I tell to myself, Rolling stone may not gather any mosses but at the end the rolling stone will be something beyond perfection. It may not gather any mosses but it will  transform itself to something else. I believe I will be efficient. 

That transformation is what I am searching for and not attachment to anything else. Just a free transformation like energy like everything in this universe. A living organism. Yes, a transformation.  

Intellectual Value

My friend Bhupendra Khanal has put this very important post regarding the Intellectual Value.  The main point of the post is that the Internet has degraded the quality of Information. Further, it has put down the real serious researcher and has increased plagiarism and so forth.

And I say, yes they have. But I want to argue on the basis of fact that its not due to the Internet alone but the changing society, which in itself is again affected by the Internet and other technology. 

The information plays a key role to the changing society. It happened when radio and television was invented and it will happen with the Internet. How often do you hear a quality news these days in the news channel?.......Was it easy to answer or you had to think on it? Rarely, isn't it? 

So you think Internet had to do with it? Whole Media is filled with JUNK. It has been like this for a very long time. The owner of the media has always been using it for his purpose and very rarely we find an unbiased story. The propaganda and every issue has been twisted for their own purpose. And it is very hard to go against them. Media has been influencing the societies state of mind for a very long time even before Internet. At least now, we have the right to go against them and say. "YOU ARE WRONG." At least now, we are all playing on the same level. Be it, among all the other junk that exist.

Further, let me clarify one other point. If there are two articles on the same topic one from a PhD graduate and other from a normal blogger. Which one do you prefer? PhD isn't it. So was that decision being affect by the media? I think its very normal to always believe in higher standard tagged by a degree. I am not against it, but I thing we have a right to argue on it no matter how long the research took. We still have the right to question and it being answered by the higher authority.

It is a long battle that started since the very evolution of mankind. Laymen versus the Intellectual Group. Remember Galileo who said earth is not the center? How was his life being tortured by the higher community because he went against and that he had no credential to speak against. Remember Einstein, he had to struggle with the whole physics intellectual department to prove that his findings were of importants. Remember, Faraday a book binder who found the connection between the electricity and magnetism. And also, the Mendel whose findings were accepted only after his death. 

May be there are so many of them that have still been under the desk, suppressed to prevent the doubt against the higher society who supposedly are responsible for the creation of the information.  But , I am not saying that the higher society has not created anything, they have actually contributed a lot to the society. But, on the other hand, they have always been resistance towards the findings coming from a minority. Its a long long battle. 

For example, if I discover something in near future how am I going to justify it. I can't. I can only explain my findings and present them logically. But it might take years for other to accept it. But if it comes from the group of intellectuals then it will be  imposed and very few could argue against it. 

Therefore, I completely respect the information's value and its standard. Every one today can at least question and put his own thoughts on it. Today, we can peek into the minds of other people's from other society making us more connected. Today, our perspective has changed and its due to the free information. I would be so happy if all the researchers and professors start writing their own blog or knol so that everyone can understand it better. Instead of making their work accessible to few individual. 

Lastly, information in itself is a living entity which exists as per its important. Anything junk will remain junk and will perish slowly. It's an age of Internet,  so lets work together to make it better.

Today's world is governed by the technology created in University by researchers. We have the right to understand it and duty to express it. Imagine, every professors and researcher over the whole world sharing their findings, won't the education in it self be FREE. One such example is the MIT's open course

"With great powers and understanding comes great responsibility." 

If you think there is something wrong going on and you feel you can stop, ACT now.


Nov 22, 2008

RGB of Marketing Analytics

Your Analysis Report is your product, just like any other kinds of commodities that is available in the market. So, you can actually sell it, but how are you going to market it? Its a big question that I have been asking my selves for some time. How in Gods name would you do it?

In search of answer, I have been going through some Marketing Books, then I stumbled upon one such book called "Everything They Have told you About Marketing is Wrong" by Ron Shevlin

Well, it is a very unconventional Books, hitting at the target rather than beating around the bushes. Its starts with debunking all the rules starting with 4P's of marketing, replacing it with the 3P's.

I have my own thought and I call it RGBTM of Marketing. Why RGB? Well RGB is fundamental of any colors, so obviously RGB of Marketing means fundamentals and not complexity.

Well, so let me put down the things that I have finally thought off, on how to start on Marketing your analysis or go on making one. Let me go step by step :

1. Forget about it.

Forget the fact that you have to sell or forget the fact that you are making a product in the very first place. Why? Trust me.

2. Concentrate on the problem. 

Try to figure out the problem that you are solving from top to bottom. And solve it purely for the sake of solving it.

3. Respect it.

Once your product solves the problem, respect it and wrap it properly but don't over do it. Keep it simple.

4. Remember it.

Remember it. Remember What?? Good, so you have forgotten it. Well, remember the very thing that I told to forget in the first place. :-) Remember that you need to demonstrate your product in the right market. For that, learn the language people understand. 

5. Give your best.

No comment on this step. It is a must for any marketer or anyone. 

So, how did I go there? Yup, it sounds very very simple, but actually it isn't. There are human factor and other unknown variables which I haven't considered. I don't know if all that is even necessary? Because ultimately, your product speaks more that you. If it can't perform then its no use. So PISS, meaning Put It Simple you Stupid.

So much for my own rules and regulation. If it was so EASY then everyone would have sold by now.

Well, you happen to sell things by following the above rules, then don't forget to mention me and Offcourse, I would be glad if you dropped some part of it in my account. :-) U know what I mean.

By the way, I wasn't marketing the above book so don't shout at me, you don't have to read it. 

Nov 21, 2008

Information 2.0

The age is of that of information. No Doubt. But, with increasing size of data and the redundancy, it is not difficult to get lost into the vastness of the data ocean. How often have you experienced that you sit in front of your computer to search on a specific topic and find yourselves wandering for hours, losing your focus. Very often isn't it.

Google made it easy to find the data, making the search engine efficient. Today, people have become use to using search engine. Somehow, we all have become lazy to even remember the site's name, cause we have confidence in  search engine. So what is the NEXT step in this information revolution. Analysis?? 

Well, I am not trying to brag about my profession, but surely when the size of information is so large, you need some one to analyse them and ...and what?? And report them to the end user. 

Ultimately, it is the defined information that we need. Thus, it is not surprising to find many reporting tools available here and there. But very often do you find a Web Reporting Portal and that also based upon the Open Software Philosophy. One such tool that I have come across is OpenI.

Talking about reporting tools, they are based upon different database system or architecture, depending upon the nature and size of data. The common database system are MySQL, Oracle, PostGreSQL, SQL etc. But, reporting is done not directly from the Database server, there is a need for few layers between them so that the data is processed to the required form. 

In OpenI, this layer is the OLAP and RDBMS deployed on any J2EE server. Technically, OpenI is the integration of a powerful java based tools like JPivot, JFree Chart, Mondrian and Jasper Report.    

It provides an easy way of connecting a processed data in the form of OLAP to the end-user in form of  Graphs and Charts with the click of button and query processor. Its fun to use it, especially when it is web based. Soon information processor will be flooding the market today and this is inevitable. 

With the demand of Intelligent Data in the market, it won't be wrong to tell that this is the age of Information 2.0 as my friend, here has mentioned.    

Beauty Of Knowing.

As a child or as a conscious being, we all look in amazement towards the life that we live. With questions filling our thoughts, we try to solve each one. Be it a scientist or an ardent thinker or any normal being. We all are conscious of the fact that we exist and that it is only for some time.

Within these time, we struggle to understand ourselves and the world around us. We try hard to unravel each mysteries. But as we go along, few stop asking and few stop ignoring and few just become silent. 

But during all these questioning and answering, we can't ignore the fact that knowing brings a satisfaction to us. Knowing things makes us feel happy and it makes you feel good because suddenly you are out of the state of confusion. This we call an enlightenment. 

That instant when we discover something, for that very few seconds of our life, we become so clear in our thoughts. We feel blessed. Yes, we do and such is the beauty of knowing which can never be replaced.


Nov 19, 2008

Add as a Friend

Social Structure is based upon the connection between people enhanced by the technology around them. Starting from saying "hi" to the never ending relationships between people is amazing. Once the connection is made, it takes ages for it to break or may be infinity. 

Being in a group and having a friend circle is a unique features of Human being that is studied in any sociology text books. The happening or making of a group is quite interesting to look at. Don't criticize me for being OVER scientific but mathematicians have some how managed to come with certain approximate ways of defining the social group in numbers.

With the rising age of Online Networking groups available, it has become more easier to analyse the group and make a suggestion on them,NUMERICALLY. Also, the rising of the REALITY SHOWS like BIGG BOSS and others, it has given a social analyst a platform to analyse the social bonding and a structure.
Let me narrate a short observation that was made in Discovery Channel.

It was about a child psychology. A small girl about one or two years old shopping was shopping in a mall, when she encounters another girl of same age. They stare for a moment but do not talk to each other. 

Soon they get accustomed to each others existence and they start to mingle with each other. After a while they are holding their hands and running around the shopping complex as if they were meant to be with each other. 

After a while another girl of the same age enters into the scene with her mother. Suddenly, the previous girls encounter the new girl. They all stare at each other but do not respond. The previous girls hold each other hands and runs away playfully. The new girl then clutches her mothers hand and stares at them running away as if being left out. This small social experiment was actually carried out by the Discovery Channel. This small and unconscious behavior that was being observed says a lot about us. 

These unconscious behaviour is natural not only in child but also in all the human being. It is the foundation on which social structure are constructed.

Finally, what matter is the bond and the strength of the social structure. Today, many online networks are present, we still have a time to check its strength and weakness.

Working with Regression Analysis

When every variables are ready to be put into some software machines to be analysed, and the formula for the relations to be spit out, it is always tempting to believe that the answer is correct. But it isn't? We need human intelligence and analyses to come to the right result. 

Similar things happen when you are making linear models through regressions. As, I have written earlier, the choice of variable is very important. So how do we insure that the resultant choices made are justified. It is always easy to come to result but harder to justify. 

Lets look at some examples:

The above graph is the graph of a Revolving Value (Revolving Balance). This is the variable very important for the Bankers. Therefore, it will be interesting for them to know before hand the value of the Revolving Balance. 

There are many variables to be included for the analysis and modeling. By following the rules and methods mentioned in my previous blogs, I have constructed an initial model. Using that model, I have predicted the values. lets look at it below.

Remarkable!! The predicted lines, which is the red one, seems to follow the real line. But, this analysis in itself is insufficient to prove the validity of the model. We need to look at it sown the line further. Just for the case, the R-square Value for this model was around 0.98!

Graph of Error Versus the Revolving Value

Here, we are trying to look at the relationship of error and the real value. But why?? 

The error is by nature defined to be RANDOM and having zero correlation with any variable. Therefore, the error must not be predictable. In this figure, we see that the error is vibrating between the zero value with higher bias to the negative side. Hence, we have a line drawn to best fit the plots. Ideally, this line must pass through the zero line. The line seems to have deviated to -30 below. This must be reduced to have a better result. This analysis of error with time is necessary, as error are RANDOM.

Graph of Error Versus Time

This above graph is also almost the same as with the previous one, except the error is related to the real value. It seems the model was not able to predict the higher values with accuracy. Something must be done here. But What?? May be reduce the variables or validate the raw data to conform that there is no error in the data. Also, we can ignore this particular value.But BE CAREFUL, it might have a significance.

These kinds of analysis speaks about the model that you are building. 

Just to show that there are many ways to analyses the error, I am posting another graph.

OK, don't worry about the figure. Actually its quite simple. The linear graph with the time has been folded to make the a circle.

In idealistic situation, the plot should be a perfect circle at Zero line. In good and realistic model, the figure must still be circle but with certain constant thickness. In this case it is neither.

Last thing, there is one important graph to check the randomness of your error and it is done by plotting the histogram of the error. The plot must match the Normal Distribution Curve.

The curve above shows that the mean of the error is ZERO, which is good. But the biasness to the negative side has to be taken care of.

Therefore, in this way we can analyse the model being constructed and keep tinker until a better one is obtained.

Nov 18, 2008

Information Theory

This is a very highly theoretical problem. What do you call an "Information"? To know that you have been informed. What does that mean? Analyst are always tyring to find the information from a mathematical functions and applying a probabilistic approach, from the trend or clusters of data.

But, what is information and how can you measure the value of the information. Just recently I had posted a article on modeling where, I talked about the Information Value. I might have skipped by saying that it has a mathematically rigorous process and needs in depth understanding of Information Theory. Well, let me try to explain it in as many few words as possible.

Information is something you don't know. It is something which you don't expect to happen. For example your future- so if someone tells your future, its an information. Is that confusing??

Actually, its quite simple. If someone tells you something you already know, then its not an information, cause it doesn't have any value, as you already know it. 

Therefore, even in a collection of data and in the correlation process, we try to figure out the best possible sets of variables that have the capability to predict the outcome variables. 

Lets take another example:

Lets say, you know that your schedule for train is at 12pm at night. Cause its in the ticket.

Now if someone says, that your departure time is 12pm....hmmm is that an information.

But, if someone comes to you and says, your train has been rescheduled due to the delay. Then that is an information- an information, that is useful. Anything not useful is not an information. (From what I have explained)

Mathematically, Information is inversely proportional to the probability of the occurrence of an event.

If p(E) is zero than it contains INFINITE information.
if p(E) is 1, which is the largest, then it contains ZERO information.

I hope that was simple enough.

Remember that the information that I am talking about is not the collection of a data. It is something that has value in terms of its existence.

Nov 17, 2008

Variable Analysis

During the analysis of the variables and their relationship with the outcome variable, we can choose Bi variate or Multivariate. Both have their own advantages and disadvantages. 

Bi variate considers the effect of a single variable on the outcome variable, thus ignoring the effect of the other. Depending upon the correlation factor the variables can be either chosen for further analysis or else rejected. But, rejecting a variable just on the bases of correlation factor is not so wise decision. 

On the other hand, multivariate analysis checks the relationship between the outcome variable and all the other independent variables. From this type of analysis we get a more clear picture on how all the variables are affecting the outcome or dependent variables. The scenario in this case can be described as multi dimensional, cause there are more than 2/3 variables.

In the process of modeling, it becomes a very important for us to select the right variables so that the model can predict with high accuracy. In such case, running both the analysis step by step will reveal the more appropriate choices.

But, the choices must not be always made on the basis of the numerical values or percentages, sometime , it is be more wise to include a variable which may seem to have effect in the future or in the different conditions even though it may have low score.

Methods of choosing the variables

There are many statistical test available in order to choose the relevant variable. Few important ones are as follows:

3. Information Value

1. Chi-squared

The above method gives the correlation between the predictive variables and the log of the odds of a bad outcome. This allows to measure the predictive power of the variable, meaning - how important can this variable be for building a predictive model for the outcome variable.

2. Spearman correlation

The above method gives the correlation  between the ranking of the predictive variables and the outcome variables and not the real values of the variable. In this analysis, the relationship doesn't have to be linear, it just has to be proportional either in negative or positive sense.

3. Information Value

The Information Value is the most interesting statistical process because it measures the amount of information that a variable can give while designing model. It is measured in basis of the deviation of the values within the variables. It is based upon the Information Theory. The range of this score is 0-3.

Example :

The above is just an example of how the table might look. From the above table it is clear that the variable 1 has the highest information-value but less Chi-Squared than Variable 2 and negative correlation.

Whereas , the variable 3 has the least information and also the Chi-squared and Spearman Correlation, thus it can be removed from the analysis, UNLESS, you consider that it is of some value from the business perspective.

After the above process is carried out, the variables can be scored a new value depending upon these three scores. Further, a new ranking can be carried out for all the variables by taking this new score into account. This allows a new perspective for choosing a variable.

The whole process can be in form of iterative process because the result all depends upon the sampling algorithm that has been used. Different sample can give different results, thus the process can be lengthy and laborious. 

Reducing Redundancy in variable

Why is it necessary to reduce redundancy in the variable? This has actually many reasons some of them are as follow :
  • Over crowding the model with many variables with no purpose.
  • Reduce the significance of the predicted co-efficient of the parameters.
  • Risk of out fitting the model.
  • Can destabilise the estimates.
  • Also increase the computation time which can be crucial when the data is in millions. 
How can we identify the redundancy?

In order to identify the redundancy, correlation must be carried out among the variables and not with the outcome variables. After that, the variables must be ranked according to their correlation factor. Then, among the clusters, one may be picked by reference to the three scores discussed above.

There are many commercial soft wares available for the above process. The one that comes to my mind is SAS. But other statistical software like MatLab can also carry out the operation with some programming module written to it.

Nov 16, 2008


Just a thought. But what really makes you satisfied? I have been thinking on it for a long time. But yesterday, I watched this animated movie called a Family Guy. There is particular portion in the movie which hit me.

The child character in movie some how meets his future him. After spending some time with him, he starts to hate him. He finds out that he really did not become what he had thought himself to be. He starts questioning his future him, "How did you turn SO DUMB?"

Then after while, he finds out the real reason and decides to go back to the past and rectify it.

At that very instance, it really hit me hard and I started to think : 

What I wanted to be and how I wanted to be and if the path I was walking going to get me there?

I then went to think about my childhood and thought what if that me, back in child, meet me today. Would he be happy? I smiled a while and thought yes he would be.

So there I tapped on my back and said to myself, may be satisfaction is something you get when you find that you had nothing to regret about.

Yes, no regret. Even if you made a mistake in your life and learnt from it then its great. 

There you go. Try figuring out your childhood dream and see if you some how got it. May be just may be this is how you can see if you are in the right track or not. 


Any information, in a form of numbers, has to be first understood in terms of its physical nature. 

For example, let say : 40 degree Celsius.  What does it mean? I mean is it cold or hot? You will be amazed that it is really difficult to categories temperature in terms of HOT or COLD, because we human beings are not so critical on there thought process, we are in a sense FUZZY about it.

But the computer program is not a human being. If a program was written to give you the answer to the above question than the program would either respond by saying its HOT or COLD, by the criteria written by the programmer.


If (temperature > 50 ) then
        PRINT "HOT"

Fair enough. These kinds of code are written every time to categories the variables or to make a Decision making System.  The above method has its own merits and demerits. Its a classical way of analysing  a data. 

Now allow me to introduce you to the other way of analysing a data. It is not really a new method, but is gaining popularity in being implemented into different systems. It is called Fuzzy Logic. Do not be confused by the name. When you understand the method, you will find that its a simple statistical scoring method, or simple percentile concept. The process off course is known as FUZZIFICATION.

Lets apply it to the above example.

Now, instead of categorising the quantity as HOT or COLD. We start by defining a completely new set of attributes to the quantity. In this example I would prefer :  

{Very COLD, COLD, Less COLD, Neutral, less HOT, HOT, Very HOT}

In this way, you can categories any way you like it, but be careful, your choice of categories is going affect the overall decision system, so BE CAREFUL.

So what do you do, now? Well, you start by scoring each category by checking the quantity. I am going to use the upper category as it is simpler.


So for 50 degree Celsius :
COLD     =  25%
COOL     =  30%
WARM   =  60%
HOT       =  30%

This scoring has to be done in a very precise way and not haphazardly. What these percentage mean is that 50 degree Celsius is more of a WARM than a COLD or HOT.  So instead of saying HOT or COLD, we have described the temperature in terms of four different attributes.

So for 100 degree Celsius :
COLD     =  0%
COOL     =  0%
WARM   =  25%
HOT       =  100%

Here, 100 degree Celsius is more HOT than COLD, so COLD and COOL is given a score of 0%.

The score can be a number between 0-1 or 0%-100%, as you like it.

If you look into the above graph, you can see that you have 4 curves for each category.

Each curve has its own characteristics. Its known as Membership Function. I have used non-linear curve, but for simplicity even straight lines can be used. More abstract you make the curves, the more complicated is your calculation going to be.

So when you are scoring the variables, you just need to look over the X-axis and check the point and then move vertically upwards and start noting the values of each category. I have drawn three dotted red line to show you the example. lets start from the left wards.

The first line, which is less than 50, shows a high score for Cold and Cool but zero for other. The second line, which is around 50 has the score value for all the category but is very less. The last line, which is around 100, has a high score value for the Warm and Hot but not for other.

In this way, all the quantities can be scored into different values. 

What is the advantage of going through such a long method? We have increased the number of variables to analyses the single variable. In this way, we can look at different attributes of the single variable and than make a decision, in this way our decision is going to be more valuable than before, and also more appropriate.

The whole process of generating categories and scoring the variable is called the FUZZIFICATION process. It is a Statistical Process, which can be realised in any computer program.

This is the first method in the whole process of making a decision system. Others are:

1.   Designing Rule-Based System
2.   Defuzzification Process

During the fuzzification process, we could have many types of membership functions, each having their own advantages over other.

But in a very simple words, it is the method of getting a good information on the variable, so that we get a clear picture of it.

The day I dreamt of becoming someone

Reading a story is like pealing an onion, each chapters reveal new secrets and show their hidden layers. Everyone has their own story to...