Consistency function (statistical)
Moderator: General Moderators
Consistency function (statistical)
Hi, I've been wanting to make a function that takes an array of numbers and calculates the consistency, but I don't know how. Here's a few examples that will probably best illustrate what I mean.
Example 1:
1
1
1
1
1
The numbers don't change, so the consistency would be 100%.
Example 2:
2
0
2
0
1
These are relatively consistent, since even though they change they are sort of spread out evenly.
Example 3:
2
2
1
0
0
This is even less consistent since the numbers are all on one end of the array.
Example 4:
3
0
0
0
2
Even worse, since there's a big gap in the middle.
Example 5:
5
0
0
0
0
Worst case, consistency 0% since the sum is all in one spot and the rest is empty.
I kept the sums the same throughout just to illustrate the differences, but the sum needs to be able to change. Also, there won't be any negative numbers.
I'm not a statistician so I don't really know how to go about this. Any help and insight would be welcome.
Example 1:
1
1
1
1
1
The numbers don't change, so the consistency would be 100%.
Example 2:
2
0
2
0
1
These are relatively consistent, since even though they change they are sort of spread out evenly.
Example 3:
2
2
1
0
0
This is even less consistent since the numbers are all on one end of the array.
Example 4:
3
0
0
0
2
Even worse, since there's a big gap in the middle.
Example 5:
5
0
0
0
0
Worst case, consistency 0% since the sum is all in one spot and the rest is empty.
I kept the sums the same throughout just to illustrate the differences, but the sum needs to be able to change. Also, there won't be any negative numbers.
I'm not a statistician so I don't really know how to go about this. Any help and insight would be welcome.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Consistency function (statistical)
I sounds like you might want to calculate the difference between the lowest and highest values in the range as a percentage of the total range. But you also may need standard mean, median and mode calcs to make real sense of the numbers.
(#10850)
Re: Consistency function (statistical)
I need to consider the order as well, like the difference between example 2 and 3 shows. Another example would be
1
0
1
0
1
0
1
0
1
0
versus
1
1
1
1
1
0
0
0
0
0
since the first one is fairly consistent as the numbers are all spread out, while the second one the numbers are all in the same area. Think of it as tracking a sales person's or football player's performance. If they're consistent they will not go long without performing but also won't have huge spikes either; if they're inconsistent they will have hot-streaks and cold-streaks.
1
0
1
0
1
0
1
0
1
0
versus
1
1
1
1
1
0
0
0
0
0
since the first one is fairly consistent as the numbers are all spread out, while the second one the numbers are all in the same area. Think of it as tracking a sales person's or football player's performance. If they're consistent they will not go long without performing but also won't have huge spikes either; if they're inconsistent they will have hot-streaks and cold-streaks.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Consistency function (statistical)
Again, you could record all the runs of consecutive numbers and get stats on that as well -- again mean, median, mode of the runs for each number. Post some code and people can give you some help.
(#10850)
Re: Consistency function (statistical)
I don't have any code since I don't know which direction to go in calculating the consistency. I mean, sure I need a function declaration and a loop, but it would be pointless to just post that.
As for the mean, median, mode, etc, what do I do with those? How do I combine them into a relevant reading?
As for the mean, median, mode, etc, what do I do with those? How do I combine them into a relevant reading?
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Consistency function (statistical)
Start with a loop. Loop through your array and sum all the unique values. That's a start.Sarke wrote:a loop, but it would be pointless to just post that.
(#10850)
Re: Consistency function (statistical)
Are you trying to find a statistical predictor of something real and verifiable, for example detecting a stolen credit card by a change in buying patterns? Or are you looking for a formalisation of something that you think is there but aren't sure if it's just a run of good or bad luck, for example "streakiness" in sports competitors? Do you have test data? How is the test data categorised - in the first case you're looking at something like stolen cards vs unstolen cards, but in the second case it's harder to work out how to categorise the data.
A useful stepping stone from intuition to formalisation is graphing the trends in your data. 1, 1, 1: a straight horizontal line; 1, 0, 1, 0: periodic, stays within bounds; 1, 1, 0, 0, 1, 1: also periodic, different frequency; 1, 2, 3, 4, 5... : a straight line going up; 5, 4, 3, 2, 1: another straight line... those are all consistent in their way, but are they Consistent? 5, 0, 0, 4: sags in the middle - apparently Inconsistent
Another thing to graph is the distribution of your data. Use a barchart to keep track of the number of 1's, 2's etc that come up. Does this barchart end up being flat at the top, like the distribution of rolls of a die? Is it a bell curve? Is it symmetric?
A useful stepping stone from intuition to formalisation is graphing the trends in your data. 1, 1, 1: a straight horizontal line; 1, 0, 1, 0: periodic, stays within bounds; 1, 1, 0, 0, 1, 1: also periodic, different frequency; 1, 2, 3, 4, 5... : a straight line going up; 5, 4, 3, 2, 1: another straight line... those are all consistent in their way, but are they Consistent? 5, 0, 0, 4: sags in the middle - apparently Inconsistent
Another thing to graph is the distribution of your data. Use a barchart to keep track of the number of 1's, 2's etc that come up. Does this barchart end up being flat at the top, like the distribution of rolls of a die? Is it a bell curve? Is it symmetric?
Re: Consistency function (statistical)
"Streakiness" would be a good way of describing it. Basically how much does it deviate from the norm. A sports player would be the best example. Are they streaky? Do the go stretches without scoring?dml wrote:Are you trying to find a statistical predictor of something real and verifiable, for example detecting a stolen credit card by a change in buying patterns? Or are you looking for a formalisation of something that you think is there but aren't sure if it's just a run of good or bad luck, for example "streakiness" in sports competitors? Do you have test data? How is the test data categorised - in the first case you're looking at something like stolen cards vs unstolen cards, but in the second case it's harder to work out how to categorise the data.
Take any player and play him for 5 games. Some players will score in each game, some will not score at all, but some you "know" they'll at least score a bit. It's that consistency or certainty I want to try an put a number on. Be it a percentage or a value.
As for the test data, the arrays I posted above are pretty much what it is. Any array of integers from 0 and up.
When I talk about consistency I mean a flat line; it's neither going up or down. Going back to the sports example, a player that keeps improving through the season isn't really that consistent since they're scoring much more at the end of the season than they did at the start. Same with a player who's performance is going down.dml wrote:A useful stepping stone from intuition to formalisation is graphing the trends in your data. 1, 1, 1: a straight horizontal line; 1, 0, 1, 0: periodic, stays within bounds; 1, 1, 0, 0, 1, 1: also periodic, different frequency; 1, 2, 3, 4, 5... : a straight line going up; 5, 4, 3, 2, 1: another straight line... those are all consistent in their way, but are they Consistent? 5, 0, 0, 4: sags in the middle - apparently Inconsistent
Interesting... An inconsistent die would be one that works normally (results are "all over the place" from 1 to 6 with no pattern), while a consistent die would be one that always rolled sixes (for example).dml wrote:Another thing to graph is the distribution of your data. Use a barchart to keep track of the number of 1's, 2's etc that come up. Does this barchart end up being flat at the top, like the distribution of rolls of a die? Is it a bell curve? Is it symmetric?
What I'd like would be to have the function be able to say "this die is 0% consistent since it hit each number twice on 12 rolls" or "this die is 100% consistent since it hit all 3s on each of the 12 rolls". Or any percentage in between, based on the results of course...
Re: Consistency function (statistical)
Streaky players are _more_ predictable in the sense that their performance in a given game can be predicted by their performance in the preceding games. Two players each end up scoring in half the games in the season. If you know that Player A is streaky, you might think it's a good bet to put money at even odds on them scoring in a given game if they scored in the game before. Player B isn't streaky at all - there's a 50% chance of them scoring in this game regardless of their performance in the preceding games - you might as well bet on a coin toss, yet it might be that Player B is regarded as the cool, steady, predictable one.Sarke wrote: Take any player and play him for 5 games. Some players will score in each game, some will not score at all, but some you "know" they'll at least score a bit. It's that consistency or certainty I want to try an put a number on. Be it a percentage or a value.
But if you don't have enough data, Player B can look streaky even if they're not. This player has a 1/2 chance of scoring in a given game, therefore a 1/32 chance of scoring in five consecutive games - entirely within the realm of possibility.
Start with what arborint recommended - the range between the min and the max. As an extension of that, look at a measure like the interquartile range, where you throw away the top 25% and the bottom 25%, and look at the range of the remaining data. Maybe it makes more sense to only throw away the top and bottom 10%. Try it out and see if the result matches your intuition.What I'd like would be to have the function be able to say "this die is 0% consistent since it hit each number twice on 12 rolls" or "this die is 100% consistent since it hit all 3s on each of the 12 rolls". Or any percentage in between, based on the results of course...
Re: Consistency function (statistical)
I think you're going the other way with this; you're talking more about trends than consistency, and while both can be used to predict, it's the latter I'm after. That's one of the reasons I put 5 games as the example, not just one game. If Player A has scored in 5 straight games then anyone would be willing to put money on him doing it in the next as well without knowing much about the player. But if you go a longer span, say 5 or 10 games, then player A would be a high-risk, high-reward bet because he could score in each of the games, or he could start slumping again and not score at all. Player B, on the other hand, would be the safe bet (the "money") because it is highly unlikely that he will go 10 games without scoring.dml wrote: Streaky players are _more_ predictable in the sense that their performance in a given game can be predicted by their performance in the preceding games. Two players each end up scoring in half the games in the season. If you know that Player A is streaky, you might think it's a good bet to put money at even odds on them scoring in a given game if they scored in the game before. Player B isn't streaky at all - there's a 50% chance of them scoring in this game regardless of their performance in the preceding games - you might as well bet on a coin toss, yet it might be that Player B is regarded as the cool, steady, predictable one.
Sure it's possible, but what I'm more interested in is that (if he's consistent and has been scoring at that 1/2 pace all season) he's very likely to score in 2 or 3 of those games. Take the die example; we know that a regular die is 100% consistent so we could bet that on 6 rolls it will hit a 1 at least once. But if we knew the die was not consistent and would roll 5s or 6s most of the time then that bet is not a very good one.dml wrote: But if you don't have enough data, Player B can look streaky even if they're not. This player has a 1/2 chance of scoring in a given game, therefore a 1/32 chance of scoring in five consecutive games - entirely within the realm of possibility.
"interquartile range" eh? Intersting, I'll play around with that, thanks.dml wrote: Start with what arborint recommended - the range between the min and the max. As an extension of that, look at a measure like the interquartile range, where you throw away the top 25% and the bottom 25%, and look at the range of the remaining data. Maybe it makes more sense to only throw away the top and bottom 10%. Try it out and see if the result matches your intuition.