Author Topic: Statistics Question  (Read 7113 times)

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« on: May 25, 2005, 06:49:30 PM »
Been working on the mutations probability screen, and I need this information to continue:

There is an Event A

Probability of A = (B or C or D or E)

I know how to modify A if one of B,C,D, or E changes.

I want to know how to modify B,C,D, and E if A changes.  That is, if I want A to be 1/5 of it's original value, how should I change B,C,D and E?

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Statistics Question
« Reply #1 on: May 25, 2005, 06:52:08 PM »
That is not statistics, that is basic arithmetic :)

You need to divide B, C, D and E by 5 as well.

I suspect I did not understand your question.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #2 on: May 25, 2005, 08:13:46 PM »
Technically it's a probability question.

Remember that Pr(A or B ) = Pr(A) + Pr(B ) - Pr(A * B )

So just dividing everything by 5 won't work.
« Last Edit: May 25, 2005, 08:14:06 PM by Numsgil »

Offline AZPaul

  • Bot Builder
  • **
  • Posts: 76
    • View Profile
Statistics Question
« Reply #3 on: May 25, 2005, 09:21:44 PM »
Quote
I want to know how to modify B,C,D, and E if A changes. That is, if I want A to be 1/5 of it's original value, how should I change B,C,D and E?

I'm assuming B,C,D,E are probabilities of other distinct events.

It is the "or" between B,C,D,E that is the problem. If A changes then there are B*C*D*E solutions to solve for A. What are the constraints on the probabilities of B,C,D,E?   Is B reasonably +- 10% of present value while C can go from 0 to 100?

Good luck on this one!

-P

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #4 on: May 25, 2005, 09:24:47 PM »
B,C,D and E are all user defined, ranging basically from 100% to 0%.  Becauset they are independant, B + C + D + E do not have to add up to be 100%, or 0% or 300% or any other value.

I would like to scale B,C,D and E such that the beginning and end ratios are all the same.  That is, B/C will be the same before and after.

The more I look at the problem, the more I see that this will probably need some thought.
« Last Edit: May 25, 2005, 09:25:55 PM by Numsgil »

Offline AZPaul

  • Bot Builder
  • **
  • Posts: 76
    • View Profile
Statistics Question
« Reply #5 on: May 25, 2005, 10:20:05 PM »
Quote
B,C,D and E are all user defined, ranging basically from 100% to 0%. Becauset they are independant, B + C + D + E do not have to add up to be 100%, or 0% or 300% or any other value.

So, OK, something is amiss because right now I have:

All values are probabilities between 0-100%

Your original relationship is:

Prob A = (prob B ) or (prob C) or (prob D) or (prob E)

A, then, is the value of the highest prob in the list.

Which means that if B=20, C=10, D=50 and E=5 then A=50%, the value of D, yes?

Now you want A = 10%? Yet maintain the ratio balance between BCDE?

So the highest prob (D) must come down to 10%. The ratios with the others is just basic math. Reduce A (and thus D) to 1/5, reduce all values to 1/5. Divide everything by 5. A=10% thus B=4, C=2, D=10, E=1%

I'll take the shvarz route here and say that something is missing and I do not understand the question.
« Last Edit: May 25, 2005, 10:20:44 PM by AZPaul »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #6 on: May 25, 2005, 10:38:28 PM »
Remember that Pr(A or B ) = Pr(A) + Pr(B ) - Pr( A)  * Pr( B )

If you just pick the largest, you're assuming that all the other probabilities are inside each other.

That is, you're assuming if Pr(B )  > Pr( C ), then B implies (->) C, which just isn't true.
« Last Edit: May 25, 2005, 10:39:21 PM by Numsgil »

Offline AZPaul

  • Bot Builder
  • **
  • Posts: 76
    • View Profile
Statistics Question
« Reply #7 on: May 26, 2005, 12:20:16 AM »
Quote
Probability of A = (B or C or D or E)

The original relationship is not as above?

What then is the  formula?

If indeed the above is correct then 'A' MUST equal the highest value in the list.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #8 on: May 26, 2005, 12:49:44 AM »
A by definition is defined as B or C or D or E.

A
= B or C or D or E
= (B or C) or (D or E)
= (B + C - B*C) or (D + E - D * E)
= (B + C -B*C) + (D + E - D*E) - (D + E - D*E)*(B + C -B*C)
= B + C -B*C + D + E - D*E -DB -DC -DBC - EB -EC- EBC +DEB +DEC -DEBC
= B + C + D + E - BC - DE -BD -CD -BCD - BE - CE - BCE + BDE + CDE - BCDE
= B + C + D + E - BC - BD - BE - CD - CE - DE - BCD - BCE + BDE + CDE - BCDE
(check over my math)

that's when my mind explodes.

Also, I'd like the general case to the problem, not just this specific case with 4 variables.  That is, I want A = OR (I = 1 to N) Xi, where N is variable and OR is the or of the whole sequence.

I know that it kind of follows a binomial coefficients distribution (for 4, 1 0*letter, 4 letter, 6 letter*letter, 4 letter*letter*letter, 1 letter*letter*letter*letter), which could be useful.

I know in the end you'll get N equations, N unknowns, and probably have to solve it with matrices).
« Last Edit: May 26, 2005, 12:59:30 AM by Numsgil »

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Statistics Question
« Reply #9 on: May 26, 2005, 01:17:27 AM »
Is this the mutation frequency calculation I was asking about?  It actually would help if you described exactly the problem you are trying to solve.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #10 on: May 26, 2005, 01:23:18 AM »
Yeah shvarz, this is the flip side.

Say you want the chance of a mutated youngster to be 1 in 36.  I want you to be able to enter 1 in 36 and have the program scale all the mutation probabilities for you automatically.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Statistics Question
« Reply #11 on: May 26, 2005, 01:24:00 AM »
What I would need to know is whether B, C, D and E are independent events or if they are exclusive.  The solution will be different in the following two scenarios:
1) You throw a dice once and look at the number that comes out.  B is 1, C is 2, D is 3 and E is 4.
2) You throw a dice 4 times and B is "6 comes out on first try", C is "6 comes out on second try", D is "6 comes out on third try" and E is "6 comes out on fourth try".

See what I mean?  We need to know what are events B, C, D, E.

I am also assuming that your event A is probability that any of these four events would happen.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #12 on: May 26, 2005, 01:28:55 AM »
They are independant.  If they were exlcusive, you're original answer would work.

I'm solving the case for A = B or C right now.  Maybe I can see a pattern.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Statistics Question
« Reply #13 on: May 26, 2005, 01:31:26 AM »
Oh, OK.

Here is an easier way to do that:

Say I type in "1 in 36".  This gives you the frequency of any mutation happening in the offspring: 1/36.

Then you have the ratios between different kinds of mutations.  These are either set up by the user or are at some default ratios.  Say Insertion:Deletion:Substitution is 1:1:3
Then you take the total of these ratios: 1+1+3 and normalize all frequencies to that.  
So the frequncy of Insertion is 1/36 x 1/5= 1/180
Deletion is the same: 1/180
Substitution is 1/36 x 3/5=1/60

Would that work?
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Statistics Question
« Reply #14 on: May 26, 2005, 01:39:20 AM »
No, that won't work.

In this case, the difference is .0002155 (or roughly 1 in 4639).  Very tiny and quite negligable.

However, as you increase the number of mutation fields you're editing this will increase.  Once you hit 16 it becomes quite noticable.

Here's the solution for a 2 mutation field case:
Ai = initial A
Bi = initial B
Af = final A, what we're trying to find, = Ai*Bf/Bi
Bf = final B, what we're trying to find

Bf = (Ai+Bi)/Bi +- sqrt( ((Ai+Bi)/Bi) ^2 - 4(Ai/Bi)(New global probability))
--------------------------DIVIDED BY----------------------------------------------
2(Ai/Bi)