General > Off Topic
Statistics Question
AZPaul:
--- Quote ---B,C,D and E are all user defined, ranging basically from 100% to 0%. Becauset they are independant, B + C + D + E do not have to add up to be 100%, or 0% or 300% or any other value.
--- End quote ---
So, OK, something is amiss because right now I have:
All values are probabilities between 0-100%
Your original relationship is:
Prob A = (prob B ) or (prob C) or (prob D) or (prob E)
A, then, is the value of the highest prob in the list.
Which means that if B=20, C=10, D=50 and E=5 then A=50%, the value of D, yes?
Now you want A = 10%? Yet maintain the ratio balance between BCDE?
So the highest prob (D) must come down to 10%. The ratios with the others is just basic math. Reduce A (and thus D) to 1/5, reduce all values to 1/5. Divide everything by 5. A=10% thus B=4, C=2, D=10, E=1%
I'll take the shvarz route here and say that something is missing and I do not understand the question.
Numsgil:
Remember that Pr(A or B ) = Pr(A) + Pr(B ) - Pr( A) * Pr( B )
If you just pick the largest, you're assuming that all the other probabilities are inside each other.
That is, you're assuming if Pr(B ) > Pr( C ), then B implies (->) C, which just isn't true.
AZPaul:
--- Quote ---Probability of A = (B or C or D or E)
--- End quote ---
The original relationship is not as above?
What then is the formula?
If indeed the above is correct then 'A' MUST equal the highest value in the list.
Numsgil:
A by definition is defined as B or C or D or E.
A
= B or C or D or E
= (B or C) or (D or E)
= (B + C - B*C) or (D + E - D * E)
= (B + C -B*C) + (D + E - D*E) - (D + E - D*E)*(B + C -B*C)
= B + C -B*C + D + E - D*E -DB -DC -DBC - EB -EC- EBC +DEB +DEC -DEBC
= B + C + D + E - BC - DE -BD -CD -BCD - BE - CE - BCE + BDE + CDE - BCDE
= B + C + D + E - BC - BD - BE - CD - CE - DE - BCD - BCE + BDE + CDE - BCDE
(check over my math)
that's when my mind explodes.
Also, I'd like the general case to the problem, not just this specific case with 4 variables. That is, I want A = OR (I = 1 to N) Xi, where N is variable and OR is the or of the whole sequence.
I know that it kind of follows a binomial coefficients distribution (for 4, 1 0*letter, 4 letter, 6 letter*letter, 4 letter*letter*letter, 1 letter*letter*letter*letter), which could be useful.
I know in the end you'll get N equations, N unknowns, and probably have to solve it with matrices).
shvarz:
Is this the mutation frequency calculation I was asking about? It actually would help if you described exactly the problem you are trying to solve.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version