Darwinbots Forum
General => Off Topic => Topic started by: Numsgil on May 25, 2005, 06:49:30 PM
-
Been working on the mutations probability screen, and I need this information to continue:
There is an Event A
Probability of A = (B or C or D or E)
I know how to modify A if one of B,C,D, or E changes.
I want to know how to modify B,C,D, and E if A changes. That is, if I want A to be 1/5 of it's original value, how should I change B,C,D and E?
-
That is not statistics, that is basic arithmetic :)
You need to divide B, C, D and E by 5 as well.
I suspect I did not understand your question.
-
Technically it's a probability question.
Remember that Pr(A or B ) = Pr(A) + Pr(B ) - Pr(A * B )
So just dividing everything by 5 won't work.
-
I want to know how to modify B,C,D, and E if A changes. That is, if I want A to be 1/5 of it's original value, how should I change B,C,D and E?
I'm assuming B,C,D,E are probabilities of other distinct events.
It is the "or" between B,C,D,E that is the problem. If A changes then there are B*C*D*E solutions to solve for A. What are the constraints on the probabilities of B,C,D,E? Is B reasonably +- 10% of present value while C can go from 0 to 100?
Good luck on this one!
-P
-
B,C,D and E are all user defined, ranging basically from 100% to 0%. Becauset they are independant, B + C + D + E do not have to add up to be 100%, or 0% or 300% or any other value.
I would like to scale B,C,D and E such that the beginning and end ratios are all the same. That is, B/C will be the same before and after.
The more I look at the problem, the more I see that this will probably need some thought.
-
B,C,D and E are all user defined, ranging basically from 100% to 0%. Becauset they are independant, B + C + D + E do not have to add up to be 100%, or 0% or 300% or any other value.
So, OK, something is amiss because right now I have:
All values are probabilities between 0-100%
Your original relationship is:
Prob A = (prob B ) or (prob C) or (prob D) or (prob E)
A, then, is the value of the highest prob in the list.
Which means that if B=20, C=10, D=50 and E=5 then A=50%, the value of D, yes?
Now you want A = 10%? Yet maintain the ratio balance between BCDE?
So the highest prob (D) must come down to 10%. The ratios with the others is just basic math. Reduce A (and thus D) to 1/5, reduce all values to 1/5. Divide everything by 5. A=10% thus B=4, C=2, D=10, E=1%
I'll take the shvarz route here and say that something is missing and I do not understand the question.
-
Remember that Pr(A or B ) = Pr(A) + Pr(B ) - Pr( A) * Pr( B )
If you just pick the largest, you're assuming that all the other probabilities are inside each other.
That is, you're assuming if Pr(B ) > Pr( C ), then B implies (->) C, which just isn't true.
-
Probability of A = (B or C or D or E)
The original relationship is not as above?
What then is the formula?
If indeed the above is correct then 'A' MUST equal the highest value in the list.
-
A by definition is defined as B or C or D or E.
A
= B or C or D or E
= (B or C) or (D or E)
= (B + C - B*C) or (D + E - D * E)
= (B + C -B*C) + (D + E - D*E) - (D + E - D*E)*(B + C -B*C)
= B + C -B*C + D + E - D*E -DB -DC -DBC - EB -EC- EBC +DEB +DEC -DEBC
= B + C + D + E - BC - DE -BD -CD -BCD - BE - CE - BCE + BDE + CDE - BCDE
= B + C + D + E - BC - BD - BE - CD - CE - DE - BCD - BCE + BDE + CDE - BCDE
(check over my math)
that's when my mind explodes.
Also, I'd like the general case to the problem, not just this specific case with 4 variables. That is, I want A = OR (I = 1 to N) Xi, where N is variable and OR is the or of the whole sequence.
I know that it kind of follows a binomial coefficients distribution (for 4, 1 0*letter, 4 letter, 6 letter*letter, 4 letter*letter*letter, 1 letter*letter*letter*letter), which could be useful.
I know in the end you'll get N equations, N unknowns, and probably have to solve it with matrices).
-
Is this the mutation frequency calculation I was asking about? It actually would help if you described exactly the problem you are trying to solve.
-
Yeah shvarz, this is the flip side.
Say you want the chance of a mutated youngster to be 1 in 36. I want you to be able to enter 1 in 36 and have the program scale all the mutation probabilities for you automatically.
-
What I would need to know is whether B, C, D and E are independent events or if they are exclusive. The solution will be different in the following two scenarios:
1) You throw a dice once and look at the number that comes out. B is 1, C is 2, D is 3 and E is 4.
2) You throw a dice 4 times and B is "6 comes out on first try", C is "6 comes out on second try", D is "6 comes out on third try" and E is "6 comes out on fourth try".
See what I mean? We need to know what are events B, C, D, E.
I am also assuming that your event A is probability that any of these four events would happen.
-
They are independant. If they were exlcusive, you're original answer would work.
I'm solving the case for A = B or C right now. Maybe I can see a pattern.
-
Oh, OK.
Here is an easier way to do that:
Say I type in "1 in 36". This gives you the frequency of any mutation happening in the offspring: 1/36.
Then you have the ratios between different kinds of mutations. These are either set up by the user or are at some default ratios. Say Insertion:Deletion:Substitution is 1:1:3
Then you take the total of these ratios: 1+1+3 and normalize all frequencies to that.
So the frequncy of Insertion is 1/36 x 1/5= 1/180
Deletion is the same: 1/180
Substitution is 1/36 x 3/5=1/60
Would that work?
-
No, that won't work.
In this case, the difference is .0002155 (or roughly 1 in 4639). Very tiny and quite negligable.
However, as you increase the number of mutation fields you're editing this will increase. Once you hit 16 it becomes quite noticable.
Here's the solution for a 2 mutation field case:
Ai = initial A
Bi = initial B
Af = final A, what we're trying to find, = Ai*Bf/Bi
Bf = final B, what we're trying to find
Bf = (Ai+Bi)/Bi +- sqrt( ((Ai+Bi)/Bi) ^2 - 4(Ai/Bi)(New global probability))
--------------------------DIVIDED BY----------------------------------------------
2(Ai/Bi)
-
Hmm, I feel dumb, but I don't get it. I don't see why you have to do this. But even if you must, I am not sure you should.
If you do it my way, it is very easy to understand and saves on processing cycles. I think right now the mutation routine scans through every line of DNA code and decides whether it wants to mutate that. This requires all those random numbers to be generated and some calculations. But if you do it my way, then the routine would do a quick single check on how many mutations should offspring get - and you can get that from the "1 in 36" frequency using Poisson distribution. If the number is 0, then the whole DNA is copied in one step, no need to run the mutation routine at all. If the answer is 1, then it checks which mutation this will be and applies it to the DNA. If it is two, then it runs the routine twice. At any realistic mutation frequncy the probability of getting three mutations in the same offspring should be so low, that it can be ignored completely.
-
We could do it that way, but in the end it's just a matter of how the program runs itself, not a User Interface Problem.
That is, you may still want to do minor adjustments in specific mutation areas.
The way it works right now I just do multiply by 1.1 until the chance per offspring reaches a certain value. So as it stands it's not a huge hurdle.
-
A
= B or C or D or E
= (B or C) or (D or E)
= (B + C - B*C) or (D + E - D * E)
= (B + C -B*C) + (D + E - D*E) - (D + E - D*E)*(B + C -B*C)
= B + C -B*C + D + E - D*E -DB -DC -DBC - EB -EC- EBC +DEB +DEC -DEBC
= B + C + D + E - BC - DE -BD -CD -BCD - BE - CE - BCE + BDE + CDE - BCDE
= B + C + D + E - BC - BD - BE - CD - CE - DE - BCD - BCE + BDE + CDE - BCDE
(check over my math)
ick. You're right. The baby's ugly.
I see where you're trying to go. Unfortunately, if I am reading this right, in a general case of n variables composing 'A' there will be n^2 solutions. The scaling requirement may be the key. That puts a bound on the number of solutions. I haven't cracked a matrix mechanics book in decades. I suppose now would be a good time to see if I can even find the &%#@ thing.
I'll try to hunt it down today, but, don't wait up for me.
Good God, the things you get into when you're having fun!
-P
-
Here's my try on the problem.
If B, C, D, E are independent and A=(B||C||D||E), then ~A=~B&~C&~D&~E (note: '~A' means 'not A') and P(A)=1-(1-P(B))(1-P( C))(1-P(D))(1-P(E)). To change P(A) and conserve the relative probabilities of B, C, D, E, you need to solve a quartic equation.
But I think it would be simpler to define a as the expectation value of the number of events. In that case a = P(B)+P( C)+P(D)+P(E) and it's straightforward to rescale the probabilities.