Question for genetic wizzards

Menhir · Feb 16, 2004

Oh - I see

Then you're using a different algorithm for doing the Memo?

I really didn't see the button and tought, that you generate the Memo out of the grid by another click or whatever.

So, ok, for a program that works on everyones own PC, performance wouldn't also be THAT important. And when I look at the postings, there seem to be people that work with the PUnett Squares - I missed that thing because I never used tham and so I thought you use them for generating the "Memo".

~ggg~ I hope you understand what I mean.
But the User Interface looks quite nice and easy to use! Mine is embedded in pure HTML and won't be that nice or flexible :-/

Greetings from Germany

Menhir

P.S.: Which language did you use? Will there be only a M$ solution or will it run under Linux etc.?
I also tought about programming that this past summer, but I didn't want to programm something for Windows, and there aren't that much people using Linux - so the language had to be Java or something that can be run with an Interpretor..... but u know, how many people will accept that?

Marcel Poots · Feb 16, 2004

Menhir said:
P.S.: Which language did you use? Will there be only a M$ solution or will it run under Linux etc.?
I also tought about programming that this past summer, but I didn't want to programm something for Windows, and there aren't that much people using Linux - so the language had to be Java or something that can be run with an Interpretor..... but u know, how many people will accept that?

In Delphi (Pascal). Delphi can compile Pascal for Linux.. I made it for Windows.. All snake lovers have windows

Menhir · Feb 16, 2004

Marcel Poots said:
All snake lovers have windows

I think this is the answer to the question, why there is only one buggy program for genetics available... ~hehe~

But back to reality, this was one point why I decided to do my program in PHP with a HTML Interface - no mater which OS, everyone can use it without any installation etc.

Perhaps I should have write it as a quastion, how does your algorithm work (not code, just a short description) for the "Memo" way?

Marcel Poots · Feb 16, 2004

Basically the same. The user can check the checkboxes. Then I sum all the genes. I have a Male and a Female that have a set of Genes. I have a set of Dominant Genes and a set of HeteroGenes. Then when the user presses the calc button I make all possible combinations with the genes and count howmany times each version of a result is in the result list. That I translate to a language that the user will understand.

Menhir · Feb 16, 2004

OK, I understand the thing with the list. So, indirectly, u are using the square to generate the Memo.
But, I think it is very slow to check the list for all possibilities. My Guess is, you take the first "combination" in the list, count all possibilities (so go through the whole list that is left), and while counting, you take the equal ones out of your list. Then you have the next type in front of you etc etc.
I think these are plenty of comparisons for a 256 Punett Square - isn't it?
Altough my algorithm is tree-recursive, the tree just has as many branches as different results come out of the paring. And because of the MAINLY used parings and that I don't split the branches much the tree isn't also much branched out.

But, this is only my guess, I'm very interested in the algorithm you are using, because I also thought of doing the thing with a 2 dimensional array. For small problems, the algorithms may be fast, but for bigger squares the problem grows fast. My thought was running through the square one time, and pushing every entry into a hash - then I just have to count the buckets and the number of entrys in every bucket for the rate.
So, I would be really glad to hear how you solved the problem!

Greetings

paulh · Feb 16, 2004

For the algorthym, check out a recent genetics text for the branching system. A Punnett square is useful as a teaching tool only. The branching system is much better for routine use, especially for two or more loci.

A few years ago I hacked out such a program for 10 loci. The branching system can be computerized using a series of arrays and nested loops. There are no comparisons. My program's human-machine interface was not satisfactory, so I never released the program. But I did use external files for the loci, so merely selecting the correct file changed it from a corn snake genetics program to a king snake program to a mouse program, etc. My program would also do sexlinkage, and I'm not aware of any others that can.

BTW, a decent genetics program should be able to handle multiple alleles, not just 2. In some species there may be a dozen or even more alleles known for a single locus. So it is quite possible that there could be four different alleles in one mating.

I'll try to get back and post more in this thread later.

Serpwidgets · Feb 16, 2004

We're going off topic, but I love the progamming aspect of this, too, because there are a million ways to do anything.

If you insist on building an actual Punnett square, here's one way to do it:

For example, use the cross "AABbCcdd X AabbCcDd"

FatherGenotype as a string is "AABbCcdd"
MotherGenotype as a string is "AabbCcDd"

To build the sperm/eggs (the top and left) of the square you can do this for each parent:

Assign a "base genotype" using the homo loci (same letters, "AA" or "aa") of that parent. For the father this is "Ad" (all sperm will always be "Ad")

In Psuedo-code:

Code:

for i = 1 to len(Fathergenotype) step 2
     a = mid$(Fathergenotype,i,1)
     b = mid$(FatherGenotype,i+1,1)

     if  a == b {
          BaseGenotype += a }
     else {
          NewFatherGenes += a + b}

next i

FatherGenotype = NewFatherGenes

You have the BaseGenotype string which is now "Ad"

You've reduced the father's genotype to remove the "AA" and "dd", so that it's now:
"BbCc"

With the remaining FatherGenotype string, each sperm type is made by starting with the BaseGenotype, then adding the other letters. It now consists of two letters in the "B's place" and two letters in the "C's place." You need to run every possible combination of B and C.

Count from 0 to (2^(len/2)) - 1 ... (2 to the power of half the length of the string, minus one.) With the father, you would count to (2^2) - 1, or "0 to 3."

Since we only have "het" stuff, the letter for a "place" is always either a capital or small letter. The digit in each place (1's, 2's, 4's, 8's, 16's, etc) is always either a 0 or 1. How convenient!

Use AND to check the binary digit. If it's 0, you add the first letter for that place, otherwise you add the second letter for that place.

Your count goes:
00
01
10
11

The 1's digit corresponds to the "B" place, and goes:
0 = "B"
1 = "b"
0 = "B"
1 = "b"

The 2's digit corresponds to the "C" place, so it goes:
0 = "C"
0 = "C"
1 = "c"
1 = "c"

So you end up with the base genotype, plus what you got by "counting" in binary, to get this:
00 = ADBC
01 = ADbC
10 = ADBc
11 = ADbc

In pseudo-code, it would be like this:

Code:

Sperm() as String, Count as Int, Place as Int

For Count = 0 to ((len(Fathergenotype) / 2) ^2) -1
     Sperm(Count) = BaseGenotype
     For Place = 0 to (len(fathergenotype) / 2)
          if Count AND (2 ^ Place) = 0 {
               Sperm(Count) +=  mid$(FatherGenotype, Place * 2 , 1) }
          else {
               Sperm(Count) +=  mid$(FatherGenotype, (Place * 2)+1 , 1) }
         next Place
     next Count

You can then sort them alphabetically if you want, to get this:
ABCD
AbCD
ABcD
AbcD

The important thing is: at this point you've got all possible sperm from the dad, which is your row across the top. Do the same for the column down the left to get all possible eggs. Once you've done those, you have to only fill in the Punnett Square:

Code:

for x = 0 to SpermCount
for y = 0 to EggCount
     for i = 1 to len(sperm(0))
          Punnett(x,y) += mid$( Sperm(x), i, 1) + mid$( Egg(y), i, 1)
     next i
next y
next x

Since everything is recessive, you can easily say
"" (for "XX")
"Het for trait" (for "Xx" or "xX")
"trait" (for "xx")

At the end, if the "description" is an empty string, change it to "Normal."

It may or may not execute faster, but it's sure easy to code.

I'm curious about the recursive tree method you mentioned. Is this like the method Hurley expleined? If not, can you explain a bit about how it works?

Serpwidgets · Feb 16, 2004

Heh, it's always tough for me to learn anything if it's in "textbook" language.

(So I'm probably saying the same thing in different language...)

I'm guessing that the branching system can be optimized because there can still only be a maximum of four different alleles at any one locus. (Two in the mother, two in the father.)

So, for one locus, if the cross is:
12 X 34

Only these can happen:
(First) 13
(Outside) 14
(Inside) 23
(Last) 24

This means it could be worked from a table, which is a quick and easy calculation for each locus, using "FOIL."

If you do this at each locus, you get either 1, 2, or 4 possible outcomes depending on which alleles are there. (You could always ignore this and just use 4 for each locus, but that could get very inefficient with lots of loci.)

Then you could use a counting system which has variable-base digits (base 1, 2, or 4 for each digit) and just "count" through the results again. That would be "a series of arrays inside nested loops."

It's probably the same concept as what you mentioned, Paul.

Am I on the right track?

I agree that the human/computer interface is the biggest challenge of the project. Once I found an easy way to calculate it in my head, it became tedious trying to make it give an answer I could already get by myself in a few seconds.

Blegh, adding sexlinkage might make things messy. I'll have to vegetate on that one, hehe.

paulh · Feb 16, 2004

We are definitely together through FOIL for a single locus. BTW, sexlinkage is also FOIL. Here's FOIL for sexlinkage in mammals, with a heterozygous female mated to a male:

Aa x aY -->
1/4 Aa
1/4 AY
1/4 aa
1/4 aY

Anything with a Y chromosome is a male.

Here's FOIL for sexlinkage in birds and colubrid snakes, with a heterozygous male mated to a female:

Aa x aW -->
1/4 Aa
1/4 AW
1/4 aa
1/4 aW

Anything with a W chromosome is female.

I'm pretty sure we are together through the rest. Here's a branching system for Aa Bb x Aa Bb:

a locus: Aa x Aa -->
1/4 AA
1/4 Aa
1/4 aA
1/4 aa
which reduces to 1/4 AA, 2/4 Aa, 1/4 aa

b locus: Bb x Bb --> a result like the a locus, which reduces to 1/4 BB, 2/4 Bb, 1/4 bb.

Put the a locus results in one array and the b locus results in another array. Loop through both arrays with the b loop nested inside the a loop. At each step multiply the fractions. Result:

1/16 AA BB
2/16 AA Bb
1/16 AA bb
2/16 Aa BB
4/16 Aa Bb
2/16 Aa bb
1/16 aa BB
2/16 aa Bb
1/16 aa bb

Getting all the possible combinations is overkill, though, if I mate Aa BB cc Dd Ee Ff x AA Bb Cc dd EE ff and just want to know the expected fraction of babies that are Aa BB cc dd Ee ff.

Serpwidgets · Feb 16, 2004

Ooh, how about this:

Assign a "gene" to each binary digit of each parent...

That is, make a string array for the precalculations(memory is cheap, these aren't Commodore 64's or Apple IIe's anymore, wooowoo!)

Given:
Father = "AABbCcdd"
Mother = "AabbCcDd"

(yay, the "PHP" tag makes pretty colors for my psuedocode!)

PHP:

// variables
NumLoci =  len(father) / 2        //integer
NumPoss( i )                //number of results for each locus
Poss ( NumLoci, 3 )    //for each locus, each possible result

for i = 1 to NumLoci
    f1 = mid$(father, (i*2)+1, 1)
    f2 = mid$(father, (i*2)+2, 1)
    m1 = mid$(mother, (i*2)+1, 1)
    m2 = mid$(mother, (i*2)+2, 1)

    if f1 == f2 { //father is homozygous
        if m1 == m2 { //both parents are homozygous, only one result
            NumPoss(i) = 0
            Poss(i, 0) = f1 + m1 }

        else { //father is homozygous, mother is het, two results
            NumPoss(i) = 1
            Poss(i, 0) = f1 + m1
            Poss(i, 1) = f2 + m1 }

    else //father is heterozygous
        if m1 == m2 { // mother is homozygous, two results
            NumPoss(i) = 1
            Poss(i, 0) = f1 + m1
            Poss(i, 1) = f1 + m2 }

        else { // both parents are het, four results
            NumPoss(i) = 3
            Poss(i, 0) = f1 + m1
            Poss(i, 1) = f2 + m1
            Poss(i, 2) = f1 + m2
            Poss(i, 3) = f2 + m2 }

next i

Of course if you're using more than one letter per allele, you'd have to adjust the code for that.

You are only running through this loop once for each locus, it's extremely fast execution time. You are only using four little strings and one integer per locus, that's a tiny amount of memory.

Then you need to iterate every possible result... the number of different results is:

PHP:

TotalOutcomes = 1
for i = 1 to NumLoci
    TotalOutcomes *= ( Numposs(i) + 1 )
next i

Like Paul said, nested loops and arrays can be used to "count" your way through the outcomes. I'm not sure how anyone else would implement this, but for me it's easiest just to "count" through the combinations.

In short, all you do is:
What genotype is made from the current number?
What's the next number?
Until you've "counted" every number.

PHP:

//variables
Locus, OffNum, k      // integers
Offspring ( TotalOutcomes ) // string
i ( NumLoci )               // integer
//  each i() is one "digit" in the number.

for k = 1 to NumLoci
    i(k) = 0 //initialize to 0
next k

OffNum = -1


do {
    OffNum += 1

    for Locus = 1 to NumLoci 
        Offspring(OffNum) += Poss( Locus, i(Locus) ) // add genotype at this locus
        //notice that i(Locus) changes with the digits in the number we are counting
    next Locus    
    // one offspring's genotype has been created as a string.

    // increment the "number" you are using to count
    i(1) += 1 //standard counting: increase the lowest digit

    for k = 1 to NumLoci - 1 // then "carry" any overages to the next digit...
        if i(k) > NumPoss(k) {
            i(k) = 0      // reset this digit to zero
            i(k+1) += 1 } // increment the next digit
    next k
    
loop until i(NumLoci) > NumPoss(Numloci)
// will loop until the high-order digit passes its limit.
// basically like counting to "99999" with a five-digit (base 10) number.

What you do is "count" in a variable-base-by-digit number system to iterate every possible combination (no more, no less) and for each "number" you slap a genotype together from the "Poss" array that was constructed in the first part.

I'm curious if the "branching" system is more or less efficient, or the same thing.

(Then you need to actually interpret those genotypes into human language. That's the annoying part, hehe. It could be done within the above loop, or after, in another loop.)

Serpwidgets · Feb 16, 2004

paulh said:
We are definitely together through FOIL for a single locus. BTW, sexlinkage is also FOIL. Here's FOIL for sexlinkage in mammals, with a heterozygous female mated to a male:

Aa x aY -->
1/4 Aa
1/4 AY
1/4 aa
1/4 aY

Anything with a Y chromosome is a male.

I assume that anything paired with the "Y" means there is no corresponding locus on the Y chromosome, so the other one (on the X) is always expressed, right?

Is it the case that everything on a Y chromosome is also on the X chromosome? Or are there also loci on Y which aren't on X?

Menhir · Feb 17, 2004

OK, nice solutions

I planned to use my program just for the cornsnake genetics aspects, so I think that I can save some branches that are uninteresting for breeders.

First, I used Integer Arrays. The Place in the Array shows my the Gene (u used A, B, C etc.) so, in the End I only do a lookup with an array like this ("Anery", "Caramel", ....) (so, changing these Array will make it useable for Balls or wathever is dom-rez)
So, then 1 stands for homo, 2 for het, 0 for nothing.

If I programm it that it is genetically true, there are some cases in which I have lots of branches.

look at this male(2, 0) & female (2,2) - so lets say male Nominat het. Amel, female Nominat het. Amel & Anery

My tree only has 2 branches, why? OK, both het. means that there are 1/4 homo (push 1 into result) animals and 3/4 poss het 66% (push 4 into result)
So, I split that tree and weight the branches with 3/4 and 1/4.
Then it is only het. Anery left so no split and push 3 into result.

Because every branch saves his weights, I can unshift it in the last step into the result-array und then push the result array into my 2 dimensional result-array.

Then you just have to print like this

PHP:

for[i......]
  shift(RESULT [i][0]) //Ratio
    
    for[j....] // one time for 1, 2, 3 and 4 (Homo, Het, Poss50 & poss 60)

       shift(RESULT[i][j]; // Names

Because PHP is based as a scrit language, it will go through the tree just as I tell him to do, so if you do your function calls after a split it the right way, you don't have to sort anything in the end, because the branches are evaluated sequentially and so are also pushing their result sequantially into the RESULT[][]. This would be possible in other languages like scheme etc.

So, what comes out is
(4, 3) => Nominat poss.het.66%Amel poss.het.50%Anery
(1, 3) => Amel poss.het.50% Anery

So, genetically correct I would have 16 branches, but why should I calculate them, when I can't differentiate them with my eyes?

For bigger problems, with e.g. more allels, I also thought of doing it OO style with one Object for every possibility and then broadcast the next steps to all the objects. I think this would be nice code, but I did not think of the speed aspect yet.

Marcel Poots · Feb 17, 2004

Menhir said:
But, this is only my guess, I'm very interested in the algorithm you are using, because I also thought of doing the thing with a 2 dimensional array. For small problems, the algorithms may be fast, but for bigger squares the problem grows fast. My thought was running through the square one time, and pushing every entry into a hash - then I just have to count the buckets and the number of entrys in every bucket for the rate.
So, I would be really glad to hear how you solved the problem!

Greetings

Speed is not really a problem. I am working on the algorithme and it could probably calculate a 5 trait father X 5 trait mother about 100.000 per second. I will show you my algorithme as soon as I am finished.

Menhir · Feb 17, 2004

~ggg~ Sometimes I wish someone could translate me what I wrote back into understandable german... I bet that I often don't say the things I meant to say.

I'm not thinkin about speed, especially in a PIV Workstation, you can really write an algorithm checking every field and compare it to every other, without letting the user wait. I'm more interested in what will happen to the time, when we double the input or things like that.

Perhaps this is a desease when you study computer science since 2,5 years now. Programmin is boring, it's more optimizing or figuring out for which problems which algorithm is faster or uses the smaller amount of memory etc.

So, don't be afraid, I won't copy the algorithm

One thing that I would like to know - what do you do with Motley & Stripe? Simply Dominant Rezessive?
I did it the way pewter has it on his site and this was really ugly to embedd into my algorithm. Source code is now 200% of what it was before :-(

[Edit] ~lol~ I wrote the german word "oder" (means "or") through the whole post...[/Edit]

Marcel Poots · Feb 17, 2004

Menhir said:
So, don't be afraid, I won't copy the algorithm

LOL, I will not be affraid for that since I will make the whole program inclusive code available. The algorithe will be something along the line of this:

Length of string 'AaBbCC' = 6 There are three traits. So we have 3 ^ (power of) 2 possiblilities. Then there are these posibilities:

0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 1

Each zero or one stands for a possibilty.

So to make the 8 combinantions it would be something like:

for MyCounter := 0 to ((Lentgh(aString) div 2) ^ 2) - 1

MyCounter shr (shift right bitwise) and 1 <- do something with this

end;

Marcel Poots · Feb 17, 2004

Okay Menhir,

Here is the algorithme for building the sets of the father and the sets of the mother:

Code:

[COLOR=red]procedure [/COLOR]TForm1.Button1Click(Sender: TObject);
[COLOR=red]var
  [/COLOR]MyList : TStringList;
  i, j, tmp : Integer;
  S : [COLOR=red]String[/COLOR];
[COLOR=red]begin
  [/COLOR]Memo1.Lines.Clear;
  MyList := TStringList.Create;
  [COLOR=red]try
   for [/COLOR]i := [COLOR=blue]0 [/COLOR][COLOR=red]to [/COLOR](length(Edit1.Text) [COLOR=red]div [/COLOR][COLOR=blue]2[/COLOR]) -[COLOR=blue]1 [/COLOR][COLOR=red]do
     [/COLOR]MyList.Add(Copy(Edit1.Text, i* [COLOR=blue]2 [/COLOR]+ [COLOR=blue]1[/COLOR],[COLOR=blue]2[/COLOR])) ; [COLOR=darkblue]// This will add 'Aa' 'Bb' as strings to the list
   [/COLOR][COLOR=red]for [/COLOR]i := [COLOR=blue]0 [/COLOR][COLOR=red]to [/COLOR]Floor(Power( [COLOR=blue]2[/COLOR], length(Edit1.Text) [COLOR=red]div [/COLOR][COLOR=blue]2[/COLOR]))  - [COLOR=blue]1 [/COLOR][COLOR=red]do
   begin
     [/COLOR]tmp := i;
     S := [COLOR=blue]''[/COLOR];
     [COLOR=red]for [/COLOR]j := [COLOR=blue]0 [/COLOR][COLOR=red]to [/COLOR]MyList.Count -[COLOR=blue]1 [/COLOR][COLOR=red]do [/COLOR][COLOR=darkblue]// For each 'Aa' .. 'Cc'
       [/COLOR]S := S + MyList.Strings[j][(tmp [COLOR=red]shr [/COLOR]j [COLOR=red]and [/COLOR][COLOR=blue]1[/COLOR]) + [COLOR=blue]1[/COLOR]]; [COLOR=darkblue]// And here is the trick!!
     [/COLOR]Memo1.Lines.Add(S);
   [COLOR=red]end[/COLOR];
  [COLOR=red]finally
   [/COLOR]MyList.Free
  [COLOR=red]end[/COLOR];
[COLOR=red]end[/COLOR];

cowtownherper · Feb 17, 2004

That wasn't a sonic boom, just my head exploding

paulh · Feb 17, 2004

Mammalian Y chromosome

Serp, as I understand it, the vast majority of the genes on the X chromosome have no corresponding gene on the Y chromosome. In such cases the gene on the X chromosome is expressed.

One or more genes on the Y chromosome cause Y bearing zygotes to develop into males. This genetic material is not present on the X chromosome. Apparantly birds do not have a male causing gene but depend on the number of Z chromosomes present to determine whether the zygote becomes male or female. Colubrid snakes may have a mechanism similar to birds.

There was a piece on the evolution of the Y chromosome in Scientific American within the past year. You might want to take a look at it.

CAV · Feb 17, 2004

I'm sure there are many "x chromosome" humans that would readily agree that something is definitely missing from "y chromosome" humans. This would explain why we have contact sports, chili cook-offs, NASCAR and bungee jumping.

Menhir · Feb 18, 2004

@Marcel oomph...

I was reading Serps postings from the beginning of the threat, where he gives the example with the cards.
So, it's possible that I'm to blind to see what you're doing, but do you setup in this algorithm only the "Cards" each individual is able to pass to the next generation, or are you already calculating the possibibilities in the end. ....ohm, I always hatet Pascal or perhaps I should got to sleep for another hour or so...

Question for genetic wizzards

Charmelippe

Young, handsom member

Charmelippe

Young, handsom member

Charmelippe

New member

New member

New member

New member

New member

New member

Charmelippe

Young, handsom member

Charmelippe

Young, handsom member

Young, handsom member

New member

New member

Dazed and Cornfused

Charmelippe