I'm actually very familiar with this. However, in most modern computers with a math coprocessor, the square root operation is going to be alot faster than almost anything you can make yourself, including using a custom inverse version.
Or that's my experience anyway. Primarily this sort of thing is important if you're building hardware. Video cards use this I'm sure.