The mathematics exists to understand things like collisions, etc. so clearly the problem is in conveying the information to the observer.
What is available on the screen to denote spatial dimensions? Well, you have x,y, color, and distortion.
Clearly you can devise an arbitrary set of rules for how to interpret 4D, the same way you can create an arbitrary set of rules to interpret 3D on the 2D screen.
Okay, imagine this:
x,y position -> x,y position of graphic
size > z position
Darkness -> w (the 4th dimension variable). Say a bot's color is red. A low w value would result in a deep crimson. A high w value is pinkish.
In this case, a collision could be predicted by you, the viewer, if the bots:
1. are near each other on the screen
2. are close in size
3. are close in color.