Author Topic: GPGPU acceleration? (Read 16070 times)

Billy · « **on:** April 27, 2017, 10:04:56 AM »

I know it's pain to get working, but a simulation with many bots seems like an ideal problem for opencl or similar. One job for each bot, and perhaps even grouping similar bots into warps for SIMD execution on Nvidia chips. You wouldn't necessarily have to move much data between main memory and GPU memory if you don't need to observe the simulation in real time.

I've only dabbled with GPU programming though (circle detection in images), so maybe it's not as well suited as it seems. I am aware that it would take a lot of effort to port all of the DB simulation code to opencl, what with the physics, DNA interpretation, reproduction mutation... Maybe I'll try making something similar from scratch over the summer, specifically geared towards GPGPU execution.

Numsgil · « **Reply #1 on:** April 27, 2017, 05:16:27 PM »

Executing the DNA isn't really well suited for GPUs, unfortunately, since each bot has to run its own DNA on its own data. Getting the physics to execute on GPUs is possible in principle but rarely done in practice. Probably only shots are really suited for GPU calculations, but we just don't have enough shots in a typical sim to outweigh the upfront CPU cost of transferring things to/from the GPU.

Billy · « **Reply #2 on:** April 27, 2017, 06:40:37 PM »

Couldn't kernels read tokenised DNA code from one buffer, memlocs from another, and interpret the DNA? Writing the new memloc values to a third buffer, perhaps. There is a lot of branching, which would reduce the benefit from SIMD processors, but maybe this could be reduced by grouping similar bots. Most bots spend the vast majority of their time doing one thing, from my experience (at least in evosims), like searching or spinning, so this might just work.

Numsgil · « **Reply #3 on:** April 28, 2017, 05:58:10 AM »

Problem is GPUs only benefit when they can do the same operation (add, multiply, etc.) in parallel. Even if you put the DNA on the GPUs somehow, you still can't execute more than one DNA program at a time.

It's a problem of SIMD vs. MIMD. GPUs need SIMD to overcome the additional cost of transferring the data to/from the GPU and starting up a processing batch, but most of Darwinbots is either SISD or MIMD. The DNA execution is MIMD, certainly. The physics has passes of MIMD but then everything bottlenecks in places through a SISD section. In that sort of problem, CPUs are still king.

Botsareus · « **Reply #4 on:** April 28, 2017, 06:03:33 AM »

We need better designed hardware is what we need.

Billy · « **Reply #5 on:** April 28, 2017, 09:30:25 AM »

Quote from: Numsgil on April 28, 2017, 05:58:10 AM

Problem is GPUs only benefit when they can do the same operation (add, multiply, etc.) in parallel. Even if you put the DNA on the GPUs somehow, you still can't execute more than one DNA program at a time.

It's a problem of SIMD vs. MIMD. GPUs need SIMD to overcome the additional cost of transferring the data to/from the GPU and starting up a processing batch, but most of Darwinbots is either SISD or MIMD. The DNA execution is MIMD, certainly. The physics has passes of MIMD but then everything bottlenecks in places through a SISD section. In that sort of problem, CPUs are still king.

I guess you're probably right. I was thinking that if 90% of the bots have very similar DNA and are doing exactly the same activity, there would be very little divergence and it would effectively be SIMD. I'd still like to make a prototype, maybe with some minimal language rather than DNA, so I can compare the performance with that on a CPU.

Botsareus: what kind of thing do you think would be better? A chip with lots of simple MIMD cores?

Botsareus · « **Reply #6 on:** April 28, 2017, 11:23:33 AM »

Do not care about what MIMD stands for. In general a lot of small CPU cores compute a lot of small parallel instruction sets really quickly. Then have one huge chip like the new Intel shit super cooled for large instruction sets.

Botsareus · « **Reply #7 on:** April 28, 2017, 11:35:29 AM »

The problem is no one figured out how to place different sized CPUs on the same board.

Billy · « **Reply #8 on:** April 28, 2017, 01:54:54 PM »

Quote from: Botsareus on April 28, 2017, 11:23:33 AM

Do not care about what MIMD stands for. In general a lot of small CPU cores compute a lot of small parallel instruction sets really quickly. Then have one huge chip like the new Intel shit super cooled for large instruction sets.

MIMD just means that each core runs its own program with its own data, so yeah, that pretty much what I was getting at. I guess the issue with that is expense. Each core having its own control unit would cost silicon relative to a GPU, so it might work out better to just use the larger, faster cores in today's CPUs. Perhaps not though, it would be very handy for massively parallel problems where GPUs can't be used.

By the way, 'instruction set' has a specific meaning that is different, I think from how you're using the term. A CPU has one instruction set, which is just the instructions that it can process (e.g. add, load, store, etc.). When source code is compiled, it is translated into the instruction set of the platform it's targeting (e.g., x86-64 or ARM). Each platform has its own assembly language too, where you can write a program directly using the instructions from the instruction set of that platform.

Botsareus · « **Reply #9 on:** April 28, 2017, 04:51:08 PM »

I get I am not good with words; it is not my native.
However point still stands that no one figured it out. It would have been perfect architecture because each master process can just assign threads or processes or whatever to different sized CPUs. Or just hack it into a programming language. Something like "start a new n speed thread"

Numsgil · « **Reply #10 on:** April 28, 2017, 08:46:07 PM »

Quote from: Billy on April 28, 2017, 09:30:25 AM

I guess you're probably right. I was thinking that if 90% of the bots have very similar DNA and are doing exactly the same activity, there would be very little divergence and it would effectively be SIMD. I'd still like to make a prototype, maybe with some minimal language rather than DNA, so I can compare the performance with that on a CPU.

I don't think it's impossible with a bit of ingenuity to come up with something, but it's a bit awkward. At their core, bots have 1000 inputs (their memory) and 1000 outputs (their memory after storing stuff in to it). GPUs don't really like mapping inputs to outputs in the large scale like that. They tend to be more comfortable mapping a small fixed number of inputs to 1 output over and over in parallel, and doing multiple passes if they need to.

One idea: I could imagine a DNA language built around the idea of smaller DNA programs feeding to each other. Like, imagine a neural network, but instead of a sum and logistic function each node is a small program. If each small program had 16 inputs, say, and a single output that fed on to the next layer of nodes, each node could be executed massively in parallel using the GPUs quite nicely. I don't know if this would be even remotely efficient, but it would sort of map to how the hardware works at least.

Billy · « **Reply #11 on:** April 29, 2017, 08:20:50 AM »

Quote from: Numsgil on April 28, 2017, 08:46:07 PM

Quote from: Billy on April 28, 2017, 09:30:25 AM
I guess you're probably right. I was thinking that if 90% of the bots have very similar DNA and are doing exactly the same activity, there would be very little divergence and it would effectively be SIMD. I'd still like to make a prototype, maybe with some minimal language rather than DNA, so I can compare the performance with that on a CPU.

I don't think it's impossible with a bit of ingenuity to come up with something, but it's a bit awkward. At their core, bots have 1000 inputs (their memory) and 1000 outputs (their memory after storing stuff in to it). GPUs don't really like mapping inputs to outputs in the large scale like that. They tend to be more comfortable mapping a small fixed number of inputs to 1 output over and over in parallel, and doing multiple passes if they need to.

One idea: I could imagine a DNA language built around the idea of smaller DNA programs feeding to each other. Like, imagine a neural network, but instead of a sum and logistic function each node is a small program. If each small program had 16 inputs, say, and a single output that fed on to the next layer of nodes, each node could be executed massively in parallel using the GPUs quite nicely. I don't know if this would be even remotely efficient, but it would sort of map to how the hardware works at least.

So there would be a fixed set of these node functions that, arranged in different tree structures, produce different bot behaviours? With a pre-written kernel for each function?

Billy · « **Reply #12 on:** April 29, 2017, 08:30:07 AM »

Quote from: Botsareus on April 28, 2017, 04:51:08 PM

I get I am not good with words; it is not my native.
However point still stands that no one figured it out. It would have been perfect architecture because each master process can just assign threads or processes or whatever to different sized CPUs. Or just hack it into a programming language. Something like "start a new n speed thread"

Not sure it's a case of figuring out how (though I don't know much about electronics), rather proving that it's worth doing. Most things can be done well enough on a CPU and/or GPU. Maybe having a few smaller cores is no better than having a single extra full-size core.

Numsgil · « **Reply #13 on:** April 30, 2017, 12:22:01 PM »

Quote from: Billy on April 29, 2017, 08:20:50 AM

So there would be a fixed set of these node functions that, arranged in different tree structures, produce different bot behaviours? With a pre-written kernel for each function?

I was thinking more that the code in a node could change but all bots in a species would share the tree structure and code so you could go wide that way, but using preset nodes works, too.

Billy · « **Reply #14 on:** April 30, 2017, 05:35:51 PM »

Quote from: Numsgil on April 30, 2017, 12:22:01 PM

Quote from: Billy on April 29, 2017, 08:20:50 AM
So there would be a fixed set of these node functions that, arranged in different tree structures, produce different bot behaviours? With a pre-written kernel for each function?

I was thinking more that the code in a node could change but all bots in a species would share the tree structure and code so you could go wide that way, but using preset nodes works, too.

Why would that be easier for a GPU than DNA as it is? If the node code can mutate, wouldn't you run into the same problem of divergence?

Darwinbots Forum

News:

Author Topic: GPGPU acceleration? (Read 16070 times)

Billy

GPGPU acceleration?

Numsgil

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?

Numsgil

Re: GPGPU acceleration?

Botsareus

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?

Botsareus

Re: GPGPU acceleration?

Botsareus

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?

Botsareus

Re: GPGPU acceleration?

Numsgil

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?

Numsgil

Re: GPGPU acceleration?

Billy

Re: GPGPU acceleration?