Author Topic: C++ refactoring  (Read 9725 times)

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« on: October 30, 2006, 04:08:11 AM »
I'm thinking of ending my Darwinbots coding sabatical and I'd like to see the C++ fork smoothed over and finished.  The present code can be seen as a rough draft I think.  Here are the changes (some major) I'd like to discuss:

1.  Documentation is vital.  An automatic documentation system like doxygen should be explored.  Code documentation has always been my weak point, so I'd like thoughts from people more knowledgable about this than I.

2.  Pure object oriented - I've slowly been won over on the maintainability and architectability of well designed object oriented code.  An example: different shot types are inherited from a pure virtual base class.  I'll have to explore the source before I know for sure what I'd like to do.

3.  Engine DLL - The core engine is constructed to be a DLL that interfaces with a pure virtual GUI handler.  This way, the GUI can be constructed seperately from the core engine.  The idea is that I can build a pure console version, a C# GUI interface, and still remain potentially cross platform.

4.  Boost library, specifically threads but maybe other things as well as I explore boost better.  Problem is that it takes like 3 hours to set up the boost libraries on someone's computer, especially if they've never done it before.

5.  An eye towards multithreading.  This is especially important for things like a bot debugger in the program.  Mutexes need to be interspersed properly to get everything working right.

6.  Sysvars no longer read in from an external file, but stored as a hardcoded STL map.  You could output a list of sysvars populated from this map.  The idea is that you only need to distribute a single file.

Just looking for people's thoughts.  Most of the code is fine as is, but the over-reaching architecture is a little weak.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
C++ refactoring
« Reply #1 on: October 30, 2006, 08:03:28 PM »
#5 is top priroity from my perspective.  I think we will see 64 core chips in 2 years time available in machines for under $1k.  Think of what we coudl do with that kind of horsepower...  DB should parallalize extremely well and without the need for fancy threading libraries...

The importance of #2 is to a large extent a function of the number of developers on the project.  Personally, I'm surprised more people arn't hacking away on the VB fork given how easy and approachable Vb6 is.  Given this, I doubt we will have many code authors on the C++ fork.  I'm all for strict object orientation and aherance to class inheritance rules, but there is a cost in terms of learning curve and speed of development.  The fewer the authors, the less critical this is I think.

The same can be said for #1.  If no one is going to read your code, why document it?  But if you have a bunch of authors, it becomes critical and a matter of honor to document your work.  I know I would be doing a better job on the VB fork if I thought anyone would ever look at the source...

#3 is not high on my list.  It takes work to maintain portability.  If someone really wants to run on Linix, let them join the club and maintain the portability of the code.  Until then, focus on Windows.

#6 is a good idea.  I may do this in the VB fork...
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #2 on: November 01, 2006, 12:16:10 PM »
This might be an issue with a sort of chicken and the egg though.  If the code is more accessible to major changes, maybe more people would work on it.  The VB source is open more to smaller changes (like adding a sysvar) but is(was, I haven't touched it in a while ) getting harder and harder to shoehorn major new features.  This isn't really anyone's fault, it's just a product of the iterative development of the program.

Major changes are going to be easier with major refactoring.  Major refactoring is alot of work.

Let me see what I can come up with...

Offline Sprotiel

  • Bot Destroyer
  • ***
  • Posts: 135
    • View Profile
C++ refactoring
« Reply #3 on: November 07, 2006, 12:37:14 PM »
Well, I guess the honest answer to this is: I'm glad you saw the light, at last!

More constructively, I agree completely with items #2 and #5. I would even say that #5 is an absolute requirement if we want a stable GUI app.
I also agree with the goal stated by #3, but I don't think it's really useful right now to actually make the core engine a DLL. Just keeping the possibility open by ensuring that it's possible to compile a console application with just the core engine should be enough for now, IMHO.
I agree with #1, but I don't think it's critical. Better than a thoroughly-documented code is a code that's clear enough to not require documentation.

As for #4, I'd say we shouldn't use boost unless we really need it, but it seems to have a lot of handy features, not just threading, so I don't know.

I disagree with #6: I agree that the way the sysvars file works now is rather silly, but the possibility to change the memory location associated with a sysvar would be interesting.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
C++ refactoring
« Reply #4 on: November 07, 2006, 02:38:41 PM »
Quote from: Sprotiel
I disagree with #6: I agree that the way the sysvars file works now is rather silly, but the possibility to change the memory location associated with a sysvar would be interesting.
FYI, as of the next VB version, the sysvars file is no longer needed or used.

The code is not and from what I can tell, never has been set up to allow for the changing of the memory location associated with a sysvar simply by editing sysvars.txt.  Doing so will certainly break things in current versions.  There are many, many places in the code where the mem locations for specific sysvars are hard wired and changing this would be a major major work item.  Sysvars.txt is not a sysvar mapping table that drives the code.  All it allows (allowed) was the potential for sysvar synonyms, localized sysvar names for example.  If you wanted to call .dn .down instead (or something in Italian perhaps) for your own bots on your own machine, you could change the sysvars.txt file entry or add additional entries and this would work.  That is the only thing we loose when the sysvars go internal to the exe - the exe has to rev to add sysvar synonyms - but the gains in simplicity are well worth it IMHO, particularly once I complete the work for the exe to be self installing.
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #5 on: November 07, 2006, 03:40:13 PM »
Quote from: Sprotiel
Well, I guess the honest answer to this is: I'm glad you saw the light, at last!

I've seen alot of very poor "OO" code.  Alot of very poor OO code.  I have/had some prejudices.  Of course, I've seen some even worse code used for various research papers written in C or FORTRAN, so there's something to be said about bad programming practices all around I think.

While I'm on the topic, at the moment I'm exploring a data driven design architecture.  Which basically means the data and the algorithms to operate on the data are entirely seperated.  So when I say "OO", it might be a little different from what you're used to if you've been doing the more standard approach of data + implementation in the same data object.  This is a good collection of articles and thesises (thesi?) about data driven design.  Basically it allows the data to be shared between the GUI and the engine and the parts of the engine, without various modules needing to know how the others work.

Also, the Engine DLL should be fairly trivial to handle.  The main reason I think it needs to be seperate is that I'd like to use ANSI C++ for the main engine to allow portability issues.  But I'm really enjoying C# for GUI creation.  A WSYSISYG GUI creator and IDE is really about the greatest thing since sliced bread.  I could move the entire engine to C#, but that would mean alot of rewriting code and abandoning portability.  Having the two seperate and working in different languages should allow both to play to their strengths, and helps physically enforce proper abstraction between the two.

I'd be willing to use another library than boost for threading.  Basically I just need a portable threading library, preferably stand alone without other cruff.  The more features the threading library has, the better.
« Last Edit: November 07, 2006, 03:41:02 PM by Numsgil »

Offline Sprotiel

  • Bot Destroyer
  • ***
  • Posts: 135
    • View Profile
C++ refactoring
« Reply #6 on: November 08, 2006, 08:17:28 PM »
Quote from: Numsgil
While I'm on the topic, at the moment I'm exploring a data driven design architecture.  Which basically means the data and the algorithms to operate on the data are entirely seperated.  So when I say "OO", it might be a little different from what you're used to if you've been doing the more standard approach of data + implementation in the same data object.  This is a good collection of articles and thesises (thesi?) about data driven design.  Basically it allows the data to be shared between the GUI and the engine and the parts of the engine, without various modules needing to know how the others work.
A Google search on "data driven design" reveals that it's mostly understood as a way to pretend doing OOP, while doing anything but it. I'm not sure you're referring to the exact same thing, but I've read the thesis you linked to and I'm not convinced. Modularity is certainly a good thing, but I don't see why the approach described in there would be the only way, or the best one, to achieve it.

Quote
Also, the Engine DLL should be fairly trivial to handle.  The main reason I think it needs to be seperate is that I'd like to use ANSI C++ for the main engine to allow portability issues.  But I'm really enjoying C# for GUI creation.  A WSYSISYG GUI creator and IDE is really about the greatest thing since sliced bread.  I could move the entire engine to C#, but that would mean alot of rewriting code and abandoning portability.  Having the two seperate and working in different languages should allow both to play to their strengths, and helps physically enforce proper abstraction between the two.
I'm not too sure of the wisdom of using two different languages, since it requires proficiency in two instead of one languages. Besides, portability is one thing, but actually being ported is the real goal. We have much more chances of attracting Linux or Mac users if they have a GUI (almost) up and running than if they have to build it from scratch before using the program. I really think the whole program, not just the engine, should be portable.

Quote
I'd be willing to use another library than boost for threading.  Basically I just need a portable threading library, preferably stand alone without other cruff.  The more features the threading library has, the better.
The more I look at boost, the more useful I think it could be (serialization, smart pointers, ...) but having too many dependencies should be avoided and FOX already handles threading.

Anyway, you should take a look at my version of the code. I believe I made significant progress wrt. goals 2, 3 and 5.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #7 on: November 09, 2006, 12:08:39 AM »
Googling this probably isn't a good idea.  It's a loaded concept that means different things to different people.

For Darwinbots, basically, all the data is physically stored in various singleton managers in the engine.  Data is passed into algorithm singletons that modify and change it.

Pros:
Additional non core modules can easily be worked in (GUI, stats tests, etc.) without core modules worrying about them.  The core engine can be entirely ripped from the GUI, and vice versa, enforcing strictly bound problem domains.

Cons:
If the data representation changes, all of the modules need to be changed that deal with that data.  Different files need to be maintained between C# struct definitions and C++ definitions.  Const correctness is difficult (and in some ways unnecesary).

It's basically identical to more proper OO designs, except you're splitting the implementation and data apart.  The idea is that you can rip out certain modules (say, physics), replace modules (again, say physics), or add modules without modifying other modules.  This isn't as true for OO design, which follows a more hierarchial approach.

Quote
I'm not too sure of the wisdom of using two different languages, since it requires proficiency in two instead of one languages. Besides, portability is one thing, but actually being ported is the real goal. We have much more chances of attracting Linux or Mac users if they have a GUI (almost) up and running than if they have to build it from scratch before using the program. I really think the whole program, not just the engine, should be portable.

Have you done any programming with .NET?  It's absolutely spectacular.  Between GUI building, database support, internet controls, and other features I'm still discovering, it takes alot of the head scratching out of some of the features I'd like to add to Darwinbots.

There are also .NET exports to linux such as Mono, that probably will let you run Darwinbots on other platforms.  I say probably because I've heard reports that some of the more abstract features are missing.

I could program it all in C#, but I'm more familiar with C++ and there's already a 13K line code base that's written in C++.  Moving it all to C# is a question worth discussing, but I'm not sure the time required to do so would be worth it.  I'm not sure I can think of a good reason not to move it to C# beyond my lack of knowledge, the time it would take, and portability issues.

If someone wants to take the time to refactor the current code into C#, that would be cool.  Or if anyone can think of other valid reasons to maintain the code in C++.  My experience with C# is still intermediary.

Quote
Anyway, you should take a look at my version of the code. I believe I made significant progress wrt. goals 2, 3 and 5.

I did like the idea of loading the DNA and copying it to instances of a species insetad of reloading it for every individual.  What really needs to happen is that the Robot class needs to be split up into like 6 different core ideas.  Major refactoring work.  I'm still working on what sorts of splitting need to be performed.
« Last Edit: November 09, 2006, 12:33:33 AM by Numsgil »

Offline frankle

  • Bot Neophyte
  • *
  • Posts: 21
    • View Profile
C++ refactoring
« Reply #8 on: December 18, 2006, 09:52:13 PM »
Ok, after reading this thread I have a few comments.

The idea about making an engine dll is just common sense with a project like this. Having the guts be independent from the GUI should be a core priority. This has more implications than just cross-compatibility. It also makes things like server/client architecture easier for one, and the ability to use the C# GUI development tools is essential as well.

For extensibility and scripting, I'd suggest looking into lua (http://www.lua.org), which is a lightweight programming language developed specifically for extending programs.

I'm curious at this point why the VB fork hasn't been feature frozen? It seems counter-productive to me to be actively extending the feature set of one codebase while rebuilding in another language. I understand reluctance to learn another language, but wouldn't it be more productive to have all hands on deck on the new codebase? Let the VB 6 codebase die with VB 6.

As far as multithreading goes, I think that having 'an eye' to MT is a bit underpowered. Ideally every procedure that can benefit from multithreading should be designed to make use of it. The real trick is knowing if a routine actually benefits from MT.

Lastly, documentation is ABSOLUTELY ESSENTIAL to fostering community involvement. Code without documentation is like coca cola without sugar. It just sucks. I can't emphasize how important documentation is to community support for a project.

Anyway, I'm certainly a newb to this project, and I don't mean to ruffle any feathers. Just my $0.02
« Last Edit: December 18, 2006, 09:56:27 PM by frankle »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #9 on: December 18, 2006, 10:41:17 PM »
Quote from: frankle
The idea about making an engine dll is just common sense with a project like this. Having the guts be independent from the GUI should be a core priority. This has more implications than just cross-compatibility. It also makes things like server/client architecture easier for one, and the ability to use the C# GUI development tools is essential as well.

At the moment I'm leaning more towards implementing all of it in C# (it's won me over ).  This would still allow for a core engine DLL, I think.  I have played around with having multiple solutions for the same project, since custom GUI controls need to be in their own DLL.

Quote
For extensibility and scripting, I'd suggest looking into lua (http://www.lua.org), which is a lightweight programming language developed specifically for extending programs.

I had the same idea.  Lua seems the best choice to me.

Quote
I'm curious at this point why the VB fork hasn't been feature frozen? It seems counter-productive to me to be actively extending the feature set of one codebase while rebuilding in another language. I understand reluctance to learn another language, but wouldn't it be more productive to have all hands on deck on the new codebase? Let the VB 6 codebase die with VB 6.

It was feature frozen for a long time.  But after about the 6 month mark and no shiny new C++ version forthcoming, Eric joined the forum and started tinkering with the old VB source to fix some longstanding problems.  It's sort of rolled from there.

Quote
As far as multithreading goes, I think that having 'an eye' to MT is a bit underpowered. Ideally every procedure that can benefit from multithreading should be designed to make use of it. The real trick is knowing if a routine actually benefits from MT.

I'm pretty well versed in the "theory" of MT, but I haven't done alot of work on anything as large as Darwinbots.  Which means I'm pretty sure I know when and where to MT, and I know what pitfalls to look for and identify (race conditions, deadlock, etc.) but I'm not as well versed in the best practices to control these issues.

Quote
Lastly, documentation is ABSOLUTELY ESSENTIAL to fostering community involvement. Code without documentation is like coca cola without sugar. It just sucks. I can't emphasize how important documentation is to community support for a project.

This is alot of why I'm really liking C# at the moment.  Alot of documentation is done automatically if you provide the appropriate tags.

Quote
Anyway, I'm certainly a newb to this project, and I don't mean to ruffle any feathers. Just my $0.02

Feel free to ruffle away   Pretty soon I'm going to be placing the VB and C# code into SVNs along with the C++ code (which already is), so all the code will be in the same place.  Then it should be pretty easy to look at changes to any of the code.

Offline frankle

  • Bot Neophyte
  • *
  • Posts: 21
    • View Profile
C++ refactoring
« Reply #10 on: December 18, 2006, 10:45:44 PM »
Quote from: Numsgil
Pretty soon I'm going to be placing the VB and C# code into SVNs along with the C++ code (which already is), so all the code will be in the same place.  Then it should be pretty easy to look at changes to any of the code.

I <3 SVN, I use it for everything. Well, not everything... but you know what I mean.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #11 on: December 18, 2006, 10:51:01 PM »
"It slices, it dices, it makes julian fries."

Offline Jez

  • Bot Overlord
  • ****
  • Posts: 788
    • View Profile
C++ refactoring
« Reply #12 on: December 18, 2006, 11:22:00 PM »
Your 2c is always welcome, I'm not a programmer so I hope you don't mind if I ask whether, as it seems, you have relevant skills in this field and whether you are considering playing a part in this project.

Quote from: frankle
I'm curious at this point why the VB fork hasn't been feature frozen?
Lastly, documentation is ABSOLUTELY ESSENTIAL to fostering community involvement.

My little potted history on DB programing,

First created by Carlo, PY took the banner and teaching himself VB made many improvements to it, next came Nums' who has now been sidetracked by the C++ transfer and finally Eric who has been a bit of a superstar when it comes to addressing problems and new features in the current program. (My appologies to anyone I've missed, I know other people have taken an active part in fixing the code and I have only watched from the sidelines.)

Firstly, I think the VB fork hasn't been frozen because the C++ idea is a 'figure in the wings'. It would seem wrong to many peeps that a proto-idea, that hasn't been implemented yet, stops the DB program evolving. I totally agree that the continual evolution of the program will create inumerable problems for Nums' but until the C++ project is 'ipso facto' it remains a pipe dream for many of us. All credit to those to whom it's due, but all the programming is done by volunteers when and if they can make the time available.

Secondly, documentation, the Wiki part of the forum is scarcely more than a out of date guide to DB, if people, including me, had more time to spend creating documentation for the program or less important parts of DB to address, then I'm sure it would be done. Instead, as DB is always a work in progress and generally designed/modded by one person at a time, the documentation comes last. Often only added because of questions from someone else.

If you can sketch out a plan for how we might get programmers working in symbiosis, something we have been trying to achieve with the bots as well for quite a while, then I am sure Eric would be happy to consider it, after all it might lighten his workload and allow him more time to play with the program rather than just fix and amend it.

EDIT

and Nums' got there before me...
« Last Edit: December 18, 2006, 11:24:37 PM by Jez »
If you try and take a cat apart to see how it works, the first thing you have in your hands is a non-working cat.
Douglas Adams

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
C++ refactoring
« Reply #13 on: December 18, 2006, 11:55:08 PM »
I think getting the VB source into an SVN will allow simoltaneous efforts.  Originally, when I first joined, almost all of the code was in a single file.  It was almost impossible for two people to work on it at the same time.

As Jez says, until there's a real and finalized final product for "the next generation", people are going to want smaller bug fixes for the VB source.  And it's almost impossible to fix bugs in DB without adding features

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
C++ refactoring
« Reply #14 on: December 19, 2006, 01:00:28 AM »
Here's my $0.02.

The first 90% of a software project takes 90% of the time.  The last 10% takes the other 90% of the time.  While the current C++ may limp along and almost certainly has many architectural advantages over the VB source, I predict many months of stabalization ahead just to get it to the level of the VB source w.r.t. day to day usability much less feature set.  Just take a look at the topics in the bug forums for the last 6 months.  2.4 was working pretty much when I came on board.  I've made hundreds of bug fixes since then and we don't even have everyone off of 2.37.6!

In my experience, the way to port code is to take baby steps and change as few things as possible at any point in time, maintaining or at least recapturing stability at each stage before embarking upon the next.  People have to be able to use the code at each stage and you don't dare go too long between stable, usable versions.  In particular, changing programming languages and making major architectural changes at the same time is a sure reciepe for instabiliy and discouragement.

Don't get me wrong.  I'm all for porting to C++ or C# (though I make arguements below against many of the reasons put forward for doing so).  Hell, I used to teach C in grad school and most of my own recent work has been in C#, so I'm all over moving to one of those languages.  But IMHO, the proper way to go about it is as follows:

1) Move the current VB source into a source code management system.  I'm happy to have people help or add features if there is a structured way to do it.
2) At some point, agree to freeze at a certain version where the only changes tot eh VB version from that ponin forward are major bug fixes which must get checked into both forks from that point on until the VB version is suplanted and obsoleted.
3) Do a staight across port to C# or whatever with absolutely no architectural changes.  I mean none.  Get it stable first.  We use the same lousy data structures, the same update loops, all in a single thread.  No getting nuts with classes and methods and all that crap.  Straight across.  I'd even suggest we fake out some of the VB methods like Circle() to preserve the investment in code at least for the initial port.  We get it stable and usable and everyone using it before we make ANY major architectural changes.  This includes threading.  The current architecture will be easy enough to thread where it counts when the time comes.
4) We make architactural changes bit by bit, gaining stability back each time we do something major.  This is where Num's work on his C++ fork becomes very useful.  It's a prototype, a proof of concept demostrating many new archtiectural concepts and a new set of physics.  We take what we want, bit by bit, but we start from the current VB source instead or I predict we will have a very long climb to get stable.

An alternative order would be to back port some or all of Num's new physics into the VB source before freezing and porting.

Now, my $0.02 would not be complete unless I pointed out that a port is a lot of work.  A lot of work, a lot of sideways work.  We have to be very very sure we want to do it and we want to be very very sure of the reasons why we are doing it.  Let me play devils advocate for a moment.  Why do we want to do it?  So we can have multiple code authors?  We can do that on the VB source by moving it to SVN or some other source code management system.  For performance reasons?  Don't be so sure.  I could probably double the perfromance of the current VB code with a couple of weeks of focused work without losing stability.  If perf is the main goal, some serious profiling and code reviews would reap more gains for less pain.  For scalability across multiple CPUs?  Yes, VB is single threaded, but DB is processor bound and does basically no I/O.  That one thread is always busy, making maximum use of a single processor.  You think moving to multiple threads will speed things up?  Not on a single processor machine it won't.  All else being equal, the added context switching will slow it down on single proc machines by maybe 10% or so.  Got a dual core box you say?  Use teleporters, run two connected sims and utilize 100% of both processors.  Don't get me wrong, I love theads.  Hell, I even like fibers when used right.   DB would love threads and scale well, but only on machines with the processors to take advantage of a threaded architecture.  Better physics maybe?  Look, code is code.  If you want elastic collisions or bouncy walls, back port the algorthims.  Moving to a differet programming language is no panacia.  It's the algorthms that count, not the programming language as far as physics go.  Separating the UI from the engine?  Future client-server versions?  Graphics packages?  All good stuff and admittably, harder to do from VB.  But we need to be realistic about why we are porting.  Those things are nice, but not near term and not the first reasons many people site for porting.

Okay, all that said, I'm still all up for porting to a 'better' language or at least a newer version of VB (with threads).  I just want us to do it with our eyes open.  It's a lot of work.

EDIT - Oh, I should point out we will take a perf hit with C#.  Managed code is wonderful, you don't have to worry (as much) about memory leaks and such, but the garbage collection costs you maybe 10% on the client.
« Last Edit: December 19, 2006, 01:06:52 AM by EricL »
Many beers....