[ale] On Programming and Programming Langauges

Michael B. Trausch mike at trausch.us
Fri Sep 17 11:07:28 EDT 2010


Given the C# thread and the volumes of comments that I have there---both
addressed and not, because of space---I thought I'd go ahead and just
send an email out with all my thoughts.  Whether or not this turns into
a thread that anyone has any interest in.  I do not expect that most
will care to read this, it's _quite_ long.  Oh, well.  Take the rambling
for what it is (or is not) worth.  :-)

There are, of course, _many_ languages out there.  And there are a lot
of people out there looking to hire for projects in language X which
have insane requirements like "five years experience in X", which is,
frankly, _stupid_.  It shows that management and HR types really know
next to nothing when it comes to the world of programming, and a lot of
programmers buy into that knowledge of nothing.  Moving on, though...

If you are a _programmer_ (regardless of what other hats you might wear,
such as system administrator) then you hold a lot of programming
knowledge that is independent of the implementation language.  For
example, you know algorithms, and you know where to look them up for the
ones that you've never used, and you're familiar with the costs and
benefits of them.  You know how to craft a hand-written parser for
simple to moderately complex data types.  You know that you have
metaprogramming tools available to you.  You know that a compiler isn't
some special magical black box, but yet another tool in the process of
translation, which is something that all programs ultimately wind up
doing (translating data from one form to another).  You are also fluent
in multiple _programming environments_.  And you are able to "program
into" them.  More on this in a minute.

It's easy to be fluent in a language.  Seriously.  It's more difficult
to be fluent with the entire standard library for a language, though,
whether that is a class library or a function library or a hybrid of
both.

Insofar as "programming into" an environment, it means that you do
creative things with the resources that you have.  For example, I often
see a lot of people say that "C isn't object-oriented".  Bull$#!t!  It
_absolutely_ is.  In fact, even assembler can be object oriented, if you
want it to be.  "Object-oriented"-ness isn't a feature of a programming
language that is depended upon to write programs in an object-oriented
manner.  Languages that support object-orientation natively have
provided a convenient notation for supporting that programming model,
and a type system that is accessible behind it, but nothing more than
that, really.  I mentioned GObject before, and I'll mention it again:
GObject is an object-oriented type system for C.  It is easy to write C
programs that use it, and it even supports refcount assisted memory
management along the lines of what you expect to see from other
languages.  It's more difficult to write classes in GObject, but there
are tools available which take the tedium out of that (for example, GOB
and Vala, both of which take code as input and output C-GObject code).

GNOME is written in C.  And GNOME is heavily object-oriented.  And the
GObject type system is really an awesome, amazing work.  It does it all
without the binary mangling problems of C++, without the managed runtime
aspects of Python, the JVM, the CLR, or whatever, and it works well,
given enough time and effort put into to being fluent with that
particular programming environment.  And it is portable across POSIX
systems and even Windows to a certain degree (because of the way Windows
manages things like network file descriptors there can be a few caveats
that the programmer needs to be aware of, but it's not that awful a
situation, really).

Now, there is the issue of whether or not it is useful to learn C these
days.  It absolutely is!  C is not a dead language, by any stretch of
the imagination.  Why do I think that?  Because there are a lot of
things about being fluent in C that provide insight to programming
problems in other languages.  If you can sort through the issues of
programming language vs. type system vs. standard library in your head
(they really are all separate things!) then you can gain the insight
that C has to offer from a programming perspective.  There was once a
time when it was necessary to program in C for everything.  Do I think
that we should return to that?  Hell, no.  There are one-off things that
I use HLLs like Python for, and I wouldn't want to write those sorts of
things in C.  But the things that I do write in C are things that are
not one-off programs; that is, things that I deploy in multiple
environments.  And writing cross-platform C code, as I've mentioned
before, isn't as difficult as a lot of people seem to think it is.

I learned a lot of programming languages before I learned C.  BASIC (in
the forms of BASICA, GW-BASIC, QuickBASIC, and Visual Basic), 6502
assembler, some x86 assembler---the BASIC family and PHP were the ones
that I used the most.  And honestly, I think that was detrimental to my
way of thinking.  You see, while BASIC is a general-purpose language
("Beginner's All-purpose Symbolic Instruction Code", wow, I still
remember that?), it does not expose a lot of the system's underlying
functionality in the way that C does.  It cannot, in fact, because BASIC
requires certain built-in statements (PRINT, INPUT, and so forth) that
are part of the core language itself.  This coupling of the language to
things that ought to be explicit function calls in the language's
runtime makes it difficult to use for "all purposes".

After I learned the many variants of BASIC and then PHP, I spent a fair
amount of time avoiding C.  People told me that it was pointless to
learn, that it was arcane, that it was a dead/dying language, that it
was a waste of time.  And so I avoided it.  At some point, I had a
problem, and I don't even remember what that problem was now, but it was
a problem with a program that was written in C.  And I went looking
through it, and because C is so bloody different from BASIC and PHP I
was utterly confused.  Now, I'd been programming in PHP for quite some
time at this point, and I was fully used to the ideas of weak type
systems.  At first, I saw C's relatively strict data typing as a
hindrance to productivity.  How wrong was I!  It's something that I wish
PHP would implement, it would reduce bugs caused by implicit coercion
that I see in every PHP project I am paid to work on.

Sorry, I digress a lot.

In any event, I had to learn C, and I was able to fix my problem.  And
then I put it off, and I didn't write anything in it because I had no
need to.  But as time went on, I kept coming back to learn C.  And the
standard library, and so forth.  I learned these so that I could
(mostly) fix problems with things I had, or at least that was my
intention at the time.  But I found that the more that I learned C and
the standard library and all of that, the more I could appreciate its
simplicity and structure.  I realized that I had been thinking about
programming all wrong!  So I started learning more about it the whole
process.  I decided that if I'm going to continue to call myself a
programmer, I needed to really understand what the hell was going on.

So I started learning about compilers, and interpreters, in a
lower-level-than-most sort of manner.  What happens when GCC is run over
C code?  All of us know that if you have valid C input, you get object
code as output, and you can link that into an executable.  How do those
executables play nicely together?  What about for other languages, is it
the same?  Turns out that C++ has non-standardized linkage while C has
standardized linkage and a lot of other programming environments take
advantage of that (too bad that PHP does not, though that is probably
because the bulk of PHP code out there would choke and die if it did).

Jumping topics again, I said in the other thread somewhere that if you
learn C, your abilities as a programmer greatly improve.  Allow me to
correct and clarify on this point somewhat.  There is some indescribable
quality of knowledge that can be obtained from working at a low level.
You get a chance to see all the things that are going on, a chance to
understand what is _really_ happening.  When I allocate some memory in
C, it's pretty straightforward compared to the creation of a new
variable in Python, at least in terms of what's really happening under
the scenes.  And if you have a problem with the Python VM, you have a
hell of a problem debugging it because you probably (as a Python
programmer) do not care to understand all the internals of the Python
VM, you just want to get things done.  So now you're holding a ball that
you will need to handle---either you find someone to give the ball to,
and depend on them to fix it, or you start learning enough about the
Python VM to be able to troubleshoot what's going on and fix it.  After
all, all software has bugs.

The lower-level your understanding is, though, the more that you can
infer.  And if you know C, you can understand other C-family languages
more easily.  I mentioned before that of this family there is C++, C#,
and Java.  Omitting the standard libraries of each, my statement is
factually correct.  The languages themselves have very similar rules in
terms of syntax and type-strength.  You have nuances that are of course
different, or they wouldn't be different languages.  But they are all
part of the same family.  PHP, OTOH, while it looks like it is a
C-family language at a glance, cannot be described as such.  Some of the
syntax is similar, but it is _very_ different in terms of how you would
write code for it, and the very simple fact that you can just $create a
variable anywhere and it's valid even if it $creates a new $Variable
changes the fundamental understanding that one has to have in order to
properly scrutinize and read a program.

To that end, I think that everyone should at least be familiar with what
assembly language is, and how it works.  I don't think that everyone
could or should program in it, because once you get "lower" than C, you
have the problem that you are now tying yourself to a particular CPU
family, and potentially even just typing yourself to a single CPU model.
While that used to be something that was useful at one point in time, it
has no payoff anymore.  You can get all the performance that you need
from a good C compiler, and if you need to squeeze more than that, you
can pay someone who will write hand-optimized assembler, and you'll pay
out the nose through it (which I suppose will tell you just how much you
probably don't need to do it, since another CPU is less expensive).

But writing code in C is _not_ as expensive as people make it out to be.
There is a *heavy* investment in learning.  *HEAVY*.  And you need to
sit yourself down and you need to find a routine that works for you in
terms of writing code.  There are multiple books on programming best
practices out there that talk about how to do things like ensure that
you're always keeping up on your memory allocations and whatnot.  And
for the most part, it is as simple as adopting a convention and
following it.  And if your convention is well-thought-out, you can
probably even write a small program that will automatically enforce it
for you.  You can use your version control system to run sanity checks
on your code, telling you when you step out of your convention.

The real expense with C and C++ code (and in fact, with nearly any other
language that I can think of) comes not in the writing of it by a
programmer, but in the maintenance of it by a programmer when the code
was originally written by someone who isn't a programmer, but writes
programs.  There's a distinction, I think.  I know many people that can
write programs, but I would not call them a programmer.  You say
"algorithm" to them and they say "what's that?"  You say "type system"
to them and they again say, "what's that?"  And many of these are people
that can put "5 years of programming in X" on their résumé!

I'm not saying that "real" programmers are immune to bugs and failures
and mistakes, but as with any field, mistakes are far less likely to
happen if you are aware of what causes them and head them off before
they can form.  If you have a brand new car and you don't want to get
ketchup in the seats, don't eat in the car.  If you are a C programmer
and you don't want to leak memory, remember to have a matching free (or
g_free, or struct_or_object_free) call to match every malloc (or
g_malloc, or struct_or_object_new) call.  You do this the same way that
you match your braces, and you're good to go (for the most part).  And
if you write library code that will be reused, and you make it easy to
do both, chances are that it'll get done and you won't forget.

In any case, I've touched on every point I wanted to touch on, for now,
I think.  If this turns into a discussion (which I'd like very much to
see, but it might be too long to keep anyone's interest) then cool.

	--- Mike



More information about the Ale mailing list