kill -FKOFFDAMMIT 25208

blog entry posted by lalo (Lalo Martins) on 2008-11-09 05:23:00

Tags:

You know, we (Unix-y people) need a new signal, stronger than SIGKILL. As satisfying as it can be to type “killall -KILL firefox” (we just know that's the actual reason you still love the command line), there are a number of situations where that will still not get rid of the damn process; for example, if it's in the middle of some syscalls, specially nfs (grr!) or swapping (which is precisely when you need to kill it). So I'd like to propose a new signal which, let's say, waits for half a second, and if the process really doesn't respond, then gets rid of it for good, regardless of what else it was doing. In the middle of your quality toilet time, with pants down and all? Who cares, just get out. (If the process does respond, then I suppose do a -TERM... or rather, the other way around; send a -TERM, wait half a second, and if nothing seems to be happening, then bring out the ultraviolence?)

Here's a list of suggested names for the new signal.

and my personal favourite:

XML considered harmful, or,

blog entry posted by lalo (Lalo Martins) on 2008-10-25 15:37:00

Tags:

I have, on a number of occasions, stated that XML is harmful, and should be taken out and shot. So here I am today, to explain why I think that, and offer alternatives.

Not good for humans

The main problem is, of course, that XML was never intended for humans. It's not designed so that we can efficiently write it, read it, understand it at a glance, or maintain it. But many tools that use XML today tend to forget that, leading to hours of wasted time and lots of frustration. (XML for configuration files, anyone? Zope's ZCML and .Net's configs and all those Java frameworks?)

Then, of course, that's not XML's fault; it was never designed to succeed at that task. The fault lies with developers who misuse it. Well, yes and no. The reason people misuse it is because it's overhyped; XML is the new peanut butter (or garlic butter, according to Pete Abrams) — adding it to anything makes it taste better and sell more. (I don't even like peanut butter.)

Not good for machines

What it was designed for is communication between programs; an unified, extensible format for data transmission. By having libraries to handle it in most languages and environments, you'd make it easy for developers to deal with it, and as a consequence, to make their programs communicate.

However, after roughly ten years of working with it, it is my informed opinion that XML fails at that, too. I'm not saying it got supplanted by better technology which we invented later. It did, to be fair. But what I'm saying is that it was wrong from the beginning. And if it's not good for us and it's not good for our programs, why are we still using it? (Peanut butter, I know.)

So let's try to break out of the hype and prove that it's bad for our programs.

The perceived problem with XML can be summarised in one sentence: XML is costly to parse. But that's too superficial; let's go deeper, look at the specifics, and the flaws in philosophy/design that lead to this perception.

Parsing XML: layers

I usually tell my co-workers that there's two “layers” to parsing XML. While that is true, it's only true in the context of our data; if I were to make that statement more generic, I'd say: there's always at least two “layers” to parsing XML.

The first, the “bottom” layer if you want, is syntactic parsing. This means reading XML itself: tags, entities, attributes, comments, CDATA, PCDATA, white space, the works. The input to syntactic parsing is a string or stream of bytes; the “output” is an API — SAX, DOM, ElementTree, you name it.

On the opposite end of the stack, the “top” layer so to speak, is semantic parsing, or extracting the data you're actually interested in. The “input” here is a generic API; in the typical case of two layers, the API from syntactic parsing. The “output” is a domain-specific API or, more commonly, a collection of structured data (usually objects, nowadays).

An example where you may have more than two layers is when you're using something else built on top of XML; the most common case being feeds. So at the bottom layer something will parse XML, then another chunk of code will parse that as RSS or Atom, and then your semantic layer will actually extract the data. At work, we initially made our data available as RDF; so we had a second, “middle” layer (we actually used a JavaScript RDF library) which would parse the RDF, and then we did our semantic parsing by using the RDF library's API. That made our code a lot simpler, but it also made it a lot slower; so we later switched to ignoring the RDF and simply treating it as XML. (Even later, we switched to a JSON format.)

Syntactic parsing: too much structure

Syntactic parsing is what XML is supposedly “all about”; the point being, you don't see it. In our case, at work, it's done by the browser (which gives us DOM with a touch of XPath). In pretty much any other case, it will still be done by your environment (the browser, in our case; JBoss and .Net are other examples), or by a standard library.

Well, that's great, right?

It is, yeah. But it hides the fact that those libraries (even if it's “hidden” in the environment, it's still at some level done by a library) tend to be huge and ridiculously complex. The XML syntax is designed to cover an enormous universe of cases that your program will concretely never encounter, and yet, you have to pay the complexity cost for them.

Semantic parsing: not enough structure

XML shines on xHTML: a markup language for text, where you have arbitrary streams of text sparkled with special instructions about it. Some of those “instructions” are really containers, which have more text and instructions. XML does that really well.

It shines a little less on something like SVG, where it represents arbitrary streams of heterogeneous objects. Some of those contain other objects, and XML does help there.

But the truth is that, for representing your program's data? It probably sucks. Its model is very different from the object model of most (all?) popular languages and frameworks today. In the end, we find ourselves designing our data structures as many as three times: once in the language in which we're actually writing it, one in a relational database, and one in XML. The mappings between them are often poor, since the semantics of the three models are so poorly matched.

Sadly, it would be relatively trivial to pick a lowest-common-denominator model that would fit all of today's popular languages. But XML didn't even try.

That's not the whole of my objection, though. Due to the MASSIVE FAIL in the syntactic layer, we get a semantic layer that's only marginally simpler than it would be to parse a DSL (domain-specific language); maybe less simple, if you use a good library for your DSL. There are about half a dozen XML APIs in wide use; smart people are frequently getting annoyed at the ones already there and coming up with a new, better one. And although a modern offering like, say, ElementTree can be light-years ahead of SAX or DOM, it can't help being clumsy and feeling unnatural to the language; at the bottom line, what it's doing is dressing up a rotting corpse.

Conclusion

Here's a better phrasing then, for the problem of XML as I see it:

XML has too much structure where it doesn't help, and not enough where it matters. One of the reasons I love JSON is that it's not designed to mark-up text, or to transfer “streams of data”; it's designed to transfer objects (JSON means “JavaScript Object Notation”), which means it maps nicely to my code on both ends, whether that code is JavaScript, Python, C++, or even C. (It maps nicely to Java as well, but who cares.)

Alternatives (existing and ideal)

Right now, for real-life code, most places where you're using (or thinking of using) XML would probably be better served with JSON. A few more complex cases may justify a DSL, but I would hesitate a lot before going down that route.

Ideally, I'd like to propose a new format; an “active” derivative of JSON, inspired by the modern practise of “JSON with callback”. Essentially, I'd like to replace JSON's “flat” object notation ({'attr1': 'value', 'attr2': 'value'}) with something which looks like a Python constructor (MyClass(attr1='value', attr2='value')). The pseudo-classes (or pseudo-functions if you're looking at it from C) would play the role that tag names play in XML elements, which would make it even more straightforward to map this data to actual objects on each end.

This would, of course, lose the benefit that “JSON with callback” can simply be executed in a browser. But then again, “JSON with callback” is not formally correct JSON anyway, so we already sacrificed some portability for that ability. “Real” JSON is usually converted to “JSON with callback” by a simple routine on the server side. A similar transformation could convert the format I'm proposing into JavaScript; the fragment above would become: MyClass({attr1: 'value', attr2: 'value'}).

Billy's old car

blog entry posted by lalo (Lalo Martins) on 2008-02-05 11:01:00

Tags:

Billy has a broken car. It's broken mostly because he doesn't know how to run it, of course; and more importantly, because it's making Billy rivers of money even though it's broken, so there's no incentive to learn to run it right.

Then Jerry drives by with an incredibly shaggy, shitty car, with a few nice accessories.

Billy immediately thinks: I'll buy Jerry's car, and use the parts to make mine less broken!

Of course, Billy still not knowing what to do with a car, it will remain broken, or best case, be fixed for a short time before he breaks it again. And it will still be hugely profitable. So to be honest, I see nothing wrong with Billy's logic.

Yeah, there's a reasonable chance it will make zero difference to Billy's car in the long run. But it will make people talk about Billy even more, which of course, puts more money on his pocket. Maybe even more than he's offering.

Spam, beautiful spam, lovely spam!

blog entry posted by lalo (Lalo Martins) on 2007-08-31 02:34:00

Tags:

Spam is so useful! I keep being surprised every time I clean my spam box!

See; this MegaDik thing supposedly will add 4 inches to my length down there. Did you buy yours yet? All my 7 girlfriends have been complaining I need another 2 inches. Well, I'll get 4 and they'll be happy!

Of course, I can only have the energy to have sex with so many girls because I always buy ciallis from this other nice guys that keep sending me spam. In fact, I GOT one of these girlfriends thanks to those great "win a free ipod" spams -- since I already had 3, I gave the new ipod to a girl I was interested on, and she immediately took off her clothes!

And it's only fair that I spend so much money on spam. After all, my email seems to win some lottery or another twice a week, I don't even have to work anymore... not to speak of all the money I got helping that nice gentleman that had some trouble in Nigeria!

I'm so happy that we have spam. When we started using the internet, we had no idea it would eventually become such a huge factor in our lives.

Better stop now; my eyes are welling up.

On "the boundaries between computers"

blog entry posted by lalo (Lalo Martins) on 2005-09-23 16:39:00

Tags:

Rough sketch of how I would like to use my laptop and my (more powerful) office computer (let's call it the "station"). For the record, the station has two monitors, as of this writing.

When I say data is "in" a computer, I mean I'm able to use this data when I have this computer and no other, and I'm unable to use this data when I don't have this computer.

older posts