Automagically-translating chat thingy

blog entry posted by lalo (Lalo Martins) on 2008-12-19 20:23

Tags:

Usually, I have to communicate with the people in the building's management office via Google Translate. It works, but it's awfully painful to be constantly flipping the language drop-downs back and forth. (It's two drop-downs, one for source and one for target language.)

So I wrote a little javascript gadget that does the hard work for me, and also keeps a “log” of the conversation. You can peruse it at http://lalomartins.info/transchat.html

(Attention though: this is not a chat app, not in the modern sense. It's “chat” in the old-school sense, of actually talking to a person that's in front of you. It's... an interpreter widget, not a chatbox :-) enjoy and spread if you wish...)

kill -FKOFFDAMMIT 25208

blog entry posted by lalo (Lalo Martins) on 2008-11-09 05:23

Tags:

You know, we (Unix-y people) need a new signal, stronger than SIGKILL. As satisfying as it can be to type “killall -KILL firefox” (we just know that's the actual reason you still love the command line), there are a number of situations where that will still not get rid of the damn process; for example, if it's in the middle of some syscalls, specially nfs (grr!) or swapping (which is precisely when you need to kill it). So I'd like to propose a new signal which, let's say, waits for half a second, and if the process really doesn't respond, then gets rid of it for good, regardless of what else it was doing. In the middle of your quality toilet time, with pants down and all? Who cares, just get out. (If the process does respond, then I suppose do a -TERM... or rather, the other way around; send a -TERM, wait half a second, and if nothing seems to be happening, then bring out the ultraviolence?)

Here's a list of suggested names for the new signal.

and my personal favourite:

XML considered harmful, or,

blog entry posted by lalo (Lalo Martins) on 2008-10-25 15:37

Tags:

I have, on a number of occasions, stated that XML is harmful, and should be taken out and shot. So here I am today, to explain why I think that, and offer alternatives.

Not good for humans

The main problem is, of course, that XML was never intended for humans. It's not designed so that we can efficiently write it, read it, understand it at a glance, or maintain it. But many tools that use XML today tend to forget that, leading to hours of wasted time and lots of frustration. (XML for configuration files, anyone? Zope's ZCML and .Net's configs and all those Java frameworks?)

Then, of course, that's not XML's fault; it was never designed to succeed at that task. The fault lies with developers who misuse it. Well, yes and no. The reason people misuse it is because it's overhyped; XML is the new peanut butter (or garlic butter, according to Pete Abrams) — adding it to anything makes it taste better and sell more. (I don't even like peanut butter.)

Not good for machines

What it was designed for is communication between programs; an unified, extensible format for data transmission. By having libraries to handle it in most languages and environments, you'd make it easy for developers to deal with it, and as a consequence, to make their programs communicate.

However, after roughly ten years of working with it, it is my informed opinion that XML fails at that, too. I'm not saying it got supplanted by better technology which we invented later. It did, to be fair. But what I'm saying is that it was wrong from the beginning. And if it's not good for us and it's not good for our programs, why are we still using it? (Peanut butter, I know.)

So let's try to break out of the hype and prove that it's bad for our programs.

The perceived problem with XML can be summarised in one sentence: XML is costly to parse. But that's too superficial; let's go deeper, look at the specifics, and the flaws in philosophy/design that lead to this perception.

Parsing XML: layers

I usually tell my co-workers that there's two “layers” to parsing XML. While that is true, it's only true in the context of our data; if I were to make that statement more generic, I'd say: there's always at least two “layers” to parsing XML.

The first, the “bottom” layer if you want, is syntactic parsing. This means reading XML itself: tags, entities, attributes, comments, CDATA, PCDATA, white space, the works. The input to syntactic parsing is a string or stream of bytes; the “output” is an API — SAX, DOM, ElementTree, you name it.

On the opposite end of the stack, the “top” layer so to speak, is semantic parsing, or extracting the data you're actually interested in. The “input” here is a generic API; in the typical case of two layers, the API from syntactic parsing. The “output” is a domain-specific API or, more commonly, a collection of structured data (usually objects, nowadays).

An example where you may have more than two layers is when you're using something else built on top of XML; the most common case being feeds. So at the bottom layer something will parse XML, then another chunk of code will parse that as RSS or Atom, and then your semantic layer will actually extract the data. At work, we initially made our data available as RDF; so we had a second, “middle” layer (we actually used a JavaScript RDF library) which would parse the RDF, and then we did our semantic parsing by using the RDF library's API. That made our code a lot simpler, but it also made it a lot slower; so we later switched to ignoring the RDF and simply treating it as XML. (Even later, we switched to a JSON format.)

Syntactic parsing: too much structure

Syntactic parsing is what XML is supposedly “all about”; the point being, you don't see it. In our case, at work, it's done by the browser (which gives us DOM with a touch of XPath). In pretty much any other case, it will still be done by your environment (the browser, in our case; JBoss and .Net are other examples), or by a standard library.

Well, that's great, right?

It is, yeah. But it hides the fact that those libraries (even if it's “hidden” in the environment, it's still at some level done by a library) tend to be huge and ridiculously complex. The XML syntax is designed to cover an enormous universe of cases that your program will concretely never encounter, and yet, you have to pay the complexity cost for them.

Semantic parsing: not enough structure

XML shines on xHTML: a markup language for text, where you have arbitrary streams of text sparkled with special instructions about it. Some of those “instructions” are really containers, which have more text and instructions. XML does that really well.

It shines a little less on something like SVG, where it represents arbitrary streams of heterogeneous objects. Some of those contain other objects, and XML does help there.

But the truth is that, for representing your program's data? It probably sucks. Its model is very different from the object model of most (all?) popular languages and frameworks today. In the end, we find ourselves designing our data structures as many as three times: once in the language in which we're actually writing it, one in a relational database, and one in XML. The mappings between them are often poor, since the semantics of the three models are so poorly matched.

Sadly, it would be relatively trivial to pick a lowest-common-denominator model that would fit all of today's popular languages. But XML didn't even try.

That's not the whole of my objection, though. Due to the MASSIVE FAIL in the syntactic layer, we get a semantic layer that's only marginally simpler than it would be to parse a DSL (domain-specific language); maybe less simple, if you use a good library for your DSL. There are about half a dozen XML APIs in wide use; smart people are frequently getting annoyed at the ones already there and coming up with a new, better one. And although a modern offering like, say, ElementTree can be light-years ahead of SAX or DOM, it can't help being clumsy and feeling unnatural to the language; at the bottom line, what it's doing is dressing up a rotting corpse.

Conclusion

Here's a better phrasing then, for the problem of XML as I see it:

XML has too much structure where it doesn't help, and not enough where it matters. One of the reasons I love JSON is that it's not designed to mark-up text, or to transfer “streams of data”; it's designed to transfer objects (JSON means “JavaScript Object Notation”), which means it maps nicely to my code on both ends, whether that code is JavaScript, Python, C++, or even C. (It maps nicely to Java as well, but who cares.)

Alternatives (existing and ideal)

Right now, for real-life code, most places where you're using (or thinking of using) XML would probably be better served with JSON. A few more complex cases may justify a DSL, but I would hesitate a lot before going down that route.

Ideally, I'd like to propose a new format; an “active” derivative of JSON, inspired by the modern practise of “JSON with callback”. Essentially, I'd like to replace JSON's “flat” object notation ({'attr1': 'value', 'attr2': 'value'}) with something which looks like a Python constructor (MyClass(attr1='value', attr2='value')). The pseudo-classes (or pseudo-functions if you're looking at it from C) would play the role that tag names play in XML elements, which would make it even more straightforward to map this data to actual objects on each end.

This would, of course, lose the benefit that “JSON with callback” can simply be executed in a browser. But then again, “JSON with callback” is not formally correct JSON anyway, so we already sacrificed some portability for that ability. “Real” JSON is usually converted to “JSON with callback” by a simple routine on the server side. A similar transformation could convert the format I'm proposing into JavaScript; the fragment above would become: MyClass({attr1: 'value', attr2: 'value'}).

Review: Sanctuary

blog entry posted by lalo (Lalo Martins) on 2008-10-06 23:23

Tags:

This weekend I watched the first double-episode of Sanctuary, the new series in the Sci-Fi Channel. If you're a self-respecting sci-fi geek, you probably know that Sanctuary was created by one actor, one writer, and one producer of Stargate: SG-1, and that it started off as a web-based series. The double-episode is, in fact, the first “season” of the web series, with the tiniest bit of re-shooting and, dare I say it?, “re-post-production”.

The writing isn't bad, the acting is decent (great in some cases, but unfortunately not the lead), and the special effects are pretty good.

Still, I give it a “FAIL”. Sorry, but it's just not interesting. There's nothing new, there's nothing that happens there to keep me interested. Supernatural creatures living in secret in our world? Yawn, that was cool in the early 90s. What, so the big secret of the “mysterious” Doctor Magnus is that? Sorry, that was already old in the early 90s when the rest of the premise was cool. Also, you just ruined the “mysterious” part by revealing it so soon.

It's also too slow on the first half, lots of talking heads and little plot progress, with the second half having too much action and little plot progress. In fact, plot progress tends to happen in “bursts”, which is, sorry, not good at all.

Good try, but I won't be coming back for the next one.

Freedom for Whom?

blog entry posted by lalo (Lalo Martins) on 2008-09-24 16:50

Tags:

I think I've seen this argument for the first time in a Slashdot comment, years ago. I've since adopted it, refined it, and used it a lot myself; but now in light of the Android release, I think it's worth mentioning again.

The big problem I see with “Open Source” is that there are, in fact, two groups there. Fortunately the same is not true of Free Software, but even our arguing that it's about freedom still doesn't help... well, read on.

The thing with “Open Source” is: who is it open to?

Arguably, Open Source, as a vague, undefined thing, has existed for decades. But as a conscious, named movement with its own marketing, it spun off from the Free Software movement in the late 1990s, after the “open-sourcing” of Mozilla and the publishing of The Cathedral and the Bazaar. (Or, according to some, it spun off a few weeks later, when RMS noticed those guys were talking about something else and split off from the Open Source initiative.) Still, in hindsight, one can say things like the BSDs, and even the original Unix, were done more in the spirit of Open Source than of Free Software.

Now Free Software, with all its GNU/FSF writings, has always been very clear about its goals. We're here for the freedoms of the user. If you get a piece of software, you have a bunch of inalienable rights, rights that aren't being respected by most software, and which we intend to uphold and defend. Nice, eh?

Open Source people, on the other hand, seem to be a little confused about this. It's like watching two madmen (or drunks) arguing, each founding an argument on an entirely different premise. Some, perhaps still in touch with the “origins” of Open Source in the 90s, believe it's about being “open” to the users of the software. Others have adopted the belief (from BSD maybe?) that it's all about “openness” to the developers.

(More importantly, some of them don't realise Free Software ≠ Open Source, and mistakenly argue this in even more confusing terms; like the old fallacy that the GPL, and viral licenses in general, are bad for Free Software because they give “less freedom” than BSD-style licenses. They do, if you're thinking of other developers, who will then have the “freedom” to “steal” my software and use it in their own closed software, and not give back to the project in any way. I don't care the least about those; I'm writing software for the freedom of my users, and those have their freedoms enforced by a viral license. Now are viral licenses bad for Open Source? Honestly, I couldn't care less.)

The Android platform seems to be firmly planted in the latter camp, sadly. (Or maybe not so sadly; I rejoice with every Java-based product that fails.) It's “open”, first and foremost, for handset makers and network operators, and a distant second, to application developers. “Openness” for the end-user doesn't seem to even be a consideration. Now of course, both things are pretty much incompatible; being “open” to the operators means, really, “open” for them to “close” it in whatever ways they want; so yeah, no VOIP.

Oh well. At least I don't need to be conflicted about whether I want an Android device, whether I can stand Java long enough to actually like the OS. Clearly, that won't be a consideration, and OpenMoko — or, if they fail, someone else, probably using LiMo or FSO stacks — will be the mobile phone for me. Eventually :-)

older posts