Back to Top

Tuesday, January 25, 2011

Remote debugging with Java


Sometimes you have the situation that an issue is only occurring on certain machines or only at a certain time of day. There are a couple of possible methods to investigate such an issue (like: adding extra logging), however I would like to add an other one: remote debugging trough TCP/IP.

To do this, start your java program with the following jvm paramters:

-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=23334

The meaning of the parameters is as follows:

  • server=y – this application will act as a TCP/IP server (“acceptor”) and wait for incoming connections rather than trying to connect to you
  • suspend=n – the server will not suspend on startup (alternatively you can set it to “y” in which case it will pause and wait for the debugger to connect – useful if you need to debug issues occurring at startup)
  • address=23334 – the port on which the debugger will listen. Keep in mind that only one program can listen on a given port on a machine and if the given port is not available, the given program will not start

After the program has started open your Eclipse, go to Debug configrations, Remote Java application, create a new entry and set "Host" to the machine name or IP and "Port" to 23334 (or whatever other port you've set up). Connect to it and off you go. The configuration steps for IntelliJ can be found here (I didn’t check it, but they seem right). A couple of final thoughts:

  • If your sources are not in sync with the remote jars, you will see weird stuff (like breakpoints not triggering, triggering and the “wrong” line, etc), so you should make sure that you have the same sources as the jar does. If you still get into the situation where the sources are different from the classfiles, I found that setting breakpoints on "method entry" works as expected (ie. it breaks even if the method in the classfile is on a different line)
  • You can "detach" from a certain process and it keeps running (and later on you can re-attach to it)
  • This method is of low bandwidth / overhead, so it can be used to debug servers in remote locations
  • Never, ever do this in production! unless you are absolutely, 100% certain that you know what you are doing.

Navigating (Searching) Collections


Update: this article has been crossposted to the Transylvania JUG blog.

The Java collections framework includes the concept of NavigableSets / NavigableMaps. The principle behind these interfaces is that taking a SortedSet/SortedMap you can use a subset of it. Some examples:

Given the following set:

public void setUp() {
  set = new TreeSet();
  set.addAll(Arrays.asList(1, 2, 3, 4, 6, 7, 8));

The following is true:

// Returns the least element in this set greater than or equal to the given element
assertEquals(Integer.valueOf(6), set.ceiling(5)); 
// Returns the greatest element in this set less than or equal to the given element
assertEquals(Integer.valueOf(4), set.floor(5));
// Returns the least element in this set strictly greater than the given element
assertEquals(Integer.valueOf(7), set.higher(6));
// Returns the greatest element in this set strictly less than the given element
assertEquals(Integer.valueOf(3), set.lower(4));

// Returns a view of the portion of this set whose elements are strictly less than toElement.
assertTrue(set.headSet(4).containsAll(Arrays.asList(1, 2, 3)));
assertEquals(3, set.headSet(4).size());
// Returns a view of the portion of this set whose elements are greater than or equal to fromElement.
assertTrue(set.tailSet(4).containsAll(Arrays.asList(4, 6, 7, 8)));
assertEquals(4, set.tailSet(4).size());
// Returns a view of the portion of this set whose elements range from fromElement, inclusive, to toElement, exclusive.
assertTrue(set.subSet(4, 8).containsAll(Arrays.asList(4, 6, 7)));
assertEquals(3, set.subSet(4, 8).size());

Also, the subsets / submaps / "views" remain connected to the parent collection, so adding / removing to/from the parent collection updates them:

SortedSet headSet = set.headSet(4);
assertTrue(headSet.containsAll(Arrays.asList(1, 2, 3)));
assertEquals(3, headSet.size());

// subsets remain connected
set.removeAll(Arrays.asList(1, 2));
assertEquals(1, headSet.size());

// subsets remain connected
set.addAll(Arrays.asList(-1, 1, 2, 3, 4, 5));
assertTrue(headSet.containsAll(Arrays.asList(-1, 1, 2, 3)));
assertEquals(4, headSet.size());

Finally, you manipulate the subsets and the result will be reflected in the original set (however if you try to add an out-of-range element, you will get an exception):

SortedSet headSet = set.headSet(4);
assertTrue(headSet.containsAll(Arrays.asList(-1, 1, 2, 3)));
assertEquals(4, headSet.size());
assertTrue(set.containsAll(Arrays.asList(-1, 1, 2, 3, 4, 6, 7, 8)));
assertEquals(8, set.size());

The implementation is very memory efficient, there is no copying of elements going on. One thing to consider is that by default these operations are not thread safe! Ie. if you generate two subsets of the same set and process them on two different threads, you must take care to properly synchronize the processing.

The complete source code can be found on Google Code under Public Domain or the BSD license.

How to test for the implementation of toString()


Update: This entry has been crossposted to the transylvania-jug blog.

Problem statement: you have some value objects for which you implemented toString() (for debugging purposes) and now you would like to test using a unit test that these implementations exist.

Possible solutions:

  1. Use reflection to detect the existence of the method:
    boolean hasToStringViaReflection(Class clazz) {
      Method toString;
      try { toString = clazz.getDeclaredMethod("toString"); }
      catch (NoSuchMethodException ex) { return false; }
      if (!String.class.equals(toString.getReturnType())) { return false; }
      return true;

    Advantage: no third party libraries needed. Also, no instance of the class is needed. Disadvantage: the actual code is not executed, so even trivial errors (like null dereferences) are not caught. Also, code coverage tools will report the lines as not covered.
  2. Compare the string returned by the toString method to the string returned by Object and expect them to be different. This uses ObjectUtils from Apache Commons Lang:
    boolean hasToStringViaInvocation(Object o) {
      return !ObjectUtils.identityToString(o).equals(o.toString());

    Advantage: the actual code is executed, so trivial errors are detected. Also the code will be "covered". Disadvantage: it requires an external library (however Commons Lang contains a lot of goodies, so it is sensible to add it most of the time). Also, it requires an instance of the class, so you need to be able to instantiate it.
  3. Don't use hand-coded methods at all, but rather some code-generation / AOP style programming like Project Lombok.

Again, these methods are to be used for toString methods which have debugging purpose only. In case the output of the method needs to conform to some stricter rule, more checks need to applied.

The complete source code can be found on Google Code under Public Domain or the BSD license.

Non-buffered processor in Perl


Lets say that you have the following problem: you want to write a script which processes the output of a program and writes out the modified somewere, with as little buffering as possible. One concrete example (for which I needed the script) is log rotation: you want to save the output of a program (which doesn't support log rotation by itself) to a logfile which gets rotate at midnight (because it includes the date in the name). Also, an other constraint is that you would like to “time-out” the read attempt to do some maintenance work (for example you would like to rotate your logs – create the files with the different dates - even when no data is written to it).

One possibility would have been to use IO::Select, however it doesn't support filehandles on Windows (not that Windows wouldn’t have the API to do so, it’s just that nobody was implemented it in Perl core). Fortunately we can have something very similar to it:

use strict;
use warnings;
use IO::Handle;

binmode STDIN;
binmode STDOUT;

my $BUFFLEN = 4096;
while (1) {
  my $buffer;
  my $read_count = sysread(STDIN, $buffer, $BUFFLEN);
  if (not defined($read_count)) {
    # nothing to read, pause
    sleep 0.1;
  if (0 == $read_count) {
    # EOF condition
    exit 0;
  syswrite(STDOUT, $buffer);

The magic is done here by STDIN->blocking(0); which sets the filehandle into a non-blocking mode, returning “undef” is there is nothing to read. Whenever this happens (ie. there is no data on the input) it pauses for a brief moment (1/10 of a second) and then retries.

Some other remarks about the code:

  • the input is read and the output is written as binary. This means that no processing is done which could screw up the flow (for example trying to convert data between character sets and screwing up Unicode characters)
  • care is taken to introduce minimal buffering. Output is produced as soon as the input arrives. For more intricacies of Linux buffering see this nice article at pixelbeat.
  • the code is very performant. I’ve measured throughputs up to 1.4 Gb/sec and can certainly handle anything the disk can (if we consider it in the context of log rotator)
  • the code has been tested and works on both Windows (Strawberry Perl 5.12.1) and Linux. It should work mostly anywhere since it uses Core Perl.

Monday, January 24, 2011

Comparative book review


Below is a a short comparative review of tow books about Java concurrency which I've read in the last couple of months. Disclaier: the Amazon links are affiliate ones.

Java Concurrency in Practice is an interesting book, which should be a must-read for anyone doing concurrent programming in Java (and in these days if you aren’t, you’re missing out on a whole lot of possible performance improvement). While some reader criticize it for the dense stile, it is hard to see how one could tackle such a complicated topic in simpler way (to paraphrase Albert Einstein: one needs to make things as simple as they need to be and no simpler). That said, the book definitely has the topics ordered from simple to more advanced, so even if you find the idea of reading the whole book daunting, you should look at the first couple of chapters at least. I would especially recommend chapter 3 (Sharing Objects) from part I (Fundementals) which should give a clear motive to everyone why they should be concerned by thread-safety and how they should reason about concurrent programs (I find that many concurrency errors occur because people have a naive and simplistic understanding of the way concurrency works on modern hardware).

Concurrent Programming in Java: Design Principles and Pattern (2nd Edition): The CPiJ book is much older, ancient even by computer age standards (published in 1999, compared to the JCiP book published in 2006). If also describes a much more manual, tedious way of doing things compared to the newer book. Also, it talks about the precursor of the java.util.concurrent package, since the package didn’t exists back then. All in all: if possible, get the JCiP book. If you already have the CPiJ book, it is a good introduction to the topic, however be ware that much of the advice is outdated and Java 6 (and even Java 5) contain better and simpler ways to perform the tasks described in the book.

noevir review


noevir is a "direct marketing" company focusing on cosmetics and "* care" (skin, body, etc) type of products. After looking at their site I'm mostly neutral about them. I wouldn't recommend anyone to join such ("direct marketing") organizations, but that's not specific to noevir. It also says "Ginza Tokyo" in the header, which is a big shopping street in Tokyo, but I couldn't find any other connection to Tokyo (nor did I see this brand advertised last summer when I was in Tokyo and I visited Ginza, but then again, I wasn't looking for it). I also can't find it on the BBB site (a worrisome sign), but the contact address conincides with the domain registration (a good sign) and it is a real address, findable on Google Maps. There are also a lot of negative articles on the web, but they are related to the "direct marketing" part of the business (ie. if you join as a consultant) not to the products. I couldn't find anything negative about the products.

My final verdict is: use a temporary credit card if buying from them (always a good idea when dealing with smaller merchants). If you buy something, buy it for its obvious qualities (like its scent), not for the advertised but hard to quantify qualities (like "clensing", "protection", etc). If you need help with a skin issue, consult a medic.

Full disclosure: this is a paid review from ReviewMe. Under the terms of the understanding I was not obligated to skew my viewpoint in any way (ie. only post positive facts).

scentsy review


scentsy has an interesting concept for providing different scents in the room: rather than burning different materials (like candles or sticks) it uses a lightbulb to heat the wax. This provides a "smoke-free" way to enjoy your fragrances. An other advantage of the concept is that it keeps the warm glow of the candle. If scents are your thing and you don't like the smoke part, give this a try. Take care however that some people are sensible to strong scents and they can have adverse reactions (like headache). The electric system can help here also: you can use an electric timer (easy to find is most places) to control the dosage. This is also helpful if you are concerned (as I am) about electric appliances overheating.

A couple of final thoughts:
  • it uses its own checkout system rather than something more known like Google Checkout. I would recommend using a one-time creditcard (like the ones offered by PayPal) for added safety (I just don't feel comfortable giving my creditcard details to small merchants)
  • it has an A rating on BBB (which is good)
  • it also has some kind of a referral system. I would strongly advise people against participating in such systems, but if the products are good, buy them.
Full disclosure: this is a paid review from ReviewMe. Under the terms of the understanding I was not obligated to skew my viewpoint in any way (ie. only post positive facts).

Monday, January 03, 2011

Processing clipboard data in Perl


The problem: lets say you have a program which generates data to the clipboard (or it is easier to get the data into the clipboard than into a file) and you want to process the data (create a summary for example).

Perl to the rescue!

Get the Clipboard module (if you use Linux, it is as easy as sudo cpan -i Clipboard; sudo apt-get install xclip but the package is also available as an ActivePerl package for example).

Write a script like the following:

use strict;
use warnings;
use Clipboard;

my $clippy = Clipboard->paste();
my ($sum, $cnt) = (0, 0);
while ($clippy =~ /Processed in: (\d+)/g) {
        $sum += $1;
        $cnt += 1;

print $sum/$cnt, "\n";

Profit!!! :-)

Update: you can combine this with syntax highlight for example to obtain nicely formatted source code.

Update: copying stuff to the clipboard doesn't seem to work under Linux (tested under Ubuntu 10.10) because it invokes xclip with the "primary" clipboard but it only seems to work with the "clipboard" clipboard. Unfortunately I didn't find any good material about the distinction between these different clipboard types, but the "monkey patch" below fixes the problem for me (of course I also filed a bug with the package so this should be resolved in a future version).

use strict;
use warnings;
use Clipboard;

if ('Clipboard::Xclip' eq $Clipboard::driver) {
  no warnings 'redefine';
  *Clipboard::Xclip::all_selections = sub {  
    qw(clipboard primary buffer secondary)

# ... your code here ...

Why Ubuntu 10.10 is better than Windows XP?


I want to preface this with the following: I don't want to pull a fanboy move here. The only thing I assert is that a recent OS (ie. Ubuntu 10.10) can give a considerable performance improvement (without changing the hardware) compared to an almost 10 year old OS (Windows XP).

Without further ado, compiling a large(ish) Java project on Windows XP:

real    3m16.776s
user    0m2.333s
sys     0m0.796s

And Ubuntu 10.10:

real    1m32.169s
user    2m10.488s
sys     0m12.677s

More than twice as fast! Neat!

Update: a friend just got a newer machine with better processor (Core i5 vs Core Duo) with Windows 7. The new machine with Windows 7 compiles the project in ~1m50s, so still Ubuntu seems to be the better choice.

Sunday, January 02, 2011

Java has some surprising amount of dinamism in it


Not long ago I saw some java code from Simone Tripodi. It generates synchronization wrappers around arbitrary objects at runtime in a typesafe manner with a couple of easy to understand lines of code. The heavy lifting is done by the dynamic proxy mechanism available from Java 1.5 if I recall correctly.

The downside is that there seems to be a 30% to 40% performance impact based on some quick benchmark I've done. However one can not understate the value of not having to write or maintain code!

There are surprisingly many things one can do in Java in a typesafe manner (including things usually associated with dynamic languages) which helps catching more errors early (at compile time) and take full advantage of the different helper features available in IDEs (such as auto-complete).



performance tweaks and tools for linux

Java Date objects can mutate, even when read


Ran into this problem a couple of months ago, when we saw some strange dates in production. So I dug into the Java library sources (thank you Sun for providing those!) and found that Date objects aren't always "normalized". Rather, sometimes a "denormalized" value is stored which is later (lazily) normalized. The normalized value isn't properly synchronized with regards to the Java memory model however, which means that sometimes you can get weir (and incorrect!) results.

To illustrate the problem, I've created a small program. It does the following:

  1. It creates a Date object and sets it to certain values
  2. Schedules multiple Runnable's which examine the value of the object on a threadpool

Everything looks fine and dandy, right? The object isn't changed (apparently) after being handed of to the threadpool, yet sometimes wrong answers still appear (it takes around ~30 min on my laptop for such an event). So what are the lessons here?

  • Get your API right! If the user doesn't seem to be doing writing, don't do writing!
  • You can still do lazy initialization (if you really want to), but be sure to make it thread-correct (volatile, synchronized, etc) or at least document it (even though nobody reads the documentation)
  • Source code FTW! I couldn't have debugged this without source code. Ok, maybe I could (decompiling class files is not that hard), but probably I wouldn't have bothered.
  • Finally, the solution (hack) in this particular situation is to call getTime() after setting the values, which preemptively normalizes the internal representation. Of course the proper solution would be to pass around truly immutable objects (like timestamps or value objects from Joda Time).