The Ghost Who Codes: 2014

Thursday, 15 May 2014

Lua fun

Today the following code (lua) floated past my desk with a request for any improvement.

The background is that we need to detect auditable events on an external source, and then assign those events to a pool of workers to be processed. These workers are not 'threads', they are fully-fledged processes that may be on remote machines. It is expected that more than one process may be free and try to pick up the event. The tool which will hold the queued events is a database, as it is a lightweight conversion from a RabbitMQ based approach in use at the moment to a comparable implementation that uses the existing database of one of our products for customers that do not want to maintain RabbitMQ in production.

    
    -- get next row
    id = db:get_next_event()
    
    -- attempt lock, may fail with unique constraint.              
    ok, err = insert_event_lock(id,owner)         

    if ok then 
        return id
    else

        -- analyze error message for constraint violation
        if process_error(err) == "duplicate" then 

            -- retry immediately
            return nil          
                  
        else

            -- mark as error and retry after timeout
            error("row is bad") 
                  
        end

   end

That's ok, but I didn't like the way the process_error() relies on detecting the name of the unique index we are expecting to be violated when load increases, and the general approach of using an database exception for normal flow control.

Here is my improved version


    work_id = next_lock_for_owner(self)

    if not work_id then 

        -- get next action with no lock
        next_id = get_next_pending_id()           

        -- can be duplicate
        insert_empty_lock(next_id)                

        -- ‘where clause’ updates all duplicates with same id to self
        update_lock_owner(next_id,self)           

    end

    -- may be nil (get new lock on *next* select to avoid races)
    return work_id

We can do event better than that though, in fact the criteria that implements the get_next_pending_id() function, which is basically returning an id in the source that does not have a lock, well that can be implicit in the where clause of the update. We also need to add an update step before the next_lock_for_owner() step to set a 'processing' flag, which makes the implementation of the next merely return the top 1 record with the processing flag set. This allows us to run all three steps in separate processes/threads, like so:


Thread 1 (T1):

    -- sets a processing flag
    mark_next_lock_for_owner(self)                

    -- may be nil, signifies no work to do
    return next_lock_for_owner(self)              

Thread 2 (T2):

    -- get next action with no lock
    next_id = get_next_pending_id()
               
    if (next_id ~= nil) then

         -- can be duplicate
         insert_empty_lock(next_id)               

    end

Thread 3 (T3):

    -- ‘where clause’ updates all duplicates with same id to self
    update_lock_owner(next_id,self)

Key to understanding this version is the fact that it uses the transactional handling of the database for the heavy lifting. In this version, we are not trying to stop duplicate locks being made, just ensuring that it's harmless when it happens. There are three possible race conditions to deal with:

Notation will be as follows. For now, assume that there are two processes P1 and P2, with all three of these threads running in each one (in reality, the distribution of threads will be across multiple machines for scalability and robustness concerns. This leaves us with a notation like this P2T1 -> P1T2/3 which should be read as process 2 firing between process 1's thread 2 and thread 3 completion.

Another process's T2 triggers between a different process's T2 -> T3 gap. This will result in a duplicate empty lock, something harmless as the update for the T3 updates all locks with the same id.
Another process's T3 triggers between a different process's T3 -> T1 gap. This will result in a previously assigned worker having it's work 'stolen', something we expect and is harmless.
Another process's T1 triggers between a T2 and a T3: Where clause of the T1 returns only fully-assigned locks, after T3. But it's completion update (not shown) updates all locks for the id, whether or not they were assigned to it, hence closing any locks that slipped through the T2->T3 gap.
If a race condition occurs between the insert of an empty lock and the selection of the next lock, we will end up with duplicate locks. Something which is harmless as long as they are all assigned to the same worker.
If we get a race condition between the

additionally, the existence of duplicate locks is a clear measure that the system has too many workers for it's load.

Friday, 4 April 2014

Lambdas make things better pt1.

Lambdas in Java make a lot of code much prettier.

Take this quick example. Without lambdas, it's not worth the effort. With lambdas, it's pretty enough to do:

public class EndpointConnection<T> {

    public static class ConnectionConfig {
        public int maxRetries;
        public int retryDelay;
    }

    private final T endpoint;

    private final int maxRetries;
    private final int retryDelay;

    public EndpointConnection(final T endpoint, final ConnectionConfig config) {
        this.endpoint = expect(endpoint);
        this.maxRetries = expect(config.maxRetries);
        this.retryDelay = expect(config.retryDelay);
    }

    public <R> R attempt(Function<T> fn) {

        int count = 1;
        R result = null;
        while (result == null) {

            if (count >= this.maxRetries) throw new RuntimeException("Error: max# of attempts exceeded.");

            result = lambda.apply(this.endpoint);

            if (result == null) {

                count++;

                try {
                    Thread.currentThread().sleep(this.retryDelay * 1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }

            }
        }

        return result;
    }

}

Then combine with the factory method in the class in question:

public class Endpoint {

    ...

    public EndpointConnection<BridgeEndpoint> newConnection(final ConnectionConfig config) {
        return new EndpointConnection<>(this,config);
    }

}

and call like so:

return endpointConnection.attempt(Endpoint::someFunctionCall);

This is merely an example for the purposes of illustration however, I don't recommend using this sort of code when you can use something like this instead:

https://github.com/rholder/guava-retrying

Scan for a set of URLS with Jsoup.

Snippet to load a list of urls, scan them for link tags, and output them as a file - with Jsoup.

public void run(final String in, final String out, final String match) throws IOException {

        final StringJoiner joiner = new StringJoiner("\n");

        try (BufferedReader br = new BufferedReader(new FileReader(in))) {
            for (String line; (line = br.readLine()) != null; ) {
                try {
                    System.out.println("Scanning: "+line);
                    for (final Element link : Jsoup.connect(line)
                           .timeout(0)
                           .get()
                           .select(match)) {
                        joiner.add(link.attr("href"));
                    }
                } catch (Exception e) {
                    System.err.println("Error: " + e.getMessage());
                }
            }
        }

        Files.write(Paths.get(out), joiner.toString().getBytes());

    }

Where the 'match' arg will be a Jsoup matching pattern such as: a[href^=http://].

Thursday, 20 March 2014

JDK8 Initial Conversion Thoughts, pt 1

The 'vertical problem' in Java does appear to be gone with JDK8. Here is a snippet of some common code used in my daily work.

Pre-JDK8

    private BoneCPDataSource addShutdownHook(final BoneCPDataSource pool) {
        Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {

            @Override
            public void run() {
                pool.close();
            }

        }));
        return pool;
    }

As a Lambda

private BoneCPDataSource addShutdownHook(final BoneCPDataSource pool) {
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            pool.close();
        }));
        return pool;
    }

To method reference

private BoneCPDataSource addShutdownHook(final BoneCPDataSource pool) {
        Runtime.getRuntime().addShutdownHook(new Thread(pool::close));
        return pool;
    }

That's one line per step: Perform a pool-close on shutdown, return the pool argument back to caller. Nice.

Friday, 28 February 2014

Errata Security: C programming: you are teaching it wrong

Superb article:

In other engineering disciplines, you learn failure first. If you
are engineering bridges, you learn why the Tacoma Narrows failed. If you
are architecting skyscrapers, you learn why the World Trade Center
fell. If you are building ships, you learned why the Titanic sank.

Only in the discipline of software engineering is failure completely
ignored. Even after more than a decade of failure, students are still
taught to write software as if failure is unlikely to be a threat they
ever face.

Saturday, 11 January 2014

In Defence of DTOs: Why the term DTO is not a dirty word.

About:

I am aiming to show here why your project probably should use DTOs.

The code you will see here originated in some work I did whilst working in my own startup a few years ago and several older personal projects (some going back almost 10 years!). However i have updated it recently as part of this 'ghost who codes' blog project, and therefore has for the most part never been used in real projects, and it almost certainly has serious bugs I have missed. It is provided primarily for the purposes of illustration, and if you are planning to use something from this, please consider it alpha quality only. And if you do find bugs, I would love to see it forked and fixed.

Justification:

I have shared my 'generic' Dto library here (https://github.com/llaith/toolkit/blob/master/toolkit-core/src/main/java/org/llaith/toolkit/core/dto/) which implements just such behavior and is the topic of this post.

It's been said, that the DTO is an anti-pattern. That it inherently suffers from the 'Anemic Domain Model' anti-pattern. You can read about that in this excellent article: http://www.martinfowler.com/bliki/AnemicDomainModel.html . Please note that Fowler doesn't himself offer DTOs as an example.

It's simply not a given that a DTO component will necessarily be an anemic domain model, and here's why:

Firstly, a good Dto implementation will have behavior, because there are 'concerns' of the 'service facade' layer that can be addressed efficiently by using smart DTOs with this behavior. These could be around the issues of mediation of version changes, efficient object graph slicing, marshaling and de-marshaling, etc. Note that I am not talking merely about 'presentation object' matters such as getting the usual getDisplayName() because they don't belong there either. I am talking about change and conflict tracking, and perhaps session co-ordination if you have rich GUIs with many instances of these behind more than one view of the data.

Taking the idea of change/conflict tracking, at some point you will try to save/push edited data back into the system you got it from. If this data has changed (on either side), you will need to deal sensibly with those changes. Rejecting the entire edit and reloading the data is not my personal idea of sensible. Sensible for me involves merging the current set of edits together with the updates to the source, and taking note of whether they were conflicting or not, and presenting those to the client with a confirm button.

Secondly, it's only a matter of time in any decent sized 'enterprise' system before you will need to combine and present data that does not originate in your system's domain model.

For example, naively attempting to use your 'transparent domain objects' as the backing objects behind the forms displayed to the users is going to fall apart when you need to start combining and correlating data from more than one source to do so. This will happen sooner rather than later if you take the step of building your applications in a truly modular manner as that will extend to the database level and there will be some necessary pain in treating tables which could be linked with foreign keys as external data from another system.

Note also that usually (as with Hibernate), passing these domain objects back will expose the behind-the-scenes rewriting of them to support the object slicing behavior, you may get back proxies with lazily loading collections for example. If you don't get these issues when passing domain models back, it's because it's not supported and you will end up implementing this manually on some layer.

So, if you don't use DTOs or the domain model itself as the transport objects, you will be using an untyped structure such as a hash-map. This is to be avoided at all costs as what is forgotten here is that the type information is valuable in itself, it is not merely redundant. If we didn't have a DTO with methods like setFirstName(String) and just put a String instance into a Map, we'd need metadata to allow any generic framework to be able to do anything useful with it at all (even validate it!). This type information (think of it as metadata) is the 'secret sauce' that allows static analysis for tooling.

No matter how you try to mash the concerns of a transport layer and a domain layer together in one set of objects, what you end up with will be objects that cannot simultaneously be suitable for use as a DTO and a Domain Model (and to be honest, possibly not as a persistable object either, as directly persisting domain objects isn't terribly useful for some domain models eg, financial projections) but that's a more complex issue.

On of the great tragedies in J(2)EE is seeing a good idea 'try to use POJOs to reduce the invasive coupling of business logic to the frameworks you use' become 'use the same mashed together 'POJO' across all layers of your application in violation of every shred of common sense.' Even more disturbing is that these POJOs are really PINO (POJOs in Name Only) as can be seen by looking at there 'enhanced' structure via reflection.

There is only really one down side to the use of DTOs like this, and that is some boilerplate required to make custom subclasses of the DTO. It's worth pointing out here that that my implementation has more than most and that it isn't strictly necessary to meet the minimum requirements for an effective DTO implementation. For example there is no absolute need to use a 'recursive' generics declaration (which adds implementation effort), it is a particular design decision of this particular library after some painful lessons learned maintaining these systems in production. The same verbosity that is a pain during development is a blessing during debugging production issues and maintaining the code years later. This can be managed by some level of (model driven) code generation but that is a topic for another post.

Implementation:

The DTO library is complex in that it has three different 'customers' (callers) of it's methods.

    1) The client-specific (horizontal) code that uses the DTO library to implement a solution to whatever business problem is being solved. For these users, the library places a slight burden on them as there is no better object than a simple POJO for doing that, and to use this library they will have to add a POJO skin over an object that is not a POJO. It's slightly painful and I hold this up as an excellent use-case for code generation as it requires a very small and simple bit of generation that won't be affecting core functionality in magical ways (just getters and setters), and any problems would be immediately apparent to the user.

    2) The technology-specific interfacing (vertical) code that does something specific with these DTO instances, such as sending them over remote WS calls, gets them in and out of databases, or whatever. These are the consumers of the change tracking and conflict resolution etc. For these users, a POJO is a horrible tool for the job as writing generic code would require either lots of boilerplate code, lots more code generation of code that is complex and could hide issues, or use of reflection which will certainly make debugging a great deal more 'fun'.

    3) An SPI, such as the DTO interface itself. 1 & 2 type clients/consumers will probably not that interface directly, rather it will be used by the extenders of the library. While the first two users are dealing primarily with essential complexity in either the horizontal or vertical domain, the consumers of the SPI methods are dealing with primarily incidental complexity. I've tried to make that as simple as I could, although some features, such as the cyclic relationship protection, incurred a relatively (and surprisingly) large chunk of additional code.

Essentially, the design of this library can be summarized then as making it easy for the coders of (2) with as small as impact as possible for (1).

So let's see what features (IMHO) a good DTO library should offer:

First, we need to extend the abstract DtoObject into a suitable POJO form. Doing that properly will look something like this:

public class ContactDto extends DtoObject {

    public static interface Fields {
        public static final DtoField ID = DtoField.newReference("id",String.class,true,true,true);
        public static final DtoField NAME = DtoField.newReference("name",String.class,true,true,false);
        public static final DtoField EMAIL = DtoField.newReference("email",String.class,true,false,false);
        public static final DtoField PHONE = DtoField.newReference("phone",String.class,false,false,false);

        public static final DtoField CONTACTABLE = DtoField.newPrimitive("contactable",Primitive.BOOLEAN);
        public static final DtoField AGE = DtoField.newPrimitive("age",Primitive.LONG);

        public static final DtoField PARTNER = DtoField.newDtoObject("partner",ContactDto.class,false);
        public static final DtoField> RELATIONS = DtoField.newDtoCollection("relations",new TypeToken>() {
        });
    }

    public static final List> fields = Arrays.asList(
            Fields.ID,
            Fields.NAME,
            Fields.EMAIL,
            Fields.PHONE,
            Fields.CONTACTABLE,
            Fields.AGE,
            Fields.PARTNER,
            Fields.RELATIONS);


    private ContactDto() {
        super(new DtoType<>(ContactDto.class,fields));
    }

    private ContactDto(final String name, final String email, final String phone,
                      final Boolean contact, final Long age,
                      final ContactDto partner, final DtoCollection relations) {

        super(new DtoType<>(ContactDto.class,fields));

        this.setName(name);
        this.setEmail(email);
        this.setPhone(phone);
        this.setContact(contact);
        this.setAge(age);
        this.setPartner(partner);
        this.setRelations(relations);
    }

    private ContactDto(final boolean accept, final String id,
                      final String name, final String email, final String phone,
                      final Boolean contact, final Long age,
                      final ContactDto partner, final DtoCollection relations) {

        super(new DtoType<>(ContactDto.class,fields),new FieldInit(accept)
                .initField(Fields.ID,id)
                .initField(Fields.NAME,name)
                .initField(Fields.EMAIL,email)
                .initField(Fields.PHONE,phone)
                .initField(Fields.CONTACTABLE,contact)
                .initField(Fields.AGE,age)
                .initField(Fields.PARTNER,partner)
                .initField(Fields.RELATIONS,relations));
    }

    public String getId() {
        return this.get(Fields.ID);
    }

    public ContactDto setId(final String id) {
        this.set(Fields.ID,id);
        return this;
    }

    public String getName() {
        return this.get(Fields.NAME);
    }

    public ContactDto setName(final String name) {
        this.set(Fields.NAME,name);
        return this;
    }

    public String getEmail() {
        return this.get(Fields.EMAIL);
    }

    public ContactDto setEmail(final String email) {
        this.set(Fields.EMAIL,email);
        return this;
    }

    public String getPhone() {
        return this.get(Fields.PHONE);
    }

    public ContactDto setPhone(final String phone) {
        this.set(Fields.PHONE,phone);
        return this;
    }

    public Boolean getContact() {
        return this.get(Fields.CONTACTABLE);
    }

    public ContactDto setContact(final Boolean contact) {
        this.set(Fields.CONTACTABLE,contact);
        return this;
    }

    public Long getAge() {
        return this.get(Fields.AGE);
    }

    public ContactDto setAge(final Long age) {
        this.set(Fields.AGE,age);
        return this;
    }

    public ContactDto getPartner() {
        return this.get(Fields.PARTNER);
    }

    public ContactDto setPartner(final ContactDto partner) {
        this.set(Fields.PARTNER,partner);
        return this;
    }

    public DtoCollection getRelations() {
        return this.get(Fields.RELATIONS);
    }

    public ContactDto setRelations(final DtoCollection relations) {
        this.set(Fields.RELATIONS,relations);
        return this;
    }

    @Override
    public ContactDto getThis() {
        return this;
    }
}

For the record, I recommend model-based code generation for this step. Also, in the examples to follow, the terminology used is accept/cancel not commit/rollback to avoid implying transactional support for interfaces that may not be transactional in implementation.
Now what can we do with this?
1) Instantiate it and use it as a normal POJO:


        ContactDto contact = new ContactDto();

        contact.setName("Joe Bloggs");
        contact.setAge(25);

        System.out.println("Hello my name is: "+contact.getName());

2) Setting a field flags the dto as dirty:

        // new instances
        assertTrue(newContact("email@address.com")
                .setEmail("newemail@address.com")
                .isDirty());

        // existing instances
        assertTrue(newContact("email@address.com")
                .acceptChanges() // 'saves' instance
                .setEmail("newemail@address.com")
                .isDirty());

3) Setting a field manually back to the old value removes the dirty flag (no annoying false positives):

        final ContactDto target = newContact("email@address.com").acceptChanges();

        assumeTrue(target.setEmail("newemail@address.com").isDirty());

        assertFalse(target.setEmail("email@address.com").isDirty());

4) 'Resetting' (change-control counterpart to set) a field controls the stale-checking and leaves alone the dirty-checking:

        assertFalse(newContact().acceptChanges().setEmail("newemail@address.com").isStale());

        assertFalse(newContact().acceptChanges().reset("email","newemail@address.com").isDirty());

        assertTrue(newContact().acceptChanges().reset("email","newemail@address.com").isStale());

5) And resetting to original also removes stale (again no false positives):

        final ContactDto target = newContact().acceptChanges();

        assumeTrue(target.reset("email","newemail@address.com").isStale());

        assertTrue(target.reset("email","email@address.com").isStale());

6) Setting and resetting (updates from both directions) will leave the object conflicted:


        assertTrue(newContact()
                .acceptChanges()
                .set("email","newemail@address.com")
                .reset("email","neweremail@address.com")
                .isConflicted());

7) Which can be resolved by accepting or cancelling them (which pushes all non-conflicted dirty fields):


        // accept
        assertFalse(newContact()
                .acceptChanges()
                .set("email","newemail@address.com")
                .reset("email","neweremail@address.com")
                .acceptChanges()
                .isConflicted());

        // or cancel
        assertFalse(newContact()
                .acceptChanges()
                .set("email","newemail@address.com")
                .reset("email","neweremail@address.com")
                .cancelChanges()
                .isConflicted());

8) Accepting changes keeps the changes, and cancelling throws them away:


        // accept
        assertThat(
                newContact().setEmail("newemail@address.com").acceptChanges().getEmail(),
                is(equalTo("newemail@address.com")));

        // cancel to last accepted changes (will be blank if object is 'new')
        assertThat(
                newContact()
                        .setEmail("newemail@address.com")
                        .acceptChanges()
                        .setEmail("neweremail@address.com")
                        .cancelChanges()
                        .getEmail(),
                is(equalTo("newemail@address.com")));

9) A new instance will be marked as not-new if accepted:


        assumeThat(
                newContact().isNew(),
                is(true));

        assertThat(
                newContact().acceptChanges().isNew(),
                is(false));

10) Cancelled new but dirty instances are cleared:


        final ContactDto target = newUnsavedInstance();

        assumeThat(
                target.isNew(),
                is(true));

        assertThat(
                target.setName("name").getName(),
                is(equalTo("name")));

        assumeThat(
                target.cancelChanges().isNew(),
                is(true));

        assertThat(
                target.getName(),
                is(nullValue()));

More complex uses involve compound objects: 11) They support relations/nesting between objects:


        final ContactDto inner = this.newContact1();
        final ContactDto outer = this.newContact2().setPartner(inner);

        assumeThat(inner.isDirty(),is(true));
        assertThat(outer.isDirty(),is(true));

        outer.acceptChanges();

        assumeThat(inner.isDirty(),is(false));
        assertThat(outer.isDirty(),is(false));

        inner.setEmail("newemail@email.com");

        assumeThat(inner.isDirty(),is(true));
        assertThat(outer.isDirty(),is(true));

12) And importantly, you don't need to worry about cycles (that was harder to implement):


        final ContactDto inner = this.newContact1();
        final ContactDto outer = this.newContact2();

        outer.setPartner(inner);
        inner.setPartner(outer);

        outer.acceptChanges();

        assumeThat(inner.isDirty(),is(false));
        assertThat(outer.isDirty(),is(false));

        inner.setEmail("newemail@email.com");

        assumeThat(inner.isDirty(),is(true));
        assertThat(outer.isDirty(),is(true));

13) They also support change tracking of collections of such instances, including removal:


        final ContactDto one = newChild1().acceptChanges();
        final ContactDto two = newChild2().acceptChanges();

        assumeThat(one.isDirty() && two.isDirty(),is(false));

        final DtoCollection children = new DtoCollection<>(Arrays.asList(
                one,
                two)).remove(one);

        assertThat(children.isDirty(),is(true));
        assertThat(children.size(),is(1));
        assertThat(children.contains(two),is(true));

        children.acceptChanges();

        assertThat(children.isDirty(),is(false));
        assertThat(children.size(),is(1));
        assertThat(children.contains(two),is(true));

14) supporting undoing on cancel:


        final ContactDto one = newChild1().acceptChanges();
        final ContactDto two = newChild2().acceptChanges();

        assumeThat(one.isDirty() && two.isDirty(),is(false));

        final DtoCollection children = new DtoCollection<>(Arrays.asList(
                one,
                two)).remove(one);

        assertThat(children.isDirty(),is(true));
        assertThat(children.size(),is(1));
        assertThat(children.contains(two),is(true));

        children.cancelChanges();

        assertThat(children.isDirty(),is(false));
        assertThat(children.size(),is(2));
        assertThat(children.contains(two),is(true));
        assertThat(children.contains(one),is(true));

... and so forth for all additions etc. See the unit tests for more info. If you want to use change tracking of sets of objects in screen layers, an identifiable object (has at least one identity field): 15) Will allow not allow you to clear or change the identity fields:


    // clear - expected = IllegalStateException.class
    newContact("email@address.com").acceptChanges().setEmail(null);

    // modify - expected = IllegalStateException.class
    newContact().acceptChanges().setName("New Name");

16) But its ok if it's 'new' which allows you to use the same instance in 'builder' mode if the setters are fluent-style (recommended):


    // clear
    assertThat(
            newContact("email@address.com").setEmail(null).getEmail(),
            is(nullValue()));

    // modify
    assertThat(
            newContact().setName("New Name").getName(),
            is(equalTo("New Name")));

17) Those 'builder' instances can't go into a session:


    // expected = IllegalStateException.class
    this.sessions.addDto(newContact1());

18) But non-builder instances can go into sessions and start pushing changes to each other:


    final ContactDto a = sessions.addDto(newContact1().acceptChanges());
    final ContactDto b = sessions.addDto(newContact1().acceptChanges());

    a.setEmail("none").acceptChanges();

    assertThat(b.getEmail(),is(equalTo("none")));

That's all for now, more information can be seen from the unit tests in the project.