Data persistence and the DAO pattern

Abstract

The DAO pattern is a widely used approach to encapsulate data persistence behind common abstractions.
Still, many proposed instances of that pattern neither use its full potential nor do they rely on very DRY implementations. This article discusses the DAO pattern in general and from an architectural design perspective and shows how DAOs can be built on top of a generic persistence provider by means of delegation.

 

Layers of data persistence

Data persistence is a fundamental funtionality of most modern software systems. After all, programs are data transformations and as such they are built around data and this data is usually not volatile but meant to be used multiple times over a longer period of time. Hence, loading and storing data becomes an essential part of many programs.

Storage and retrieval of data are complex operations that potentially involve many devices and require combined efforts of many software and hardware components of the entire computer system. These components are organized into different layers, the top level layer usually being a well designed driver/protocol (JDBC,ODBC,NTFS etc.) that offers access to the high-level operations for its respective storage technology.

Inside an application one or more of those drivers are used to connect with (multiple) storage technologies and provide data persistence functionality to the rest of the application. Additionally, there is a multitude of libraries and frameworks that provide more sophisticated and application specific data persistence functionality built on top of those drivers – object relational mappers being one of the most popular examples.

Why that DAO?

Besides those many layers of abstraction built around data persistence which are mostly ready to be used out-of-the-box, it is still necessary to create some form of design to plug-in those technologies into the application. It should leave enough flexibility to allow the data persistence aspect to evolve easily with the changing requirements, hide any unnecessary technical complexity and adapt well with the rest of the application. There are people who argue that, for example, the JPA EntityManager is already such an abstraction but in reality it is still a technical mechanism with some of the ORM related technical details inevitably surfacing in the API. For example, the entity lifecycle is not a natural property of domain objects but a technical necessity of ORM. Also, exception handling should be done consistently and in one place instead of in every piece of code that deals with persistent data. Additionally, official specifications are not always precise about what behaviour is required and a vendor might choose to either throw an exception or return a null value. So each vendor comes with its own peculiarities, strengths and weaknesses and the ideal of being able to switch from one to the other without any change in code isn’t applicable to reality.

This is why there should be a dedicated, custom data persistence abstraction in any serious software system. Designing such an abstraction is not necessarily very difficult and if done right, will provide many benefits. One of those abstractions is the well-known DAO pattern. It is one of the older and actually not too exciting patterns but it is quite solid and not overly complex in its implementation. Still, even experienced developers get it wrong or at least don’t use its full potential. Redundant and semantically varying implementations of basic CRUD and other persistence methods, lack of static typing, DAOs with thousands of LOC, scattered access to technology specific components on multiple layers are only some possible flaws. So, using even a simple pattern can go wrong, which is why I want to share some of my learnings regarding the use of DAO pattern. I found it surprising how much one can get out of this simple design when some basic principles are followed.

Use a DAO for every persistence operation, don’t work around it

This might seam quite obvious but is easily violated when the software process lacks good code reviews, a new developer joins the team, communication between team members doesn’t work etc. If you promote the concept of a special class of objects dedicated to handle the persistence aspect of your application, then only these objects and nobody else should be in charge of it. Generally speaking, if you decide to use a pattern, use it consistently. This also helps to keep the team aligned with the design concepts of the system.

Persistence APIs are potentially rich and complex in semantics so it is easy to write code that looks about the same but executes differently under certain conditions. When the calls to the technical frameworks are centralized in a few places like some abstract DAO, developers are less likely to get their persistence operations wrong and may instead focus on getting their work done instead of browsing technical APIs. Changes in the object model are more likely to affect related DAOs and not dependent services. Implementing work arounds or using a special feature of technology XY can be done consistently in one place.

Align your DAOs with your domain model

From my experience, persistence is an essential feature of the applications domain model, ie. it’s the business objects which come with a persistence aspect. Aligning the DAO hierarchy with that of the domain model is a natural way to keep the single DAOs focused – they will only contain persistence methods dealing with their corresponding business object class(es) and inherit the more general functions from their superclass. Thus, it is always clear which DAO to use or where to add a new finder method for a special use case, which prevents persistence code from scattering into random places.

Naturally, every other object can be persistent and sometimes it makes sense to treat data and the object model separately. Even for those objects the DAO pattern can be used but maybe with slightly different interface (see first rule). You don’t have a domain model? Then be referred to [link:Domain Driven Design articles].

Don’t expose technology specific components

A DAO is a technology agnostic component for data persistence – it’s terminology consists of “load”, “store”, “update” etc. disregarding whatever storage technology may be under the hood. Honoring this principle makes it possible to plug-in different but similar storage technologies using potentially the same interface or a similar interface. Whether the DAO loads from a key-value store, an RDBMS using JDBC or JPA should not be visible from outside. This encapsulation allows to use a variety of storage technologies without forcing every developer to learn its details. It will also make storage migrations feasible and reduce maintenance costs. Of course, this approach has limitations and not all storage technologies can be reasonably unified without loosing their respective special features and it has to be carefully decided whether an attempt of unification might yield necessary benefits.

Test your DAOs

Having dedicated components for data persistence allows to test this whole aspect of the application independently. In fact, testing of basic CRUD methods can be done without the developer having to implement anything at all. More specialized data persistence code can be tested in dedicated unit tests and many bugs originating from incorrect transaction handling, invalid data etc. can be identified close to their source.

Daoism – DAO for Java

Besides the many libraries and approaches already out ther, I want to present the solution that I have come up with over the years and which I finally github(ed) (Daoism). My goal was to make writing new DAOs as easy as possible, providing as much persistence functionality as possible using a clear and expressive interface and still leave room for extension. All of that without much magic like reflective annotation processing behind the scence. Just simple object design using basic OO features.

One cornerstone of the design is that all DAOs function by delegating their work to a shared, stateless persistence provider. This persistence provider is the actual implementation of a specific persistence technology, like JPA, JDBC or the like. Concrete DAOs will inherit all basic CRUD methods and a generic way for executing arbitrary queries from a common base class, the TypedDao.

Writing a new DAO is a matter of extending TypedDao. Generic unit tests for all CRUD methods can be created in the same way, just extend the common base class. The design works with persistent domain objects (not generic data records like ActiveRecord or any other untyped tuples), so its interfaces are built around object types using Java Generics.

 
@Service public class VServerDao extends TypedDao<String, VServer> { 
@Autowired private DbPersistenceProvider persistenceProvider; 
public VServerDao() { super(String.class, VServer.class); } 
@Override protected IPersistenceProvider getPersistenceProvider() { return persistenceProvider; } } 

public class VServerCrudTest extends SpringAwareCrudTest<String, VServer> { 
@Autowired private VServerDao dao; 
@Override protected VServer createValidEntity() { VServer vserver = new VServer(); vserver.setHost(System.currentTimeMillis() + ""); return vserver; } 
@Override protected void modifyEntity(VServer which) { which.setHost(UUID.randomUUID().toString()); } 
@Override protected ITypedDao<String, VServer> getDao() { return dao; } 
@Override protected void addEntities(List<vserver> entities) { entities.add(createValidEntity()); entities.add(createValidEntity()); entities.add(createValidEntity()); }

The CRUD test provides generic tests for all standard persistence operations using the entities generated by addEntities() and createValidEntity()

  • findById()
  • findAll()
  • persistSingle()
  • persistAll()
  • count()
  • remove()
  • removeAll()

Getting these tests for free, is a great win, especially when frequent changes are made to the persistence model.  With all components being Java POJOs it is very easy to use Daoism in any managed environment like Spring or a J2EE container – in the project I use Spring to wire up my DAOs in the unit tests. Note: I do not show the code of how to wire-up the DAOs in a specific environment.

The way how delegation is used to push the actual work to the lowest layer and then have other classes built a less verbose interface on top of it, has prooven to be a good design and even if you don’t feel like using Daoism in your project, it might be worth to take half an hour and browse the source.

References

[1] Jakob Jenkov ideas on using DAO pattern. He discusses many additional details not mentioned here

Documentation is like code

After reading parts of this discussion, I felt the urge to organize my own thoughts on source code comments and other types of documentation because I do feel it is a topic that does receive far too few attention. Working with legacy code without having a decent amount of good documentation is really painful and lack of documentation generally slows down the development progress. Still, documentation is not treated as a first class citizen of a software system. Wondering why, I found many reasons partially inspired by the before mentioned discussion.

 

What is documentation?
 

Documentation appears to have many aspects and may take many forms – and it changes…and it moves. But I think in its essense, it is always intended to provide valuable information on something that requires additional explanation because it is not entirely self explanatory. In other words, every thing that is not completely obvious in its function and way of operation but intended to be used by a certain audience requires some form of documentation. The form of documentation varies wildly and may be anything from a piece of paper with poorly written instructions on it up to a full-fledged owners manual with fancy video tutorials dubbed by Chuck Norris. In this article I'll refer to them as "documentation artefact".

So, I agree that the main point of any documentation is to describe what a thing can do and how it is intended to be used and that this description has a target audience. In the realm of software development, however, there are more things that require documentation and not all the documentation deals with concrete code artefacts and their functions and the specific needs of the target audience will also vary. As a developer, for example, I am also interested in how and why a thing works which is generally of no importance for an end-user. So, my software engineers perspective defines:

Documentation consists of one or more documentation artefacts, an artefact being every thing/asset that offers explanation about a given piece of software as to

  • what it's capabilities are
  • how it is intended to be used
  • its design principles together with it's motivation and implications
  • how it is implemented (code centric)
  • how it actually is used as part of a bigger system (which hopefully is not that different from its intended use)

A documentation artefact materializes in a certain form like

  • API documentation generated from source (javadoc or the like)
  • UML diagrams or more informal drawings of system structures or interactions
  • Wiki pages, Google Docs or other formats of web based software for information management

The guy who happens to know all about the system because he's been around from the beginning is a common but unfortunate form of documentation because documentation should not be entitled to 28 days of (paid!!!) vacation per year.

Documentation is like code

An interesting observation is that, like code, documentation tends to have different levels of granularity. This is implicitly so because documentation deals with code and its higher level concepts, so it naturally tends to reflect this structure. In a well documented system you will find artefacts that describe

  • system metaphors
  • architecture patterns (pipes & filters, transaction strategies)
  • system/subsystem communication schemes
  • API and service contracts
  • component design
  • source code comments for classes, functions, algorithms etc.

So there is documentation for the different levels and parts of the entire software system and they take different perspectives on the system – as would any developer when he talks about the request processing pattern in general compared to talking about the implementation of a specific request filter.

Besides this obvious analogy in structure, it seems that documentation shares even more characteristics with code artefacts: Scope or say, responsibility which refers to what a particular documentation artefact is supposed to explain and which is very similar to what is considered the responsibility of a function/module/class. Different documentation artefacts will have different stabilities, meaning that they are more or less likely to change. Both, documentation and code can suffer from fragmentation/scattering and duplication and its severely harmful to both of them. A major problem with documentation is that it can become stale, thus diverging from the code/system it actually deals with. Something very similar exists in code: One code artefact is updated and receives new capabilities but clients do not update.

 

Benefits of documentation

As stated before, I think that documentation is a neglected and underestimated part of any software system because it does add significant value….over time. One of the main benefits I see is that good documentation will safe you a lot of time in the future. It is like an investment. As per the day of writing it, it is just a cost factor as it consumes resources. But when the first new developer can start doing work on his own immediately because he can read the documentation to understand the important parts of the system, the first portion of time and money has been saved.

As a developer I have learned that code is written once but read many, many times. Each time that I have to read code that is not documented I have to mentally reengineer its meaning and design ideas. Each time the project looses time and money. Also, each mental reengineering bears the potential for misunderstandings and programming errors. This is why I usually add comments while I read because the boy scout rule says so and I believe in it.

One cost factor of software systems is driven by code degradation which slowly introduces more and more technical debt. Providing guidelines and metaphors will increase the likelyhood that newly developed code fits well into the system, hence it counteracts degradation. It will also ensure that existing infrastructure like basic libraries and frameworks will be used as intended an as such the potential for bugs is reduced.

Having clear documentation will also help a lot when you have to make changes to the codebase on a larger scale. If you understand how the code/system is designed and why that so, then you can safely reason about changes and their implications. In contrast, if you don't know how it works, you are unlikely to change it – and more likely to introduce bugs.

 

Why is documentation neglected so much?

There are definitely many reasons why documentation is neglected and it is certainly impossible to provide a complete list but here is my try to enumerate the most important ones. I also think that there are two primary actors responsible which you can see classified in the (blame :) ) model below.

  1. Documentation appears to be generally not considered part of the solution to a specific programming problem. Assuming that (most) developers are interested in designing solutions for complex problems this means that as soon as the program is written and the solution is found, they want to move on to the next problem
  2. He who should or could write documentation is usually not part of the group interested in reading it. In other words, the people who benefit from documentation are not the ones who write it and the ones who have to write it don't get any benefit out of it
  3. Good documentation is actually very hard to write and by our intuition (and probably experience) we know that poor documentation is more harmful than benficial
  4. Often when code is written it appears obvious what the code does, hence, it doesn't need documentation. Many developers also seem to think that it is possible to write entirely self-documenting code, which, so says I, is simply not possible –> documentation is not only about code but ideas and all the abstract things around it
  5. Time pressure to meet deadlines will make documentation appear superfluous, not adding any business value
  6. The management (and thus the process) is often focussed on short term benefits, whereas documentation shows its value only on mid or long term scale
  7. Documentation becomes stale easily and is not of much value when stale. This generally hints to the lack of good tooling to support durable, high-quality documentation

 

Responsibilities (Blame mode)

programmer                                                                     management/process

- – - – - – - – - – - – - – - – - – - – - – - – - – - – | – - – - – - – - – - – - – - – - – - – - – - – - – - – -

(3)                                               (2)        (1)                                                            (5)

(4)                                                                                                                             (6)

 

 

How to improve?

First and most importantly I would say that developers should take more pride in writing good documentation. They need to understand that this in itself is a complex task and that good documentation is a vital part of any elegant solution to a programming problem.

IT managers should understand that they gain nothing if they make their developers constantly neglect documentation in favor of just the next important feature. It's only a matter of time until the slowdown of undocumented system sets in and when it shows it's already quite late to remedy. In that particular aspect lack of documentation is a like a form of technical debt [at least much of its fundamental notion applies].

I would suggest to make writing documentation a part of the daily development routine. Documentation should be included into workload estimates and undergo reviews like any other code artefact. Personally, when I have to write more complex algorithms or object collaboration schemes I start off by creating the empty methods and fill them with comments that represent the "high level" algorithm. Then I incrementally translate them into code. I think this is an established programming technique (called constant refinement) and one of my professors tried to convince us of its value [ and rightfully so ].

Selecting the right tools to store and maintain available documentation is also vital for its acceptance. Outdated documentation is not of much help – not to say, it can be very harmful – and will not be taken seriously by the majority of people. So if a great deal of your documentation tends to be even slightly old, chances are high that many people will dismiss reading it at all.

 

Quotes about documentation

Finally, I want to share some of the ideas and opinions of other people that I myself find insightful or at least delightful – Enjoy!

"I've repeatedly seen the claim that code can be written in such a way as to make comments unnecessary.  But in almost 30 years, I've never seen such code."
          —- Hardware Guy —-

"Methods names don't provide context and they don't tell you why. These are the most important things when looking at code."
          —- Unknown —-

"You don't comment primarily on what is in the code. You comment on what isn't in the code (the why, the business case it's trying to address, the reason for doing it this way and not another, any pitfalls to watch out for when refactoring, and so on)"
          —- Stephen Jones —-

"You could use meaningful names instead but do you really want controls called
PrintReportToShowHowManyApplicantsFromCountrySpecifiedinCountryOfCurrentResidenceDialogBox
ArrangedInReverseChronoligicalOrderofApplicationDateButton
"
          —- Stephen Jones —-

"[...]inherting a system this well documented, especially if the domain knowledge is complex, is awesome!  It makes life sooo much easier."
          —- Canuck —-

"Documentation is the single most important change I've made to my coding style in the last year"
          —- Zach Holman —-

References

[1] Fifteen Thoughts and Tips on Writing Software Documentation

[2] 5 Keys to Creating Docs that Rock

[3] Discussion on stackoverflow.com about self-documenting code

[4] Readme Driven Development

Talk on actors – A topological perspective on software systems

While playing around with different design solutions for concurrency issues, I enevitably stumbled upon the actor model, which has become quite popular by now. Writing software for a multi-threaded (read: highly concurrent) environment requires to handle things like process synchronization, data dependencies, visibility while avoiding nasty side effects like deadlocks and the unlimited possibilities for race conditions.

Synchronization mechanisms like semaphores and other types of locks present a pretty low level solution that still leaves much of the thinking to the developer. He has to find all signifant variants of program execution and ensure that all threads are always correctly synchronized. This is especially difficult because bugs are not easily reproducible and which also makes it hard to write convincing tests. So there is a strong need for programming models that fit more naturally to the world of parallel program execution. One of those design approaches is the actor based programming.

The benefits of the actor model is that it tries to eliminate shared mutable state (which is one of the major evils in concurrent programming) by localizing computations and data mutations into single- threaded components, called actors. So each actor is accessed non-concurrently and works on its local copy of data, the system takes care of spawning actors and delegating work to them. Actors communicate via immutable messages and communication paths can be setup very flexibly and dynamically reordered. To me, the actor model is a design approach that relies on the very principles of object oriented design: Collaborating objects with well-defined responsibilities which encapsulate state and communicate using immutable messages. Lately, I stumbled upon this talk on infoq.com and it helped me to see actors from a different perspective.

Now, I understand that by using actors you additionally start to think about your system in terms of its topology – a network of specialized nodes and the communication flow between them. Structurally, an actor corresponds to a node in this topology graph and different nodes are grouped to form coherent subsystems. From a behavioral view actors form hierarchies where some actors manage others  – which in turn may manage just another set of actors.

Each actor class  is a point for extension. It can be replaced, refined or incorporated into a newly created subsystem. Because actors communicate by passing immutable messages, communication can be intercepted and rerouted in a very flexible way. I do see several analogies to AOP principles, which is basically about refining existing code with modular extensions (by using pointcuts and advises). With actors pointcuts are actor invocations and an advice is just an actor that takes over the control before some other actor is invoked. Intercepting messages and introducing additional behaviour in form of new actors is the natural way to go.

Synchronization of actors is usually done by a queueing system which may become the overall bottleneck for systems with very high-transaction rates. This reportedly was one reason for the invention of the Disruptor framework since the synchronization of actors started to eat up  most of the processing time when many transactions where processed in parallel.

Java event bus library comparison

Introduction

Event based system design is a viable solution for many technical use cases that require decoupling of individual components. Especially in GUI frameworks, components communicate with each other by means of sending and processing(receiving) events. Modeling system interaction in an event driven way decouples the different components and allows to introduce new behaviour into a system at runtime. It is not the goal of this article to explain the pattern in detail, nor will it give references to scenarios where eventing is applicable. Instead, in this article I will try to compare existing eventing solutions with respect to their offered features and performance.

 

My personal motivation to writing this article is that I created an event bus system that I recently released as an open source project. I created the MBassador library since none of the available solutions offered the functionality I needed and I felt that I wanted to give it a try. Since I was interested in how my solution compares to the other available libraries I started writing some test scenarios that I ran with every of the listed libraries using a general adapter for each.

 

The Candidates

Google Guava Event Bus

The Google guava library contains many very useful and powerful general purpose datastructures, most of them with concurrency support built-in. I really like the library a lot and have used the cache implementations frequently. Google Guava was my first choice until I discovered that the event bus uses strong references. Additionally, its feature list is quite small.

http://code.google.com/p/guava-libraries/

Tested version: 14.0-rc1

Simple Bus

As the name suggests, a very simple implementation of an event bus. The good thing is, that it uses weak references to the subscribed listeners such that one can simply forget about what was stuffed in. Event publication is very fast. It is the only bus that supports cancelation of event delivery (vetoing).

http://code.google.com/p/simpleeventbus/

Tested version: 1.2

EventBus

EventBus comes with a lot of features and has been around the longest (I guess). It is used in various projects and I expect it to be quite mature. Due to the number of features it offers it is quite slow.

http://www.eventbus.org

Tested version: 1.4

MBassador

I proudly present you my first open source project. MBassador was designed with performance and ease of use in mind. It works well in concurrent environments and has generally a very high throughput. It uses weak references to the subscribed objects. It does support customzation using custom implementations of subscriptions and different dispatch strategies. Feature wise it roughly compares with EventBus but highly exceeds it in performance.

https://github.com/bennidi/mbassador

Tested version: 1.1.1

 

A brief feature comparison

The following table is a currently incomplete list of features offered by the different libraries. I only included the main features that I consider fundamental for an event bus solution. More exotic features offered by some of the products where excluded since the primary intend of this article is to compare the performance of the different products for the most common use cases.

 

Event Bus

Listener declaration

Synch. dispatch

Asynch dispatch

Filtering

Event type hierarchy

Multimode

Reference Type

Handler Priorities

Google Guava

Via annotations

Yes

Via specialized class only

No

?

No

Strong

No

SimpleBus

Via annotations

No

Yes

No

?

No

Weak

No

EventBus

Via annotations

Yes

Yes

Static?

Configurable

?

Both

Yes

MBassador

Via annotations

Yes

Yes

Static

Yes

Yes

Weak

Yes

Note: I can not guarantee that this feature list is completely correct since I did not inspect all the solutions in much detail. As per the date of writing this article, the documentation for the EventBus was not available online. I am happy to include corrections and additions to this list, so feel free to comment.

 

Listener declaration = How are listeners defined? Is it non-invasive or does it affect the listeners class hierarchy?
Synchronous dispatch = Is synchronous event dispatching supported (publication method blocks until event is received be every handler). This is the most common mode of operation.
Asynchronous dispatch = Is asynchronous event dispatching supported (publication method returns immediately and event delivery is run asynchronously in a different thread).
Filtering = Does a mechanism for event filtering exist, such that handlers are not notified of every event that matches their parameter type?
Event type hierarchy = Is the type hierarchy of the published event considered during event delivery? Implementations that do support type hierarchies for events will deliver the event E to all handlers that accept E or one of its super types
Multimode = Is it possible to use the same bus to dispatch events either synchronously or asynchronously?
Reference Type = What kind of references are used to store subscribed listeners. Strong references require the client code to take care of proper deregistration of subscribed listeners or else a memory leak occurs.
Handler Priorities = Can the execution order of handlers be influenced by an ordering criteria?

The Test scenarios

For the performance tests I created various scenarios that isolate the different functionalities of the tested libraries. Scenarios are implemented as Runnable that take an implementation of the adapter interface. The adapter is the common interface I designed to make the basic functionalities of each library available in a standardized way and is a very thin wrapper around the method invocations of the actual event bus component. All tests are run single threaded and multi threaded – each table shows how many threads are used and the workload per thread. For each single-threaded test a multi-threaded one with the same overall workload is run to evaluate how concurrent access affects the components performance

Note: Since some of the event bus implementations use weak references and others not, a strong reference to each listener will be created before it is subscribed to the bus. Of course this impacts the overall performance but it does so equally for all tested implementations. Furthermore, the SimpleBus needed some extra treatment since it publishes events asynchronously and all tests of SimpleBus therefore need a bit of extra code to wait until event delivery is really finished (all handlers have actually been invoked)

 

The code of the performance comparison can be found here https://github.com/bennidi/eventbus-performance

Test machine: Samsung X22 Laptop (2.2 Ghz Intel Dual Core, 3 Gb ram)

 

Scenario 1: Subscribing/Unsubscribing listeners

This scenario tests the performance of subscribing and unsubscribing new listeners. No messages are published to the listeners. Different classes of listeners with different handler definitions are used. Not every subscribed listener is subsequently unsubscribed such that with each round the number of registered listeners grow. Each loop subscribes three different listeners and unsubscribes under certain conditions.

 

Subscriptions per Thread

6000

Number of Threads

1

Implementation

Duration(ms)

Google Guava

453

SimpleBus

1377

EventBus

1866

MBassador

112

Subscriptions per Thread

18000

Number of Threads

1

Implementation

Duration(ms)

Google Guava

595

SimpleBus

6734

EventBus

12911

MBassador

191

 

Subscriptions per Thread

300

 

 

Number of Threads

20

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

360

251

407

SimpleBus

1705

1559

1777

EventBus

2166

1966

2238

MBassador

96

58

118

 

Subscriptions per Thread

900

 

 

Number of Threads

20

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

601

480

642

SimpleBus

5789

4772

6140

EventBus

13729

11897

14582

MBassador

189

118

216

 

 

Scenario 2: Publishing events

This scenario tests the performance of event publication. Each test is setup with a number of listeners that are subscribed to the bus (not part of the measured performance). After the test is set up event publications will be issued to the bus. Since the message type hierarchy is not considered by all implementations, only the event class that is highest in the hierarchy will be used.

 

Events per Thread

2000

Number of Threads

1

Number of listeners

6000

Implementation

Duration(ms)

Google Guava

1322

SimpleBus

17603

EventBus

3078

MBassador

228

 

Events per Thread

100

 

 

Number of Threads

20

 

 

Number of listeners

6000

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

1038

883

1168

SimpleBus

28334

27772

37753

EventBus

6446

5931

6727

MBassador

261

51

382

 

 

Scenario 3: Mixed usage

This scenario is designed as I would expect a lot of use cases to look like. Listeners are subscribed and/or unsubscribed while events get published.

 

Events per Thread

4000

 

 

Number of Threads

1

 

 

Subscriptions per Thread

6000

 

 

Implementation

Duration(ms)

 

 

Google Guava

2535

 

 

SimpleBus

162259

 

 

EventBus

4893

 

 

MBassador

622

 

 

Events per Thread

200

 

 

Number of Threads

20

 

 

Subscriptions per Thread

300

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

2257

2054

2387

SimpleBus

31367

26477

33569

EventBus

4510

4136

4670

MBassador

761

626

831

 

Events per Thread

600

 

 

Number of Threads

20

 

 

Subscriptions per Thread

900

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

12575

11905

12970

SimpleBus

101030

87618

103493

EventBus

34814

32038

38350

MBassador

2497

2231

2660

 

Events per Thread

1000

 

 

Number of Threads

50

 

 

Subscriptions per Thread

1500

 

 

Implementation

Avg(ms)

Min(ms)

Max(ms)

Google Guava

396176

359027

403041

SimpleBus

> 15 min.

 

 

EventBus

OutOfMemoryException

 

 

MBassador

32525

22746

34087

 

Results

The shown performance characteristics of the compared implementations indicate that

  1. Listener subscription is an expensive operation for all implementations but MBassador and Guava

  2. Concurrent access does generally slow down the bus performance because of higher contention/synchronization.

  3. SimpleBus is by far the slowest implementation

  4. MBassador is by far the fastest implementation in all scenarios. It also offers the best scaling characteristics meaning that higher concurrency rates do not slow down the bus performance as much as the others. This is because MBassador relies on a custom data structure with very fast write operations that do not block readers and at the same time do not copy existing data structures (most other implementations use CopyOnWriteArrayList).

To sum it up, MBassador is not only astonishingly fast but also offers a rich set of features that only EventBus can compare with. Google Guava’s event bus implementation improved a lot from version 13 to 14 but is still clearly behind MBassador considering both performance and features. Especially in high concurrency/throughput scenarios MBassador is the clear winner when compared to Guavas. It is also quite low on resources (best memory footprint of all implementations, most likely because existing objects are not copied on inserts). So, use it, file issues, request features!

I will keep on working on the code adding more unit tests and features. I am currently working on an integration component for the spring environment that supports conditional event dispatch triggered by transaction events (e.g. commit, rollback). For those who have been using EJB3 this will sound familiar and is quite a nice feature. A base implementation with proof of concept already exists and will be released soon. That’s it for now. Any feedback is more than welcome.

View model or Data Transfer Object

Well, here I am having to take care of the communication between presentation and domain layer in a multi-tier and pretty much traditionally layered software system. The software was designed following many ideas of the Domain Driven Design approach, the backend code was in good shape and now it was time to connect it to the frontend. But how?

The first idea was to avoid exposure of the domain objects to the view such that they do not get tied together. Thus, there is the need to introduce a new kind of object that represents the parts of the domain model that need to make it to the view. Everybody agreed on using the so called data transfer object for this purpose. But then you also need to map between the two worlds and a lot of new code and questions arise. So mabye a DTO is not the right way to go?!

Generally a DTO is what its name suggests, a simple object without any behaviour that is used to transfer data across system boundaries (read). I also found a post that adds the separation of concern aspect and also points out the maintenance impact of having a lot of DTOs in your codebase. Then there is this so called view model, that some people (including me) confused with DTOs (see this post) but that should have some kind of behaviour. I think the relation between the two is summed up very clearly in this post which also includes an implementation proposal that could be useful in some scenarios and definitely reduces the amount of code one needs to write.

Personally I don't have the experience to discuss the view model DTO relation but I do know that I dislike all objects that do not know how to behave. Being a value object is not a good reason for existence (a map would do), especially if I use it with general assembly patterns and some reflective framework technology that does not even need to know my concrete types.

This thing kept me thinking and I ended up having an idea. I don't know if it classifies as a view model but it is a model used by the view although it does not necessarily contain view specific information. The more importand thing about it is that is does not create much overhead to the application. I think there must already be posts about that approach since I feel it is an obvious one but my search did not bring up any, so in case I just repeat please have mercy. But before I explain the implementation approach I want to make a point why I think that you should not expose your domain objects directly to the view.

"It's all about separation of concern – decoupling domain model and presentation layer"

It's as simple as that: Decoupling the domain layer from the view layer allows bigger changes to the application frontend to be done without modifications to service or domain layer. Changes can be applied to GUI components and the view model such that view layer and domain model can evolve independently.

Although both models are strongly related to each other ( the view model is mainly a custom representation (=view) of the domain model ) they do vary in different aspects. Since the design of the view layer aligns with the steps of user interactions required by the usecases, its model tends to have a different granularity than the domain model. Additionally data types in both models may be different and the view model might provide access to other functionality like syntactical validation, internationalization and even technology specific properties.

For me that's enough good reasons to have some kind of abstraction between view and domain. But I don't want the trade off traditional DTO implementation implies. I don't want to code potentially n data transfer objects AND their corresponding transformations for (maybe only) one domain object. And I don't want to put transformation logic into the service layer or pollute my repositories with toDTO(…) and fromDTO(…) methods. Also, I would like to avoid the use of a generic transformation framework that may require me to define mappings as xml or annotations and introduce just another model (the mapping model) and code artifacts into my code base.

Designing the view model as a thin wrapper around the domain model

So, I asked myself, what if I treat the view model as a thin wrapper around the domain model objects and expose just the properties and functionality that I care about and that is valid for my specific use case? Then I could use the domain objects as delegates and have my IDE generate all the delegate methods I need. It would not be necessary to define any transformations since the objects managed by the view model can directly be used inside the services and domain layer. All additional transformation steps like data type mapping, string mutations, formatting etc. can be encapsulated in the view model class itself. In a lot of cases there wouldn't even be any transformation required.

This approach would allow me to initially define the view model analogous to the domain model and then have the view model evolve over time as it gets tailored to the specific usecase requirements. There would be no mapping overhead, no additional transformations and I could reuse some of the functionality of the domain model that is also valid in the context of the view layer, e.g. all the operations that modify the object graph like adding an address to a person, adding or removing an item to the shopping cart etc.

The only drawback I can see with this approach emerges when view model objects have to cross system boundaries, i.e. serialization. The delegate approach might result in view model objects that reference a set of potentially large domain objects although only a small portion of each is needed. In worst case scenario this might result in considerable performance issues. In this case, the view model classes may become producers of their respective DTOs?

But I guess in many scenarios serialization will not be much of an issue. Also, if the addToCart(…) method is available inside the view model there is no need for a service call and serialization (unless the shopping cart should be persistent of course).