the blog for developers

Never, never, never use String in Java (or at least less often :-)

Never, never, never use (unwrapped) String or long or int. Why? Those primitive types have no semantic meaning. They are hard to understand, hard to maintain, and hard to extend. I’ve been evangelizing this concept for some time, the essay “Object calisthenics” finally prompted be to write this post. Suppose we have an example of a cinema ticket booking service.

Update: If you just want to drop a comment telling me how revolting you find the idea, well, just don’t. I appreciate your comment, but sit back, think some time about it and move on coding. When you read someone else code with String id and you wonder what on earth the id is, come back and read this post.

Compare

public void bookTicket(
  String name,
  String firstName,
  String film,
  int count,
  String cinema);

with (and I know one would introduce an Order object for real code):

public void bookTicket(
  Name name,
  FirstName firstName,
  Film film,
  Count count,
  Cinema cinema);

The second one is much easier to understand, especially when your IDE is one that tells you during autocompletion of a method call bookTicket(String arg0, String arg1, String arg2, int arg3, String arg4) versus bookTicket(Name arg0, FirstName arg1, Film arg2, Count arg3, Cinema arg4). The second one is also much easier to read. Compare

void book(String orderId);

with

void book(OrderId order);

In the first case the developer seeing the code wonders a.) where to get an orderId and b.) what an orderId really is, "1212", "ABC-123" or "12-34-45-SCHWEINEBACKE". In the second case he can search for the usage of the OrderId class, how it is used, read the javadoc and only pass validated and correct order ids into an application. You might think an orderId is just an orderId, easy to find. Legacy systems will change the id, the naming and semantics in often inconsistent ways. I’ve seen systems that name an order ID in serveral ways as “orderId”, “AuftragsId”, “id” and several other names and all meaning the same thing!

It’s easier to have a class with semantics than a domain-less String. Developers cannot as easily mess up. If you rely on static typing and use a startic type language (both object and reference) then maximize your benefits and create more classes. In the future an OrderId class can also easily be changed to hold long instead of an int, hold validation or id generation logic. It’s much harder to extend the initial String based version.

Implementation with a fluent interface

The classes should be implemented as simple Domain classes, sometimes as immutable value objects, which just wrap String and attach some semantic meaning to the String.

public class Name {
   public Name(String name) {
      ...
   }
   public static Name name(String name) {
     return new Name(name);
   }
}

One would wonder if the solution is too noisy. Assuming

new Customer(new FirstName("Stephan"), new Name("Schmidt"));

is certainly noiser than a String argument:

new Customer("Stephan", "Schmidt");

The first is easier to understand though, and with static methods can be changed to

new Customer(firstName("Stephan"), name("Schmidt"));

This also solves the problems that with many arguments developers from reading the code don’t understand what each parameter means, especially in longer (refactor to parameter object!) parameter lists. This is another approach to Fluent Interface builders.

My last post on how to use generics with immutable objects could also be extended with value objects instead of primitives.

new Point(10,10);

new Point(x(10), y(10));

where x(10) and y(10) create Xpos and Ypos value objects.

Domain objects vs. primitivs in interview questions

One of the interview questions I like is to ask people about an interface for a price search. I usually give them

... searchByPrice(...)

and let candidates fill in the missing parts. Some will write

Vector searchByPrice(double start, double end)

which is bad code from several points of view (using double for money, no domain objects, untyped Vector).

Others with more domain based thinking write

List<Product> searchByPriceRange(Price start, Price end)

or even us the Range pattern by Fowler:

List<Product> searchByPriceRange(PriceRange priceToSearch)

The last solution is easy to extend and understand. Answering this question often starts an interesting discussion on interface design, maintainablity and domain modelling. Whatever you think about this interview question, don’t forget to once and for all: Do not use double for money.

Thanks for listening and don’t use String, int and long (or double for money).

Update:
If you find the usage of Classes instead of Strings repulsive, look at another example, zip code. Most people I’ve seen in lots of code use a primitive for zip code which creates a lot of problems when going i18n.

Customer {
   String name;
   String street;
   String city;
   String zip;
}

(some will have used int for the zip code but get faster into trouble than the String users)
instead of

Customer {
   String name;
   Address address;
} 

Address {
   ZipCode code;
}

Still think Strings are a good idea in your code or “the simplest thing that could possibly work”?

You can leave a Reply here. Of course, you should follow me on twitter here.

You can share this post!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

About the author: Stephan Schmidt has more than 15 years of internet technology experience and 10 years experience in agile. He was head of development, consultant and CTO and is a speaker, author and blog writer. He specializes in organizing and optimizing software development helping companies by increasing productivity with lean software development and agile methodologies. Want to know more? All views are only his own.
Leave a reply.

Comments

Yet another good post that serves to highlight the brittle, naive nature of my code. If only you had been my tutor for Introduction to programming in Java 5 years ago.

Still learning a better way to do things is what its all about.

Thanks again.

[...] bei Never, never, never use String in Java (or at least less often at Stephans Blog Posted in Java. Schlagworte: development, Java, [...]

Full ACK. I can not say how much I agree to you. I stumbled upon several occurances of pure String objects in code lately and I always had to dig deeper through the sources to get a feeling for their uses. It is so annoying and refactoring/changes are much harder to do.

Next time I’ about to add a String object somewhere, I’ll definitly give it a second thought.

stephan

@David: Thanks. I’ve teached programming several years and been using Java for more than 10 years, so sometimes I have good ideas ;-)

(Sometimes I have stupid ideas which look good but don’t work in the end)

@Carsten: “It is so annoying and refactoring/changes are much harder to do.” And it takes a lot of time and takes momentum from your development effort. The evil thing with Strings is that they are easier to code/write/use the first time. But you will pay for that quick implementation in the future.

Cheers, great post!

James

Regarding:

List searchByPriceRange(PriceRange priceToSearch).

What about considering:

Iterator searchByPriceRange(PriceRange priceToSearch).

It might be a bit more general and would be that you did not need to hold all the Products in memory, adding a bit of flexibility.

Also regarding firstname and “name” (I guess you meant surname), I would keep them as a String in this case because as a concept they are easily understood. Or I would make a person object which would probably be better.

I dont think that there are no hard fast rules for programming and saying ban Strings per say is probably going just a bit too far. But the general idea is a good one!

stephan

@James:

Often it’s not a good idea to return collections, because they expose implementation details.

- Iterator is a good idea usually to use instead of returning an internal list. Best combined with implementing Iterable

- Instead of returning anything one can use Applyable with applyToSearchResult(applyable)

- This case might be different, because the List does not return an internal representation, but something newly created.

- This interface is easier to use remotely, so if it is a service interface, I’d prefer returning a List

You’re right of course that with Iterator you’re free to catch products on demand not up-front.

Yes I meant surname :-) There is always the problem with choosing examples which are to easy with no real world value and ones which are to complicated and do not transport the intended message.

“I dont think that there are no hard fast rules for programming and saying ban Strings per say is probably going just a bit too far.”

Yes, perhaps. But from my point of view it’s better to when in doubt, don’t use a String than the other way round.

1 to “do not use zillion String arguments, find classes with semantic meaning”. Call it SearchFilter, PriceRange or SearchCriteria if you must.

-1 to Name, FirstName and Count classes, since YAGNI. Create classes as you see them fit, not the other way around.

IMHO.

Dimitris

You have a tendency to write at the imperative voice “this must / must not, never do this, always do that” which sounds rude.

As for your proposal, IMHO it is one of these “stupid ideas which look good but don’t work in the end”. If you wrap each and every primitive type use case in custom objects, you might gain a little convenience (which is debatable) but you will also pay it with tremendous resource overhead.

Seems like it adds a lot of coding overhead, and I’m not convinced its outweighed by the benefits. This creates a lot of extra objects that need to be maintained, and are not necessarily reusable between different projects.

I’m all in favor of more readable code, but I think it can be better attained through more descriptive naming of variables, functions and parameters.

Hi! I think the examples are highly overengineered. This looks like too much up front design. In Test Driven Developmen you always try to create the simplest possible program that does what it should do. This way you get a much simpler program since you do not have to implement unnecessary abstractions. I only add domain classes during refactoring, and only when the need arises.

stephan

Uh, I sure hit a nail there. Generally speaking it depends on your project size, your line count, how long the application will be used etc. For one people projects with 100 NOC this is surely overengineered, for bigger applications with 1 kNOC I would go this way every-time.

@Ignacio: From my experience this is not YANGNI, this is domain modeling with value objects (see DDD). And you are gonna need it I think when you are more than 10 developers working on the code for some time. But YMMV.

@Dimitris: Sorry for not being able to express myself better. “[...] but you will also pay it with tremendous resource overhead.” No I guess not, the VM will clean most of those small objects up and inline them.

@Spike: “Seems like it adds a lot of coding overhead, and I’m not convinced its outweighed by the benefits.” From my experience with projects ranging from 100 kLOC to 1.5 mLOC you can’t start early enough to plan for maintainablity.

@Martin: As said before, it’s difficult to choose meaningful examples which are not too simple and not to complex. So one always may argue the example is too simple to mean anything I guess.

“I only add domain classes during refactoring, and only when the need arises.”

As soon as someone needs to think more than 5 min what String orderId looks like and introduces a bug, I’d say there was a need before to introduce an OrderId class.

LOL, why don’t you also make interfaces and their respective implementations of your above described instances., so you have Name name = new NameImpl(“get a life”)

stephan

@m.j: Ah, yes, good idea. posts.select( it.author == "m.j.milicevic" ).reply("I've got enough life beside answering questions to this blog, thanks.")

I think this article is really part of a “use appropriate domain abstractions” design principle. It’s one of those sliding scales you adjust for the project at-hand based on the tradeoff of maintainability versus speed of development. The idea is basically that you should be passing around domain abstractions (domain interfaces/classes as method parameters) and returning domain interfaces/classes for highly maintainable code. It gets into responsibility driven design where instead of “getting” all the pieces of state you need and then processing them, you tell those objects “in the know” to do the processing for you. That’s what Allen Holub’s polemical “Getters and Setters are Evil” article is driving at. It sounds impossible to avoid calling getters and setters but I think the key enabler is using enough domain abstractions (avoiding those Strings and (N|n)umbers except at the lowest level… you have to have them at some point) to do the work for you. Of course, at domain boundaries (user interfaces and persistence layers), you start having to pick apart your objects again to get the job done. Holub says the UI pieces can get around this in other ways too, though it’s not done very often. Holub believes that the slider should only ever be at the full-up domain abstraction end of the scale whereas most people do not (probably many aren’t even aware of the design principle).

Anyway, I just wanted to point out this aspect of it. And as soon as I mention Holub on this topic, I have to brace for flames :-) His design articles are quite polarizing and I often see them criticized but it seems people often misunderstand the underlying idea–which is easy to do with his articles.

- So, isn’t this just a (poor) replacement for a simple C/C++ typedef

I do like the named parameters...communicates very well, and matches what the Ruby & Objective-C community does.

eessbb

Have you ever worked on a real project with deadlines? And what IDE are you using that your having to create useless objects just so your IDE will do a better job of helping you understand the code? Get a real IDE and learn how to use common sense when reading code…

stephan

@eessbb: Yes I have, sometimes with very tight deadlines. You’re right with implying that sometimes creating this code isn’t a good idea on a tight deadline. But when you have to live with the code, you need to decide if to refactor now or later. When doing consulting or projects, most companies write fire and forget code and move on to new projects.

My IDE is IntelliJ IDEA and most often helps with my “useless” objects. What “real” IDE would you suggest?

stephan

Don’t forget most often you won’t see name("Stephan") because the data comes from a database or from the user interface. The call in your code will then look just like with strings.

stephan

@Chris: “Getters and Setters are Evil”

Some time ago I thought it impossible to write code without setters and getters and abide by the law of Demeter or Ask-don’t-tell. Over the years I found this guideline to result in better code. The only problem I have with this is when to have other representations of an object, like String or XML. It strikes me odd to put XML generation code into an object as it’s an orthogonal concern. Not sure how to solve such cases without getters.

joe dev

Dumb idea. Are you a beginner or what? Think before posting dumb stuff that may lead others to a wrong path.

stephan

Beginner? I do hope so, but sadly I’m spoiled by many years of coding.

“The mind of the beginner is needed throughout Zen practice. It is the open mind, the attitude that includes both doubt and possibility, the ability to see things always as fresh and new. It is needed in all aspects of life. “

Stephan, I have my own copy of the Domain Driven Design book and I cannot recall anything like what you propose. I don’t think this is a matter of team or project size, really.

Please consider again some of the comments in your post, some of them strongly opinionated but most of them aligned. What you are describing here is the definition of YAGNI. You are adding classes for your IDE, and that’s wrong.

stephan

@Ignacio: Two things. DDD is a methodology, not a book – it’s like saying I have the Object Oriented Development book here and I cannot recall anything like that. From my version of a book, I recall that there are entities and value objects. In my opinion String is neither one, but Name is a value object.

“I don’t think this is a matter of team or project size, really.”

I do think it’s a matter of team size:
- bigger teams have a higher probability of having sub-average developers (if you don’t do hard recruiting)
- the more people, the higher the probability that you will need to read and understand (and fix and refactor) code which you didn’t write. So there is a need for clearer code

“You are adding classes for your IDE, and that’s wrong.”

You’re taking one sentence from several dozens ones. But even if you’re right, even if it would be the only reason – and it isn’t – I think you should go that way. Sometimes the choice of your IDE is limited, sometimes the only company option is Eclipse (a bad choice on it’s own, but I don’t want to fan the flames in this thread).

Stephan, I was talking about THE book on DDD – the one that appears top in your google results, the first link in the wikipedia page, the 2004 book that existed before the “methodology”, whatever. That book doesn’t promote this kind of approach.

Anyway, you seem to have your mind already set. Good luck with that.

I think I was perhaps generous to cast this approach as “using appropriate domain abstractions”. Things like count don’t strike me as a domain object. But passing around a broken up “customer” as firstName and name objects as parameters, where they are really just string wrappers, doesn’t seem helpful. The customer is the domain object, not the name components. If for some reason you think the system might have to adapt the name to something other than String, that’s when you’d bother with the abstraction.

I’d prefer:


public void bookTicket(
Customer,
Film film,
int count,
Cinema cinema);

or if your a real purist :-)

public void bookTicket(
ICustomer,
IFilm film,
int count,
ICinema cinema);

In a domain discussion with end users, Customer, Films and Cinemas would the typical domain objects. First name and last name are simply attributes of the Customer.

pwilder

Do you know if there are any open source projects that adopt this philosophy? This sounds like something that would be useful to see in action.

The idea is interesting but just looking at your examples here it looks like this would not scale well to large projects.

Hmm…

“Never, never, never use (unwrapped) String or long or int. Why? Those primitive types have no semantic meaning.”

You are laying down rules you expect Java programmers to follow, but you are quite ignorant of the language. String is an object, not a primitive.

The name of the variable, get/set method or the method/constructor parameter name provides the semantics.

“They are hard to understand, hard to maintain, and hard to extend.”

They are easy to understand. You don’t maintain them, ever. You can’t extend them.

I would hate to have to maintain code you have written, because I bet it is full of pointless layers of indirection.

David

Oh my god,

So now I have this API with a billion methods with strange class-names … how do I create a damn Price object? What’s that? Is it a String? a Double? a Long ? A composite class with lots of fields? (price in dollars, pounds, yens … with taxes without taxes).

I just don’t get how is having to investigate dozens of classes in order to know which primitive types I need, going to help me.

B. Waite

I had to check the date on this entry while I was reading it. I thought I had stumbled across an old April 1st article.

The reason I sit here, boggled, is probably because I’ve never heard of anyone having problems with unwrapped Strings and primitives.

Ever.

Your examples that use Strings and primitives look like every Java method I’ve ever encountered. They’re so conventional they’re boring.

I can’t imagine encountering a method like any of these, and being confused.

On the other hand, if I encountered a method like your second bookTicket example, I *would* be confused. Not for long, but your unconventional approach would take some time to understand.

stephan

@David: “how do I create a damn Price object? What’s that? Is it a String? a Double? a Long?”

The point is, you shouldn’t need to care what it is.

@B.Waite: I consider you lucky for never having come across am method with 4 Strings which you do not know what they mean and you do not have somewhere to go and look them up. I’ve come across a lot of code in the last decade which was written this way and was really hard to understand.

As I’ve written, the example is no real world example because one would use an order domain object instead of several parameters.

@youneededtohearthis: Considering the semantics of String and comparing it with int, String is a primitive just like int. As I wrote there is no semantic to a String, it could be anything. Compare this to Person, Name or Point which do have a semantic meaning. String doesn’t.

“You don’t maintain them, ever. “

You do have to maintain them. Over the life cycle of several years those “things” often need to change. The need to transfer more information for example. When using a String your’re out of help and need to refactor every method which uses your String with additional information. Say your String is an ID. After some time your business user decides that IDs should become invalid. With and ID class it’s easy to model a valid state into the ID. With String, as you’ve said “You can’t extend them.” you need to refactor a lot of methods.

I don’t like this idea and I agree: It’s related to YAGNI. Yes, sometimes I need to rename private String firstName to private FirstName firstName, but does that happen the majority of the time? No. Sometimes I go entire classes without needing that change.

Your point on it being more semantic is valid. But 1) it’s usually wasted effort and 2) the name of the variable is already semantic 3) there is sometimes an advantage to knowing exactly what you’re working with without having to do any research. For example, when I see private String firstName;, I know exactly how to set and get its values.

stephan

@Ignacio: “That book doesn’t promote this kind of approach.”

If it doesn’t promote the use of Name objects, then perhaps I’ve misunderstood the promotion of value objects in that book. From the definition in the book, Name looks very much like a value object to me.

@Clark: You’re right, this isn’t black or white, “(or at least less often :-)”

The count class is very contrived, I would use an int too, not a Count class. Having an inc() method would be handy though and on a higher semantic level than the int count++ version. And as written in the post, Order or something like that would be best. As would a be a Customer class.

stephan

@Dan: in this case, obviously you don’t need it. One method isn’t complex enough to replace Strings. Considering thousands of methods – which every product grows into – this often needs to change.

And I stand by my opinion: Using primitves (and as said before I consider String a primitive) as method paramters more often is a bad idea than not.

“[...] the name of the variable is already semantic.”

Yes, sometimes it is, sometimes it isn’t. I’m always for well named parameters and sometimes think about a parameter name for 10 minutes, and then change it again because I’ve found something better. Having a object instead of a String/Long conveys the semantic better though – and I think it prevents bugs this way.

BUT reading all this comments bashing my post – most likely I’m wrong :-)

(I’d still use a Price class instead of double, and a PriceRange and a Money class, and – when there is semantic to a name – a Name class).

stephan

I wonder how Name works with Qi4j, where a Name class could be configured into your Entity. Hmm. N8. All further comments need to wait till tomorrow to be moderated, no censorship just bedtime :-)

B. Waite

@stephan

I think the issue of semantics is a bit of a red herring. If a parameter is poorly named and is undocumented, it’s hardly the fault of the primitive. It’s sloppy coding. I think it’s fair to say that someone who won’t go the extra inch to JavaDoc their code isn’t going to go the extra mile and wrap their parameters.

The main issue here is that the rest of the Java world uses primitives, and avoiding their use makes interacting with this world very difficult.

For example, if I want to modify an ID’s value using the Apache StringUtils class, I can’t just pass the ID object. I need to extract the String value from the ID, pass it to the StringUtils method, and finally construct a new wrapper using the result…

…that, or extend the ID class to include this functionality. Now, multiply this sort of problem by each framework or utility that you use.

My advice (and what I understand to be a best practice) would be to choose the simplest possible solution. Use primitives whenever you can. Like Visa, primitives are accepted everywhere. When you outgrow one, just accept that you’ll need to refactor and move on. ;-)

I can easily see the benefits of using a Price instead of a double under a lot of situations. On the other hand, using a FirstName class instead of a String is a little iffy. I think it’s LIKELY that a String will satisfy all your needs.

[...] have been reading some articles on the net and I just read an interesting article about Java programming. The author in this article is explaining  the  reasons  why Java [...]

stephan

@B.Waite: Sorry to disagree.

“The main issue here is that the rest of the Java world uses primitives, and avoiding their use makes interacting with this world very difficult.”

If you are using primitives all over, then your doing something wrong. Your methods should mostly use value objects and entities from the beginning (See parameter object pattern and others).

So we’re only talking about 20 percent or less of primitives in method calls which I propose to replace with wrappers.

“My advice (and what I understand to be a best practice) would be to choose the simplest possible solution.”

Ah, but using a wrapper is simplier within your IDE than using String. And it is simpler to understand. You’re against the idea because one time you need to write a little bit more code, say 10 lines. But code is written once (most of the time) by one develper but read a thousand times by dozens of developers. I do know that currently writing less code is the hype, see Rails et al. But this is also wrong. It’s not about writing less code, it’s about writing maintainable code. 80% of costs will arise in maintenance, not in development.

So with a rapid development is the most important thing mindset, we will always disagree.

stephan

@Dan: “[...] think it’s LIKELY that a String will satisfy all your needs.”

Perhaps you are right, but then it really doesn’t matter for FirstName, because you are mostly using the object only to construct a customer object you will use a dozen times.

Look at another example, zip code. I bet most people in this thread use a primitive for zip code which creates a lot of problems when going i18n. And all those primitive defenders will use

Customer {
   String name;
   String street;
   String city;
   String zip;
}

instead of

Customer {
   String name;
   Address address;
} 

Address {
   ZipCode code;
}

Are you still on the primitive side?

sean

I have definitely gone this way, consider the following:

public class Address {
boolean multiline, virtual;
String address;
public Address(String add, boolean mul, boolean vir)// cant be bothered typing
}

I hate such code as it’s hard to remember what the booleans mean some times. So I use now:

public class Address {
enum Lines;
enum Reality;
String address;
// you get the idea
}

This means you can create objects with meaningul names instead of booleans.
Eg. new Address(addr, Lines.MULTILINE, Reality.VIRTUAL);

Let’s compare these:

1. void book(String orderId);
2. void book(OrderId order);

The first one:

- The reader understands immediately, that a string is passed, which represents an order ID
- Whereever this order ID is used inside the function, there is no doubt that it’s an order ID

The second one:

- The reader has no idea what is being passed. It could be a string, an integer, or a complicated object.
- In order to fully understand the code, you need to read in multiple places in the source code, which wastes programmer workhours.
- Whereever the order variable is used, the code doesn’t reveal that it contains an Order ID.

Therefore, it is quite obvious that the first of these two are the best choice.

Xavier

Well, I perfectly understand your point… On one of our projects, we had a Person entity which of course had a ‘name’ property as a String. When the testers came in, the first thing they tried when creating a Person instance (through our web ui) was to provide weird names (long names, names with forbidden characters)… we had to figure where to put the validation logic for names, we came up with 2 possibilities:

1) scatter that logic in the ui AND the entity (Person.setName) or
2) create a value object Name with the validation Logic.

In the end, I chose the second solution as it clearly had 2 advantages over the first one:

1) the validation was defined once and for all in one place (the Name class) and
2) when a wrong name was entered, the error was raised immediately when the Name object was created (in the ui), not when the entity was persisted (which is 2 layers deeper and harder too debug). I think this is related to the ‘Fail fast’ principle: if the name is wrong, don’t wait until you persist the Person to know about it.

stephan

@Lars: Let’s compare these:

1. void book(String id);
2. void book(OrderId id);

The second one is much clearer. If your system has an OrderId, the developer writing the book function needs to use the OrderId. He can then choose to poorly name the paramter “id”. In the first case if the developer chooses to name the parameter “id”, all others are lost.

“The reader has no idea what is being passed. It could be a string, an integer, or a complicated object.”

The reader doesn’t need to care. That’s the idea behind object oriented programming, which I understand is out of fashion our days (think encapsulation).

“In order to fully understand the code, you need to read in multiple places in the source code, which wastes programmer workhours.”

See above, the developer gets an Id from the GUI or a service and doesn’t care how the Id is implemented (again, encapsulation).

Whereas with the String the user a.) knows implementation details of the id b.) build his application (book method) to the implementation of the id (String), not the semantic idea (id).

stephan

@Xavier: Thanks, agreement to my post is scarce and I thought I’m living in a different universe ;-)

stephan

@Lars: To make my point about encapsulation clear:

The user would get the OrderId for example by:

1. OrderId id = orderService.getNewId()
2. OrderId id = order.getId()
3. OrderId id = bookingSerivice book(Order order)

etc.

No need to know what the id is.

Or to follow your argumentation, “In order to fully understand the code” one would always be forced to look into String or StringBuffer and understand the internal implementation (as you want to understand the OrderId implementation) to fully understand the code. No one does that (except for finding bugs e.g. or to learn from the code)

Readability is always important, but don’t expect the reader to immediately understand identifiers that are not from the library.

OOP is very good, except when it makes source code much harder to understand for those readers, who needs to understand performance, security, find bugs etc.

Most source code is a compromise between readability, performance, productivity and safety, and it rarely makes sense to make one of these three parameters be dominate all code.

stephan

“[...] find bugs [...]”

Excellent point. An OrderId with validation logic, range checks etc can easily tested with unit tests to make it essentially bug free. See how much the Array bound checks in Java have helped with security problems. Because the logic is in the Array class.

A String orderId is impossible to test, you need to test every input and output path of the code, and perhaps every usage to check for correct validation etc. Keeping logic local and encapsulated results in easier to test code I believe.

Good testing is about testing the most frequently used code and the most critical code.

I think it’s more important to save manhours in the process of finding out, why a correct order ID retrieves a wrong order, than to optimize the unit tests for checking that the orders got the correct order ID.

You may construct examples where it makes sense to spend programming hours on encapsulating the data in a way that range checks, makes checksum calculations etc. every time a value is assigned, but there are surely many cases where the costs are bigger than the benefits.

The worst code you can get, is where almost everything is defined elsewhere. You don’t need to add many programmers to a project like that, and you’ll end up with huge overhead, low productivity or spaghetti code quickly.

Good code is where a new programmer, who doesn’t know about the project, can add a new feature within an hour.

Fred Swartz

I like your proposal, but understand the resistance to it. Programming languages could make it easier by supporting more convenient type definitions, units, numeric subranges, operator overloading, etc.

Wrapping primitive numbers has advantages but without additional language support, it’s hard to make use of numeric libraries (as was pointed out previously).

I’ll try it on code I’m working on now and see if it flies.

sean

Lars is wrong in my opinion. If you have ever developed with large systems, especially those that don’t use dependency injection (passing around 6 primitive/string objects) you will agree with stepan.

I guess it depends on whether you understand OO.

Lars, here is an example:

String orderId = “2343590″;

Where does the validation go? The answer is EVERY place you need it. Now your developers that want to user orderId need to put checks in everywhere. And your developers need to know how to validate an order – what was that class called again? Oh, and how are you going to retrieve it? I guess you could pass the order validator in, inject it or have a static service — NOW you understand why code mess happens in most java projects!

Compare this to:

// throws OrderInvalidException
OrderId id id = new OrderId(“2343590″);

Now, whenever your developers have an OrderId, it is guaranteed to be valid, especially if it is defined as immutable. Simplicity.

@ Sean: I certainly know large systems (>50 programmer manyears per app), and I don’t recognize your support to not using string in projects of this size, because of the need for validation.

We could easily introduce a name verification, checking that first name and last name are both present, have initial caps letter etc. A University in USA once checked for presence of first name and last name, and then the son of our Queen wanted to study there, but as a royal he doesn’t have a last name… so the IT system failed to help the University. Validation is good, but it should not prevent the user from getting things done.

Another example: A national person identification number. In Denmark, we use “CPR-Numbers”. However, if you check these fully, you will find that foreigners are suddenly excluded from your system, and you might end up being able to use your IT system for 90% of your work. The rest has to be done on paper, or you have to invent fake numbers. You can then discuss how to define person IDs, but you can be sure that the customer doesn’t provide correct specs on this from the start.

What about if a company changes it’s order ID system. That would mean that orders from before 1990 must comply to xxx-xxx, orders from 1990-2005 must comply to yyyy-xx, and after 2005 they are again xxx-xxx, but restarting from zero. An orderid class can do some checks but it cannot say, whether the orderID conforms to the current standard for orderIDs.

In Denmark, a validity test for names is quite easy: We have a list of approved names, so if the name doesn’t appear in the list, it’s not valid. So I guess a name class would contain thousands of names in it’s source code? Doesn’t make sense to me. Anyway, under certain rules, a name can be accepted, even if not on the list. This typically happens when somebody immigrates. In other words, the validity test depends on the context and external decisions.

Then we could discuss SQL string conformance test. Well – doesn’t that depend on the database server? One of the big benefits with SQL is, that you can create a tool, and then use it for many different database systems, even for future database server versions that you haven’t heard of yet. Can you validate SQL if you don’t know the database server SQL standard, yet? Of course not, that would be utterly rubbish.

I could continue for a long time giving examples of where validation of strings or a stupid question: It’s not WHETHER to validate or not, it’s WHAT to validate and WHEN.

Basically, all this relates to how do you develop software. Do you use the waterfall model, agile software development or something else? Who provides the specs? How big a part of your costs goes to automating testing? If you spend close to 0% or 100% on automating testing, you’re certainly spending your money unwise. A good projects spends something in-between, and good projects don’t have unlimited access to money – because that makes them inefficient.

stephan

@Lars: Excellent examples!

“What about if a company changes it’s order ID system. That would mean that orders from before 1990 must comply to xxx-xxx, orders from 1990-2005 must comply to yyyy-xx, and after 2005 they are again xxx-xxx, but restarting from zero.”

No problem with an OrderId interface and different class implementations, depending on the date it was created. A no brainer.

With Strings? How do you validate an Id in a String without any further information? I guess that’s impossible.

“A national person identification number.”

Excellent too, having different classes for an ID with different GUI frontend and Hibernate mappings, a no brainer. With a String this is much more difficult to implement (You need to encode the county in the number or something like that).

“So I guess a name class would contain thousands of names in it’s source code? ”

Shouldn’t that come from a NameService for your country which returns Name domain objects? With the validation in the service and the only possiblity to create names from the service, so you have always valid names? Compared to Strings which need to validated in lots of places?

Thanks for the examples, I could not have thought of better ones for my arguments.

“No problem with an OrderId interface and different class implementations, depending on the date it was created. A no brainer.”

Ok. Please show me the code that solves this task:

You receive a tab-separated file like this:

123-456 45
2345-22 78

Now, your code has to import this file. The first column contains order IDs, the second contains some value for this order ID. That could be statistical information or something completely different.

Your code must:
- Read this file
- Write a list of customer ID, order ID and this value

As you can see, it’s quite simple: You call a FetchCustomerIdForOrder (OrderId Id), and then you have the customer ID and can print everything. According to your “no brainer” explanation, that doesn’t work, because you don’t have the date of order creation, so you cannot create an OrderId object.

In other words, your explanation is wrong, and “no brainer” is probably not the best way to describe an faulty method. You can then say that “we create a non-checked orderid class” (which would be, like, a string?!?) or “we just retrieve the date from the database, too” (which would harm performance), or perhaps something else. But I guess at this point, we already found out what the basic problem is: We’re already way beyond a reasonable level of complexity.

Most apps are database applications, where data is not necessarily created by the application that reads it. In other words, you have one piece of data (a string) and multiple implementations that read and write it, maybe even in various languages (java, php, C# etc.). If you add to this, that you may even have various versions of each application, and many different installations, then you often end up in a situation where it doesn’t make much sense to let the reader validate the data in a very high detail.

Some data may even pass multiple systems. I think that the longest path of data transmission, that I have been involved in, is to transmit very precise and structured data across at least 20 applications, using XML, tab-separated files, SQL and many other techniques, transferring data from the end-user to a national database. You don’t want to update validation code in each of these systems every time the data standard changes a bit.

With regard to the other examples: The same GUI is usually used for all CPR-number registrations (requirement from users) and you wouldn’t want to make a network transmission for every time you assign a name to a new variable in your code.

speaking of hibernate mappings.

With all the fragmentation of classes into smaller and smaller units of coherence, as you suggest. Mapping would become counterintuitive, filled with @embedded and @oneToOne relationships.

The idea you propose is good in certain scenarios, no doubt. But the imperative tone of the article, with the strong disclaimer as the title, is a little bold ;)

— FirstName firstName –

From a pragmatic viewpoint this is just painful. There is no good argument in design to break up entities to this level. In fact to support Lars’ argument. All this would do is promote unnecessary validation, creating a restrictive, impractical system.

new Name(“John”,”Doe”); // can still handle all the validations

new Name(firstName(“John”),lastName(“Doe”)) // Is over the top.

In fact it reminds me somewhat of Scala. A very academically though out language. But far too ugly for the real world.

stephan

@Lars: I’ve lost you, you propose to have no validation at all?

“As you can see, it’s quite simple: You call a FetchCustomerIdForOrder (OrderId Id), and then you have the customer ID and can print everything.”

So you do not have input validation on the file? And put your orders directly into your database without validation?

Then perhaps we have two very different styles of development and discussion on validation is futile.

stephan

@Nick:

“new Name(“John”,”Doe”); // can still handle all the validations”

So you do support a class for a name or not? I’m not sure.

“In fact to support Lars’ argument.”

So you would use String firstName = “John”; String lastName = “Doe”?

And not “new Name(…)” ?

I’m not clever enough to follow that logic.

stephan

@Lars: Or do you want to write a Customer/Order reporting tool and not store data?

Then I would use something like Crystal Reports or any other tool and not write my own code.

And validation is not needed for reporting. As isn’t a domain model.

@stephan:

To your question about validation: We validate a lot, and normally always in the interface where data comes in (GUI or API to external systems).

In the example I gave you, the order ID is validated because if it cannot find an order with the specified order ID, an exception is raised, telling very clearly what the problem is. Checking against a list of existing orders is probably the best validation you can get.

In a database, the same field may require different validation depending on the app that stores it and the current configuration at the time of creating a record. We usually assume that historic data is correct, even if it is created and validated under a data policy that stopped 5 minutes ago. It doesn’t make sense to apply a validation rule that only exists for 1-2 years, if you want to create a system that is in production for 10-20 years.

With regard to principles, the only thing I will dare to really generalize is this:

If you do something 0%, you probably do something wrong.
If you do something 100%, you probably do something wrong, too.

This applies to OOP, UML, unit testing and a lot of other buzzwords/religions.

stephan

Sadly you didn’t go into my reporting argument, so I only can repeat it. Your requirements look a lot like a reporting problem.

I agree with your other argument though

“If you do something 0%, you probably do something wrong.
If you do something 100%, you probably do something wrong, too.”

Because of this the title of the post included

“(or at least less often :-)”

:-)

Peace
-stephan

sean

Lars,

I don’t think you get it. Validation is necessary, but implemented correctly.

Here is my thought process in code:

try{
zipCode = new ZipCode(“90210″);
insertIntoDb(zipCode); // stupid example
}catch(ValidationExceptione ){
showToUser(e); // etc
}

Compared to what I think you are proposing:

Either you are proposing no validation:

String zipCode = “badmonster”;
insertIntoDb(zipCode);

Or some, but externalised:

String zipCode = “badmonster”;

if( ! validate(zipCode) ){
showToUser(zipCode, “Error message”);
}
// somewhere else in the code:
String zipCode = “90450″;
if( doesntValidate(zipCode) ){
showToUserInDialog(zipCode, “Enter zipCode”);
}

————————————

If either of the two is what you are proposing (which I think is B) then my point is proven. By making the zipcode understand what is good data, you never have the risk of reimplementation of code. The object knows whether it’s valid or not.

All your arguments against this are meaningless. I could easily have a ZipCode that determines the type of zipcode validation (for legacy integration) or use the correct one. One object responsibility in no way guarantees failure, far from it.

Validation is definitely necessary, and any failed validation must be correctly reported to the user at the time of data entry or import, or to the external application in case of integration data transfer.

However, it doesn’t make sense to validate simple values at the time of assignment to a variable – that’s waste of CPU power. Try to imagine that you want to import a million order IDs into an array, sort them and output them again. That’s simple: find some code that can read strings, sort them and output them again.

What Stephan basically says, is that this method is not so good, because it doesn’t validate the orderIDs. Ok – let’s say that we have 3 different order ID systems, and we want really good validation. In order to solve the “sort the file of order IDs”, Stephan’s approach would then need:

- Find a way to get additional data about what kind of order IDs these are, so that we know how to validate them
- Implement a class for each kind of order ID validation
- Create one object for each object ID
etc.

It’s totally overkill. You may regard this as a special case, because your projects don’t do much of these things, but there are huge amounts of applications out there, that does these kinds of things very often. Therefore, Stephan’s recommendation may apply to his project, but it surely doesn’t apply to many huge applications deployed in public administration and big corporations.

Rereading this discussion, I’d say there are several subissues:

1) Type check. It’s good to be sure that the order ID values are always order ID values, and stored the same way. I can agree on that one.

2) Debugging. By using a class for order IDs everywhere, you can build debug code into the class generation, which catches specific bugs.

3) Simple validation: Checking the value’s correctness, only by looking at the value itself. For instance, age should be between 0 and 200. If outside, it’s a bug. Doing this check everywhere will definitely harm performance, possibly in ways that makes the app not comply with user expectations.

4) Full validation: Checking that the value is a real value. We usually do full validation, checking that the value is actually in the database etc. A full validation can take 0.1 seconds. Validating 1000 values can take 100 seconds. It does not make sense to build a validation like that into the constructor of a class that handles a simple string value.

The way that Stephan chooses to write his comments, it seems that Stephan is actually recommending a full validation with database lookups, national name service lookups etc., every time you create a new piece of data in your application. Since this is totally unrealistic, I guess that this is either philosophy, not well thought through, or misunderstood by me.

Even simple validation is something we can definitely disagree on, simply because I don’t see the point of introducing new loops inside other loops that are already slow and consume 100% CPU time.

lumpynose

Regarding validation with Stephan’s method, instead of putting validation in the constructor could you have a separate method that you call when necessary to do the validation; e.g.

public class Zipcode {

public boolean validate() { … }
}

And could you have different constructors, one that does validation and one that doesn’t?

And different validators; for example, validateOldStyle() and validateNewStyle().

Stephan, how do you do the getters when you need to use the primitives? I’m thinking that you’d use the ‘as’ style, for example, firstName.asString().

I would also like to see a more fleshed out example. I’m also curious to see how it would work with Hibernate, and Spring’s MVC with its forms.

This is how we validate zip codes:

When entering data: Depending on the country chosen for the person, and depending on the system policies for the database, we look up the zip code in a table. For some countries, the zip code must be present in the table, and for countries that we don’t know details about, we accept anything that complies with the max length of the database field.

When importing data about a person from other places, we cannot reject zip codes that we consider wrong, so we only check that their length is ok.

In other words, our standard validator for zip codes could be described as:

bool ValidateZipCode (zipcode, country, systempolicy, nationalzipcodelists, maxfieldlength);

I wouldn’t have a clue how to call this validator from the constructor of a zipcode object. I’m also not sure if this actually belongs with a zip code object, because that would introduce dependencies from the zip code class to the country class, which might be an unwanted dependency.

Oliver

Hi Stephan,

I am so happy to know that I am not alone in this ‘fully domain development’ way of thinking you are describing in your blog entry.

Maybe the people got hung up on the FirstName and Name examples, which, at first, do not sound like good candidates for wrapping Strings into Domain objects.

But we actually had a problem with the last names on the project I am currently on. The business wanted to limit the last name length to minimum 2 characters, and of course, the code was littered with length checking, from the ui level to the database.

A couple of months later, they dropped that requirement, and it took us ages to find all the checking throughout the code and remove it.

Yes, you can argue that we could have created a LastNameValidator, but the situation was not that simple, because of the way presentation, business and integration tier handled last name for their purposes.

Also, if you are creating LastNameValidator, what better place to put it then the LastName domain object, since that’s the place that should know how last name ‘looks’ like. And once you need to soundex/reverse soundex the names, where does that go?

It’s also no fun to deal with international phone numbers as strings formatted in who knows what way (with/without dashes, braces…). Validation and formatting code all over the place.

Then again, project after project, I’ve seen this emphasis on using Strings everywhere, even in the places where Java already has nice domain representations.

Those developers have no objection on using Strings for email addresses and urls, and then fix endless stream of defects logged against their validation and parsing, even though one line use of java.net.URL and javax.mail.internet.InternetAddress would eliminate the problems before they even happened.

stephan

@Oliver: A heartfully thanks :-)

@Oliver: If you have the same validation code in many places, something is definitely wrong… and I’m shocked to hear about a solution where you would check for the minimum length of a name in multiple places.

However, what Stephan is advocating, is not to reduce the number of places in your source code, where you write validation code, but to:

1) Increase the number of places that the validation code is executed, even to places where the check would reduce the app performance maybe 100 times or more.

2) Co-locate the validation code with a class that is used to contain the data, adding dependencies that may be unwanted

You can achieve much better validation with better performance and better maintainability in other ways.

Oliver

@Lars D:

I welcome you to an average enterprise project, where code is not being written as the books and best practices would have it. 8-)

Jokes aside, I think too much emphasis on this thread has been placed on validation (but it is a large part of business applications), and the original blog post tries to make a point on domain modeling.

The original blog post argued that domain modeling needs to occur at the lowest possible level. At some point, every project, starts introducing domain model.

That concept, ’shields’ (in a way) developers from thinking about low level implementation, and let’s them focus on higher level constructs and interactions within the system.

For illustration purposes, somewhere in the code, one WILL eventually have classes named Order, Person, Customer, Report (depending on what the application does, of course).

When does one start introducing the domain objects? At what level?

Stephan’s post argues that domain starts at the very bottom, at the lowest possible level. The rest should then build on top of it.

He chose to illustrate his point with the simpler examples of Names, and some comments jumped on that. My comment was that I agree with him, even for those simpler examples, and wanted to support his view with a real life example.

I would say, if there are business rules around the thing you are modeling, it becomes a domain class.

- Date of birth for a person in your application needs to be within certain date range? You just got yourself a DateOfBirth class.

- Password is between 6 and 8 characters, and has to have 2 digits and a special character? Guess what, the model just got a Password class.

Domain classes allow for better immutability, reuse, and yes, they should encapsulate their values, as well as their behaviour. Validation? In most cases, it should probably be a part of the domain class, as it is the place that has the most knowledge on what is being validated.

Is it an overkill to model even the Names in such a way? Maybe in simpler cases. However on larger projects, where the data may come from several databases, mainframes, web services, you as a developer, really need to rely on something telling you what that String/int/long really IS. Domain model lets you do that.

Now, the biggest problem actually is not writing this code properly, but to get the business stakeholders actually agree on the business domain. Is contact address the same as principal residence address? Does it have the same fields and which values should be in those fields? What is a ‘valid’ mailing address? What ‘is’ a Person?

I’ve never worked on such a project that had domain constructs at such low level, but after seeing the issues we are encountering, and having recently read Domain Driven Design and blog entries like this, I really want to try use such concepts on the next projects.

There is a push back from nearly all developers I’ve worked with, and that’s why I welcomed reading this blog entry. Most developers are arguing against such modeling, but they will happily use java.util.Date (for now, let’s put aside the suckage of that class), instead of a String, or long, or three Strings/ints (day, month, year), or 6 Strings/ints (day, month, year, hour, minute, second).

When you look at it, Date IS a domain model, raising above the primitives to model a real world concept (with some internal validation and ‘business’ rules).

Finally, to comment on your points:
1) [increasing the number of places that the validation code is executed, impacting application performance]

I am not sure if the domain modeling increases the number of places validation code is executed. As you are accepting the data in your application, you have to validate the data anyway. On some projects, just because you are reading data from the database, does not mean it’s good (valid). Some other application sharing the database might have put the wrong data in.

As for the performance impact, who knows. Profile the code and see where the problem lies. Then fix it. Anything else is just an assumption.

2) [co-locating the validation code with a class that is used to contain the data, adding dependencies that may be unwanted]

I don’t know what dependencies are you having in mind, but you will have them in your code anyway. Just because you use a String for a password on your Account class, and have a separate class named PasswordValidator in a separate package, does not mean they don’t depend on one another.

Actually, I am arguing that having such code close to one another actually makes the dependency visible, tangible, easily understandable, exposing business rules and making developers aware of it, stopping them from writing yet another password validation somewhere else.

Huh. Time to go home.

sean

Oliver, great points.

People just don’t seem to get that if their objects need logic, it should be encapsulated in that class.

E.g. instead of:

Collections.sort(users, new Comparator(){ etc etc}

it should be:

Collections.sort(users, User.COMPARE_FULL_NAMES);

There are many, many reasons for this, but the primary is that all logic for the object should be as close to the object as possible, at worst in another class in the same package.

I have been stuck with codebases of my own design that used the former. What happens when the customer wants to change the sorting method? Code search. We may as well use ruby in that case.

stephan

“However, what Stephan is advocating, is not to reduce the number of places in your source code, where you write validation code, [...]”

Interesting what you think that I’m saying.

I do think it’s good to have validation on object creation, though this topic is miles away from the post which was not to use Strings instead of classes.

If performance is an issue because of validation, have a Valid interface with an isValid() method. If you have different contexts of validation, change the validation code. If that’s not possible, habe a ValidatorService. As the title of the post – which noone seems to read – suggests, do the right thing don’t follow a dogma.

@Oliver: Don’t be personal, making incorrect assumptions about my field of work. FYI, I’m working on nationwide EA systems.

@Stephan: Yes, your primary message was to make type check stronger. My primary objection to this is, that you say “never, never, never”. When a system gets over a certain size, you need to create tools that handle various kinds of information the same way – so that the code really doesn’t know if the string is a name, a zip code or a product ID.

You mention Qi4j in the passing, without much context. And Qi4j is not OOP, but what we call Composite Oriented Programming and this kind of discussion is mostly meaningless and at the wrong level of focus.

Classes are Dead, Long live Interfaces.

To give this audience a teaser;

public interface Person extends HasName, HasAddress, HasSpouse
{}

and no more coding is required. No magic, pure Java, clever runtime….

stephan

@Niclas: Thanks for the clarification and the teaser. I’ve mentioned Qi4J before, commented on Rickards blog and had some email exchanges with Rickard. So the “in the passing” is purely random.

I’ve mentioned Qi4J in other posts like

http://stephan.reposita.org/archives/2008/01/09/qi4j-the-next-java-forget-scala/

Very instructive article and discussion.

Stephan, could you blog at some suitable time your ideas on rule 8 in Object Calisthenics: “Use First Class Collections”.

I wonder whether it is really worth the extra effort or what are concrete benefits in that.

stephan

I’ll do

Buddy Casino

I’m not sure it is really worth the effort in Java. Maybe the following language constructs would help:

- Named method arguments, so you could write:
new Customer(firstName=”Stephan”, name=”Schmidt”);
This would be great for DI, also.

- Having special constructs for such value objects, like Scala case classes.
Note that getters, setters, equals & hashCode are constructed automatically by the compiler:
case class Name(name: String)

What you propose adds way to much bloat, and Java already has enough of that. Oh, and btw.: you can actually perform validation in setters, you don’t need a separate class for that…

stephan

@Buddy: Yes could be. I thought named parameters are nice in any language where I have used them.

I tend to disagree: The “bloat” is rather small but the benefits are great, 5 years into development it makes development much faster and productive and less error prone.

Hi stephan,
but if you create an apposite Class for every attribute, have’nt you an explosion of class?
In this way, you destroy all the clarity gained (by the class closer to domain) in a lot – too much – class?
If I create a class Name what is a simple wrapper of String

public class Name{

}

stephan

@alepuzio: Usually you do not need this vor every attribute, Name, Description, Order, Id, Range, Duration and more are reusable over several classes of your domain model.

That aside, yes there will be more classes.

I don’t think it’s too much, I think it helps read code:

(String, String, String) as a signature

versus

(Name, Firstname, Description)

I find the second one more readable and being more clear.

There are (as usual) valid cases for both approaches. To me this is a “Silver Bullet”-discussion, and if you have not learned this yet, say after me; There are NO silver bullets.

Having one class per String usage (which is the extreme case) sounds like ‘class explosion’ to me, not doable in large domains. But selective use of ‘domain types’ makes sense to improve readability.

stephan

@Niclas: Sorry to disagree, this is not a silver bullet discussion. What led you to believe this is one? I don’t claim 10x productivity, I don’t claim this will solve all your problems.

“The metaphor of the silver bullet applies to any straightforward solution perceived to have extreme effectiveness.” -Wikipedia

I consider it more of a “Best practice” post.

“Having one class per String usage (which is the extreme case) sounds like ‘class explosion’ to me, not doable in large domains.”

You will notice that there are not that many String-types as you think there are. Most of the time the Strings are of the same “type” – Name, Description, Titel and so on. Which more or less proves my point when developers cannot or do not categorize Strings into types but perceive all of them different.

“not that many String-types as you think there are. ”
I doubt that you know all domains, and I just look at the numbers in my current project. Anyway, as I said; for the string based domain concepts, it indeed makes sense. As you said, not too many of them. And they clearly communicate intent, purpose and adds clarity. I think we are in general agreement, but people always see these kinds of “Best Practices” as an “Either/Or” choice. I just trying to say, “Balance”.

stephan

“I doubt that you know all domains, [...]”

No I don’t you’re right.

“I just try to say, ‘Balance’”

Yes, as the title is “Never, never, never use String in Java (or at least less often :-)” which was my amateur attempt to balance it :-) I totally agree with you, most often it’s not an “Either/Or” choice.

Ralf Schneider

That’s not enough!
Be aware all your classes like Name still contain things like strings, integers, … !!!!
Could you add another layer of indirection, please?

I’m happy people like you exists!

stephan

@Ralf: Nice try. but I’ve read to much Hofstadter to fall for this one.

I came to the same conclusion. Although I believe that following the domain closely is a good reason, my main motivation is to avoid errors leveraging the Java type system.
Slightly modifying your example:
bookTicket(String arg0, String arg1, int arg3)
versus
bookTicket(Name arg0, Film arg2, Count arg3)

In the first case, nothing prevents me to book a ticket to see the “Gabriel” movie in the name of “Apocalypse Now”… is very easy to mix up the parameters. To catch that I have to actually write a unit test for something the compiler is giving me for free!
Even if wrapping the parameters takes the same amount of code than the test, I don’t need to run it to have the result.

I believe that for non trivial projects, having the (type) contract enforced by the compiler is a good thing :)

@Gabriel; You mean like;

bookTicket( new Name( “Apocalypse Now” ), new Film(“Gabriel” ), new Count(1) );

;-)

The Strings are coming from “somewhere” and they can be messed up along the way, no matter what you come up with…

stephan

@Niclas: Verification has been discussed somewhere in this thread I think.

@Gabriel: Yes, I also think it makes code less error prone.

Dan

Wow, this post generated a lot of controversy, which I missed. I think the most salient points are from Sean (in that case just use Ruby) and Gabriel (compiler benefit).

We are Java developers and we are using a statically typed language. By defining types at compile time we are giving the compiler the information it needs to make certain guarantees. We can not use booleans where Strings are expected. We can not use Strings where doubles are expected. When a method accepts an integer parameter, we know it is receiving one; the compiler makes and keeps that promise.

The question is, how much information do we want to give the compiler to gain this benefit? A lot of people here seem to be arguing that we should only create classes for compound types, and for value types we should just use whatever classes are available, because (I guess) creating classes is too much work.

I haven’t read Domain Driven Development (though I may, now) but this is something I’ve been thinking about lately. To use an example from my project, a task ID is not a project ID, even though they’re both stored in the database as number(10,0) and represented in Java by an int. I think there is value in creating a ProjectID class and a TaskID class so that the compiler can guarantee, not just that the ID I have received is an integer, but that it is an integer representing the ID of a task, and not the ID of a project.

It is true that the IDs are populated somewhere using an integer, but it still greatly reduces the number of places in code that the knowledge that the ProjectID is implemented by an integer is embedded. Lots of people have been saying, “YAGNI”. I don’t see this as a question of building something you ain’t gonna need. That principle as I understand it is to avoid building something because you think it might add value later. Creating these ‘domain primitive’ classes adds value today, the moment they are written, in the form of those compiler guarantees. Also in the form of IDE assistance etc. which is definitely nice to have but secondary. Less repetition, easier refactoring, and knowing that you have a ProjectID are also real benefits.

As far as saying that the code is =less= readable when using domain primitives … I find that baffling. How many times do I have to look up TaskID to know it is ‘really’ an integer? In fact, I never have to look that up. I know it is an integer. Everyone working on this system knows it is an integer. And if we have a new developer, he will have to look at the TaskID class once to see it is an integer.

Every time a developer looks at a TaskID, he will know it is an integer. But the converse is not true. I can be looking at an integer. Is it a task ID? I don’t know. Even if the variable is called taskID, I don’t know. You bring up method declarations with many variables of the same type. It is trivially easy to mix up arguments of the same type.

The cost of this approach … well, I will have to write a TaskID class. I expect that to take one minute. Developers will have to look at the TaskID class to comprehend that it is represented by an integer, if they are working on a system boundary where they care how it is implemented. I expect that to take ten seconds. If it saves two minutes of debugging, it’s worth it.

I liken it to using generics. When Java 5 was released, I added generics to parts of our code base and found many bugs that had been lurking there.

Thanks for this post, Stephan. You’ve really given me something to think about. I’m going to try this, so that I can more fully exploit the strict type system that is after all one of the reasons I am using Java.

stephan

@Dan: “I think there is value in creating a ProjectID class and a TaskID class so that the compiler can guarantee, not just that the ID I have received is an integer, but that it is an integer representing the ID of a task, and not the ID of a project.”

I’ve tried this too lately, and I really like it. It makes code more readable than using int/long/String for IDs. As you’ve saif, ProjectId is something different than TaskId. And with a growing code base, problems will arise if every Id is String/Int/Long. (mostly because developers – if you take enough of them – won’t name every ProjectId the same, some call it ProjectId, some PID, etc. all being Longs. They can’t screw up with a ProjectId type)

Thanks for your long comment, thanks for the insights.

Nathan

Just wanted to say that less than an hour after I read this I ran into a problem on my project: “what is customerId, and how do I find it?” The answer wasn’t pretty. The code wasn’t reusable. I will definitely be using these ideas on projects I have more control over.

stephan

@Nathan: Thanks for your comment. Yes I had the problem also several times in the past.

lumpynose

@Dan: “How many times do I have to look up TaskID to know it is ‘really’ an integer? In fact, I never have to look that up. I know it is an integer. Everyone working on this system knows it is an integer. And if we have a new developer, he will have to look at the TaskID class once to see it is an integer.”

I definitely agree with your overall ideas, but for this particular part I tend to think that you’re not likely to really or often need to know what’s inside of a ProjectId or TaskId class; it could be a single primitive, or a salmagundi of things. And that’s good.

Just want to announce that Qi4j now have its own equivalent recommended by this blog entry.

public interface SocialSecurityNumber extends Property
{}

which can then have constraint rules added to it centrally;

@Constraints( SocialSecurityNumberConstraint.class )
public interface SocialSecurityNumber extends Property
{}

I am pretty convinced that this technique will make apps a lot more robust in the long run.

Aaron Faanes

I know this post is rather old, but I figured I’d toss my two cents in:

For one, I completely agree with the opening post. I find the mocking replies to this post frankly offensive, on a few levels. The author does not suggest to meaninglessly wrap primitives simply to do so to make life harder. His suggestion is to not use ambiguous types where specific ones are more appropriate. String, double, int, and so forth, have no real context, no rules governing their use or mutability. (Beyond those introduced in the declaration through modifiers and visibility)

Primitives imply no meaning and leave it up to the developer to determine what is in fact valid and what is not. I mean, some replies said that doing this pattern “simply [so] your IDE will do a better job of helping you understand the code.” As though that is some travesty!

To say that the indirection is meaningless seems to assume that all values that a String can hold are also valid for someone’s name. For example, Strings can be null, ” “, “”, “$@##%%@%#&^$%”, “a”. Are these all suitable names? They must be, since using String directly for the name is apparently the trendy thing to do.

After listing those, to use TDD as an example that String is in fact the best data-type is simply absurd. The tests you make must have dreadfully poor coverage.

Of course, the easy solution to my scenario is to test for these cases in the setter, and throw IllegalArgumentException’s whenever these situations occur. Indeed, one can jury-rig all kinds of conditions into this BookTicket class to assure validity. A previous reply by Xavier explains the reasons why this is a bad idea (Centralized validation, throwing exceptions at the logical place where the exception occurred, and so forth)

After all, it’s highly unlikely this is the only place in this project where you would have a name – most likely, you’d use it many places, and each would be vulnerable to the same invalid test-cases described above. An ‘agile’, YAGNI-inspired solution of ‘using strings because its ez’ would waste time and force the developer and tester to ensure everywhere a name is used, that it’s correct.

Beyond these practical concerns, a more pressing concern would arise. The BookTicket class’s responsibility lies in accurately representing the state of a BookTicket. It should not have to worry about making sure someone’s name doesn’t consist of only letters. It should not have to worry about whether the orderID is already in use. It’s a simple class, and should be made simply.

The idea of responsibility-driven design is an old one, and I think one that’s often forgotten by people. It’s become second-nature to just pile features and functionality unto one class, under the convenient guise of YAGNI to belie a hidden sense of downright laziness.

I doubt I am alone in finding a few, small classes _easier_ to understand, than one monolith, whose implementation must span the gamut between name validation, orderID validation, and BookTicket representation. This defiles the true intent of the class, and makes maintenance, extensibility, testing, and understanding the code a nightmare.

An analogy to this concept that comes to mind is HTML tables. I remember reading a post about how they’re flagrantly abused to layout pages, when in fact they’re simply intended to (surprise!) represent tabular data. Now, getting a consistent layout can be difficult, so I can understand being lax on that point, but there is *no* excuse for being amateur when it comes to Java.

A String should be used to represent arbitrary text. An integer should be used to represent an arbitrary integer value. If these are sufficient (e.g. can be null, random symbols, Chinese characters, etc. in the case of String), then by all means, but to assume that just because a name consists of letters means that it should be a String is madness.

alberto gori

I wonder how an experienced programmer could end up with such proposal.

Have you ever worked on some big projects or only toy programs with ten classes?

For example, if I follow your apprach, mapping hundreds of hibernate entities, I would end up with thousands of new classes.

It’s simply impossible to code, read and mantain.

stephan

@Alberto: Could be that I’ve only worked on “toy programs”, I don’t know what you consider “big”. The largest apps I’ve worked with were >10k classes and 1.5 MLOC.

And I’m not sure either what you consider “experienced” – because you don’t give a definition. I’ve written code for around 30 years in around 20 programming languages.

For more information see the link label “About me” at the top.

@Alberto; You may think so, probably deriving it from an observation that you have approx 10 strings in every hibernate class, hence the 10 times as many classes. In reality, you will find that the bigger the program the greater the benefit, i.e. each type of string is showing up in many, many places and with an explicit type you get additional ‘communication of intent’.
I happen to be of the opinion that there are a few exceptions. System.out.println( String message ) is a classic example, but as soon as the string has a format, range, or implicit meaning, i.e. most of the time, then Stephan’s suggestion is very, very good for long-term maintenance.

Fabian

Hi Stephen,

I’ve read your post and I found it _really_ interesting in terms of OO programming. That’s the kinda things I love to learn and understand. So thx for this :-)

Still, I wonder what the consequences are on O/R mapping. I haven’t had the time to read all the comments, I just saw some people asking about mapping with Hibernate, and didn’t find any anwser on that topic. If you define Name and FirstName classes that can be used by several other classes, handle the O/R mapping will be a nightmare, won’t it? (I guess you won’t create a specific table for Name and another one for FirstName, as you would have too many joints) Do you then have to handle the persistance layer by yourself without a O/R mapping framework? (or you just use object databases ;-))

Cheers!
Fab

ulf

If only String wasn’t final, then this could be so easy. And we could use those “semantified primitives” directly with existing APIs (“semantified primitives”, that’s how i call them, even if i only really used that idea once, due to the obvious overhead).

Of course, a non-final String would be evil in many other aspects, so what we’d really need would be a langauge change that made it legal to extend final classes whenever the implementation is not changed:

[code]class Name extends String; //empty implementation![/code]

or, for more complex cases:

[code]class Name extends String implements OneOfMyEmptyInterfaces; //empty implementation![/code]

Guess this will be left to languages like scala.

The humorous thing isn’t the post itself, but the rush by people to agree hardily to anything that has been written in a blog. Just Hilarious.

cd

When does the headache end?
Never, never, never use (unwrapped) String or long or int?
Give me a break.
The next thing u will be saying is to make objects for objects for objects for objects …

Here is a idea you can use “KISS”.
(K)eep (I)t (S)imple (S)tupid

Over complicating a simple data holding object is a waste of time, memory, and processing.

cd

ulf,
Create ur own String object.
Unzip src.zip in ur JDK directory and change the package name on the String object.
Then ur set.

LuisKarlos

We found a possible nice solution for the thousands of classes problem.
In our case we were using integers as ID for some objects (after reading your article we thought that it is a very interesting point), so we decided to look for a solution… and this was it:

//THE PROBLEM
//Concrete classes
class User
{
private int id; //strong typed int, messy primitive

class Role
{
private int id; //strong typed int, messy primitive

Instead of using a simple int as ID we created this class:

//THE SOLUTION
public class ID
{
private int id;

public ID(int id)
{
this.id = id;
}

public void setId(int id)
{
this.id = id;
}

public int getId()
{
return id;
}
}

but now we have:

class User
{
private ID id;//Strong typed ID

class Role
{
private ID id;

This is a really nice enforcement of types. Now we can have:

List<ID> usersID = ….
List<ID> roleID = ….

instead of

List usersID = …..
List roleID = ….

An even more Generic Solution:

//THE SOLUTION
public class ID
{
private R id;

public ID(R id)
{
this.id = id;
}

public void setId(R id)
{
this.id = id;
}

public R getId()
{
return id;
}
}

LuisKarlos

mm.. it removes the generics from my code… :(

Good post. Caught my attention due to your current layout improvement, and the fact that was one of the few posts I had not read.

But this being such an old post, I was amazed that no-one mentioned “Primitive obssesion” code smell from Martin Folwer’s book. Jeff Atwood comments on this as well: http://www.codinghorror.com/blog/2006/05/code-smells.html.

Howerver I considerthat like all rules for good design, more important than merely applying them it to know them, know when to apply them, why to apply them and how to apply them. But I am a TDD/agressive refactoring/XP fan, therefore I do not fear old bad design decisions.

On the other hand, primitive obsession is very damaging to DDD, as it really defies the purpose of ubiquitous language and business readable code (at least for business rules). And not only to static languages, but on dynamic languages as well (for instance, Folwer kept the code smell on his Refactoring Ruby with Jay Fields), even though the looser coupling due to lack of type annotations focus more on good names than on good types (and makes it easier to refactor).

“Howerver I considerthat like all rules for good design, more important than merely applying them it to know them, know when to apply them, why to apply them and how to apply them.”

I agrre with that, and will reread Fowler.

Antonio

Come on, use common sense this is unnecessary and verbose. Then is not convincing even from a theoretical point of view.
In programming the abstraction is never an end in itself but is functional to the problem. The definition of a class “Name” would make sense only if you have to do special processing on the names.

bye
Antonio

@Antonio: I disagree for the reasons mentioned in the post.

Emilien

Hello,

I found this article and the comments very interesting -I just read many- and, as a java beginner, I wondered if a name convention for the “String objects” like NameString (for the Bookticket object) would be a good idea to explain this way of thinking ?

Emilien.

@Emilien: Encoding the (Sub-)Type in the name of classes is not a practice I favor, it leads to tighter – syntactic – coupling.

Peter

Interesting. Not only almost entirely your opinion regarding whether this is a good idea or not, but bad practice.

I wonder why the Sun code base doesn’t do this … oh yeah, bloated overhead is also a bad practice.

In truth, proper design and docs alleviates what a method does. I mean really, do you have a sticker on your car keys that say “this is used to start my car”?

If you need this much direction on anything, then maybe we need a batch file that executes java that says “this is how we compile code.bat”.

In the end, things MUST be of a primitive nature, either in code or in persistence (no Classes in databases). If you ever took the time to unwrap the Integer class, at the base, is an INT .. hmmm, maybe you should check out the Java source code.

Oh well, good luck with this strange way of trying to be efficient.

Peter

Lukasz

The author discourages the use of Strings and primitive types for method arguments in favor of descriptive classes because his IDE doesn’t show the names of variables during code completion.

An effective javadoc in an effective IDE can solve the author’s problem without unnecessarily polluting the design.

@Lukasz: pollutung the design? I suggest watching

http://www.infoq.com/presentations/Value-Objects-Dan-Bergh-Johnsson

Stephan,

It really amazes me how often people in our business like to condescend and patronize using straw man arguments and ad hominem attacks while responding to a post with which they disagree. You would think that folks seemingly used to the clean logic of code would be able to have a debate based upon the logic and merits of the argument itself, and not attack the author.

I laud you for your thoughtful ideas and measured responses, when it would be easy to stoop to such a level. I also noticed that you were often the only person in this long long list of responses that would admit shortcomings in your argument and correct yourself in the process – when your opposition would often just keep trying to hammer their point. I am very impressed.

This pattern you present is definitely worth consideration – and I agree – NEVER use double for any kind of money representation. At the very least, use BigDecimal. ;)

I really like it for ids for domain objects. the interface for most any id can be the same for them all – Identifier – with OrderIdentifier(Impl) having different implementations than ProductIdentifier(Impl) than SomeOtherIdentifier(Impl). That seems a beautiful and elegant way to keep those implementations hidden – who really cares how to construct the id, except the data access layer? Everyone else in the stack can just pass that Identifier around, use accessor methods, and call toString for the UI to use if needed.

Nifty! Thanks.

@Mark: Thanks. Most of the time it isn’t easy – and I’ve been through my share of alt.*.advocacy Usenet flame wars.

But in the last years I’m really more interested in “truth”, not opinion, which is hard in our business. A real impact on me had the Hacknot essays.

[...] wrote a good article about evil Strings over at codemonkeyism, so I won’t give another example here. But if you are tempted to add a integer parameter or a [...]

Chad

I can’t believe all the haters on this blog. Take the example and use it as appropriate. My god.

Thanks for taking the time to share this Stephan. What it first hit me as is a way to emulate named parameters in Java. Very useful and thanks again.

@Chad: Thanks, yes because as I said on another post:

“I was using named parameters the first time during the 80 I guess with a programming language called E on the Amiga (or was it the 90s?). I like the idea since then, but Java regretfully doesn’t have them.”

Leave a Reply

What people wrote somewhere else:

Additional comments powered by BackType

Guide to CodeMonkeyism

Over the last 4 years I wrote many articles on this blog. To make it easier for you to find the relevant ones, I've organized them into topics.

Top 10

6 reasons why my VC funded startup did fail

Go Ahead: Next Generation Java Programming Style

Java Interview questions: Write a String Reverser

The dark side of NoSQL

7 Bad Signs not to Work for a Software Company or Startup

Is Java dead?

Scala vs. Clojure

Never, never, never use String in Java

No future for functional programming in 2008 – Scala, F# and Nu

Clojure vs Scala, Part 2

Java Developer

Is Java Dead?

Go Ahead: Next Generation Java Programming Style

Be careful with magical code

All variables in Java must be final

Never, never, never use String in Java

Bending Java: More readable code with methods that do nothing?

NoSQL Guy

NoSQL: The Dawn of Polyglot Persistence

The dark side of NoSQL

Essential storage tradeoff: Simple Reads vs. Simple Writes

Sharding destroys the goals of your relational database

The unholy legacy of databases

Startup/CTO

Development Dream Teams

6 reasons why my VC funded startup did fail

American vs. European style of Software Development

12 Things to Reduce Your Lead Time and Time to Market

The high cost of overhead when working in parallel

Essential storage tradeoff: Simple Reads vs. Simple Writes

Job Seeker

Another Good (Java) Interview Question

7 Bad Signs not to Work for a Software Company or Startup

Java Interview questions: Write a String Reverser (and use Recursion!)

Java Interview questions: Multiple Inheritance

As a Manager: What I value in developers

Top 10 Tips (+1) to Get a Pay Raise

Agilist

What Developers Need to Know About Agile

5 Practices Better to Change in Your Scrum Implementation

Scrum is not about engineering practices

ScrumMaster and ZenMaster: The joke of certification

What is Trans-Scrum?