Conf42: Enterprise Software 2021

Data-Oriented programming

Video size:


Data-Oriented programming is a paradigm that aims at reducing the complexity of software systems and making the development experience more productive. Data-Oriented programming draws a clear separation between code and data and treats data as a value that is manipulated by general-purpose functions. In this talk, we illustrate the principles of Data-Oriented programming in the context of a Java production system.

After attending this talk, you will be able to apply Data-Oriented programming principles in Java, and reduce the complexity of the systems you build.


This transcript was autogenerated by OpenAI's Whisper. To make changes, submit a PR.

I'm really glad to be here at Conf42 for my talk about data oriented programming in Java.

The purpose of this talk is to give you a couple of insights that hopefully are going to help you to liberate yourself, at least a bit from the complexity of objects.

A couple of words about myself, I have been a developer since 2001, first in C++, then in Java, also JavaScript, Ruby and Closure.

And I'm the author of a book named Data oriented programming.

And in this talk, I'm going to share a couple of insights from the book and how to apply the principles of data oriented programming in Java.

And if you find my talk interesting, you might want to purchase the book, and I'll give you a coupon for discount at the end of the talk.

And you can follow me either on Twitter or on my blog at blog.



So what is data oriented programming? Data oriented programming is a programming paradigm and at reducing the system complexity by treating data as a first-class citizen.

What do we mean by complexity? If you look for complexity in the dictionary or in Wikipedia, you will find first the definition of computation, computational complexity, which is the amount of resources, machine resources like CPU or memory, that are required to run a program.

But there is another meaning of complexity, which is the system complexity.

And the system complexity is the amount of human brain resources required to understand a system.

So computational complexity is the time it takes to run a program.

And system complexity is the time it takes to understand a program.

And data oriented programming aims at reducing the system complexity.

In other words, when a system is written according to data oriented programming principles, the system is easier to understand, to maintain, or to add new features.

So let's ask ourselves what usually makes a system complex.

In my book and in this talk, we are going to take a classic example.

Imagine you need to design and implement a library management system, discipline the object-oriented developer.

So the first thing you do is to think about the design, the classes, the objects, and the relationships between the classes of your system.

And you might come up with a design like the one on the screen, where the entities, the main classes are the library, the library as a catalog, and the user management.

And in the catalog, we have books and authors, and books as book items.

And in the user side, we have different kinds of users.

We have librarians that can add books to the library, and we have members that can borrow books from the library, and members have booklandings, and booklandings belong to book items.

And you will probably come up with a design similar to the one that is on the screen right now.

If you are an experienced Java developer, you are probably going to use a couple of smart design patterns that might make the design a bit simpler, smarter, whatever.

But my point here is that the system here is complex in the sense that it's hard to understand.

And if you take a further look at this URL diagram, you might find that there are the sources of the complexity of the system, is that we have nodes in the system in the graph with many edges.

Look at the library and class.

It is connected to one, two, three, four, five, six classes.

It's a big number, six, in terms of relationships between nodes.

Another thing that makes the system complex is that we have many kinds of a rows of relationships between classes.

We have association, like, for example, between book and author.

We have composition between catalog and book.

We have inheritance between librarian and user, and also between member and user.

And we have usage relationship between, let's say, librarian and book item.

So it's a burden on our mind.

And this is what I mean by a complex system.

It takes time and energy and effort to understand a classic object-oriented system.

So the first thing that data-oriented programming guides us to do is to separate between code and data.

Usually, in object-oriented programming and Java, we tend to encapsulate data inside classes.

And to mix together data and behavior inside classes that provides methods that manipulate or modify the state of the object.

And look at what happens if we simply split each class of our system into two classes, where one class is responsible for the code, the behavior, and the other class is responsible for the data.

For example, we take the library class that mixes data and code together, and we split it to a library code class and a library data class.

And the same, we take the catalog class and we split it between catalog code and catalog data and

so forth and

so forth.

And what happens in terms of system complexity is that instead of one system with many relationships between the nodes, we get two disjoint systems with much simpler relationships between the nodes or the classes of the system.

And this is really a great benefit for our mind.

It makes the system much easier to understand, to reason about, and to maintain.

And the reason is that we have separation of concerns.

We have code on one hand and data on the other hand.

And also we have constraints on the code diagram on the left.

All the methods in our classes on the left are going to be stateless.

I will be going to see in a moment.

And the relationship between code classes is only usage relationships.

And the same on the data diagram, we have another set of constraints, which is that the only relationships between data classes are either association or composition.

So putting constraints on our diagram tend to make the overall system less complex, easier to understand.

So instead of the big mess or the complex system that we add on the left, where code is mixed with data, we get a simpler system made of two disjoint systems.

And this is huge benefit for our brain.

Let's see no practically how we do that in Java, how we separate between code and data in Java.

Actually it's quite simple.

We put data on classes that have only members of course, getters and setters.

For example, and also data will have a first name and the last name.

That's it.

No methods beyond setters and getters.

And for the code, we have classes like author code with only static methods.

No state, no data.

The data that is to be processed by the method is passed as an explicit argument to the method.

So for example, if we have a data object representing is that as a month and we want to calculate the full name of this author.

Instead of what you are probably used to as a month dot full name, we call author code full name, which is a static method.

And we pass to it as an argument, the object with the data that we want to process and it returns a stream is that as a month.

So that's how we separate between code and data in Java.

We have data classes with member only and code classes with static methods only.

So that the first benefit that we gain from data oriented programming, it makes the system easier to understand.

Now we are going to move forward and see what how we can make the code easier to understand.

And for that, we are going to ask ourselves what usually makes code hard to understand.

The first thing that makes code hard, hard to understand in Java is that we, when we, when we pass an object or an argument to a method, we, we have to ask ourselves whether the object is passed by reference or by value.

And it's, it's difficult to, to answer clearly to, to the question to this question and usually the question, the answer that we get in Java tutorial is that in Java object references are passed by value, which is really confusing object references are passed by value.

And to show you an example of this complexity, let's take again our example with Isaac Azimov as a data object and see what kind of complexity we have usually in object oriented programming.

And let's say that we, we, we have a method in our author called class static method that transform the last name of an author into uppercase.

So here how we call this method, we have azimov.

And the method returns another data object where azimov is uppercase.

So the last name of azimov number two is azimov uppercase.

Now the question is what happens to the first azimov? Did the method mutated the data object or not? And by looking at the code, you cannot really know it depends on the implementation of this static method to upper last name.

And if the implementation mutates the object, the object that it receives, it's going the response, the last name of the first azimov is going to be uppercase.

And if it's not the case, it's going to be lower case as it was passed.

And the reason for this confusion is that when we pass an object, refer an object to a method, we pass a reference to the object.

And the method now has access to the object.

And if the method called the setters of the object that we pass, then it's going to mutate our object.

And the way we usually protect ourselves, one way to protect ourselves is to copy the object before passing it to the method.

We call it defensive code.

And this is one thing that makes the code hard to understand or to write.

Every time we call a method, we need to ask ourselves if the method going to change my data or not.

And it's another cause of complexity.

Another thing is in a multistrated system, in a multistrated Java program, we need to be careful when we pass object references to method.

And let's take a look at this simple example.

Let's say we have a member data and a member could be either blocked or not blocked.

And when a member is blocked, the member shouldn't be allowed to borrow books anymore.

So in naive implementation of the borrow function in the member code could be let's check if the member is blocked here by calling the is blocked method of the data object of the data member object.

And if the member is not blocked, then we are going to allow the member to borrow the book here to print to the console the book is yours.

Can you see why this code is problematic? Can you see why this code is not threat safe? And the reason is that between the line that checks if the member is blocked and the line that does the book borrowing, there could be a context switch.

And in another thread, the member could become blocked.

And that's definitely a source of complexity of and how do we protect from that by adding locking mechanism.

And when we add lock mechanisms to our code, it definitely makes the code how to understand.

And we might get into dead logs and we need to think carefully how to leverage the lock mechanism

so that we make sure we don't have any dead logs and it also lock mechanisms also have negative impact on the performance.

So for that data, I know programming has a very simple solution.

Do not muted data.

If you treat the data as a value, it will never change.

And if data is not going to change, we won't have any problem.

When we pass data to a method, no matter if we are in a single straight-in environment or in a multi-strand environment, we have the guarantee that the data is not going to change.

And that's a huge benefit in terms of simplicity, simplicity.

It makes the code much, much easier to understand.

You can look around and you will find many great articles that explain what are the benefits of immutable data in Java.

And the more important ones are that when you did with immutable data, you are inherently straight, safe, and you have no side effects.

Now the question is how to represent immutable data in Java.

And we have at least two options here.

And as you probably have noticed, any problem in Java could be solved with Java annotations.

And this is how the project long book proposes to represent immutable data simply by adding a value annotation to a class.

And when we add a value annotation to a class, what we gain from project long book is auto generation of public constructors, immutable private fields, getters, setters, two string, ash code, and equal.

And we are guaranteed that the member fields are not going to change because they are marked as immutable by the code that is auto generated.

Another option that came up recently in Java and actually is available only in Java 14.

So it might take couple of months or years until it's adopted in production.

But I think that's an interesting one is that since Java 14 you have data classes or data record as with native implementation in the JVM.

And that's I think that's great because you don't need to rely on

so party libraries and auto generation of code.

You have a native implementation of again, constructor, immutable private fields, getters, setters, two string, ash code, and equal.

And the guarantee that the data cannot change.

And if you apply this second principle from data oriented programming about dealing with immutable data only the benefits that you gain is that no mutations, no surprises, no need to defend copy against possible mutation or possible.

And invalid state of your data and the code is in a rental straight safe, no risk conditions, you don't need the locks any lock mechanism and the code is definitely easier to understand to maintain to resolve about and it makes our systems simpler.

So before we wrap up this presentation, let me mention other topics that I'm addressing in the book and that makes the system that we build in Java even simpler in the book you will learn how to leverage efficient immutable collection or sometimes we call them efficient assistant collections

so that even when you have a huge collection of data, you can create new versions.

And of it without having to deeply copy all the data for before you create a new version of it with this slide modification, you will learn how to represent more and more data in using maps and it gives it will teach you how to manipulate data with general purpose functions like map, filter, reduce, group by merge, etc.

You will learn how to achieve polymorphism without in inheritance, without you know the big class hierarchy, there are other ways to achieve polymorphism.

You will learn also in the book how to manage the application state, state when you represent the whole state of the system as immutable data and to get highly scalable concurrent systems with optimistic locking instead of locking mechanism like mute access.

You will learn also how to get a flexible access to your database to give you a lot of freedom and flexibility when retrieving and manipulating data that you fetch from the database and that you want to stand over the wire, let's say using JSON serialization.

So that's the book and let me leave you with this diagram with this my map that summarizes the main principles of data oriented programming in Java.

First you separate between code and data, the code is written with static methods only on the green here, never avoid as much as you can instant methods.

And the data is represented with immutable data either with records that come that are available in Java since Java 14 or with a self-party libraries like Project Longbook that provides smart Java notation like value that generates all the code that's necessary to make sure that your data classes are immutable.

So if I hope that I motivated you to take a deeper look at data oriented programming and how you can apply it in Java to make and I'm quite sure that it will make your system less complex and now come the question what are you going to do with all the free brain cells that are going to be available when you move from classic object oriented programming to data oriented programming.

So please take a look at the book you can scan the QR code to be redirected to the book at mining.

com and you can enjoy a 50% discount with the coupon that is that appears here on the screen.

If you are listening from a podcast, the coupon is ml charvete to ml s h a r v it to and if you Google that out that programming you will get a link to the web page of the book.

It was a really pleasure to be here at CUN42.

Thank you for having me.

Enjoy this insights coming from data oriented programming and apply them with fun to your Java programs.



Yehonathan Sharvit

Clojure Wizard @ Cycognito

Yehonathan Sharvit's LinkedIn account Yehonathan Sharvit's twitter account

Awesome tech events for

Priority access to all content

Community Discord

Exclusive promotions and giveaways