Conf42 Enterprise Software 2021 - Online

Data-Oriented programming

Video size:

Abstract

Data-Oriented programming is a paradigm that aims at reducing the complexity of software systems and making the development experience more productive. Data-Oriented programming draws a clear separation between code and data and treats data as a value that is manipulated by general-purpose functions. In this talk, we illustrate the principles of Data-Oriented programming in the context of a Java production system.

After attending this talk, you will be able to apply Data-Oriented programming principles in Java, and reduce the complexity of the systems you build.

Summary

  • Yehonathan Sharvit talks about data oriented programming in Java. Programming is aimed at reducing system complexity by treating data as a first class citizen. When a system is written according to dataoriented programming principles, the system is easier to understand, to maintain, or to add features.
  • When we pass an object or an argument to a method, we have to ask ourselves whether the object is passed by reference or by value. This is one thing that makes the code hard to understand or to write. Another thing is in a multithreaded system, we need to be careful when we pass object references to method. This could be solved with annotations.
  • In the book you will learn how to leverage efficient immutable collections, or sometimes we call them efficient persistent collections. You can enjoy a 50% discount with the coupon that appears here on the screen. Enjoy the insight coming from data dataoriented programming and apply them with fun to your Java programs.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
You. Hello. My name is Yehonathan Sharvit and I'm really glad to be here at Conf 42 for my talk about data oriented programming in Java. The purpose of this talk is to give you a couple of insights that hopefully are going to help you to liberate yourself at least a bit from the complexity of of objects. A couple of words about myself. I have been a developer since 2001, first in C, C, then in Java, and also JavaScript, Ruby and closure, and I'm the author of a book named data oriented programming and in this talk I'm going to share a couple of insights from the book and how to apply the principles of data oriented programming in Java. If you find my talk interesting, you might want to purchase the book and I'll give you a coupon for a discount at the end of the talk. And you can follow me either on Twitter or on my blog at blog clips tech so what is dataoriented programming? Programming is a programming paradigm aimed at reducing the system complexity by treating data as a first class citizen. What do we mean by complexity? If you look for complexity in the dictionary or in Wikipedia, you will find first the definition of computational complexity, which is the amount of resources, machine resources like cpu or memory that are required to run a program. But there is another meaning of complexity, which is the system complexity. And the system complexity is the amount of human brain resources required to understand a system. So computational complexity is the time it takes to run a program, and system complexity is the time it takes to understand a program. And data. Data dataoriented programming the system complexity. In other words, when a system is written according to data oriented programming principles, the system is easier to understand, to maintain, or to add new features. So let's ask ourselves, ourselves, what usually makes a system complex? In my book, and in this talk, we are going to take a classic example. Imagine you need to design and implement a library management system. Disciplined object dataoriented developer so the first thing you do is to think about the design, the classes, the objects, and the relationships between the classes of your system. And you might come up with a design like the one on the screen where the entities, the main classes are library. And the library has a catalog and user management. And in the catalog we have books and authors and book as book items. And in the user side we have different kinds of users. We have librarians that can add books to the library, and we have members that can borrow books from the library, and members have book lendings, and book lendings belong to book items. And you will probably come up with a design similar to the one that is on the screen right now. If you are an experienced Java developer, you are going probably to use a couple of smart design patterns that might make the design a bit simpler, smarter, whatever. But my point here is that the system here is complex in the sense that it's hard to understand. And if you take further look at this UML diagram, you might find that the sources of the complexity of the system is that we have nodes in the system in the graph with many edges. Look at the library and class, it is connected to 123456 classes. It's a big number, six in terms of relationships between nodes. Another thing that makes the system complex is that we have many kinds of arrows of relationships between classes. We have association like for example between book and author. We have composition between catalog and book, we have inheritance between librarian and user, and also between member and user. And we have usage relationship between, let's say librarian and book item. So it's a burden on our mind. And this is what I mean by a complex system. It takes time and energy and efforts to understand a classic object oriented system. So the first thing that dataoriented programming is to separate between code and data. Usually in object oriented programming and in Java, we tend to encapsulate data inside classes and to mix together data and behavior inside classes that provides methods that manipulate or modify the state of the object. And look at what happens if we simply split each class of our system into two classes, where one class is responsible for the code, the behavior, and the other class is responsible for the data. For example, we take the library class that mixes data and code together, and we split it to a library code class and a library data class. And the same we take the catalog class and we split it between catalog code and catalog data, and so forth and so forth. And what happens in terms of system complexity is that instead of one systems with many relationships between the code, we get two disjoint systems with much simpler relationships between the nodes or the classes of the system. And this is really a great benefits for our mind. It makes the system much easier to understand, to resign about, and to maintain. And the reason is that we have separation of concern. We have code on one hand and data on the other hand, and also we have constraints on the code diagram on the left. All the method in our classes on the left are going to be stateless, and we're going to see in a moment. And the relationship between code classes is only usage relationships. And the same on the data diagram we have another set of constraints, which is that the Orna relationships between data classes are either association or composition. So putting constraints on our diagram tend to make the overall system less complex, easier to understand. So instead of the big mess or the complex system that we add on the left where code is mixed with data, we get a simpler system made of two disjoint systems. And this is huge benefit for our brain. Let's see now practically how we do that in Java, how we separate between code and data in Java, actually it's quite simple. We put data on classes that have only members, of course, getters and setters, for example. And also data will have a first name and a last name. That's it. No methods beyond setters and getters. And for the code we have classes like author code with only static methods, no state, no data. The data that is to be processed by the method is passed as an explicit argument to the method. So for example, if we have a data object representing Isaac Asimov and we want to calculate the full name of this author, instead of what you are probably used to Asimov full name, we call author codeful name, which is a static method, and we pass to it as an argument. The object with the data that we want to process and it returns a stream is like Asimov. So that's how we separate between code and data. In Java we have data classes with member only and code classes with tactic methods only. So that's the first benefit that we gain from data oriented programming. It makes the system easier to understand. Now we are going to move forward and see how we can make the code easier to understand. And for that we are going to ask ourselves what usually makes code hard to understand? The first thing that makes code hard to understand in Java is that when we pass an object or an argument to a method, we have to ask ourselves whether the object is passed by reference or by value. And it's difficult to answer clearly to this question. And usually the answer that we get in Java tutorials is that in Java object references are passed by value, which is really confusing. Object references are passed by value. And to show you an example of this complexity, let's take again our example with Isaac Asimov as a data object and see what kind of complexity we have usually in object dataoriented programming. Let's say that we have a method in our author code class, a static method that transform the last name of an author into uppercase. So here how we call this method, we have Asimov, and the method returns another data object where Asimov is uppercase. So the last name of Asimov number two is Asimov uppercase. Now the question is what happened to the first Asimov? Did the method mutated the data object or not? And by looking at the code you cannot really know. It depends on the implementation of this static method to upper last name. If the implementation mutates the object, the object that it receives, the response, the last name of the first Asimov is going to be uppercase. And if it's not the case it's going to be lowercase as it was passed. And the reason for this confusion is that when we pass an object to a method, we pass a reference to the object and the method now has access to the object. And if the method called the setters of the object that we passed, then it's going to mutate our object. And the way we usually protect ourselves, or one way to protect ourselves is to copy the object before passing it to the method. We call it defensive copy. And this is one thing that makes the code hard to understand or to write. Every time we call a method we need to ask ourselves is the method going to change my data or not? And it's another cause of complexity. Another thing is in a multistreaded system, in a multithreaded Java program, we need to be careful when we pass object references to method. And let's take a look at this simple example. Let's say we have a member data and the member could be either blocked or not blocked. And when a member is blocked, the member shouldn't be allowed to borrow books anymore. So a naive implementation of the borrow function in the member code could be let's check if the member is blocked here by calling the is blocked method of the data object of the data member object. And if the member is not blocked then we are going to allow the member to borrow the book here to print to the console. The book is yours. Can you see why this code is problematic? Can you see why this code is not treats safe? And the reason is that between the line that checks if the member is blocked and the line that does the book borrowing, there could be a context switch and in another thread the member could become blocked. And that's definitely a source of complexity. And how do we protect from that? By adding locking mechanism. And when we add lock mechanisms to our code, it definitely makes the code hard to understand and we might get into deadlocks. And we need to think carefully how to leverage the lock mechanism so that we make sure we don't have any deadlock and also lock mechanisms also have a negative impact on the performance. So for that dataoriented programming, very simple solution, do not mutate data. If you treat the data as a value, it will never change. And if data is not going to change, we won't have any problem. When we pass data to a method, no matter if we are in a single threaded environment or in a multi threaded environment, we have the guarantee that the data is not going to change. And that's a huge benefit in terms of complexity, simplicity. It makes the code, the code much, much easier to understand. You can look around and you will find many great articles that explain what are the benefits of immutable data in Java. And the more important ones are that when you deal with immutable data, you are inherently threat safe and you have no side effects. Now the question is how to represent immutable data in Java. And we have at least two options here. And as you probably have noticed, any problem in Java could be solved with Java annotations. And this is how the project Lumbook proposes to represent immutable data, simply by adding a value annotation to a class. And when we add value annotation to a class, what we get from Project Lamb book is auto generation of public constructor, immutable private fields, getters, setters, two string hash code and equal. And we are guaranteed that the member fields are not going to change because they are marked as immutable by the code that is auto generated. Another option that came up recently in Java and actually is available only in Java 14. So it might take a couple of months or years until it's adopted in production. But I think that's an interesting one, is that since Java 14 you have data classes or data record with native implementation in the JVM. And I think that's great because you don't need to rely on third party libraries and auto generation of code. You have a native implementation of, again, constructor, immutable private field, getter setters, two string hash code and equal, and the guarantee that the data cannot change. And if you apply this second principles from data oriented programming about dealing with immutable data only, the benefits that you gain is that no mutations, no surprises, no need to defense copy against possible mutation or possible and invalid state of your data. And the code is inherently threat safe, no race conditions, you don't need locks, any lock mechanism, and the code is definitely easier to understand, to maintain, to resell about, and it makes our systems simpler. So before we wrap up this presentation, let me mention other topics that I'm addressing in the book, and that makes the systems that we build in Java even simpler. In the book you will learn how to leverage efficient immutable collections, or sometimes we call them efficient persistent collections, so that even when you have a huge collection of data, you can create new version of it without having to deeply copy all the data before you create a new version of it with a slight modification, you will learn how to represent more and more data using maps, and it will teach you how to manipulate data with general purpose functions like map, filter, reduce, group by merge, et cetera, et cetera. You will learn how to achieve polymorphism without inheritance, without the big class hierarchy. There are other ways to achieve polymorphism. You will learn also in the book how to manage the application state when you represent the whole state of the system as immutable data, and to get highly scalable concurrent systems with optimistic locking instead of locking mechanism like mutexes. You will learn also how to get a flexible access to your database to give you a lot of freedom and flexibility when retrieving and manipulating data that you fetch from the database and that you want to send over the wire, let's say using JSON serialization. So that's the book. And let me leave you with this diagram with this mind map that summarizes the main principles of data dataoriented programming separate between code and data. The code is written with static methods only. It's on the green here. Never or avoid as most as you can instant methods and the data is represented with immutable data, either with records that are available in Java since Java 14, or with third party libraries like Project Lombok that provides smart Java annotation like valued that generates all the code that is necessary to make sure that your data classes are immutable. So I hope that I motivated you to take a deeper look at data oriented programming and how you can apply it in Java to make and I'm quite sure that it will make your system less complex. And now come the question, what are you going to do with all the free brain cells that are going to be available when you move from classic object dataoriented programming dataoriented programming please take a look at the book. You can scan the QR code to be redirected to the book@manning.com and you can enjoy a 50% discount with the coupon that appears here on the screen. If you are listening from a podcast, the coupon is ML Sharvit two, Mlsharvit two, and if you Google data in the programming. You will get a link to the web page of the book. It was a really pleasure to be here at Conf 42. Thank you for having me. Enjoy the insight coming from data oriented programming and apply them with fun to your Java programs.
...

Yehonathan Sharvit

Clojure Wizard @ Cycognito

Yehonathan Sharvit's LinkedIn account Yehonathan Sharvit's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways