Transaction Management and Repository Pattern

Video size:

Abstract

Most apps work with DB. Repository pattern hides complexity working with DB, but most implementations aren’t GoLang and show simple examples. I want to demonstrate a complex sample with transactions, nested use cases with multi-table entities, and share used approaches with pros and cons.

Summary

Goa is a young programming language and there are no ready libraries to build an enterprise application. The repository has a simple interface and hides saving, getting and mapping data from the database that help us to concentrate on business domain instead of database.
We should explicitly pass the transaction to the repository and publish method. Passing the transaction in the repository complexity the code with additional knowledge. Third is supporting nested use cases. With great business logics comes great legacy.
That gives us the ability to use context to store a transaction inside the context. However, we have a problem with closure, which is difficult to test. We need to rewrite use cases to add a new closure or replace the current one. Let's try to use generic decorator.
The solution works only on the Go 1.13 version which was released in the 2019. The drawbacks include passing context everywhere and not being able to do long business transactions. The first benchmark shows that the difference with and without the solution is about 3.4%.
Martin Fowler defines a pattern as maintaining a list of objects affected by a business transaction. The unit of fork pattern gives batch changes in the database, which can be significantly faster than sending each change individually. The drawback is complexity. Fortunately, we can simplify it to the interface.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello gophers and all those who are interested in Goa. Today I am going to talk about how to concentrate on business logic by hiding transaction management and using a repository and unit of work pattern. Small piece of information. You can find the presentation using QR code at the bottom left of the slide. All QR code in the slides are links. Let's continue. I work at Avita, one of the most popular classifieds in the world. Our main language is Golang. We use it for more than a thousand microservices. We use Golang almost everywhere in cloud and network service, command line interface DevOps and web development and we can handle about 300 million visits per month. That's cool, but we write the application with business logic, not a benchmark. My team and I work on smartphone reselling. That complex domain includes more than 20 states where phone cases from a seller to a buyer through verification, repair, adding of warranty and delivery. Nevertheless, I will use another domain as an example on the slide, the all known domain of the shop. Unfortunately, we cannot directly use approach from other languages in Go because of the error handling. The small disclaimer here is the most of the year handling in the slide is hidden instead of full error processing. I will hide if with checking from the slide then Go is not conventionally object oriented language. Go has unique insign into interfaces, no inheritance, no protected access modification and others. Also Go is young programming language and there are no ready libraries to build an enterprise application. Despite these reasons and the difference between Go and other languages, we can use some concepts from them. Let's see the repository pattern in Golem. The repository has a simple interface and hides saving, getting and mapping data from the database that help us to concentrate on business domain instead of database. We don't add validation of usernames, product count in an order or other business rules because the implementation of the concrete repository is not a part of the domain. The repository should work with only one model. If we add several models to the repository, we create a super repository which is difficult to modify, extend and testing. The sequence diagram for any repository looks like that. When we call the repositories method, we convert data from domain view to database or vice versa. Then we receive or save data in the database. However, the repository can use data mapper pattern if we have a complex model and the repository will work with a data mapper instead of direct calling the database. The description is simple, but what is inside? We little modify the repository interface to our model and that's still simple. Then suppose we have a user with username password and data data and each user have profile information about avatar first and last names, et cetera and Zed's knowledge belongs to a user model. However, we store this data in database as two different tables. The repository includes machinery to hide the difference and help us think more about the domain, not the database. On the left side of the slide you can see domain view of the model and on the right side you can see database view of the model and you see that's different and repository help us to hide it. Let's see an example. This is getting model I use SQlix to review the code of data mapping from a database. When the app receives data after receiving, we have a database view in the user row and profile row. Then we convert data in a domain view as a single model user with a nested profile model. In the save method, we convert the data view in the database and then save data. I use an insert and update in one query to place it in one slide. Then we use two tables and we need to add transactions to prevent corrupting data if a problem happens in gap between two updates. Comments leads Disclaimer I simplify models to a few fields in the next slides to feed code on a slide now we have a user repository. Next our business tell us to register a user. We create the message register, add validation method and save a model with the registry. Then the product tell us that we must notify the new user about registration. For that purpose. We publish a message in queue about that. However, we can catch an issue when the queue goes down. We save data in the database and load the message. This situation is unacceptable for us. Let's cover saving data and publishing with a transaction to prevent it. These are just two lines. The first one is to begin the transaction and other one to commit. But we should explicitly pass the transaction to the repository and publish method. Passing the transaction in the repository complexity the code with additional knowledge. They say knowledge is power. However in this case additional knowledge is not about power, it's just complication for complication sake. And we change our repository interface by adding knowledge about the database transaction. That makes our repository nonel. But we get atomic registration. It's not a big deal. We now have a registration but the app can just create a user. Businesses want to sell goods and we already matter. And easy making new repository with getting and saving methods. Then we create a new use case with validation, saving and message publication in a transaction. And now we see our elegant solution with transaction registration and buying while the developers are daydreaming, being perfectly contempt with the result of their work, business comes to them and demands a new scenario to increase the conversion of new users into our customers. Throw purchase without completed registration. That is technique when users can buy things on the site without authorization and by just typing email or phone number only. Let's see on the hides. An attentive listener may have already noticed the issue in the code. The solution is the same as for the two previous scenarios. We add control, transaction control and passing the transaction in the use cases in. Additionally, we complicate transaction control inside the use cases. The two lines are changed to if else to decide whether the flow is on the top level or it was called something else and it's not good. But as we know, with great business logics comes great legacy. Let me briefly recap what we have here. First, we have nested transaction use cases and that's cool. However, the transaction spread everywhere. That forces us to duplicate the routine code to control them. Duplication adds an increased chance of making mistakes. Also, if we want to change the database, we must change the app in many places. Now I won't rephrase our issues to our wishes. The first one is an EDL repository for simple work with the database. Second, the use cases are transactional and nested. I want to save different models in the single scenario without thinking about how to open close the transaction and rollback it on error. The third is supporting nested use cases. Now I won't rephrase our issues in our wishes. The first one is an EDEL repository for simple work with the database. Second, the use cases arent transactional and nested. I want to save different models in a single scenario without thinking about how to open close the transaction and roll back it on error. Also, I want to hide transaction control and finally I want to replace the database easily. We already have some wishes done as nested transactional use cases, but we need to get the others. Let's try to hide transaction control. The simple solution is using closure, but where should we place it? We can add closure in the repository. We call our scenario in the transaction, get the user and the order as a result and then save them. However, I hoped that you will be skeptical about this solution. First, we still should pass transactions in the repository, in this case in the order repository. Another problem is that for each new scenario we need to code a new closure with new range of models. And the last problem is that over time our scenario will became more complex and the user repository would be smarty and it would know about almost everywhere a model in the app. As a result, we would need much more time to extend and test our apps functionally. However, we can move the closer to a separate function and see the result at first sight with transaction is just a copy code from the use case. Take a look at transaction control pros in the closer and register scenario. Let's simplify our scenario. With closure we replace it lines onto and decrease the chance of create a bug. The code is entirely placed on the slide and looks better than it was and I hope you agree with me. However, our code is still bound with the SqLix Tx transaction. As a result, we can only change the database by rewriting all use cases and somebody can make mistakes by handling control of a transaction. Next, I want to hide passing the transaction in the repository. To do that, we can use the factory method in the repository which enrichs our repository with a transaction, the factory method getting the transaction and save it in our repository and then we can use the saved transaction in our methods. The code in the getting and save methods will change slightly. We replace one if condition on the single line. Nevertheless, I am eager to see how our use case would look after updates. We call this transaction repository method and then the use case is not changed. Is that better than it was? Unfortunately no. We add time dependency colon. We must call repository methods only after factory method and I don't think it's cool. Could we use reflection in this case to remove the time dependency. Let's use a function with repositories as argument in this transaction. We pass it in our closer, then the reflection retrieves a list of repositories and calls the method with transaction to enrich them. As a result, we remove an argument with a transaction in repository method. However, reflection is not a golden way and we still should pass the transaction in the use cases and the queue. Also, explicit passing spreads knowledge about the database through the application and where we can to store a transaction. Let's see how other languages solve this issue. In Python, the passing depends on a library. In SQL alchemy, we cases the transaction explicitly as a function argument and we already have it. Solution in Django, transactions are stored in a global variable because Django processes one request in one thread. PHP uses the same approaches because it does not have multi threads processing at all. Unfortunately, it's not our solution at all and fortunately go can work with multiple threads by Go routines. More enterprise languages such as Java or C sharp use thread local storage that is similar to the global variable but limited by a single thread. But where to store a transaction in go passing as an arent is not our solution. We want to hides a transaction to provide high coupling use cases with a database. We have Gorotine and God doesn't support built in Gorotin local storage. Could we create a similar solution by Gorotine Id? And the answer is probably no. Golang does not allow us to get a gorotine id directly. Go experts don't recommend using Gorotine because it contradicts Golang Way and they want to prevent building applications that associated all consumption with a single gorotine. And now we should use hacks to get an id. Also, new language updates can break the hacks and no one can guarantee the stability of the solution. However, some libraries implementations Gortin local storage but all of them arent built on the hacks and most arent not my intended. Fortunately we are not in 2016 and we have the context package that can be used to store the transaction. However, there are opinions that storing something more than primitive types in a context is a bad idea. That means should we reinvent the wheel? Fortunately not some experts articles there is an exception for the specific values to be scoped to the request and destroyed after request. That gives us the ability to use context to store a transaction inside the context. Excellent. We have a place for our transaction. Let's look at our closer. That's not so bad. We check a transaction in the context. If not, we create and put a new transaction in context. In the use cases, we replace the transaction with a new context. Then we add getting a transaction from the context in the repository to work with the database. Also, we create a TR interface to replace direct database connection on a transaction and vice versa. Now we have a simple repository interface without the explicit transaction argument and nested transactional use cases, we can forgot about transaction control by using transaction closure in. Additionally, we concentrate on working with database in closure and repository and can replace that database without changing use cases. However, we have a problem with closure, which is difficult to test because we cannot create a mock or stop for a function and a global variable works with a database. Also, we need to rewrite use cases to add a new closure or replace the current one. Let's fix it. We must convert with transaction closure in a structure with an interface. The structure allows us to create a mock and path that database as an arent instead of global variable. Let's name the interface manage to do function calls, do with settings with default settings and do with settings. Control a transaction as with transaction with additional features such as nested transaction readonly transaction timeout, et cetera. Then we introduce a generic general interface transaction which can commit rollback, show the transaction status and return an actual transaction. Also we have the interface to create nested transactions if a database supports them. Also we adding the settings interface store standard configurations for different databases. We implementations new settings for each database transaction because different database have their configurations and abilities. We cases settings in specific interface to configuration the transaction in a factory. It's time to see what we have in the use case. Manager replaces this transaction and visually everything changes. However, we can mock manager or our staff for testing in Java. We can hides the transaction from the use case by annotation or XML configuration. We can do the same in go by reflection code generation. Some tricks with generic or just decorator a simple decorator on the slide, but we should create a new decorator for each use case. It's not hard but it's so boring. Let's try to use generic decorator. To get generic decorator, we should use trick. The trick consists of using the structure for arguments of the use case and naming the method identical in all use cases. In our case we use the name handle. Then we create an interface which match our use cases. The structure gives us the ability to create the generic interface which can work with any use cases. After that we implement a decorator by using the interface and generic. The decorator is not idiot generic, but we can easily implement it and remove routine code. Writing the using of the decorator is simple and presented on the slide. Then we can simplify getting a transaction from the context in their repository. Let's create an interface to extract the transaction. The default method returns a transaction with the default context k and thereby k returns a custom k. It is necessary when the repository process two transactions simultaneously, but be careful with that. Also, when we create context manager, we set the default k to have the ability to change the k without changing code in the repository. However, we should cast transaction in the transaction structure of the database to work with it and casting is not safe. We can create an interface for each database to skip casting in the repository. In the first iteration, the saving methods look like that. Let's use SQL context manager in the repository. This simplifies the code a bit and reduces the chance of error. Let me briefly recap what we have. We kept the transaction interface simple by hiding the transaction in the context. We get transaction operation nested use cases and hiding transaction by the transaction manager. And also we can migrate to another database without changing use cases. And finally, the solution does not create a problem with testing, but what did it cost? Fortunately not everything. The solution works only on the Go 1.13 version which was released in the 2019. It was not so long ago. Nevertheless, Go is updated by a minor version and I hope you have already updated or it is not problem for you. Also there are a few ready transaction adaptations such as SQL, SQL leaks, Gorm, Mongo and Redis, but you can write new adaptations by 70 code offlines. The next disadvantage is losing performance. The first benchmark shows that the difference with and without the solution is about 3.4%. The result was impressive but the reason was SQL mock the library for mocking SQL requests. SQL mock consumes a lot of resources. Therefore I decided to rewrite the benchmark on SQL Lite in memory. The result was more natural and the difference is about 18%. However, most applications which I have seen use database that store data on disk or are dedicated on another server. For that reason I wrote benchmark with MySQL on the same server. The file system has added overhead and the solution takes the same amount of the time as code. Without the transaction manager, the network would consume significantly more time than the file systems which mean the overhead of the solution would be minor. The other disadvantage is that we should pass context everywhere. I hope you already use context in applications to store request id or other data for login or to cancel a request if a user closes a connection. Therefore passing context is acceptable for you. The last and most considerable drawback is that we cannot do a long business transaction because the transaction in the database takes a connection and limited database resources. That can happen when we call external services thoroughly. The simple solution is to request all data before the transaction, but we lose the ability to simple insert a use case into another. And another solution is the unit of work pattern which I will discuss later. Now we can see and repeat the drawbacks. The first is a limitation of the Golang version when the solution can work. The next one is that there are only five adapters but there are more than five database drivers and orms in go. However, only 70 new hides can solve it. Third one, the solution consumes about 17% more than without it or five microseconds. Next one is passing context everywhere. And finally, the solution does not support long business transactions, but the unit of work pattern can solve it. Let's take a look at what was and what is now in the code. We remove the storing connection of the database with the repository interface by hiding a transaction in context in a repository, we replace checking an existing transaction with context manager code is shorter than it was and now it's harder for us to make mistakes. Then we can skip a massive block of database infrastructure to focus on business actions in use case. The chart on the slide shows the dependence of the number of additional lines on the number of nested scenarios. The red line presents data without the solution. The green lines is for the solution. The cross point is 1.8 when road code loses in comparison to the solution regarding the amount of additional code in a use case. In addition, not only the count of code is growing when adding nesting use cases, the chance of making bugs and cognitive load on developers is increasing too. Nevertheless, I give you a tool and you decide whether to use it or not. The link to the library is in the last slide. Now I want to go back to the long transaction drawback. I mentioned that the unit of fork could solve it. The interface of the pattern for Golang on the slide. Let's describe each method. Register new marks a model as new, register dirty marks a model as dirty or updated. Register clean marks a model as clean of getting from database without changes and register delete marks model as deleted. Then comet saves data atomically in a database in one transaction and finally roll back reset state to initial or previous success commit okay, we know what each method does, but how and when can we use it? Let's see the definition to catch it. Martin Fowler defines a pattern as maintaining a list of objects affected by a business transaction and coordinating the writing out of changes and the resolution of concurrency problems. The first fun fact about the definition I mentioned concurrency problems which are situated with the database, but they are resolved by pessimistic or optimistic lock and it's not a part of the unit of work. The second one, in the original book pattern of enterprise application architecture, where the pattern was written, there is no rollback method. Let's understand what does pattern give us first, that gives batch changes in the database, which can be significantly faster than sending each change individually. We can optimize insert by insert into values with multiple rows and other commands by removing network overhead on each command sending. Also, we can only update the changed data even if we change the model several times. The second, a business transaction can be long and not depend on the database transaction. And the last one, our update is atomic. What are disadvantages? First, we can't use a pessimistic log. Sorry, we can, but we lose the ability of a long business transaction because the optimistic log uses database transactions. Second is complexity. At the first view, the pattern interface is simple, but the implementation is not let's present the UML class diagram of the pattern. The first block is our pattern. The next block is the identity map pattern which stores our models after they are registered in the sum state, new, clean, dirty or deleted. Further, we have classes to work with. Database transaction I prefer to replace it with an interface to unchain from the database. The interface help us to use the transaction management which we wrote. The next part of the pattern is mapper register which returns a suitable data mapper for a model. And data mapper is a pattern which maps data from models to database view and vice versa. Some implementations of data Maple in other languages use reflection to work with any model and they support configuration by YAML, XML annotations and ETCA. Additionally, if you want to implement the pattern, you should remember a problem with Avito increment identifiers. The application can get the identifiers only after the saving data in the database. You can use generic identifiers on the app site as a simple solution, we finish with depicting the UML cases diagram. Next, let's see the example. Order and product are our models and the pattern coordinates the saving of their changes. Order function is our business transaction. We check if the user exists in the external service. Then we create a new order and mark it as new. After that we get and marks product as clean. Then we write off the product from our warehouse and mark it as dirty or updated. And finally we commit changes to save it in the database. On the slide you see the sequence diagram. For our use case we get mark, change and save data. However, the most interesting part is hidden in the commit call because all magic happens there. When we call the commit we create a database transaction. Then we get the data mapper for a model. In our use case it is the order. After that the code execute batch changes in database. Finally we commit the transaction. I hope I have explained the unit of fork pattern and shown its complexity. Fortunately, we can simplify it to the interface. On the slide we remove identity map mapper, register and data mapper patterns. The implementation still has a long business transaction and atomic update, but this is bad updates because we work with a callback function and in. Additionally, the model will be updated multiple times instead of once. That means if we add three commas updating the same model, we send three queries in a database. However, the original implementation optimized that. Nevertheless, the implementations even fits in two slides. We use callback function as comments, register methods save comments in list. The comment is a part of the data mapper which we simplify when our business transaction is finished, we call all comments in the comment method. Also, we can replace comments on queries or operations of some databases, so we return batch updates. Then we add the interface DB runner to send queries to a database as batch data. However, the model will still be updated multiple times instead of just once and we have a problem with after increment entities. The next step is a complete unit of work pattern with blackjack, an identity map, a fully implemented data mapper and a mapper registry. Unfortunately, there is no ready library where we need to add only the data mapper for our mobile. However, the library with the name works tries to solve that problem for SQL databases. Also you can help the go community and implement the universal unit of work pattern. In conclusion, I want to remember what we have passed. The first is a repository. It is used if your application grows and data in the application standing differs from the database view. Then we use the transaction manager to have nested transaction use cases and hides knowledge about transaction in them. And finally, if you want a long business transaction, the unit of war is your solution. Thank you for your attention. The source code of examples and libraries access by second link and please press the like button if the presentation was helpful for you or write comments if I missed something and simple writing of business logic. Good luck.

Slides

Download slides (PDF)

See all 17 talks at this event!

Conf42 Golang 2023 - Online

April 20 2023

Transaction Management and Repository Pattern

Video size:

Abstract

Summary

Transcript

Slides

Ilia Sergunin

Senior Software Engineer @ Avito

Join the community!

Featured event

2025

2024

Info

Conf42 Golang 2023 - Online

April 20 2023

Transaction Management and Repository Pattern

Video size:

Abstract

Summary

Transcript

Slides

Ilia Sergunin

Senior Software Engineer @ Avito

Join the community!