Conf42 Chaos Engineering 2024 - Online

Unix shell - We can do better now

Video size:


Unix shell is powerful, but it’s stuck in the telegraph style communication paradigm. We can do better now. Actually, we could have done better since the 70s. This talk is about how we got here, what’s wrong, and how to fix it.


  • Ilya Sher: What's wrong with the shell? Well, shell is basically two things. That's a programming language and a user interface. The problem is syntax. Error handling is afterthought, not very good, and lack of structured data. And that's what I'm focusing on today.
  • Semantic recording and not literal recording. If tomorrow another pipeline fails, you will be looking at the failed pipeline. And to do all of that semantic understanding and semantic work with objects. We are looking at pretty much symmetrical feature, which will be roughly the same amount of work.
  • The more your program understands the data that it works with, the more powerful this program can be. All the power comes from semantic understanding of the data. Second big thing that should be in a shell, in the UI of the shell is capturing.


This transcript was autogenerated. To make changes, submit a PR.
Hi, let's talk about Unix shell. More specifically about what we could do better now, and even more specifically about what we could do better since the we didn't. I'm Ilya Sher. I'm a longtime bash user and I'm programming and I'm doing DevOps. But at 2013 I was fed up enough with this subpar user experience when using Bash. So I started working on my own shell. What's wrong with the shell? That would be the first question. CTO a person that started working on his own shell. Well, shell is basically two things. That's a programming language and a user interface. Both of them are not very good, I think. And the difference is that the first problem, that the programming language is not very good. I think this problem is understood. And I'm saying that judging based on other projects, they are actively working on fixing the programming language. There are several projects, they all agree with this problem of the programming language. They are all working on solving that. So that's why I assume this problem is widely understood. Basically the problem is syntax. Arcane syntax comes from too long ago. Error handling is afterthought, not very good, and lack of structured data, of course. So the big other issue is the user interface. And that's what I'm focusing on today. This talk is about user interface. I just mentioned the programming language to leave whats aside. And the user interface is basically the same as a telegraph. That means that in this paradigm you send text and you receive text. That's how you communication with the other end. The fact that the communication style of the shell and the telegraph, they are exactly the same is not a coincidence. It's a historical development. So let's overview how we got from telegraph to the shell today. Okay, so we had telegraph. Then somebody figured out, okay, that's not convenient. Let's replace this button with something more practical. They did keyboard and printer. So the device, which is called teleprinter, is basically a keyboard and a printer. To communication, you need to have two of these devices. They are cross connected, which means whatever you type on your end is coming out of the printer of the remote end, and whatever they type on their keyboard comes out on our end out of the printer. Then computers came and they were using punched cards. It was not very convenient. Somebody figured out, okay, we have teleprinter. Let's connect the teleprinter to the computer. And they did. And it worked. Then another incremental improvement video display unit. It looks like computer terminal, which we will see in a moment in the next slide. But it was exact replacement of paper. So if you had new text, it was added at the bottom. And all the other lines of the text were scrolled up a bit. And I could like to highlight whats all of these devices had. No conceptual breakthrough. They were better technologies, of course, but these were incremental improvements. Nobody said, okay, hold a moment, let's rethink the whole thing. This has got happened. When did we have technological breakthrough with these guys? This guy, VT 52, which was released in 74 or maybe 75, unclear, supported cursor movement. That means that you can go with the cursor to any location on the screen and overwrite the text that's there or clear it, which is more specific use case. And the reaction to that was as follows. Billjoy invented a text editor which was using this capability and basically brought the text editing to computers as we know it today. Which means the text is occupying the whole screen. And you got with your cursor to the point that you want to edit, and you edit the text there and it's replaced at that point. That is as opposed to previous text editors, which like the shell today had a command line interface. And you were typing comments such as add text, replace text, delete text. These are all comments that you were typing. And you could not edit the text at any point on the screen. You just had these comments how Unix shell reacted to this new capability? It didn't pretty much until this day. So we have the situation in the shell until this day that most of the screen is not actually interactive. It's treated like paper. So the text which is on the screen above the command line is not anything to the shell. The shell doesn't know about that. Shell cannot interact with that. And the only interactions that you have in the interactive shell is actually on one line. Sometimes you have completion. So it's like few lines, but basically it's one line. And I could like to fix that. I think that the screen can be interactive and we should catch up with this capability from the 75 and make this wall part interactive. How that would look like? Well, the screen will have textual representations of objects, somewhat like links on the web. The shell would trace the link between the text on the screen and the objects. And the objects will have description like, okay, we are of type that our unique id is that to display on the screen, we need to look like that. So we have this example. We have a file on the screen and a CI CD pipeline. In our case, AWS code pipeline. I'm not affiliated how the interaction would look like, let's say we want to interact with the code pipeline, since everything is semantic. When you start interaction with a search object on the screen, the shell can ask all the plugins that it has. Which one of you guys is handling objects of type code pipeline, by the way, it can be more than one. So when we create a menu for that object, the items in the menu come from different plugins or maybe one plugin. Also, these plugins can maybe provide the default action. So what would happen if you left click on the object, or if you navigate with the cursor and press enter? So this interaction that we have seen on the previous slide, it should be recorded because the, the problem with the web interface, which this interface moves into direction of the problem with the web interface is that you don't have a record of what you did, and that's very bad. No one wants to accept that for serious work. So if you did interact with something on the screen, this interaction should be recorded, not only recorded, but also immediately displayed to the user. And this recording should be on the highest semantic possible label. What do I mean by that? If you had several pipelines listed on that and we started to interact with the one that failed, the user interaction will be recorded as you are interacting now with a pipeline that has a status failed. Why is that important? Because next time, let's say tomorrow, you come to see these pipelines and another pipeline failed, and the flow that you were recording was actually looking at the failed one. So if tomorrow another pipeline fails, you will be looking at the failed pipeline. When you replay, you will not be looking at the same exact pipeline as you were looking today. Another example of why semantic recording and not literal recording, let's say you have instance with id one, two, three, and you are interacting with that instance. This id one CTO three is meaningless completely to the user. You're interacting with that instance, not because it whats particular id, but because it has some interesting property. For example, it has a name tag of something or some other tag with some particular value, or maybe it is residing in the VPC of an interest, or maybe it has a security group or some other different things, or some combination of this. And you need to record this interaction semantically. So tomorrow, when you have some slightly different situation, you will not have instance one, two, three, because it will be long gone. You will have some other id for that instance, and you want to interact with that instance, not the one that has id one, two, three. And to do all of that semantic understanding and semantic work with objects. What do we need to do? We need to understand the output and the typical concern or let's say objection or argument against that is, first of all, whats shell is got supposed to get into semantics and it's too much work. So I want to refute these two arguments immediately by looking at what we already have. Okay, let's look at exit code of a process. The shell has to understand that in order to do even the basic error handling. And the shell was understanding exit codes for a long, long time. At some point later in time, somebody added command line completion. This feature is very valued and it's very powerful and it's very practical and everybody uses it. And guess what? This needs semantic understanding of the programs that we are running. And it was quite a bit of work because we needed a kind of plugin for each of these programs. And it's done. We are looking at pretty much symmetrical feature, which will be roughly the same amount of work, or at least on the same order of magnitude of work. So that's why I think it's possible and it should be done. I would like to summarize what's important in the UI. What should be in the UI, first of all, is semantic understanding, and the more your program understands the data that it works with, the more powerful this program can be. If we compare, for example notepad in Windows and the jetbrains IDE, not affiliated jnbrains IDE, you can do way more with jetbrains IDE. You can edit programs in both of them, right? But the IDE understands way more of the semantics of the text that we are working with. Also, if you take for example middle ground, like Vi, it's not a complete id. Well, it could be, but if configured properly. But let's say it's not a complete id. It has for example syntax highlighting, right? So it understands somewhat, right? And we have language plugins, they understand more. So all the power comes from semantic understanding of the data that you are working with. Also, semantics, well, we have exit codes, we have command line arguments. I think it's just like logical continuation to get more semantics into the shell and understand the output. Second big thing that should be in a shell, in the UI of the shell is capturing. You have to capture the interactions and you have to capture as much as you can, and you have to capture at the highest possible level of semantic understanding of the interaction. That's how your record replay facility could be powerful and applicable to other situations. That's it. Thank you, bye.

Ilya Sher

CTO @ Beame

Ilya Sher's LinkedIn account Ilya Sher's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways