Computer

Sorting in Emacs - Susam Pal

steloflute 2023. 8. 13. 23:13

Sorting in Emacs - Susam Pal

 

Sorting in Emacs - Susam Pal

First create a buffer with the content provided below. Note that the text below contains three form feed characters. In Emacs, they are displayed as ^L. Many web browsers generally do not display them. The ^L symbols that we see in the text below have been

susam.net

 

Sorting in Emacs

By Susam Pal on 09 Aug 2023

In this article, we will perform a series of hands-on experiments that demonstrate the various Emacs commands that can be used to sort text in different ways. There is sufficient documentation available for these commands in the Emacs and Elisp manuals. In this article, however, we will take a look at some concrete examples to illustrate how they work.

Sorting Lines

Our first set of experiments demonstrates different ways to sort lines. Follow the steps below to perform these experiments.

  1. First create a buffer that has the following text:Let us pretend that each line is a record that represents some details about different persons. From left to right, we have each person's name, some sort of numerical ID, their current location, and their upcoming travel plan. For example, the first line says that Carol from London is planning to travel from London Heathrow (LHR) to San Francisco (SFO).
  2. Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR Bob 100 London LCY->CDG Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND
  3. Type C-x h to mark the whole buffer and type M-x sort-lines RET to sort lines alphabetically. The buffer looks like this now:
  4. Alice 10 Paris CDG->LHR Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR
  5. Type C-x h followed by C-u M-x sort-lines RET to reverse sort lines alphabetically. The key sequence C-u specifies a prefix argument that indicates that a reverse sort must be performed. The buffer looks like this now:
  6. Dan 20 Tokyo HND->LHR Carol 200 London LHR->SFO Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Alice 10 Paris CDG->LHR
  7. Type C-x h followed by M-x sort-fields RET to sort the lines by the first field only. Fields are separated by whitespace. Note that the result now is slightly different from the result of M-x sort-lines RET presented in point 2 earlier. Here Bob from Paris comes before Bob from London because the sorting was performed by the first field only. The sorting algorithm ignored the rest of each line. However in point 2 earlier, Bob from London came before Bob from Paris because the sorting was performed by entire lines.
  8. Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR
  9. Type C-x h followed by M-2 M-x sort-fields RET to sort the lines alphabetically by the second field. The key sequence M-2 here specifies a numeric argument that identifies the field we want to sort by. Note that 100 comes before 20 because we performed an alphabetical sort, not numerical sort. The result looks like this:
  10. Alice 10 Paris CDG->LHR Bob 100 London LCY->CDG Dan 20 Tokyo HND->LHR Carol 200 London LHR->SFO Bob 30 Paris ORY->HND
  11. Type C-x h followed by M-2 M-x sort-numeric-fields RET to sort the lines numerically by the second field. The result looks like this:
  12. Alice 10 Paris CDG->LHR Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Carol 200 London LHR->SFO
  13. Type C-x h followed by M-3 M-x sort-fields RET to sort the lines alphabetically by the third field containing city names. The result looks like this:Note that we cannot supply the prefix argument C-u to this command to perform a reverse sort by a specific field because the prefix argument here is used to identify the field we need to sort by. If we do specify the prefix argument C-u, it would be treated as the numeric argument 4 which would sort the lines by the fourth field. However, there is a little trick to reverse sort lines by a specific field. The next point shows this.
  14. Bob 100 London LCY->CDG Carol 200 London LHR->SFO Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR
  15. Type C-x h followed by M-x reverse-region RET. This reverses the order of lines in the region. Combined with the previous command, this effectively reverse sorts the lines by city names. The result looks like this:
  16. Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO Bob 100 London LCY->CDG
  17. Type C-x h followed by M-- M-2 M-x sort-fields RET to sort the lines alphabetically by the second field from the right (third from the left). Note that the first two key combinations are meta+- and meta+2. They specify the negative argument -2 to sort the lines by the second field from the right. The result looks like this:
  18. Carol 200 London LHR->SFO Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Dan 20 Tokyo HND->LHR
  19. Type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.Finally type M-x sort-columns RET to sort the columns bounded by the column positions of mark and point (i.e., the last two columns). The result looks like this:
  20. Bob 100 London LCY->CDG Carol 200 London LHR->SFO Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR
  21. Then type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.
  22. Like before, type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.Now type C-u M-x sort-columns RET to reverse sort the last two columns.
  23. Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO Bob 100 London LCY->CDG
  24. Again, like before, type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.
  25. Warning: This step shows how not to use the sort-regexp-fields command. In most cases you probably do not want to do this. The next point shows a typical usage of this command that is correct in most cases.
    Dan    20   Tokyo   LCY->CDG
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   HND->LHR
    Carol  200  London  CDG->LHR
    Bob    100  London  LHR->SFO
    
    Observe how all our travel records are messed up in this result. Now Dan from Tokyo is travelling from LCY to CDG instead of travelling from HND to LHR. Compare the results in this point with that of the previous point. This command has sorted the destination fields fine and it has maintained the association between the source airport and destination airport fine too. But the association between the other fields (first three columns) and the last field (source and destination airports) is broken. This happened because the regular expression matches only the last column and we sorted by only the destination field of the last column, so the association of the fields in the last column is kept intact but the rest of the association is broken. Only the part of each line that is matched by the regular expression moves around while the sorting is performed; everything else remains unchanged. This behaviour may be useful in some limited situations but in most cases, we want to keep the association between all the fields intact. The next point shows how to do this.
  26. Now type C-/ (or C-x u) to undo this change and revert the buffer to the previous good state. After doing this, the buffer should look like the result presented in the previous point.
  27. Type C-x h followed by M-x sort-regexp-fields RET [A-Z]*->\(.*\) RET \1 RET to sort by the destination airport. This command first matches the destination aiport in each line in a regular expression capturing group (\(.*\)). Then we ask this command to sort the lines by the field matched by this capturing group (\1). The result looks like this:
  28. Assuming the state of the buffer is same as that of the result in point 11, we will now see how to alter the previous step such that when we sort the lines by the destination field, entire lines move along with the destination fields. The trick is to ensure that the regular expression matches entire lines. To do so, we make a minor change in the regular expression. Type C-x h followed by M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET.Now the lines are sorted by the destination field and Dan from Tokyo is travelling from HND to LHR.
  29. Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO
  30. Type C-x h followed by M-- M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET to reverse sort the lines by the destination airport. Note that the first key combination is meta+- here. This key combination specifies a negative argument that results in a reverse sort. The result looks like this:
  31. Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG
  32. Finally, note that we can always invoke shell commands on a region and replace the region with the output of the shell command. To see this in action, first prepare the buffer by typing M-< followed by C-k C-k C-y C-y to duplicate the first line of the buffer.
    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    
    This particular problem of removing duplicates while sorting can be also be accomplished by typing C-x h followed by M-x sort-lines RET and then C-x h followed by M-x delete-duplicate-lines. Nevertheless, it is useful to know that we can execute arbitrary shell commands on a region.
  33. Then type C-x h followed by C-u M-| sort -u to sort the lines but remove duplicate lines during the sort operation. The M-| key sequence invokes the command shell-command-on-region which prompts for a shell command, executes it, and usually displays the output in the echo area. If the output cannot fit in the echo area, then it displays the output in a separate buffer. However, if a prefix argument is supplied, say with C-u, then it replaces the region with the output. As a result, the buffer now looks like this:

Sorting Paragraphs and Pages

We have covered most of the sorting commands mentioned in the Emacs manual in the previous section. Now we will switch gears and discuss a few more of the remaining ones. We will no longer sort individual lines but paragraphs and pages instead.

  1. First create a buffer with the content provided below. Note that the text below contains three form feed characters. In Emacs, they are displayed as ^L. Many web browsers generally do not display them. The ^L symbols that we see in the text below have been overlayed with CSS. But there are actual form feed characters next to those overlays. If you are viewing this post with any decent web browser, you can copy the text below into your Emacs and you should be able to see the form feed characters in Emacs. In case you do not, insert them yourself by typing C-q C-l.
  2. Emacs is an advanced, extensible, customizable, self-documenting editor. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs. At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window. The cursor in the selected window shows the location where most editing commands take effect, which is called point. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point.
  3. Our text has six paragraphs spread across three pages. Each form feed character represents a page break. Type C-x h followed by M-x sort-pages RET to sort the pages alphabetically. Note how the second page moves to the bottom because it begins with the letter "W". The buffer now looks like this now:
  4. Emacs is an advanced, extensible, customizable, self-documenting editor. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. The cursor in the selected window shows the location where most editing commands take effect, which is called point. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs. At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window.
  5. Finally, type C-x h followed by M-x sort-paragraphs to sort the paragraphs alphabetically. The buffer now looks like this now:
  6. At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. Emacs is an advanced, extensible, customizable, self-documenting editor. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point. The cursor in the selected window shows the location where most editing commands take effect, which is called point. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs.

References

To read and learn more about the sorting commands described above refer to the following resources:

Within Emacs, type the following commands to read these manuals:

  • M-: (info "(emacs) Sorting") RET
  • M-: (info "(elisp) Sorting") RET

Further, the documentation strings for these commands have useful information too. Use the key sequence C-h f to look up the documentation strings. For example, type C-h f sort-regexp-fields RET to look up the documentation string for the sort-regexp-fields command.


Home Blog Feed Subscribe About GitHub Mastodon

© 2001–2023 Susam Pal