vis
a vi-like editor based on Plan 9's structural regular expressions
git clone https://9o.is/git/vis.git
text.rst
(2994B)
1 Text
2 ====
3
4 The core text management data structure which supports efficient
5 modifications and provides a byte string interface. Text positions
6 are represented as ``size_t``. Valid addresses are in range ``[0,
7 text_size(txt)]``. An invalid position is denoted by ``EPOS``. Access to
8 the non-contigiuos pieces is available by means of an iterator interface
9 or a copy mechanism. Text revisions are tracked in an history graph.
10
11 .. note:: The text is assumed to be encoded in `UTF-8 <https://tools.ietf.org/html/rfc3629>`_.
12
13 Load
14 ----
15
16 .. doxygengroup:: load
17 :content-only:
18
19 State
20 -----
21
22 .. doxygengroup:: state
23 :content-only:
24
25 Modify
26 ------
27
28 .. doxygengroup:: modify
29 :content-only:
30
31 Access
32 ------
33
34 The individual pieces of the text are not necessarily stored in a
35 contiguous memory block. These functions perform a copy to such a region.
36
37 .. doxygengroup:: access
38 :content-only:
39
40 Iterator
41 --------
42
43 An iterator points to a given text position and provides interfaces to
44 adjust said position or read the underlying byte value. Functions which
45 take a ``char`` pointer will generally assign the byte value *after*
46 the iterator was updated.
47
48 .. doxygenstruct:: Iterator
49
50 .. doxygengroup:: iterator
51 :content-only:
52
53 Byte
54 ^^^^
55
56 .. note:: For a read attempt at EOF (i.e. `text_size`) an artificial ``NUL``
57 byte which is not actually part of the file is returned.
58
59 .. doxygengroup:: iterator_byte
60 :content-only:
61
62 Codepoint
63 ^^^^^^^^^
64
65 These functions advance to the next/previous leading byte of an UTF-8
66 encoded Unicode codepoint by skipping over all continuation bytes of
67 the form ``10xxxxxx``.
68
69 .. doxygengroup:: iterator_code
70 :content-only:
71
72 Grapheme Clusters
73 ^^^^^^^^^^^^^^^^^
74
75 These functions advance to the next/previous grapheme cluster.
76
77 .. note:: The grapheme cluster boundaries are currently not implemented
78 according to `UAX#29 rules <http://unicode.org/reports/tr29>`_.
79 Instead a base character followed by arbitrarily many combining
80 character as reported by ``wcwidth(3)`` are skipped.
81
82 .. doxygengroup:: iterator_char
83 :content-only:
84
85 Lines
86 -----
87
88 Translate between 1 based line numbers and 0 based byte offsets.
89
90 .. doxygengroup:: lines
91 :content-only:
92
93 History
94 -------
95
96 Interfaces to the history graph.
97
98 .. doxygengroup:: history
99 :content-only:
100
101 Marks
102 -----
103
104 A mark keeps track of a text position. Subsequent text changes will update
105 all marks placed after the modification point. Reverting to an older text
106 state will hide all affected marks, redoing the changes will restore them.
107
108 .. warning:: Due to an optimization cached modifications (i.e. no ``text_snapshot``
109 was performed between setting the mark and issuing the changes) might
110 not adjust mark positions accurately.
111
112 .. doxygentypedef:: Mark
113
114 .. doxygendefine:: EMARK
115
116 .. doxygengroup:: mark
117 :content-only:
118
119 Save
120 ----
121
122 .. doxygengroup:: save
123 :content-only:
124
125 Miscellaneous
126 -------------
127
128 .. doxygengroup:: misc
129 :content-only: