Library for handling binary data.
This module contains functions for manipulating byte-oriented binaries. Although the majority of functions could be provided using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume less memory, or both, than a counterpart written in pure Erlang.
The module is provided according to Erlang Enhancement Proposal (EEP) 31.
Note!
The library handles byte-oriented data. For bitstrings that are not
binaries (does not contain whole octets of bits) a badarg
exception is thrown from any of the functions in this module.
Types
cp()
Opaque data type representing a compiled
search pattern. Guaranteed to be a tuple()
to allow programs to
distinguish it from non-precompiled search patterns.
part() = {Start :: integer() >= 0, Length :: integer()}
A representaion of a part (or range) in a binary. Start
is
a zero-based offset into a binary()
and Length
is the
length of that part. As input to functions in this module, a reverse
part specification is allowed, constructed with a negative
Length
, so that the part of the binary begins at Start
+
Length
and is -Length
long. This is useful for referencing
the last N
bytes of a binary as {size(Binary), -N}
. The
functions in this module always return part()
s with positive
Length
.
Functions
at(Subject, Pos) -> byte()
Subject = binary()
Pos = integer() >= 0
Returns the byte at position
(zero-based) in
binary
as an integer. If
>= byte_size(
,
a badarg
exception is raised.
bin_to_list(Subject) -> [byte()]
Subject = binary()
Same as bin_to_list(
.
bin_to_list(Subject, PosLen) -> [byte()]
Subject = binary()
PosLen = part()
Converts
to a list of byte()
s, each
representing the value of one byte. part()
denotes which part of
the binary()
to convert.
Example:
1> binary:bin_to_list(<<"erlang">>, {1,3}).
"rla"
%% or [114,108,97] in list notation.
If
in any way references outside the binary,
a badarg
exception is raised.
bin_to_list(Subject, Pos, Len) -> [byte()]
Subject = binary()
Pos = integer() >= 0
Len = integer()
Same as bin_to_list(
.
compile_pattern(Pattern) -> cp()
Pattern = binary() | [binary()]
Builds an internal structure representing a compilation of a
search pattern, later to be used in functions
match/3
,
matches/3
,
split/3
, or
replace/4
.
The cp()
returned is guaranteed to be a
tuple()
to allow programs to distinguish it from
non-precompiled search patterns.
When a list of binaries is specified, it denotes a set of
alternative binaries to search for. For example, if
[<<"functional">>,<<"programming">>]
is specified as
, this
means either <<"functional">>
or
<<"programming">>
". The pattern is a set of
alternatives; when only a single binary is specified, the set has
only one element. The order of alternatives in a pattern is
not significant.
The list of binaries used for search alternatives must be flat and proper.
If
is not a binary or a flat proper list of
binaries with length > 0, a badarg
exception is raised.
copy(Subject) -> binary()
Subject = binary()
Same as copy(
.
copy(Subject, N) -> binary()
Subject = binary()
N = integer() >= 0
Creates a binary with the content of
duplicated
times.
This function always creates a new binary, even if
. By using copy/1
on a binary referencing a larger binary, one
can free up the larger binary for garbage collection.
Note!
By deliberately copying a single binary to avoid referencing a larger binary, one can, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed. Sharing binary data is usually good. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate copying can be a good idea.
If
< 0
, a badarg
exception is
raised.
decode_unsigned(Subject) -> Unsigned
Subject = binary()
Unsigned = integer() >= 0
Same as decode_unsigned(
.
decode_unsigned(Subject, Endianness) -> Unsigned
Subject = binary()
Endianness = big | little
Unsigned = integer() >= 0
Converts the binary digit representation, in big endian or little
endian, of a positive integer in
to an Erlang
integer()
.
Example:
1> binary:decode_unsigned(<<169,138,199>>,big).
11111111
encode_unsigned(Unsigned) -> binary()
Unsigned = integer() >= 0
Same as encode_unsigned(
.
encode_unsigned(Unsigned, Endianness) -> binary()
Unsigned = integer() >= 0
Endianness = big | little
Converts a positive integer to the smallest possible representation in a binary digit representation, either big endian or little endian.
Example:
1> binary:encode_unsigned(11111111, big).
<<169,138,199>>
first(Subject) -> byte()
Subject = binary()
Returns the first byte of binary
as an
integer. If the size of
is zero, a
badarg
exception is raised.
last(Subject) -> byte()
Subject = binary()
Returns the last byte of binary
as an
integer. If the size of
is zero, a
badarg
exception is raised.
list_to_bin(ByteList) -> binary()
ByteList = iodata()
Works exactly as
erlang:list_to_binary/1
,
added for completeness.
longest_common_prefix(Binaries) -> integer() >= 0
Binaries = [binary()]
Returns the length of the longest common prefix of the
binaries in list
.
Example:
1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]).
2
2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]).
0
If
is not a flat list of binaries, a
badarg
exception is raised.
longest_common_suffix(Binaries) -> integer() >= 0
Binaries = [binary()]
Returns the length of the longest common suffix of the
binaries in list
.
Example:
1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]).
3
2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]).
0
If Binaries
is not a flat list of binaries, a badarg
exception is raised.
match(Subject, Pattern) -> Found | nomatch
Same as match(
.
match(Subject, Pattern, Options) -> Found | nomatch
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = part()
Options = [Option]
Option = {scope, part()}
part() = {Start :: integer() >= 0, Length :: integer()}
Searches for the first occurrence of
in
and returns the position and length.
The function returns {Pos, Length}
for the binary
in
, starting at the lowest position in
.
Example:
1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]).
{1,4}
Even though <<"cd">>
ends before
<<"bcde">>
, <<"bcde">>
begins first and is therefore the first match. If two
overlapping matches begin at the same position, the longest is
returned.
Summary of the options:
Only the specified part is searched. Return values still have
offsets from the beginning of
. A negative
Length
is allowed as described in section Data Types in this
manual.
If none of the strings in
is found, the
atom nomatch
is returned.
For a description of
, see function
compile_pattern/1
.
If {scope, {Start,Length}}
is specified in the options such
that Start
> size of Subject
, Start
+
Length
< 0 or Start
+ Length
> size of
Subject
, a badarg
exception is raised.
matches(Subject, Pattern) -> Found
Same as matches(
.
matches(Subject, Pattern, Options) -> Found
Subject = binary()
Pattern = binary() | [binary()] | cp()
Found = [part()]
Options = [Option]
Option = {scope, part()}
part() = {Start :: integer() >= 0, Length :: integer()}
As match/2
,
but
is searched until
exhausted and a list of all non-overlapping parts matching
is returned (in order).
The first and longest match is preferred to a shorter, which is illustrated by the following example:
1> binary:matches(<<"abcde">>,
[<<"bcde">>,<<"bc">>,<<"de">>],[]).
[{1,4}]
The result shows that <<"bcde">> is selected instead of
the shorter match <<"bc">> (which would have given raise to
one more match, <<"de">>).
This corresponds to the behavior of
POSIX regular expressions (and programs like awk), but is not
consistent with alternative matches in re
(and Perl), where
instead lexical ordering in the search pattern selects which
string matches.
If none of the strings in a pattern is found, an empty list is returned.
For a description of
, see
compile_pattern/1
.
For a description of available options, see
match/3
.
If {scope, {
is
specified in the options such that
> size
of
,
< 0 or
is > size of
,
a badarg
exception is raised.
part(Subject, PosLen) -> binary()
Subject = binary()
PosLen = part()
Extracts the part of binary
described by
.
A negative length can be used to extract bytes at the end of a binary:
1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
2> binary:part(Bin, {byte_size(Bin), -5}).
<<6,7,8,9,10>>
Note!
part/2 and
part/3 are also available in the
erlang
module under the names binary_part/2
and
binary_part/3
. Those BIFs are allowed in guard tests.
If
in any way references outside the binary,
a badarg
exception is raised.
part(Subject, Pos, Len) -> binary()
Subject = binary()
Pos = integer() >= 0
Len = integer()
Same as part(
.
referenced_byte_size(Binary) -> integer() >= 0
Binary = binary()
If a binary references a larger binary (often described as
being a subbinary), it can be useful to get the size of the
referenced binary. This function can be used in a program to trigger the
use of copy/1
. By copying a
binary, one can dereference the original, possibly large, binary that a
smaller binary is a reference to.
Example:
store(Binary, GBSet) ->
NewBin =
case binary:referenced_byte_size(Binary) of
Large when Large > 2 * byte_size(Binary) ->
binary:copy(Binary);
_ ->
Binary
end,
gb_sets:insert(NewBin,GBSet).
In this example, we chose to copy the binary content before
inserting it in gb_sets:set()
if it references a binary more than
twice the data size we want to keep. Of course,
different rules apply when copying to different programs.
Binary sharing occurs whenever binaries are taken apart.
This is the fundamental reason why binaries are fast,
decomposition can always be done with O(1) complexity. In rare
circumstances this data sharing is however undesirable, why this
function together with copy/1
can be useful when optimizing
for memory use.
Example of binary sharing:
1> A = binary:copy(<<1>>, 100).
<<1,1,1,1,1 ...
2> byte_size(A).
100
3> binary:referenced_byte_size(A)
100
4> <<_:10/binary,B:10/binary,_/binary>> = A.
<<1,1,1,1,1 ...
5> byte_size(B).
10
6> binary:referenced_byte_size(B)
100
Note!
Binary data is shared among processes. If another process still references the larger binary, copying the part this process uses only consumes more memory and does not free up the larger binary for garbage collection. Use this kind of intrusive functions with extreme care and only if a real problem is detected.
replace(Subject, Pattern, Replacement) -> Result
Subject = binary()
Pattern = binary() | [binary()] | cp()
Replacement = Result = binary()
Same as replace(
.
replace(Subject, Pattern, Replacement, Options) -> Result
Subject = binary()
Pattern = binary() | [binary()] | cp()
Replacement = binary()
Options = [Option]
Option = global | {scope, part()} | {insert_replaced, InsPos}
InsPos = OnePos | [OnePos]
OnePos = integer() >= 0
Result = binary()
Constructs a new binary by replacing the parts in
matching
with
the content of
.
If the matching subpart of
giving raise
to the replacement is to be inserted in the result, option
{insert_replaced,
inserts the matching part
into
at the specified position (or
positions) before inserting
into
.
Example:
1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]).
<<"a[b]cde">>
2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]).
<<"a[b]c[d]e">>
3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]).
<<"a[bb]c[dd]e">>
4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]).
<<"a[b-b]c[d-d]e">>
If any position specified in
> size
of the replacement binary, a badarg
exception is raised.
Options global
and {scope, part()}
work as for
split/3
.
The return type is always a binary()
.
For a description of
, see
compile_pattern/1
.
split(Subject, Pattern) -> Parts
Subject = binary()
Pattern = binary() | [binary()] | cp()
Parts = [binary()]
Same as split(
.
split(Subject, Pattern, Options) -> Parts
Subject = binary()
Pattern = binary() | [binary()] | cp()
Options = [Option]
Option = {scope, part()} | trim | global | trim_all
Parts = [binary()]
Splits
into a list of binaries based on
. If option global
is not specified,
only the first occurrence of
in
gives rise to a split.
The parts of
found in
are not included in the result.
Example:
1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
[<<1,255,4>>, <<2,3>>]
2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
[<<0,1>>,<<4>>,<<9>>]
Summary of options:
Works as in match/3
and matches/3
. Notice that
this only defines the scope of the search for matching strings,
it does not cut the binary before splitting. The bytes before and after
the scope are kept in the result. See the example below.
Removes trailing empty parts of the result (as does trim
in re:split/3
.
Removes all empty parts of the result.
Repeats the split until
is
exhausted. Conceptually option global
makes split work
on the positions returned by
matches/3
, while it
normally works on the position returned by
match/3
.
Example of the difference between a scope and taking the binary apart before splitting:
1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]).
[<<"ban">>,<<"na">>]
2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]).
[<<"n">>,<<"n">>]
The return type is always a list of binaries that are all
referencing
. This means that the data in
is not copied to new binaries, and that
cannot be garbage collected until the results
of the split are no longer referenced.
For a description of
, see
compile_pattern/1
.