erl_tar

Unix 'tar' utility for reading and writing tar archives

The erl_tar module archives and extract files to and from a tar file. erl_tar supports the ustar format (IEEE Std 1003.1 and ISO/IEC 9945-1). All modern tar programs (including GNU tar) can read this format. To ensure that that GNU tar produces a tar file that erl_tar can read, give the --format=ustar option to GNU tar.

By convention, the name of a tar file should end in ".tar". To abide to the convention, you'll need to add ".tar" yourself to the name.

Tar files can be created in one operation using the create/2 or create/3 function.

Alternatively, for more control, the open, add/3,4, and close/1 functions can be used.

To extract all files from a tar file, use the extract/1 function. To extract only some files or to be able to specify some more options, use the extract/2 function.

To return a list of the files in a tar file, use either the table/1 or table/2 function. To print a list of files to the Erlang shell, use either the t/1 or tt/1 function.

To convert an error term returned from one of the functions above to a readable message, use the format_error/1 function.

UNICODE SUPPORT

If file:native_name_encoding/0 returns utf8, path names will be encoded in UTF-8 when creating tar files and path names will be assumed to be encoded in UTF-8 when extracting tar files.

If file:native_name_encoding/0 returns latin1, no translation of path names will be done.

OTHER STORAGE MEDIA

The erl_ftp module normally accesses the tar-file on disk using the file module. When other needs arise, there is a way to define your own low-level Erlang functions to perform the writing and reading on the storage media. See init/3 for usage.

An example of this is the sftp support in ssh_sftp:open_tar/3. That function opens a tar file on a remote machine using an sftp channel.

LIMITATIONS

For maximum compatibility, it is safe to archive files with names up to 100 characters in length. Such tar files can generally be extracted by any tar program.

If filenames exceed 100 characters in length, the resulting tar file can only be correctly extracted by a POSIX-compatible tar program (such as Solaris tar), not by GNU tar.

File have longer names than 256 bytes cannot be stored at all.

The filename of the file a symbolic link points is always limited to 100 characters.

Functions


add(TarDescriptor, Filename, Options) -> RetValue

  • TarDescriptor = term()
  • Filename = filename()
  • Options = [Option]
  • Option = dereference|verbose|{chunks,ChunkSize}
  • ChunkSize = positive_integer()
  • RetValue = ok|{error,{Filename,Reason}}
  • Reason = term()

The add/3 function adds a file to a tar file that has been opened for writing by open/1.

dereference

By default, symbolic links will be stored as symbolic links in the tar file. Use the dereference option to override the default and store the file that the symbolic link points to into the tar file.

verbose

Print an informational message about the file being added.

{chunks,ChunkSize}

Read data in parts from the file. This is intended for memory-limited machines that for example builds a tar file on a remote machine over sftp.

add(TarDescriptor, FilenameOrBin, NameInArchive, Options) -> RetValue

  • TarDescriptor = term()
  • FilenameOrBin = filename()|binary()
  • Filename = filename()
  • NameInArchive = filename()
  • Options = [Option]
  • Option = dereference|verbose
  • RetValue = ok|{error,{Filename,Reason}}
  • Reason = term()

The add/4 function adds a file to a tar file that has been opened for writing by open/1. It accepts the same options as add/3.

NameInArchive is the name under which the file will be stored in the tar file. That is the name that the file will get when it will be extracted from the tar file.

close(TarDescriptor)

  • TarDescriptor = term()

The close/1 function closes a tar file opened by open/1.

create(Name, FileList) ->RetValue

  • Name = filename()
  • FileList = [Filename|{NameInArchive, binary()},{NameInArchive, Filename}]
  • Filename = filename()
  • NameInArchive = filename()
  • RetValue = ok|{error,{Name,Reason}}
  • Reason = term()

The create/2 function creates a tar file and archives the files whose names are given in FileList into it. The files may either be read from disk or given as binaries.

create(Name, FileList, OptionList)

  • Name = filename()
  • FileList = [Filename|{NameInArchive, binary()},{NameInArchive, Filename}]
  • Filename = filename()
  • NameInArchive = filename()
  • OptionList = [Option]
  • Option = compressed|cooked|dereference|verbose
  • RetValue = ok|{error,{Name,Reason}}
  • Reason = term()

The create/3 function creates a tar file and archives the files whose names are given in FileList into it. The files may either be read from disk or given as binaries.

The options in OptionList modify the defaults as follows.

compressed

The entire tar file will be compressed, as if it has been run through the gzip program. To abide to the convention that a compressed tar file should end in ".tar.gz" or ".tgz", you'll need to add the appropriate extension yourself.

cooked

By default, the open/2 function will open the tar file in raw mode, which is faster but does not allow a remote (erlang) file server to be used. Adding cooked to the mode list will override the default and open the tar file without the raw option.

dereference

By default, symbolic links will be stored as symbolic links in the tar file. Use the dereference option to override the default and store the file that the symbolic link points to into the tar file.

verbose

Print an informational message about each file being added.

extract(Name) -> RetValue

  • Name = filename()
  • RetValue = ok|{error,{Name,Reason}}
  • Reason = term()

The extract/1 function extracts all files from a tar archive.

If the Name argument is given as "{binary,Binary}", the contents of the binary is assumed to be a tar archive.

If the Name argument is given as "{file,Fd}", Fd is assumed to be a file descriptor returned from the file:open/2 function.

Otherwise, Name should be a filename.

extract(Name, OptionList)

  • Name = filename() | {binary,Binary} | {file,Fd}
  • Binary = binary()
  • Fd = file_descriptor()
  • OptionList = [Option]
  • Option = {cwd,Cwd}|{files,FileList}|keep_old_files|verbose|memory
  • Cwd = [dirname()]
  • FileList = [filename()]
  • RetValue = ok|MemoryRetValue|{error,{Name,Reason}}
  • MemoryRetValue = {ok, [{NameInArchive,binary()}]}
  • NameInArchive = filename()
  • Reason = term()

The extract/2 function extracts files from a tar archive.

If the Name argument is given as "{binary,Binary}", the contents of the binary is assumed to be a tar archive.

If the Name argument is given as "{file,Fd}", Fd is assumed to be a file descriptor returned from the file:open/2 function.

Otherwise, Name should be a filename.

The following options modify the defaults for the extraction as follows.

{cwd,Cwd}

Files with relative filenames will by default be extracted to the current working directory. Given the {cwd,Cwd} option, the extract/2 function will extract into the directory Cwd instead of to the current working directory.

{files,FileList}

By default, all files will be extracted from the tar file. Given the {files,Files} option, the extract/2 function will only extract the files whose names are included in FileList.

compressed

Given the compressed option, the extract/2 function will uncompress the file while extracting If the tar file is not actually compressed, the compressed will effectively be ignored.

cooked

By default, the open/2 function will open the tar file in raw mode, which is faster but does not allow a remote (erlang) file server to be used. Adding cooked to the mode list will override the default and open the tar file without the raw option.

memory

Instead of extracting to a directory, the memory option will give the result as a list of tuples {Filename, Binary}, where Binary is a binary containing the extracted data of the file named Filename in the tar file.

keep_old_files

By default, all existing files with the same name as file in the tar file will be overwritten Given the keep_old_files option, the extract/2 function will not overwrite any existing files.

verbose

Print an informational message as each file is being extracted.

format_error(Reason) -> string()

  • Reason = term()

The format_error/1 function converts an error reason term to a human-readable error message string.

open(Name, OpenModeList) -> RetValue

  • Name = filename()
  • OpenModeList = [OpenMode]
  • Mode = write|compressed|cooked
  • RetValue = {ok,TarDescriptor}|{error,{Name,Reason}}
  • TarDescriptor = term()
  • Reason = term()

The open/2 function creates a tar file for writing. (Any existing file with the same name will be truncated.)

By convention, the name of a tar file should end in ".tar". To abide to the convention, you'll need to add ".tar" yourself to the name.

Except for the write atom the following atoms may be added to OpenModeList:

compressed

The entire tar file will be compressed, as if it has been run through the gzip program. To abide to the convention that a compressed tar file should end in ".tar.gz" or ".tgz", you'll need to add the appropriate extension yourself.

cooked

By default, the open/2 function will open the tar file in raw mode, which is faster but does not allow a remote (erlang) file server to be used. Adding cooked to the mode list will override the default and open the tar file without the raw option.

Use the add/3,4 functions to add one file at the time into an opened tar file. When you are finished adding files, use the close function to close the tar file.

Warning!

The TarDescriptor term is not a file descriptor. You should not rely on the specific contents of the TarDescriptor term, as it may change in future versions as more features are added to the erl_tar module.

init(UserPrivate, AccessMode, Fun) -> {ok,TarDescriptor} | {error,Reason}

  • UserPrivate = term()
  • AccessMode = [write] | [read]
  • Fun when AccessMode is [write] = fun(write, {UserPrivate,DataToWrite})->...; (position,{UserPrivate,Position})->...; (close, UserPrivate)->... end
  • Fun when AccessMode is [read] = fun(read2, {UserPrivate,Size})->...; (position,{UserPrivate,Position})->...; (close, UserPrivate)->... end
  • TarDescriptor = term()
  • Reason = term()

The Fun is the definition of what to do when the different storage operations functions are to be called from the higher tar handling functions (add/3, add/4, close/1...).

The Fun will be called when the tar function wants to do a low-level operation, like writing a block to a file. The Fun is called as Fun(Op,{UserPrivate,Parameters...}) where Op is the operation name, UserPrivate is the term passed as the first argument to init/1 and Parameters... are the data added by the tar function to be passed down to the storage handling function.

The parameter UserPrivate is typically the result of opening a low level structure like a file descriptor, a sftp channel id or such. The different Fun clauses operates on that very term.

The fun clauses parameter lists are:

(write, {UserPrivate,DataToWrite})
Write the term DataToWrite using UserPrivate
(close, UserPrivate)
Close the access.
(read2, {UserPrivate,Size})
Read using UserPrivate but only Size bytes. Note that there is only an arity-2 read function, not an arity-1
(position,{UserPrivate,Position})
Sets the position of UserPrivate as defined for files in file:position/2

A complete Fun parameter for reading and writing on files using the file module could be:

	  ExampleFun = 
	     fun(write, {Fd,Data}) ->  file:write(Fd, Data);
	        (position, {Fd,Pos}) -> file:position(Fd, Pos);
	        (read2, {Fd,Size}) -> file:read(Fd,Size);
	        (close, Fd) -> file:close(Fd)
	     end
	

where Fd was given to the init/3 function as:

{ok,Fd} = file:open(Name,...). {ok,TarDesc} = erl_tar:init(Fd, [write], ExampleFun),

The TarDesc is then used:

erl_tar:add(TarDesc, SomeValueIwantToAdd, FileNameInTarFile), ...., erl_tar:close(TarDesc)

When the erl_tar core wants to e.g. write a piece of Data, it would call ExampleFun(write,{UserPrivate,Data}).

Note!

The example above with file module operations is not necessary to use directly since that is what the open function in principle does.

Warning!

The TarDescriptor term is not a file descriptor. You should not rely on the specific contents of the TarDescriptor term, as it may change in future versions as more features are added to the erl_tar module.

table(Name) -> RetValue

  • Name = filename()
  • RetValue = {ok,[string()]}|{error,{Name,Reason}}
  • Reason = term()

The table/1 function retrieves the names of all files in the tar file Name.

table(Name, Options)

  • Name = filename()

The table/2 function retrieves the names of all files in the tar file Name.

t(Name)

  • Name = filename()

The t/1 function prints the names of all files in the tar file Name to the Erlang shell. (Similar to "tar t".)

tt(Name)

  • Name = filename()

The tt/1 function prints names and information about all files in the tar file Name to the Erlang shell. (Similar to "tar tv".)