split and cat

Custom / CUSTOM
March 12, 2010 at 07:24:16
Specs: unix/linux, 2.401 GHz / 2047 MB
I've done some testing with splitting of files, using the SPLIT command, but it does not seem to work properly. I'll post the exact commands, but it comes down to this :

- splitting a TAR.GZ file into files just less than 2G, by using split
- recreating them again using cat, keeping the alphabetic order of the filenames in mind

Both actions do their work, but then when un-gzipping, he's stating the file is corrupt.

Should work, no ?


See More: split and cat

Report •

#1
March 12, 2010 at 09:33:59
Using this to re-construct:

1. deleting env001.tar.gz

2. (there are 10 files complying to this, from .000 to .009)

for f in `ls -l env001.tar.gz.* | sort`
do
echo $f
cat $f >> env001.tar.gz
done

3. gzip -d env001.tar.gz


Report •

#2
March 12, 2010 at 10:43:49
tar and zip files are binary files - not ASCII. IMO, I don't think you will have much luck splitting them with the unix/linux split command.

Report •

#3
March 12, 2010 at 12:51:06
Of course they are binary, but I don't see the issue ...splitting a file involves taking it apart, then recreating it. No matter what the content is, no ?

I'll check the man pages on SPLIT, but IMO there is no such restriction.

And if there is, how the f___ do you have to split a file on linux/unix ? By that I mean any file, not a text file only (which is useless on itself, I mean, a tool which can only split text-files, I would consider that rather silly)


Report •

Related Solutions

#4
March 12, 2010 at 13:13:07
Ow wait ... it's the CAT command that does not know how to handle binary files !?

Problem is not with the SPLIT command ...


Report •

#5
March 14, 2010 at 06:14:50
Check the resulting files; I assume there are a few \n are sneaking in there some how.

Report •

#6
March 14, 2010 at 23:13:05
tvc:

As Razor touched on - and as you are finding out - it's not a case of spltting a binary file - it's glueing it back together.

I don't have experience with this command, but since you are trying to split tar files, you might want to check out split-tar:

http://www.informatik-vollmer.de/so...



Report •

#7
March 15, 2010 at 03:38:46
I may have a test, but I like to stick with original commands ...

Report •

#8
March 15, 2010 at 09:26:37
does cpio not work?

Report •

#9
March 15, 2010 at 12:14:54
Forgot about that one (cause I dont like that tool) ... does it have spanning ? (if not, there's no point using it)

Report •

#10
March 15, 2010 at 13:56:31
I thought it did when i first viewed the manual, but on review, it appears not. the -C option looked like it would chunk the file, but it doesn't seem to work that way. sorry...

Report •

#11
March 15, 2010 at 14:52:27
Doesn't tar have a chunking option? If so, can't you just tar your gzip'd tar ball?

Apperently GNU tar does have such an option; it's -L, --tape-length N, where N is in kilobytes. Not sure if it sticks a number in there somewhere, or if you need an actual removable drive.


Report •

#12
March 15, 2010 at 14:57:15
I tried something that looked like that, but it resulted in the action being paused, and asking me to change the tape, then to continue to write the next chunk to the same named file (since he thinks it's a tape, but it is not)

Report •

#13
March 15, 2010 at 15:02:13
Could you run tar in the background, and manually rename the file? Sounds like it'd be a PITA, though.

Report •

#14
March 15, 2010 at 15:11:47
Yes ... something like that should be possible. I'll have a test that way, the only other (completely different) idea I have is to convert to HEX, then use CAT ... but I havent found time yet.


edit: the split option in TAR (using --tape-length) does work, but :
- you have to manually come in between, or find a way to script it (something is telling me that would mean re-inventing the wheel)
- you cannot use the --gzip option as he states you cannot combine --gzip and --tape-length (this would be necessary since the splitting may not be done BEFORE the compression ... if you split whilst using TAR, it MUST be compressed already, otherwise ... aaargh)

And, when recreating, it seems to work with the ... well, forgot the option name ... option.

;)

Manually as well, of course.


Report •

#15
March 15, 2010 at 19:20:34
It's really bizarre that *nix doesn't have a "native" command to just paste files together. even "dos" can do that!
(copy a + b + c dd.out)
I fished around and did find "paste" (imagine that!) and it might work with the -s -d \0 options, but it still wants to put a single linefeed at the end of the very last line. Tail might remove that.
(edit: "tail" won't but "head -c -1" will i think.)
(edit again: i tried it out and it bombed, but i tried it with CAT and it seemed to work ok.
i split up a .zip file into 4 10000 byte chunks, then "catted" them back together, downloaded and did pkunzip -v and it did not complain. something else might be going on with your process...)

Report •

#16
March 16, 2010 at 05:19:16
It may depend on the content of those files ... this is what I get:

[root@localhost ~]# /root/pakuiteen
env001.tar.gz.000 - Tue Mar 16 04:12:27 CET 2010
env001.tar.gz.001 - Tue Mar 16 04:14:45 CET 2010
env001.tar.gz.002 - Tue Mar 16 04:17:10 CET 2010
env001.tar.gz.003 - Tue Mar 16 04:19:36 CET 2010
env001.tar.gz.004 - Tue Mar 16 04:22:03 CET 2010
env001.tar.gz.005 - Tue Mar 16 04:24:32 CET 2010
env001.tar.gz.006 - Tue Mar 16 04:27:06 CET 2010
env001.tar.gz.007 - Tue Mar 16 04:29:49 CET 2010
env001.tar.gz.008 - Tue Mar 16 04:32:20 CET 2010
env001.tar.gz.009 - Tue Mar 16 04:34:40 CET 2010
env001.tar.gz.010 - Tue Mar 16 04:37:06 CET 2010
Tue Mar 16 04:37:36 CET 2010

gzip: env001.tar.gz: invalid compressed data--format violated
Tue Mar 16 05:25:03 CET 2010
You have new mail in /var/spool/mail/root
[root@localhost ~]#


Report •

#17
March 16, 2010 at 05:20:54
To create the GZ file from splitted files:

#!/bin/bash

tarfile=env001.tar

if [ ! -f ${tarfile}.gz.* ]
then
echo
echo "ERROR - missing sourcefiles"
echo
exit 1
fi

if [ -f ${tarfile}.gz ]
then
echo
echo "INFO - The target file ($tarfile) already exists"
echo
exit 1
fi

for f in `ls -1 ${tarfile}.gz.* | sort`
do
echo $f - `date`
cat $f >> ${tarfile}.gz
done


Report •

#18
March 16, 2010 at 05:21:26
To extract the GZ file:

#!/bin/bash

tarfile=env001.tar

if [ ! -f ${tarfile}.gz ]
then
echo
echo "ERROR - Missing file ${tarfile}.gz"
echo
exit 1
fi

if [ -f $tarfile ]
then
echo
echo "INFO - Extracted file (${tarfile}) already exists."
echo
exit 1
fi

date
gzip -d ${tarfile}.gz
date


Report •

#19
March 21, 2010 at 15:46:30
Found what the problems were with the above :

- CAT incorrectly shows some characters, so if your file contains these characters, it will cause the resulting file to not be correct
- when splitting (so, this is NOT a problem of CAT, this is a problem of split) you have to make sure that the split does not occur in the middle of the line, or that CAT command (used in the way as shown above) will introduce extra line returns, which corrupt the content as well. Maybe (but not tested) you can use CAT to overcome this issue, but I've opted for the solution to split the files EXACTLY at the end of a line, not in the middle (which means : a random place)


Report •

Ask Question