Computing.Net > Forums > Unix > Trying to split up a large txt file

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Trying to split up a large txt file

Reply to Message Icon

Name: work-guy
Date: November 18, 2004 at 12:48:19 Pacific
OS: Misc
CPU/Ram: 1 gb
Comment:

Hello all,

I've got a rather large txt here at work that we received from another company that I'm trying to get into a database.

The file shows that the furthest character is 1025 spaces out in each line. The last 700 spaces or so are blank. And the area that does have data in it (up around 340 or so characters or spaces in) is a bit longer than any of my data base utilities will handle.

I'm using ssed ver. 3.48 on a Win2K workstation.

I'm thinking I can use:
ssed "s/[ \t]*$//" filename > newfile name
to delete the trailing white spaces. But ssed didn't like that command. So I'm now trying
ssed "s/\s\'//" filename > newfile name.
But I'm getting a file with the same white spaces at the end.

Any assistance would be much appreciated!


Working guy



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: November 18, 2004 at 13:17:26 Pacific
Reply:

Hi:

First, of all, by "ssed", i think you mean "sed"?

Second, what was it that sed didn't like? This command works on my Solaris 7 system:

ssed "s/[ \t]*$//" filename

provided the \t is an actual tab character. In vi you create this by being in insert/add mode, pressing keys control-v, and then hitting the tab key.

With lines as long as you're talk about, you might have overrun the sed buffer? If that's the case, you might true the GNU version of sed available for download at gnu.org.

Regards,

Nails


0

Response Number 2
Name: work-guy
Date: November 18, 2004 at 13:25:01 Pacific
Reply:

Nails,
Thanks for the reply!

I'm using a windows 2000 OS, no unix. The ssed stands for Super Sed. It's a windows/updated version of sed.

The command ssed "s/[ \t]*$//" filename probably does run correctly in a unix environment, but I don't have one here to work on. A few years ago I did some unix admin on a HPUX box, but have no access to it or linux at this time.

I'll see if I can't find a straight version of sed around that will work in Windows...

The closest command I could find that would work with ssed was:
ssed "s/\s\'//" filename > newfilename

The features I found were:
\s - any whitespace character [space, TAB, VT, FF, \n]
\' - matches the end of the pattern space: same as "$"

So I was hoping it would work.

It seems to create the second file but all the data comes over including the 700 or so Whitespaces.

Thanks again for your input.


Working guy


0

Response Number 3
Name: Wolfbone
Date: November 18, 2004 at 13:47:28 Pacific
Reply:

ssed appears to use perl regexps but anyway it looks like you only searched for one whitespace character. Did you try this?:

ssed "s/\s+\'//" filename > newfilename


0

Response Number 4
Name: work-guy
Date: November 18, 2004 at 15:47:09 Pacific
Reply:

Woldbone,

I just tried:

ssed "s/\s+\'//" filename > newfilename

And got the same thing. The last part of the data on every line in the file is 2002.

The first set of data on each line is the same as is the last set of data.
Maybe I could try a command to just grab everything from the first set of data and the last set on each line and export it to a new file? An example would be:

ssed "s/\{start,end\}/ err.. I'm not sure what the rest would be.. any help would be great here..

Thanks


Working guy


0

Response Number 5
Name: Wolfbone
Date: November 18, 2004 at 16:04:18 Pacific
Reply:

I'm not sure, I thought from what you wrote earlier that the rightmost part of each line is just whitespace but if it is "2002" then you'd need to use that instead of "\'" but if not then Nails would know better about perl style regexps than me anyway.


0

Related Posts

See More



Response Number 6
Name: work-guy
Date: November 18, 2004 at 16:13:04 Pacific
Reply:

The right 700 charcters are all whitespaces. The "2002" starts at 321 and ends at 325.

I was suggesting maybe I could grab everything between the beginning of the line "start" and the end of the data in the line "2002". Does that make sense?

Working guy


0

Response Number 7
Name: Wolfbone
Date: November 18, 2004 at 17:51:40 Pacific
Reply:

Well I've just installed ssed and it only uses perl regexps if you tell it to with -R. The following command got rid of all the trailing spaces and tabs in a test file:

ssed 's/[ \t]\+$//' in > out


0

Response Number 8
Name: work-guy
Date: November 18, 2004 at 23:24:26 Pacific
Reply:

Well I tried it again here at home and still couldn't get it to work.

While doing more research, I found a windows based text editor that allowed em to open the file and found that I might have chopped off the end of the file earlier today with out knowing it (the editor I was using at work might not have displayed it properly)..

But I also found out that file has some binary or non-dos friendly codes in it. I was able to strip those out with the editor and am in business (Well at least with the smaller subset of data I sent home to test with).

I did try the sed -n p command before attempting to access the file with the editor, but can't tell if it did or didn't work?

Working guy


0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Trying to split up a large txt file

Split large xml file www.computing.net/answers/unix/split-large-xml-file/8495.html

Insert string in a txt file..? www.computing.net/answers/unix/insert-string-in-a-txt-file/6089.html

store a line from file in variable www.computing.net/answers/unix/store-a-line-from-file-in-variable/4659.html