Solved how break large .txt files to small chunks via Python or Dos

January 18, 2012 at 12:03:18
Specs: Windows XP
I have several text files that are too large to manipulate via Word and want to break them into smaller chunks. I've tried doing it manually by drag-and-drop or using "cut" in Notebad or Word and copying the clipboard into a new document, but it's painfully slow and tricky -- one twitch of the mouse and all the highlighting can disappear and I don't have the patience.

I have very rudimentary knowledge of DOS, can do DIR and CD and COPY but don't see a way to copy just a specific number of lines. I can combine a lot of small datasets with a COPY but can't do the reverse.

I also am trying to teach myself Python. Everything I've worked with so far involves very defined input datasets so I can break every line read into its component data fields, and manipulate those fields. I cannot figure out how to read a text file that can have anything at all in each line
and be able to test the input for a plus sign in the first position.

Each useful segment of the text file begins with a line like this: +------------------------------------------------ where the visually solid line is a string of hyphens, followed by all the information in an email. I want to put 200 of those emails into each file I'm going to work with.

It ought to be simple, but both Python textbooks I'm trying to use are Greek to me. I was a mainframe programmer for 30 years and I've learned a lot of different software, but none of it is available online for free, and although every young programmer I've encountered says Python is elegant and extremely easy to learn, brains must have undergone a redesign since I had mine installed.


See More: how break large .txt files to small chunks via Python or Dos

Report •


✔ Best Answer
January 19, 2012 at 23:04:59
Batch scripting is the creation of command lines to be executed by a Command Processor, usually Cmd.exe, in Windows XP. You possibly think of it as Dos but there is no Dos in Win XP. The commands are very similar to the MS-Dos commands such as Dir, CD, Copy etc. and as in Dos some commands are internal, some external.

Many command lines can be created in one file (with the extension .bat) and run with or without user intervention depending on the commands used.

If you had a need to execute 10 commands on a regular basis it would make sense to create a .bat file containing those commands then simply start the batch script without the need to enter the commands repetitively.

There are several scripting languages available.



Please come back & tell us if your problem is resolved.



#1
January 18, 2012 at 15:14:28
I am not a python guy, but perl is readily available for most platforms including windows. This script splits bigfile.txt into filerec<no>.txt breaking at a line starting with +---

 
#!/usr/bin/perl

use warnings;
use strict;

my $fn=0;
my $bname="filerec";
my $outfile;

$outfile = $bname . $fn . ".txt";
@ARGV= qw(bigfile.txt);

open(OF, '>', $outfile) or die $!;

# have to escape the + sign and the escape character
my $pattern = "^\\\+---";
while( <> ) {
chomp;
if(/$pattern/)
{
close OF;
$fn++;
$outfile = $bname . $fn . ".txt";
open(OF, '>', $outfile) or die $!;
}
print OF "$_\n";
}
close OF;



Report •

#2
January 18, 2012 at 18:46:27
nails,
I appreciate the thought, but I can't mix Perl in with the other stuff in my head. DOS or Python I know how to execute, etc.

Report •

#3
January 19, 2012 at 12:49:11
Perhaps simple enough using a batch script if you can accept the loss of blank lines within each e-mail??


Please come back & tell us if your problem is resolved.


Report •

Related Solutions

#4
January 19, 2012 at 21:18:56
HUH?

What is a "batch script"? How would I use it? Where would I get it? Watcha talkin about, Willis?


Report •

#5
January 19, 2012 at 23:04:59
✔ Best Answer
Batch scripting is the creation of command lines to be executed by a Command Processor, usually Cmd.exe, in Windows XP. You possibly think of it as Dos but there is no Dos in Win XP. The commands are very similar to the MS-Dos commands such as Dir, CD, Copy etc. and as in Dos some commands are internal, some external.

Many command lines can be created in one file (with the extension .bat) and run with or without user intervention depending on the commands used.

If you had a need to execute 10 commands on a regular basis it would make sense to create a .bat file containing those commands then simply start the batch script without the need to enter the commands repetitively.

There are several scripting languages available.



Please come back & tell us if your problem is resolved.


Report •

#6
January 20, 2012 at 00:45:01
where do I find the magic words to put into my script? This is not an answer yet. It feels like a restatement of the problem.

As I explained in my first post, I have done some simple DOS (excuse me if I continue to call it DOS -- if it looks like DOS and acts like DOS.....) but when I say "Help" on the command line I don't see anything that looks like it would be useful. It was my understanding that all the available commands would be listed if I said "Help".


Report •

#7
January 20, 2012 at 12:02:15
A fairly comprehensive list of commands is here http://ss64.com/nt/ but be aware that not all listed commands are available in all Windows NT versions. e.g. the Choice command is not installed with XP but can be downloaded and will then run successfully. Just another bit of MicroSoft logic.

To get help on a command enter the command followed by /? e.g. Dir /?

Good luck.


Please come back & tell us if your problem is resolved.


Report •


Ask Question