Splitting Log Into Individual Files

February 15, 2013 at 04:37:48
Specs: Windows Vista, 1.6 gb 4gb

I'm stuck with a script again and would appreciate it if anyone can help point me in the right direction. I have a long rolling log which I want to split up into individual files. The following is indicative of the start of entries where each starts on a new line followed by an unpredictable number of lines of text:
F:\Test\0001_01.txt
F:\Test\0002_01.txt
F:\Test\0002_02.txt
F:\Test\0002_03.txt
F:\Test\0003_01.txt
F:\Test\0003_02.txt
F:\Test\0004_01.txt
F:\Test\0005_01.txt
F:\Test\0005_02.txt

I want to split the above into:

0001MSplit.txt
0002MSplit.txt
0003MSplit.txt
0004MSplit.txt
0005MSplit.txt

concatenating the contents into one file when the entry shows the same file name and has a suffix of _01, _02, _03 etc. I have only coded for 6 suffixes but there are likely to be more than that. A single file entry will always be _01.

I had to insert @@@ into the log on the line above every file entry for 2 reasons - 1) as a delimiter for splitting the file for the Array and 2) I need to preserve the line with the file details as I need it as the first line in the split file and as subsequent headings where there is concatenation. (Using the line with the file as a delimiter doesn't work for me as it is removed when loaded into the array).

I thought I had cracked it but not so. I was also aiming to have split files numbered consecutively but I couldn't achieve that. I would get the first file as 1 and the second one as 8 etc. I can wear the split file numbers not being consecutive if I could get the log split properly.

[Code]
Const ForReading = 1
Const ForAppending = 8

strPath = "F:\Test\"

strFile1 = "Append_Txt.txt"
strFile2 = "MSplit.txt"

strDeLimit = "@@@"

'Read Text File Into String
set fso = CreateObject("Scripting.FileSystemObject")
Set objTextfile = fso.OpenTextFile(strPath & strFile1,ForReading,False)
strContents=objTextFile.ReadAll

'Load String Into Array
OutPutArray = split(strContents,strDeLimit,-1,1)

'Close File
objTextFile.Close

'Check For Sections
n=0001

'Read Each Element of Array Into Memory for OutPut
For each strCurrentLine in outputArray


'Open File For Appending
Set fso = CreateObject("Scripting.FileSystemObject")
Set objfile = fso.OpenTextFile(strPath & n & strFile2,ForAppending,True)

'Scroll through conditions
for n = 1 to 6
if instr(1,strCurrentLine,"_00",1) >0 then
strTotal = strCurrentLine
end if
if instr(1,strCurrentLine,"_01",1) >0 then
strTotal = strTotal + strCurrentLine
end if
if instr(1,strCurrentLine,"_02",1) >0 then
strTotal = strTotal + strCurrentLine
end if
if instr(1,strCurrentLine,"_03",1) >0 then
strTotal = strTotal + strCurrentLine
end if
if instr(1,strCurrentLine,"_04",1) >0 then
strTotal = strTotal + strCurrentLine
end if
if instr(1,strCurrentLine,"_05",1) >0 then
strTotal = strTotal + strCurrentLine
end if
next

'Write Array Element To File
objFile.WriteLine(strCurrentLine)

'Increment Counter
n=n+1

'Close File
objFile.Close

Next

Wscript.Quit

[/Code]



See More: Splitting Log Into Individual Files

Report •


#1
February 15, 2013 at 05:48:54

find "Test\0001" < biglog > 0001MSplit.txt

You can guess the rest.

=====================
M2 Golden-Triangle


Report •

#2
February 15, 2013 at 17:52:02

Why can't you split on "F:\Test\"? Then the first 11 bytes of each element would be your target: 000z_yy.txt followed by all the content (in that same element, including crlfs). Then just use a "prev" var to compare first 4 bytes of each successive element until you get a diff at which point you write them out and update "prev" with "diff".

set fso=createobject("scripting.filesystemobject")
big=fso.opentextfile("log").readall
prev=""
st=0
en=0
sp="F:\Test\"
y=split(big,sp)
for i=0 to ubound(y)
diff=left(y(i),4)
'-- if you want to add "F:\test", add it here:
y(i)=sp+y(i)
if diff<>prev then
set z=fso.createtextfile(prev+"MSplit.txt")
for j=st to en
z.write y(j)
next
prev=diff
st=i
en=i
else
en=en+1
end if
next


Report •

#3
February 16, 2013 at 01:35:45

Hi nbrane, thank you for your help. Come to think of it, there's no hard reason not to use "F\Test\" as the delimiter. The only problem would be if the directory name were to differ. However, I understand that to be unlikely. I ran the script but it only produced one output file named MSplit which, other than containing one difference, is a duplicate of the log file. The difference is the very first file reference F:\Test\ 0001_01.txt is preceded by F:\test\. The other file references remain unchanged. I'd also be grateful if you could explain your script at the point - j = st to en as these variables were initialised with the value of 0. What am I missing here?

Can you also have another look at the output files for me?

Many thanks


Report •

Related Solutions

#4
February 16, 2013 at 09:10:20

Hi, nbrane, have now managed to redo my own attempt. It's not very polished I admit but it gives me the separate output files with concatenation where appropriate just what I was trying to achieve .... or so I thought!

I have just realised that I've based everything on numerical sequence being maintained and that as long as the first 4 characters of the file entries in the log appear in consecutive order, the MSplit output file name will bear some resemblance to that log entry ie log entries = 0001, 0002, 0003, 0004 corresponding output files = 1Msplit, 2MSplit, 3MSplit and 4MSplit. However, once any old file name appears out of numerical order my script no longer maintains that similarity unless I have pre-sorted the entries into numerical order - something I wouldn't relish having to do.

Consequently, is it possible for the MSplit output file name to mirror the file name which appears in the log entry and even have VBS go as far as appending an output file say 0012_07 which was located 35th in the log file entries to an output file 12MSplit which was created earlier in say 20th position?

If this is not possible I will have to resort to pre-sorting the log or creating some kind of cross index either of which will leave poor old me feeling very cross!

I ought to have spent more time thinking this through but then everyone wants everything yesterday! I don't want to spend more time on this if it is not possible.

I have attached my script so you can see how rubbish I am!

Thank you again for your time

[Code]
Const ForReading = 1
Const ForAppending = 8

strPath = "F:\Test\"

strFile1 = "Append_Txt.txt"
strFile2 = "MSplit.txt"

strRecursive = ""

strDeLimit = "F:\Test\"

'Read Text File Into String
set fso = CreateObject("Scripting.FileSystemObject")
Set objTextfile = fso.OpenTextFile(strPath & strFile1,ForReading,False)
strContents=objTextFile.ReadAll

'Load String Into Array
OutPutArray = split(strContents,strDeLimit,-1,1)

'Close File
objTextFile.Close

'Read Each Element of Array Into Memory for OutPut
For j= 1 to ubound(outputArray)

'Comparison string
strMatch = left(outputarray(j),4)

'Counter
n=n+1

'Scroll through conditions
if strMatch = strRecursive then
n=n-1
end if

'Open File For Appending
Set fso = CreateObject("Scripting.FileSystemObject")
Set objfile = fso.OpenTextFile(strPath & n & strFile2,ForAppending,True)

'Reset For Retro File Name Check
strRecursive = strMatch

'Write Array Element To File
objFile.WriteLine(OutPutArray(j))

'Close File
objFile.Close

'Loop round again
next

'Exit VBS
Wscript.Quit

[/Code]


Report •

#5
February 16, 2013 at 11:09:29

Ok, yeh my script seemed to work with limited testing last night, but in the cold light of day many live bugs could be seen scurrying for cover. It did create the files, with data however, except for the last one, so i don't know why it failed at your end. Here's my revised script:
set fso=createobject("scripting.filesystemobject")
big=fso.opentextfile("log").readall
public st,en
st=0
en=0
prev=""
sp="F:\Test\"
y=split(big,sp)
for i=0 to ubound(y)
diff=left(y(i),4)
y(i)=sp+y(i)
if diff<>prev then
if prev<>"" then call output()
prev=diff
else
' ====== Juniper note! here is where "en" is incremented
en=en+1
end if
next
'== here is where it "wraps up" the final set of elements
en=ubound(y)
call output

sub output()
name=prev+"Msplit.txt"
on error resume next
set z=fso.opentextfile(name,8)
'=== this "work-around" handles appending non-sequential data to existing files
if err.number=53 then
set z=fso.createtextfile(name)
else
if err.number>0 then wscript.echo err.number, err.description
end if
err.clear
on error goto 0
for j=st to en
z.write y(j)
next
z.close
' ====== Juniper note! and here is where "en" and "st" are updated
st=i
en=i
end sub
'===== end vbscript

I had to resort to using the error-trapping to handle existing vs not-yet-existing files, due to vbscript's annoying habit of not allowing a file-create+append in one operation like vbasic does. As you can see, this will handle "out-of-order" elements, at least with my test data (which, if you want a copy of, I'll p-mail it to you so as not to waste space posting it here.)
I think your script is not rubbish, we're both working in the same direction. I found that I had to make the write-op as a sub in order to call it the final time (when the end of the array is reached). It took me awhile to figure out why it was leaving off the last elements.
Also, "on error resume next" can really foul you up unless you remember that it's set, because other, real errors don't show up, they just get ignored and your program flops with no complaints.
Another option, as M2 suggested, would be to use batch, but it would require somewhat more code than he provided, as he mentioned.


Report •

#6
February 16, 2013 at 11:58:53

Hi, nbrane, thank you for the time you are giving me. I hate to be the bearer of bad news but the script did not output any files just a blank msgbox which, when clicked on, disappeared but nothing else happened. As you say with "on error resume next" there's no indication of what the error is. I made a new vbs script a few times in case I had copied your script incorrectly but it always produced the same result.

Hope you can spot something easily so as not to take up so much of your time

Thank you


Report •

#7
February 16, 2013 at 12:23:36

It ran ok on my xp, so i don't know. There is no display for it to show anything, so you won't see anything on the screen. I tweaked the error-trapping a tad just in case, but I got no errors either with existing outputfiles nor non-existing ones. Are you running it from the same directory as "log"? I didn't build any path-handling (being lazy, I usually leave that up to the end-user). You can always disable the error-trapping, and also add "wscript.echo" debugging equipment to examine the variables at critical points if you feel like messing with it. You're welcom to p-mail me all or some of your log-file content and I can run trials on it. Also, you might scan your system to see if the files are appearing somewhere unexpected:
cd \
dir /s /p 00*.txt
That's about all I can contribute at this point since it flies on my system.

Report •

#8
February 16, 2013 at 12:52:57

Found in root directory. The log has been copied 8 times into one file by the name of "t". I am using a machine with XP Pro at the moment but normally have Vista Home Premium. When I get home tonight I'll try the script on Vista and see what transpires.

Thank you again


Report •

#9
February 16, 2013 at 21:13:29

:: ===== script starts here ===============
:: split to outfiles by ####
:: splitter.bat 2013-02-17 11:51:17.29
@echo off & setLocal enableDELAYedeXpansioN

if exist *MSplit.txt del *MSplit.txt

for /f "tokens=* delims= " %%a in (myfile) do (
set S=%%a
set #=!S:~8,4!
>> !#!MSplit.txt echo.!S!
)
goto :eof
::====== script ends here =================

=====================
M2 Golden-Triangle


Report •

#10
February 17, 2013 at 02:39:28

Thank you for your script, Mechanix. Unfortunately when run it produced 3 files 0001MSplit.txt, 0002MSplit.txt and 0003MSplit.txt (the correct number and names for my sample) but these files contained no text other than the correct headers including those where there was to be concatenation within the file). Additionally, there were another 383 MSplit files whose names were prefixed with various fragments of text including spaces and punctuation and these files contained a mixture of words and lines of text. While processing CMD line showed "incorrect file name, directory or volume" a message which I suspect related to those files whose names included an illicit character.

I checked to make sure that I had entered the log name correctly and ran the batch file a few times but with the same outcome. I have to confess that with batch files I am at something of a loss and struggle to follow the more complex ones. Is there anything I have missed or could have done wrong. The batch file and log were the sole items in the directory. I also tried using the log's full path but that made no difference.

Thank you anyway for your interest


Report •

#11
February 17, 2013 at 04:11:23

If you have a file in the directory called MYFILE and it contains the lines below, not obvious why it won't work.

F:\Test\0001_01.txt
F:\Test\0002_01.txt
F:\Test\0002_02.txt
F:\Test\0002_03.txt
F:\Test\0003_01.txt
F:\Test\0003_02.txt
F:\Test\0004_01.txt
F:\Test\0005_01.txt
F:\Test\0005_02.txt

=====================
M2 Golden-Triangle


Report •

#12
February 17, 2013 at 11:33:45

Hello, M2. There's "stuff" (random text) between the "tag" lines that needs to be captured as file content. But still, I figure it can be batched as long as the text doesn't contain any knotty-buggers or said buggers are handled with care.

Report •

#13
February 18, 2013 at 01:21:35

nbrane

the "tag" lines

What does that mean?

=====================
M2 Golden-Triangle


Report •

#14
February 18, 2013 at 05:08:45

Nbrane, had a go at my own script again and finally got it to do what I wanted. I have to admit it is a bit "Heath Robinson" but it does work. The output files retain the same numerical reference as in the log and it will add it to any previous file however earlier. I tried to include a sort "on the hoof" but kept running into problems so settled on bubble sorting the arrays in the files in the directory on exit. I did think (and may revisit the idea) of identifying which files had been updated and sorting only these. Hopefully, the bubble sort delay in exiting won't be too time consuming - depends on how big the log grows I suppose but at least the output files are in numerical order by suffix.

I have attached a copy of my script and any critical feedback/ suggestions will be more than welcome from you). I haven't build in any error checking as yet (probably will in the future) but can you see anything obvious which might make it fall over? The checkpoints can obviously be removed but I left them in as a fellow newbie may find them helpful. As you have probably noticed I need to cultivate brevity somewhat!

Thank you again for your help and support!

[Code]
Const ForReading = 1
Const ForAppending = 8
Const ForWriting = 2

strPath = "D:\Test\Log\"
strPath2 = "D:\Test\Library\"
strFile1 = "Test.txt"
strFile2 = "MSplit.txt"
strDeLimit = "F:\Test\"

'Open File And Read Text File Into String
set fso = CreateObject("Scripting.FileSystemObject")
Set objTextfile = fso.OpenTextFile(strPath & strFile1,ForReading,False)
strContents=objTextFile.ReadAll

'Load String Into Array
OutPutArray = split(strContents,strDeLimit,-1,1)

'Check Point
WScript.Echo "Array Is Holding:" & vbCRLF & Join( OutPutArray, vbCrLf )

'Close File
objTextFile.Close

'Read Each Element of Array Into Memory for OutPut
For j = 1 to ubound(OutPutArray)


'Comparison string
strMatch = left(OutPutArray(j),4)

'Check Point
wscript.echo strMatch

'Tally Check - Could Use Some Method Of Checking If Sort Update Required
strMatchTally = strMatchTally & strMatch & "*"


'Pass Variable info to num
num = strMatch

'Open File For Appending
Set fso = CreateObject("Scripting.FileSystemObject")
Set objfile = fso.OpenTextFile(strPath2 & num & strFile2,ForAppending,True)

'Write Array Element To File
objFile.WriteLine "F:\Test\" & OutPutArray(j)

'Close File
objFile.Close

next

'Check Point
wscript.echo strMatchTally

'Sort Array Contents Of Directory Starts From Here

' Open Each File In Directory In Turn
Set objFSO = CreateObject("Scripting.FileSystemObject")
strComputer = "."
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
Set colFiles = objWMIService.ExecQuery _
("ASSOCIATORS OF {Win32_Directory.Name='D:\Test\Library'} Where " _
& "ResultClass = CIM_DataFile")
For Each objFile In colFiles

'Identify Name of File Being Processed
strFileName = objFile.Name

'Read In Content Of File
Set objFile = objFSO.OpenTextFile(objFile.Name, ForReading)
strContents = objFile.ReadAll

'Close File
objFile.Close

'Load String Into Array
HoldArray = split(strContents,strDeLimit,-1,1)

' Array Bubble Sort
For i = (UBound(HoldArray) - 1) to 0 Step -1
For j= 0 to i
If UCase(HoldArray(j)) > UCase(HoldArray(j+1)) Then
strHolder = HoldArray(j+1)
HoldArray(j+1) = HoldArray(j)
HoldArray(j) = strHolder
End If
Next
Next

' Check If Sort Worked
WScript.Echo "Sorted? " & Join( HoldArray, vbCrLf )

'Open File For Writing
Set fso = CreateObject("Scripting.FileSystemObject")
Set objfile = fso.OpenTextFile(strFileName,ForWriting,True)

'Read Each Element of Array Into Memory for OutPut
For k= 1 to ubound(HoldArray)

'Check if Variable has Info To Write To File
wscript.echo "Write this: " & holdArray(k)


'Write Array Delimiter And Elements To File
objFile.WriteLine strDelimit & HoldArray(k)
next


'Close File
objFile.Close

'Loop Round To Next File
next

'Exit VBS
Wscript.Quit

[/Code]



Report •

#15
February 18, 2013 at 10:20:05

@M2: I used "tags" to refer to the elements used to sequence the file:
F:\Test\0001_01.txt
F:\Test\0002_01.txt
F:\Test\0002_02.txt
F:\Test\0002_03.txt
but at this point I'm confused so I'm probably wrong (see next)
@Juniper
Hmm, I think I had the wrong picture in my feeble mind.
I thought the logfile looked like this:
F:\Test\0001_01.txtdatarandom text
xxxxxx
yyyyyyyyyyyyy
F:\Test\0002_01.txt
more random text
aaaaaaaaa
bbbb
99999
F:\Test\0002_02.txt
additional random text
ggg
hhhhhhhhhhhh
iiiiiii
F:\Test\0001_02.txt
last line of file

but looking at your code, where you sort each file, I must be mistaken, since the sort would be mixing in the "random" data between elements. I also made a bad assumption that the "sub-sequence" numbers (_01, _02) were in sequence even though the main numbers (0001, 0002) might not be, which would be the only reason for sorting, i guess. oh well, "water under bridge" department if your code is flying on the beam. Glad you got it working! :-)

Oh, not that it matters much, but "set fso" only needs done once in the script. I always put the "createobjects" together at the top but that's just my preference.


Report •

#16
February 18, 2013 at 12:24:44

@nbrane, thought I would just clarify a few details for you. You were mostly right. The File "tag" is always on a separate line. There is always some random text which can run over from one to a number of lines. There can also be gaps in the range of the "sub-sequence" numbers and these missing ones can appear in a later log as there are not supposed to be any gaps in the sequence.

If I pick you up rightly you are expressing surprise over the array sorting? Originally I thought that the actual "sort" of each element would include all of the "random" text it contained which would mess up things for me. However, I found that the bubble sort I used was sorting on the basis of the one line file "tag" only and that all the accompanying text remained unchanged. I don't know if its normal function is to sort everything single piece of text or if it has only turned out this way due to some error I've made inadvertently. In any case, it works the way I want it to.

Glad you told me about the "FSO" as I did wonder about that.

Grateful thanks for the feedback


Report •

#17
February 18, 2013 at 18:17:52

Ah, (lightbulb, dimly burning). Of course, you're sorting the array elements, not the file lines. Ok, since you built the match-tally, I thought I'd try using it. I also took out a for-next loop by using join. So, from here down, these were my mods:

'Sort Array Contents Of Directory Starts From Here

' Open Each Updated File based on var. StrMatchtally
'pause
n=wscript.stdin.readline

x=split(strmatchtally,"*")

for h=0 to ubound(x)-1
strFileName = x(h)&strfile2

wscript.echo "sorting: "&strfilename

'Read In Content Of File
strContents=FSO.OpenTextFile(strfileName, ForReading).readall

'Load String Into Array
HoldArray = split(strContents,strDeLimit,-1,1)

' Array Bubble Sort
For i = (UBound(HoldArray) - 1) to 0 Step -1
For j= 0 to i
If UCase(HoldArray(j)) > UCase(HoldArray(j+1)) Then
strHolder = HoldArray(j+1)
HoldArray(j+1) = HoldArray(j)
HoldArray(j) = strHolder
End If
Next 'j
Next 'i

' Check If Sort Worked
WScript.Echo strfilename&" Sorted? " & Join( HoldArray, vbCrLf )

'Open File For Writing
Set objfile = fso.OpenTextFile(strFileName,ForWriting,True)
objfile.write(join(holdarray,strdelimit))
objfile.close

'Loop Round To Next File
next 'h

'Exit VBS
Wscript.Quit

which eliminates sorting every file, as you intended. Anyway, at this point it's a wrap regardless, since your code is working fine.


Report •

#18
February 19, 2013 at 00:15:52

If you want to cat the files in the list:


:: ===== script starts here ===============
:: split to outfiles by ####
:: splitt2.bat 15:14 19 February 2013
@echo off & setLocal enableDELAYedeXpansioN

if exist *MSplit.txt del *MSplit.txt

for /f "tokens=* delims= " %%a in (myfile) do (
set S=%%a
set #=!S:~8,4!
>> !#!MSplit.txt type !S!
)
goto :eof
::====== script ends here =================

=====================
M2 Golden-Triangle


Report •

#19
February 19, 2013 at 08:10:26

@nbrane, the script with your modification produces "the handle is invalid" error 80070006 message on the line -> n = wscript.stdin.readline. I believe that this type of error would normally indicate that VBS was looking for cscript instead of wscript. However, given you "commented" pause on the line above, I wondered if there was a bit of the script missing - pause to ask if update should continue or not? Just wondering?

Thank you


Report •

#20
February 19, 2013 at 10:19:37

No, it's just the equivalent of a "pause" in batch, but it doesn't work as wscript code becsause wscript doesn't have access to the console. (I'd forgotten about that, since I most always use cscript). You can just take that line out. Sorry about the mixup.

Report •


Ask Question