Solved Remove spaces from text within quotes

October 26, 2019 at 09:09:23
Specs: Windows 7, 1.6 gb 4gb
Does anyone know of a vbscript which will remove just the leading and trailing spaces of text within quotes. I couldn't find anything online which used vbscript to do this. Plenty about just removing the quotes etc. I have tried several ways of doing this without success. I couldn't find any way of distinguishing between opening quotes and closing ones as the pattern is the same (space "space ). I have been asked to clean up some text but have no idea how such an extraneous style of spacing came to be used. It seems that when the text formatted a space was inserted immediately before and after the text between quotes and a space preceeds a colon, semi-colon, exclamation mark and a question mark. Full stops and commas are not affected. A simple sample follows but obviously where punctuation gets in the way it becomes awkward.

What I have got is: now is the " time for all good men " to come " to the aid " of the party

What I need is: now is the "time for all good men" to come "to the aid" of the party

I am not conversant with Powerscript (although I ought to learn it) so have been falling back on vbscript. Some of the files have nearly a mb of text so it would be a soul destroying exercise to do it manually or even use "find and replace" as the latter would not cover all instances. If anyone can help I would be grateful.

Thank you


See More: Remove spaces from text within quotes

Reply ↓  Report •

✔ Best Answer
October 29, 2019 at 09:43:38
Give a try for this modified code :

Option Explicit
Dim Data
Call ForceCScriptExecution()
Data = ReadFile("C:\Test\content.txt")
wscript.echo Data
wscript.echo String(50,"-")
wscript.echo Search_Replace(Data)
wscript.sleep 20000
'-----------------------------------------------
Function Search_Replace(Data)
Dim oRegExp,strPattern1,strPattern2
Dim strReplace1,strReplace2,strResult1,strResult2
strPattern1 = "[\s+]([,;:!?\. ])"
strReplace1 = "$1"
strPattern2 = "(\x22\s+|\x22)\b([^\x22]+)\b(\s+\x22|\x22)"
strReplace2 = chr(34) & "$2" & chr(34)
Set oRegExp = New RegExp
oRegExp.Global = True 
oRegExp.IgnoreCase = True 
oRegExp.Pattern = strPattern1
strResult1 = oRegExp.Replace(Data,strReplace1)
oRegExp.Pattern = strPattern2
strResult2 = oRegExp.Replace(strResult1,strReplace2)
Search_Replace = strResult2
End Function
'-----------------------------------------------
Function ReadFile(path)
    Const ForReading = 1
    Dim objFSO,objFile
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.OpenTextFile(path,ForReading)
    ReadFile = objFile.ReadAll
    objFile.Close
End Function
'----------------------------------------------
Sub ForceCScriptExecution()
    Dim Arg, Str, cmd, Title
	Title = "Search and Replace using RegExp by Hackoo 2019"
	cmd = "CMD /C Title " & Title &" & color 0A & Mode 80,30 & "
    If Not LCase( Right( WScript.FullName, 12 ) ) = "\cscript.exe" Then
        For Each Arg In WScript.Arguments
            If InStr( Arg, " " ) Then Arg = """" & Arg & """"
            Str = Str & " " & Arg
        Next
        CreateObject( "WScript.Shell" ).Run _
           cmd & "cscript //nologo """ & _
            WScript.ScriptFullName & _
            """ " & Str
        WScript.Quit
    End If
End Sub
'-----------------------------------------------

message edited by Hackoo



#1
October 26, 2019 at 12:27:15
By any chance is this a Word Document that uses Smart Quotes?

You can differentiate between a left & right smart quotes,
so it should be possible to make the corrections you want.

MIKE

http://www.skeptic.com/

message edited by mmcconaghy


Reply ↓  Report •

#2
October 26, 2019 at 20:32:51
Assuming the worst-case ie "dumb quotes", and assuming the initial "word" is outside quotes, then this just does "every-other" space-stripping. ALSO assumes there's not two quoted fields with nothing betwixt, like "a""b "
You could possibly do a replace of "" with " " to fix the latter...
This is very klunky first attempt/prototype and there's probly a million better ways to do.

'begin vbscript
testfile="testfile"
Set fso = CreateObject("Scripting.FileSystemObject")
arritems = split(fso.OpenTextFile(testfile, 1).readall,chr(34))
for i=0 to ubound(arritems)
if i mod 2 = 1 then wscript.stdout.write chr(34)ltrim(rtrim(arritems(i)))&chr(34) else wscript.stdout.write arritems(i)
next

message edited by nbrane


Reply ↓  Report •

#3
October 27, 2019 at 06:07:03
Hi Mike,
These are definitely just plain text files using ascii 34 type quotes but thanks for your interest




Reply ↓  Report •

Related Solutions

#4
October 27, 2019 at 06:18:48
Hi nbrane,
Thank you for your interest. I've been grateful for your help in the past although it is quite a long time ago! Unfortunately I couldn't get your script to run, I added in "End if" but other than that I wasn't sure about line 6 as "<rim" looks like a possible typo. Can you explain?

Thank you


Reply ↓  Report •

#5
October 27, 2019 at 19:20:50
Yeah, I saw the typo and thought I fixed that in my post. sorry! The "end if", although appearing to be needed, is not because the statement is all on one line. "<rim" was supposed to be "Ltrim". I'm just trimming all left and right spaces from the content of every other quoted item, (assuming that the 2nd, 4th, 6th etc are all inside quotes.). My bad for not "reflecting" my post back for local testing!

Reply ↓  Report •

#6
October 28, 2019 at 02:04:11
HI, nbrane, sorry to have to come back to you but the script still doesn't run. It throws up an error message for 6.49 (which is ltrim) - expected end of statement.

Reply ↓  Report •

#7
October 28, 2019 at 18:30:04
Not to be sorry, it's my fault entirely. I keep missing typos, so I'll post the script "fresh from the testing-ground". (Last round, I left out the ampersand line 6).
Set fso = CreateObject("Scripting.FileSystemObject")
testfile="testfile"
arritems = split(fso.OpenTextFile(testfile, 1).readall,chr(34))
for i=1 to ubound(arritems)+1
if i mod 2 = 0 then wscript.stdout.write chr(34)<rim(rtrim(arritems(i-1)))&chr(34) else wscript.stdout.write arritems(i-1)
next

Note I also changed a couple of other minor items, but this one is, as advertised, straight from the test-track. fwiw here's the junk I used as "testfile":
a " this test " b " ok " "does this even work?" not a prob
"line two ",next line " testing" out of quotes " goodbye and end of test "

Output:
a "this test" b "ok" "does this even work?" not a prob
"line two",next line "testing" out of quotes "goodbye and end of test"

Thanks for your patience. I just keep missing things - my eyesight and concentration are both sinking fast!

message edited by nbrane


Reply ↓  Report •

#8
October 29, 2019 at 01:42:42
nbrane, it's me again! The script still doesn't run. You've got "<rim" back in again but even when I cjhange that to ltrim it rejects the ltrim with "expected end of statement". When you correct it, I would be very grateful if you would explain how the modulus coding you used works.

Thank you


Reply ↓  Report •

#9
October 29, 2019 at 05:06:31
You can give a try with this code below using Regex : Demo Here

Data = ReadFile("C:\Test\content.txt")
wscript.echo Data
wscript.echo Search_Replace(Data)
'-----------------------------------------------
Function Search_Replace(Data)
Dim strPattern, strReplace, strResult,oRegExp
strPattern = "\x22\s([^\x22]+)\s\x22"
strReplace = chr(34) & "$1" & chr(34)
Set oRegExp = New RegExp
oRegExp.Global = True 
oRegExp.IgnoreCase = True 
oRegExp.Pattern = strPattern
strResult = oRegExp.Replace(Data,strReplace)
Search_Replace = strResult
End Function
'-----------------------------------------------
Function ReadFile(path)
    Const ForReading = 1
    Dim objFSO,objFile
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.OpenTextFile(path,ForReading)
    ReadFile = objFile.ReadAll
    objFile.Close
End Function
'----------------------------------------------

message edited by Hackoo


Reply ↓  Report •

#10
October 29, 2019 at 05:53:41
Hi, Hakoo, glad you're still around as you have also helped me out in the past! I had thought of using regex as it is frequently more about text position rather than the text character involved. Sadly I am seriously remiss in my knowledge of regex apart from something dead simple. The sample I have provided below represents the kind of text formatting which some "dipstick" has used and I am trying to modify. I ran your script against it and it converted the first three instances ok. It didn't work for the other four situations of whose existence you might not have been aware. I would be grateful if you could work your magic on these also!

Thank you

now is the " time "
now is the " time ",
now is the " time ".
now is the " time " ;
now is the " time " :
now is the " time " !
now is the " time " ?


Reply ↓  Report •

#11
October 29, 2019 at 06:12:28
Ok, Tell me if this what you want to get as Result ?

Here is the code modified :

Data = ReadFile("C:\Test\content.txt")
wscript.echo Data
wscript.echo Search_Replace(Data)
'-----------------------------------------------
Function Search_Replace(Data)
Dim strPattern, strReplace, strResult,oRegExp
strPattern = "(\x22\s+|\x22)\b([^\x22]+)\b(\s+\x22|\x22)"
strReplace = chr(34) & "$2" & chr(34)
Set oRegExp = New RegExp
oRegExp.Global = True 
oRegExp.IgnoreCase = True 
oRegExp.Pattern = strPattern
strResult = oRegExp.Replace(Data,strReplace)
Search_Replace = strResult
End Function
'-----------------------------------------------
Function ReadFile(path)
    Const ForReading = 1
    Dim objFSO,objFile
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.OpenTextFile(path,ForReading)
    ReadFile = objFile.ReadAll
    objFile.Close
End Function
'----------------------------------------------


Reply ↓  Report •

#12
October 29, 2019 at 07:06:20
Hi, Hakoo, your modification still leaves a space between the closing quote and the punctuation in the last four of the sample ie it should be:

......";
not ....." ; etc.

Thank you


Reply ↓  Report •

#13
October 29, 2019 at 08:10:23
@JuniperGreen
Can you upload a real example of your inputfile and the outputfile that you want to get ?

Reply ↓  Report •

#14
October 29, 2019 at 09:02:44
Hi, Hakoo,

Can't find out how to upload the actual plain text files. The edit facilty does not seem to support doing so. If it helps, here are the contents.

Input file TestFile2.txt - Text to be corrected
now is the " time for all "
now is the " time for all ",
now is the " time for all ".
now is the " time for all " ;
now is the " time for all " :
now is the " time for all " !
now is the " time for all " ?

Output file TestFile3.txt - Corrected text
now is the "time for all"
now is the "time for all",
now is the "time for all".
now is the "time for all";
now is the "time for all":
now is the "time for all"!
now is the "time for all"?

Thank you


Reply ↓  Report •

#15
October 29, 2019 at 09:43:38
✔ Best Answer
Give a try for this modified code :

Option Explicit
Dim Data
Call ForceCScriptExecution()
Data = ReadFile("C:\Test\content.txt")
wscript.echo Data
wscript.echo String(50,"-")
wscript.echo Search_Replace(Data)
wscript.sleep 20000
'-----------------------------------------------
Function Search_Replace(Data)
Dim oRegExp,strPattern1,strPattern2
Dim strReplace1,strReplace2,strResult1,strResult2
strPattern1 = "[\s+]([,;:!?\. ])"
strReplace1 = "$1"
strPattern2 = "(\x22\s+|\x22)\b([^\x22]+)\b(\s+\x22|\x22)"
strReplace2 = chr(34) & "$2" & chr(34)
Set oRegExp = New RegExp
oRegExp.Global = True 
oRegExp.IgnoreCase = True 
oRegExp.Pattern = strPattern1
strResult1 = oRegExp.Replace(Data,strReplace1)
oRegExp.Pattern = strPattern2
strResult2 = oRegExp.Replace(strResult1,strReplace2)
Search_Replace = strResult2
End Function
'-----------------------------------------------
Function ReadFile(path)
    Const ForReading = 1
    Dim objFSO,objFile
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.OpenTextFile(path,ForReading)
    ReadFile = objFile.ReadAll
    objFile.Close
End Function
'----------------------------------------------
Sub ForceCScriptExecution()
    Dim Arg, Str, cmd, Title
	Title = "Search and Replace using RegExp by Hackoo 2019"
	cmd = "CMD /C Title " & Title &" & color 0A & Mode 80,30 & "
    If Not LCase( Right( WScript.FullName, 12 ) ) = "\cscript.exe" Then
        For Each Arg In WScript.Arguments
            If InStr( Arg, " " ) Then Arg = """" & Arg & """"
            Str = Str & " " & Arg
        Next
        CreateObject( "WScript.Shell" ).Run _
           cmd & "cscript //nologo """ & _
            WScript.ScriptFullName & _
            """ " & Str
        WScript.Quit
    End If
End Sub
'-----------------------------------------------

message edited by Hackoo


Reply ↓  Report •

#16
October 29, 2019 at 09:59:50
Hi, Hakoo,

You've cracked it, so very many grateful thanks to you for your superb help again!

Thank you


Reply ↓  Report •

#17
October 29, 2019 at 10:02:03
Hi, nbrane,

As you will see Hakoo has provided a solution but thank you anyway for your interest and help to the point reached.

Kind regards


Reply ↓  Report •

#18
October 29, 2019 at 19:50:03
Just for the record, and my own sanity, ampersandLT gets converted to the "less-than" symbol by the forum, even inside pre tags. I thought I was going batty. Inside pre it should not do that imo.
Good work Hackoo, regex is not an easy medium.
I did not know that there were unwanted spaces in the "mid-quotes", so my effort fails anyhow without some regex. Glad the forum came through for the Juniper.

message edited by nbrane


Reply ↓  Report •

Ask Question