Solved Highlight Repeated Paragraphs Within The Same Text File

May 30, 2020 at 08:13:24
Specs: Windows 7, 1.6 gb 4gb
When in the process of writing long reports which call for a lot of consideration and time and patience, I regularly catch myself thinking about how to phrase some part of it while sitting on the bus or even when not really engaged in work. I will then quickly set down my thoughts on my Psion 5mx, yes, you did read that correctly! It still works perfectly after all this time. Never lets me down. Ojnce back at my report I will send my "thoughts/ inspirations" from my Psion 5mx to my PC and embody them at some appropriate location in my report. I may then draw on this, or not, as the case may be.

However, I find I am prone to forgetting to delete these additional "thoughts" and some of them remain buried in the body of the report - somewhere! What I am looking for is a utility which will filter the text of my report and highlight every duplicate paragraph it finds. I have hunted everywhere on the net but have been unable to find one which will do this. Not sure if it's ok to mention software names so I won't but there are many which will highlight duplicate phrases or sentences or words and undertake many other fancy text analysis including plagiarism/ use of cliches. There are a goodly number of comparison utilities and text editors around but they are based on the comparison of two files whereas I need it to be within the same file. So unfortunately none of these provide the simple identification of duplicate paragraphs! I would use VBA, and there are many examples of its use (how successful they are I don't know) but I no longer have MS Office so that option isn't open to me. I had a look under Linux utilities and maybe it could be done using Awk or Sed but I couldn't find out how - again it is all about identifying sentences or phrases. There was nothing under my old faithful vbscript (probably beyond its capabilities) or even its replacement Powershell.

If anyone knows of a utility or a script which can meet my need I would be grateful.if they would let me know. I don't care how slow running it might be but it would be better than laboriously reading through a report trying to identify duplicates. However many times you read a report it is easy to miss something.

Apologies for being so long winded but I needed to explain my problem properly

Thank you

message edited by JuniperGreen


See More: Highlight Repeated Paragraphs Within The Same Text File

Reply ↓  Report •

✔ Best Answer
June 6, 2020 at 22:04:18
Here's a prototype that seems to work. The "presentation" and followup are up for grabs, but this delivers the gist:
Set objFSO=CreateObject("Scripting.FileSystemObject")
'Set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("D:\DownloadsD\Ipsum.txt",1)
set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("ju",1)
strFileText = objFileToRead.ReadAll()

wscript.echo strFileText

OutPutFile = "D:\DownloadsD\UCOutPutFile.txt"

a = Split(strFileText, vbCrLf & vbCrLf)

arrCount = uBound(a)

p=0
alrdon=""

for i=0 to arrCount
yy=instr(strFileText,a(i))
pp=yy+len(a(i))+1
cc=1

list=i+1
for j=i+1 to arrCount

if a(j)=a(i) then
q=instr(alrdon," "&j&" ")
if q>0 then exit for
p=instr(pp,strFileText,a(j))
pp=p+1
'add into list of 'already found' to be skipped hence
alrdon=alrdon&" "&i&" "&j&" "
list=list&" "&(j+1)
cc=cc+1
yy=yy&" "&p
end if
next


if cc>1 then
wscript.echo "Paragraphs: "&list
wscript.echo "Text positions: "&yy
u=wscript.stdin.readline
end if 
next

I would think that each distinct item needs to be handled in a "session" in itself. Another, much simpler way, would be take the first few bytes of each "thought" and use "find" on the text, which will give you the positions, but not the paragraph ID.

message edited by nbrane



#1
May 30, 2020 at 10:07:29
You can search for a string using powershell, but it requires that the string is identical..

Get-ChildItem -Path "Enter-Folder-Path-Here" | Select-String "Search for a sentence here" -List | Select Filename, Path, LineNumber, Line | Format-List

This doesnt work with .docx, .xlsx etc.. I didnt test with other files, but .txt works great.

Edit:
Maybe I misunderstood when I read the post.. If you just want to search the same .txt file for repeated content you could just use notepad. Click CTRL+F and enter the string you want to search for. I would recoment Notepad++, it will count the number of matches and highlight them for you.

message edited by Kilavila


Reply ↓  Report •

#2
May 30, 2020 at 10:25:17
Another thing you can try with powershell:

Get-Content -Path "folder\file.txt" | Sort | Get-Unique | Out-File "folder\new-file.txt"

This will get the unique content from a .txt file and write it out to a new file, only downside with this oneliner is that the content some times comes out in the wrong order. It depends on which of the duplicate paragraphs powershell decides to keep, and its not always the first.

Edit:
Did a bit more testing and found this:

$Hash = @{}
Get-Content -Path "folder\file.txt" | % { if ($Hash.$_ -eq $Null) { $_ }; $Hash.$_ = 1 } | Out-File "folder\new-file.txt"

This will keep the content in the correct order, i dont know how or why it works yet but it works..

message edited by Kilavila


Reply ↓  Report •

#3
May 30, 2020 at 12:54:50
Hi Kilavila, thank you for such a prompt response. Firstly, one problem, as you obviously appreciated, was that I couldn't do a "key" search as I would have no way of knowing what paragraphs to search for. Secondly, I also note that peculiarly the output doesn't always leave a blank line between paragraphs. I came across the same problem when messing around with the DOS and the Linux "sort". The downside is that with a long report this calls for substantial editing which is something I'm trying to avoid.

Lastly, your final post seems to be the one that identifies the paragraphs perfectly though I need to play around with it to see that it works every time. The only snag, and I hate to say this, is that I am only looking for ALL identical paragraphs just to be highlighted without any deletions being made. Can I trade on your goodwill and ask if you can amend your coding so that it just highlights the identical paragraphs.

Thank you for your time and support!

.


Reply ↓  Report •

Related Solutions

#4
May 30, 2020 at 13:32:42
No problem, but im not that advanced at powershell yet.
I havent done this type of thing before, and im sure theres a somewhat simple solution to this.

I will have a look at it and share a solution i find, but could take time as im still learning.
Let me know if you find a solution.


Reply ↓  Report •

#5
May 30, 2020 at 18:49:20
For "highlighting", maybe use uppercase (since no one usually writes in all uppers unless they're very mad about something.) This would make it easy to "find" both in text-search and visually (although annoying to read!).
As for acquiring the targets, I'm kind of unsure. If your paragraphs are, as per tradition, delimited by new-line+spaceX5, then they could be split out, into subfiles or just long-string variables. For each, first do a size-match against the template or target, if there is one. If size matches, (or at least very close), then do the comparison. If you want ALL duplicate paragraphs, regardless of content, highlighted, then you would need more sophisticated coding, but vbscript could still probly handle it. Split the file into an array of string variables based on the NL+5 spaces. Starting from [1], for each, "compare" and highlight if match... I'm not very sure about the final product you want, so I'll leave it at this. My assumption was you were working on one "thought", or parargraph, at a time, which would be the target. That would be best, but if you want "all duplicates of any paragraph", then it hits the hard stuff and how to deal with it, IE how you want it presented for editing purposes.
F/e, targeted to bbbb:
aaa
BBBB
ccccc
aaa
dd
BBBB

Or overall:
AAA
BBBB
ccccc
AAA
dd
BBBB


I'm thinking in terms of vbscript mostly, but powershell probly has the most horsepower.

message edited by nbrane


Reply ↓  Report •

#6
May 31, 2020 at 02:46:47
I'm not bothered about "horsepower" or how long I have to wait until vbscript runs through the text and I would agree that uppercasing all the duplicated paragraphs will be just as effective as highlighting them. My own thoughts were as yours that perhaps on second thoughts vbscript might not be a bad idea after all as I agree the text file could be read into an array and each paragraph (element) Instr'd against the whole text. That latter bit I couldn't figure out how to do - how to check each successive paragraph.

By the way, Kilavila, if you're still learning PowerShell you certainly have acquired a lot of skill so far!

Thank you

message edited by JuniperGreen


Reply ↓  Report •

#7
May 31, 2020 at 05:05:43
Thanks!

I was thinking of making a foreach loop that checks each line in the file against the other lines, and if any of them matches it will either log the duplicate with linenumber. Or placing [Duplicate] infront of the paragraph etc..

Not sure how im gonna go about this tho, already tried a few things but got errors so far.


Reply ↓  Report •

#8
May 31, 2020 at 05:32:46
Have been trying to work with Instr to search the text using each array subscript. The trouble is that this will always result in one instance being found as the subscript is included in the whole text so that doesn't help me!

Reply ↓  Report •

#9
May 31, 2020 at 07:11:33
I apologize if I am missing something obvious to point out
something even more obvious, but you should only have to
search forward from the point where the test string appears.
No need to compare the string to the whole text, because
you have already compared it to everything earlier.

Also a question: Why do you expect entire paragraphs to be
exact duplicates? When I add in text like that, I almost always
end up changing the wording at least slightly.

Also, my paragraphs saying the same thing are almost always
very close to each other, not randomly scattered throughout
the document. I'm a bit surprised that you wouldn't know at
least roughly where in the document you would have placed
a new thought.

Both the questions and answers in this thread are really good!

-- Jeff, in Minneapolis


Reply ↓  Report •

#10
May 31, 2020 at 07:53:51
Here is a link that might be of interest.
Several suggestions including a batch option.

https://www.raymond.cc/blog/remove-...

MIKE

http://www.skeptic.com/


Reply ↓  Report •

#11
May 31, 2020 at 08:18:51
Here is an interesting AWK script:

"To remove the duplicate lines while preserving their order in the file, use:"

awk '!visited[$0]++' your_file > deduplicated_file

See this link for an explanation of how it works:

https://opensource.com/article/19/1...

MIKE

http://www.skeptic.com/


Reply ↓  Report •

#12
May 31, 2020 at 19:03:37
I'm kind of frustrated, at this point. The problem is solveable, but only if the layout, or format of the solution is defined, and the approach is defined. All "duplicate paragraphs" could be pegged, but without form of organization, the output is meaningless: a list of indexes: 3 5 7 8 22 24. Does not tell us that 3, 7, 22 are twins nor that 5,8,24 are twins. What I'm getting at is you need a collection of "thoughts" before you start the analysis. THEN, for each "thought", you build a list or an output that shows where they are duplicated. Also you need a delimiter, which I'm still assuming is 5 spaces starting a new line of text. Here's some junk pseudocode that illustrates the above:

pseudocode:
z=split(readall(file),vbcrlf+" ")
for i=0 to ubound(z)
curr_index=i
content=z(i)
for j=0 to ubound(z)
if j<>cur_index then if j (not_in_list) then if z(j)=content then add_list i,j
next j
next I
----------
and you get just the meaningless list as described above altho I did add the identifier.

message edited by nbrane


Reply ↓  Report •

#13
June 1, 2020 at 08:35:47
Here is my feeble attempt so far. It's cobbled together more like a Heath Robinson effort than vbscript but I was just having a go following my thinking rather than vbscript syntax. Nbrane, in a previous post, I think you once commented on my lack of proper indentation so maybe you could wink at this effort? I couldn't get it to work, but I forgot my laptop was not plugged in and the battery went flat and the screen black. I was just about to save the script at that moment but when I got up and running again the bit I had been working on was missing. Unlike me as I constantly save my work. So I was left trying to remember what I had. But when I was up and running and ran the script it worked (with limitations) yet I don't know what I had changed to make it run. So far so good but I need your help to finalise it as I'm stuck again.

Apart from a revised version of my spaghetti coding I need additional coding to achieve the following:

1 Display only those paras with more than one match (all paras currently have at least one match and are displayed)

2. Incorporate the "Pos" data for each replicated para (in this example those paras would be at Pos = 552, 1736 and 3522)

3. If there are two or more different paras which have been replicated, this script will only work on the highest indexed one

4. It has to have a text file output as well since there is too much to fit on the screen and duplicates could be widely spaced.

Hope this is not against the rules but here is the script with the sample text. (UCDuprParas.vbs, Ipsum.txt). I hunted around the site but could find no instructions on how to attach files to a posting


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam facilisis, mauris sit amet gravida commodo, diam massa malesuada purus, et lacinia magna libero at eros. Maecenas fermentum leo quis lacus tincidunt, sed pretium ligula fermentum. Donec vel tellus sed magna dignissim elementum. Sed a magna eros. Phasellus sed semper massa, vitae luctus odio. Etiam convallis est eget aliquet
gravida. Nam a venenatis turpis. Ut massa leo, porta id sapien eget, iaculis euismod est. Vivamus placerat consequat lectus sed aliquet. Suspendisse potenti.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor apien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Phasellus ac massa mattis, gravida erat sed, dignissim leo. Duis in maximus diam. Donec ut laoreet quam. Sed finibus id neque et vestibulum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent a efficitur libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed porttitor auctor enim, tincidunt faucibus mauris viverra eu. Ut
ex nisi, convallis quis dolor nec, porttitor tempor neque.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Ut dapibus vitae nisl nec dictum. Phasellus sagittis, magna id semper tempus, sem lectus posuere odio, dapibus commodo sem ipsum et augue. Ut vel libero quam. Etiam nec interdum sapien. Vestibulum convallis vitae dui non ultrices. Curabitur arcu massa, tincidunt at egestas quis, venenatis eget mi. Vivamus vehicula vitae nisi vel iaculis. Fusce porttitor elit non nulla varius semper. Fusce lementum turpis dignissim nulla laoreet, nec porttitor ante laoreet. Nulla tempor sit amet lectus sed posuere.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem
fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor apien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.


Const ForReading = 1
Const ForWriting = 2

Set objFSO=CreateObject("Scripting.FileSystemObject")
Set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("D:\DownloadsD\Ipsum.txt",1)
strFileText = objFileToRead.ReadAll()

'wscript.echo strFileText

OutPutFile = "D:\DownloadsD\UCOutPutFile.txt"

a = Split(strFileText, vbCrLf & vbCrLf)
arrCount = uBound(a) + 1

for n = 0 to arrCount-1
start = 1

Do

pos = InStr(start, strFileText, a(n), vbTextCompare)
If pos > 0 Then

start = pos + Len(a(n))
WScript.Echo "Match Position " & pos& ":" & " " & Mid(strFileText, pos, Len(a(n)))

End If
k = a(n)

call CountSubString(strFileText,k)

Loop While pos > 0

next

Function CountSubstring(strFileText,k)
CountSubstring = 0
For i = 1 To Len(strFileText)
If Len(strFileText) >= Len(k) Then
If InStr(i,strFileText,k) Then
CountSubstring = CountSubstring + 1
i = InStr(i,strFileText,k) + Len(k) - 1
End If
Else
Exit For
End If
Next
End Function
wscript.echo CountSubstring(strFileText,k) & vbCrLf
if CountSubstring(strFileText,k) > 1 then
wscript.echo "Paragraph Replicated " & CountSubstring(strFileText,k) & " Times" & vbCrLf & vbCrLf & k
end if

strFileText2 = replace(strFileText,k,ucase(K))
wscript.echo "Replicated Paragraphs In Uppercase" & vbCrLf & vbCrLf & strFileText2

Set objFile = objFSO.CreateTextFile(OutPutFile,2,True)
objFile.Write "Paragraphs Replicated " & CountSubstring(strFileText,k) & " Times" & " Are In UpperCase" & vbCrLf & vbCrLf &

strFileText2
objFile.Close

objFileToRead.Close
Set objFileToRead = Nothing



Reply ↓  Report •

#14
June 1, 2020 at 12:49:52
Sorry folks seems like I've messed up with the cut and paste so will have to review the last posting. Better still can someone tell me how to attach the files to the posting?

message edited by JuniperGreen


Reply ↓  Report •

#15
June 1, 2020 at 13:15:03
Yeah, the posting messed up the formatting of the text file. The script should run OK now. The 6th line of the script \Ipsum.txt",1) belongs with the 5th line but unfortunately the site formatting has carried it forward to the 6th line

Ipsum.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam facilisis, mauris sit amet gravida commodo, diam massa malesuada purus, et lacinia magna libero at eros. Maecenas fermentum leo quis lacus tincidunt, sed pretium ligula fermentum. Donec vel tellus sed magna dignissim elementum. Sed a magna eros. Phasellus sed semper massa, vitae luctus odio. Etiam convallis est eget aliquet gravida. Nam a venenatis turpis. Ut massa leo, porta id sapien eget, iaculis euismod est. Vivamus placerat consequat lectus sed aliquet. Suspendisse potenti.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Phasellus ac massa mattis, gravida erat sed, dignissim leo. Duis in maximus diam. Donec ut laoreet quam. Sed finibus id neque et vestibulum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent a efficitur libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed porttitor auctor enim, tincidunt faucibus mauris viverra eu. Ut ex nisi, convallis quis dolor nec, porttitor tempor neque.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Ut dapibus vitae nisl nec dictum. Phasellus sagittis, magna id semper tempus, sem lectus posuere odio, dapibus commodo sem ipsum et augue. Ut vel libero quam. Etiam nec interdum sapien. Vestibulum convallis vitae dui non ultrices. Curabitur arcu massa, tincidunt at egestas quis, venenatis eget mi. Vivamus vehicula vitae nisi vel iaculis. Fusce porttitor elit non nulla varius semper. Fusce elementum turpis dignissim nulla laoreet, nec porttitor ante laoreet. Nulla tempor sit amet lectus sed posuere.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.



Reply ↓  Report •

#16
June 1, 2020 at 21:29:25
Hi Juniper. I have to work tomorrow. If not solved, I'll try a solve. Now I know you have "blank line" between paragraphs, that will help.
I think this works for pasting into on the forum: use <PRE> ahead of the text, and </PRE>at end of text. This should preserve the format you wish to present.
I'll see what happens tomorrow and try to help. I think you're almost there, if not already. You've given me plenty to work with!


message edited by nbrane


Reply ↓  Report •

#17
June 2, 2020 at 02:55:32
nbrane, you were right, using the "pre" doesn't preserve the formatting on the site page but when that is copied to notepad or something similar the formatting seems to be OK there.

Script

Const ForReading = 1
Const ForWriting = 2

Set objFSO=CreateObject("Scripting.FileSystemObject")
Set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("D:\DownloadsD\Ipsum.txt",1)
strFileText = objFileToRead.ReadAll()

'wscript.echo strFileText

OutPutFile = "D:\DownloadsD\UCOutPutFile.txt"

a = Split(strFileText, vbCrLf & vbCrLf)
arrCount = uBound(a) + 1

for n = 0 to arrCount-1
start = 1

Do

  pos = InStr(start, strFileText, a(n), vbTextCompare)
  If pos > 0 Then
  
start = pos + Len(a(n))
    WScript.Echo "Match Position " & pos& ":" & "  " & Mid(strFileText, pos, Len(a(n)))
   
  End If
k = a(n)

call CountSubString(strFileText,k)

Loop While pos > 0

next

Function CountSubstring(strFileText,k)
	CountSubstring = 0
	For i = 1 To Len(strFileText)
		If Len(strFileText) >= Len(k) Then
			If InStr(i,strFileText,k) Then
				CountSubstring = CountSubstring + 1
				i = InStr(i,strFileText,k) + Len(k) - 1
			End If
		Else
			Exit For
		End If
	Next
End Function
wscript.echo CountSubstring(strFileText,k) & vbCrLf
 if CountSubstring(strFileText,k) > 1 then 
wscript.echo "Paragraph Replicated " & CountSubstring(strFileText,k) & " Times" & vbCrLf & vbCrLf & k
end if

strFileText2 = replace(strFileText,k,ucase(K))
wscript.echo "Replicated Paragraphs In Uppercase" & vbCrLf & vbCrLf & strFileText2

Set objFile = objFSO.CreateTextFile(OutPutFile,2,True)
objFile.Write "Paragraphs Replicated " & CountSubstring(strFileText,k) & " Times" & " Are In UpperCase" & vbCrLf & vbCrLf & strFileText2
objFile.Close

objFileToRead.Close
Set objFileToRead = Nothing


Sample Text

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam facilisis, mauris sit amet gravida commodo, diam massa malesuada purus, et lacinia magna libero at eros. Maecenas fermentum leo quis lacus tincidunt, sed pretium ligula fermentum. Donec vel tellus sed magna dignissim elementum. Sed a magna eros. Phasellus sed semper massa, vitae luctus odio. Etiam convallis est eget aliquet gravida. Nam a venenatis turpis. Ut massa leo, porta id sapien eget, iaculis euismod est. Vivamus placerat consequat lectus sed aliquet. Suspendisse potenti.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Phasellus ac massa mattis, gravida erat sed, dignissim leo. Duis in maximus diam. Donec ut laoreet quam. Sed finibus id neque et vestibulum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent a efficitur libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed porttitor auctor enim, tincidunt faucibus mauris viverra eu. Ut ex nisi, convallis quis dolor nec, porttitor tempor neque.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Ut dapibus vitae nisl nec dictum. Phasellus sagittis, magna id semper tempus, sem lectus posuere odio, dapibus commodo sem ipsum et augue. Ut vel libero quam. Etiam nec interdum sapien. Vestibulum convallis vitae dui non ultrices. Curabitur arcu massa, tincidunt at egestas quis, venenatis eget mi. Vivamus vehicula vitae nisi vel iaculis. Fusce porttitor elit non nulla varius semper. Fusce elementum turpis dignissim nulla laoreet, nec porttitor ante laoreet. Nulla tempor sit amet lectus sed posuere.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie. 

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.


message edited by JuniperGreen


Reply ↓  Report •

#18
June 3, 2020 at 05:36:35
nbrane, in case you are wondering why I switched from a(n) (para being searched for) to k - it was because a(n) was not acceptable as part of the function name. When I switched that to k there was no problem so I just carried on using that.

Reply ↓  Report •

#19
June 6, 2020 at 22:04:18
✔ Best Answer
Here's a prototype that seems to work. The "presentation" and followup are up for grabs, but this delivers the gist:
Set objFSO=CreateObject("Scripting.FileSystemObject")
'Set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("D:\DownloadsD\Ipsum.txt",1)
set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("ju",1)
strFileText = objFileToRead.ReadAll()

wscript.echo strFileText

OutPutFile = "D:\DownloadsD\UCOutPutFile.txt"

a = Split(strFileText, vbCrLf & vbCrLf)

arrCount = uBound(a)

p=0
alrdon=""

for i=0 to arrCount
yy=instr(strFileText,a(i))
pp=yy+len(a(i))+1
cc=1

list=i+1
for j=i+1 to arrCount

if a(j)=a(i) then
q=instr(alrdon," "&j&" ")
if q>0 then exit for
p=instr(pp,strFileText,a(j))
pp=p+1
'add into list of 'already found' to be skipped hence
alrdon=alrdon&" "&i&" "&j&" "
list=list&" "&(j+1)
cc=cc+1
yy=yy&" "&p
end if
next


if cc>1 then
wscript.echo "Paragraphs: "&list
wscript.echo "Text positions: "&yy
u=wscript.stdin.readline
end if 
next

I would think that each distinct item needs to be handled in a "session" in itself. Another, much simpler way, would be take the first few bytes of each "thought" and use "find" on the text, which will give you the positions, but not the paragraph ID.

message edited by nbrane


Reply ↓  Report •

#20
June 11, 2020 at 09:27:39
Hi, nbrane, thank you for persevering in a quest for a solution. However, when I ran the script it came up with an error message at line 42.1 (u=wscript.stdin.readline) - "the handle is invalid". Any idea what has caused that? Although the string "u" is in that line it doesn't appear anywhere else in the script.

I did think about using search but what cound I enter in the search field as I would have no idea after a considerable time elapse what my "thought" had been about weeks earlier! I had hoped that where there was replication of more than one paragraph (ie paragraphs with different text) their identification could be achieved but the limitations of vbscript seem to dictate otherwise. Difficult enough with multi replication of one paragraph. In the version I had a go at, I had hoped to do that but indicate the different text paragraphs in different colored text. Silly me, vbscript can't output text in color!


Reply ↓  Report •

#21
June 11, 2020 at 20:07:19
Juniper! Ok, that line is a "throwaway" that only serves to suspend output pending an input from kbd. "u" is nothing, just a receptor for the input which is also nothing but an ENTER key. In fact, "u" was not even needed on further tests: wscript.stdin.readline
will halt the script until ENTER is pressed. I don't know why your script threw an error while mine did not, and we're both running win-7. TAKE THAT LINE OUT, and redirect script output to a temp-file instead, should give you the goods. As for presentation and handling, I'm still somewhat dodgey.
Might try this approach: put each unique-duplicated (I know, sounds oxymoronic but it's not!) paragraph into a file using this vbscript. Then apply findstr using /G:file against the text-field. SO, F/e, you have "AAA" that has two dupes, you put content AAA into one file, x-1. You have "BBBBB" that has one dupe, put it into x-2. So any paragraph that is duplicated any number of times has one file as its leader. For each of these "base" files, run a findstr /G:file /O and you'll get your offset for that base-file.
At very least, the vbscript gives you a list of duplicate-content paragraphs that can be worked off of.

message edited by nbrane


Reply ↓  Report •

#22
June 16, 2020 at 05:12:29
Hi, nbrane, have done some more work on this and I think that what I've got will probably do. I have some misgivings about my coding so I would be grateful if you could cast your eye over it to improve the coding and let me know if there is a better way to code it. I feel it is one of my "Heath Robinson" efforts rather than the correct way to do it! Again as you'll see I've made no attempt at indenting being more concerned to get the script to work.

One problem which I can't resolve is that there doesn't seem to be any way to force Notepad to open the text file consistently in "word wrap" mode. The "advice" given seems to be that you should close Notepad with the settings you would like it to have by default and on closure that automatically become the default. Is that your understanding too or is there a way to enforce it?

The other snag which is more concerning is that the script will not work unless there are 3 CR/LFs (enter hit 3 times) after the last character in the text file. Any thoughts on getting round that?

As the script is for my own use I can live with these shortcomings being aware of them but it remains a pity that I can't sort them out.

Finally, very many thanks for your help as I could not have done this without it, my contribution merely being bolted on to yours.

 
Ipsum.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. nullam facilisis, mauris sit amet gravida commodo, diam massa malesuada purus, et lacinia magna libero at eros. Maecenas fermentum leo quis lacus tincidunt, sed pretium ligula fermentum. Donec vel tellus sed magna dignissim elementum. sed a magna eros. Phasellus sed semper massa, vitae luctus odio. Etiam convallis est eget aliquet gravida. Nam a venenatis turpis. Ut massa leo, porta id sapien eget, iaculis euismod est. Vivamus placerat consequat lectus sed aliquet. Suspendisse potenti.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

O sibile, si ergo. Fortibus es in ero. O nobili, deis trux! Vates enim? Causa an dux!

Phasellus ac massa mattis, gravida erat sed, dignissim leo. Duis in maximus diam. Donec ut laoreet quam. Sed finibus id neque et vestibulum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent a efficitur libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed porttitor auctor enim, tincidunt faucibus mauris viverra eu. Ut ex nisi, convallis quis dolor nec, porttitor tempor neque.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

O sibile, si ergo. Fortibus es in ero. O nobili, deis trux! Vates enim? Causa an dux!

Ut dapibus vitae nisl nec dictum. Phasellus sagittis, magna id semper tempus, sem lectus posuere odio, dapibus commodo sem ipsum et augue. Ut vel libero quam. Etiam nec interdum sapien. Vestibulum convallis vitae dui non ultrices. Curabitur arcu massa, tincidunt at egestas quis, venenatis eget mi. Vivamus vehicula vitae nisi vel iaculis. Fusce porttitor elit non nulla varius semper. Fusce elementum turpis dignissim nulla laoreet, nec porttitor ante laoreet. Nulla tempor sit amet lectus sed posuere.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
nbraneR5.vbs

Dim oShell, oExec
Set oShell = WScript.CreateObject ("WScript.Shell")
Set FSO=CreateObject("Scripting.FileSystemObject")
Set objFSO=CreateObject("Scripting.FileSystemObject")

strFileName = InputBox(vbCrLf & vbCrLf & "Enter Full Path Including File Name And Extension","Data Input")

'Read In Text File
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strFileName,1)
strFileText = objFile.ReadAll()

a = Split(strFileText, vbCrLf & vbCrLf)

arrCount = uBound(a)

p=0
alrdon=""

for i=0 to arrCount
yy=instr(strFileText,a(i))
pp=yy+len(a(i))+1
cc=1

list=i+1
for j=i+1 to arrCount

if a(j)=a(i) then
q=instr(alrdon," "&j&" ")
if q>0 then exit for
p=instr(pp,strFileText,a(j))
pp=p+1
'add into list of 'already found' to be skipped hence
alrdon=alrdon&" "&i&" "&j&" "
list=list&" "&(j+1)
cc=cc+1
yy=yy & " "&p

end if
next

if cc>1 then

strArr = Split(yy, " ")
subTotal = Ubound(strArr) + 1
'wscript.echo subTotal
for n = 0 to subTotal-1
Tally = Tally & strArr(n) & "-" & strArr(n)+len(a(i)) & ", "
next
Tally = mid(Tally,1,len(Tally)-2)

wscript.echo "The Following Paragraph Has Been Replicated At Paragraphs:  " &list & vbCrLf & vbCrLf &  a(i) & vbCrLf & vbCrLf & "The Corresponding Locations Of These Paragraphs Are:  " &  Tally
Tally = ""
objFile.Close
NewstrFileText = replace(strFileText,a(i),Ucase(a(I)))

'write strtFileText to file
Set ObjText = FSO.OpenTextFile("D:\DownloadsD\Ipsum.txt", 2,True)
ObjText.Write NewstrFileText
ObjText.Close                                                                  

'Open Text File In NotePad
Set oExec = oShell.Exec("Notepad.exe D:\DownloadsD\Ipsum.txt")

msgBox "              Click OK To Close NotePad" & vbCrLf & vbCrLf &    "                                  And" & vbCrLf & vbCrLf & "     Screen The  Up Next Set Of Matches", vbSystemModal

If oExec.Status = 0 Then oExec.Terminate()

end if 
next

'write strtextfile back as it was originally
Set ObjText = FSO.OpenTextFile("D:\DownloadsD\Ipsum.txt", 2,True)
ObjText.Write strFileText
ObjText.Close 

Set oShell = Nothing
         
wscript.echo "        No Further Matches" & vbCrLf & vbCrLf & "          Click OK To Close"



Reply ↓  Report •

#23
June 17, 2020 at 22:42:31
Thanks for the full text, that will help. Getting the right results:
Paragraphs: 2 5 8 10
Text positions: 561 1849 3189 4483
Paragraphs: 3 6
Text positions: 1298 2586
Paragraphs: 9 11
Text positions: 3926 5220

test: findstr /o /I /b /c:"sed nibh" ipsum.txt
test: findstr /o /I /b /c:"o sibile" ipsum.txt
test: findstr /o /I /b /c:"nulla at enim" ipsum.txt

I'll get back tomorrow with more details as to output.


Reply ↓  Report •

#24
June 18, 2020 at 23:34:07
Here's round 9, and counting. This seemed to work as you intended. I left out some of your code that I didn't understand the reason for, but mostly it's yours, with tweaks:
Dim oShell, oExec
Set oShell = WScript.CreateObject ("WScript.Shell")
Set FSO=CreateObject("Scripting.FileSystemObject")
' Same thing as above: Set objFSO=CreateObject("Scripting.FileSystemObject")

strFileName = InputBox(vbCrLf & vbCrLf & "Enter Full Path Including File NameAnd Extension","Data Input")

'Read In Text File Set objFile = CreateObject("Scripting.FileSystemObject").
'Had to add a concluding delimiter to get thing right in the array
strFileText=FSO.OpenTextFile(strFileName,1).ReadAll()&vbcrlf&vbcrlf

a = Split(strFileText, vbCrLf & vbCrLf)

arrCount = uBound(a)

p=0
alrdon=""
xx=0
pp=1
for i=0 to arrCount
cc=0
list=""
yy=""
pp=1
for j=i to arrCount
if a(j)=a(i) then
  q=instr(alrdon," "&j&" ")
  if q>0 then exit for
  'add into list of 'already found' to be skipped hence
  alrdon=alrdon&" "&j&" "
  p=instr(pp,strFileText,a(j))
  pp=p+len(a(j))+1
  list=list&" "&j
  cc=cc+1
  yy=yy&p&" "
end if
next

if cc>1 then
wscript.echo "Paragraphs: "&list
wscript.echo "Text positions: "&yy
wscript.echo "examining the text..."
'-----------------------

strArr = Split(yy, " ")
'subtract one due to trailing space ie phantom null concluding element cause error
 subTotal = Ubound(strArr)-1
Tally=""
for n = 0 to subTotal
 Tally = Tally & strArr(n) & "-" & (strArr(n)+len(a(i)))& ", "
next
Tally = mid(Tally,1,len(Tally)-2)

wscript.echo "The Following Paragraph Has Been Replicated At Paragraphs:  "&list & vbCrLf & vbCrLf &  a(i) & vbCrLf & vbCrLf & "The Corresponding Locations Of These Paragraphs Are:  " &  Tally
Tally = ""
NewstrFileText = replace(strFileText,a(i),Ucase(a(i)))
'write strtFileText to file
' Here, saw no reason to write then re-write. So put it into temp
 Set ObjText = FSO.OpenTextFile("temp.txt", 2,True)
 ObjText.Write NewstrFileText
 ObjText.Close

'Open Text File In NotePad
'Switched oShell.exe to Run to allow for the "wait till done" option
'Set oExec = oShell.Exec("Notepad.exe temp.txt")
' use "3" for maximized window, use "1" for standard window. you can manually
' change the notepad window regardless...
' the 'True' causes the vbscript to wait till notepad is closed before proceeding. Just use std closing alt-F4 etc to close notepad.
oShell.run "Notepad.exe temp.txt",1,True

'msgBox "              Click OK To Close NotePad" & vbCrLf & vbCrLf &    "
'And" & vbCrLf & vbCrLf & "     Screen The  Up Next Set Of Matches",vbSystemModal

'If oExec.Status = 0 Then oExec.Terminate()
end if
next

'Zapped all this, didn't see the reason, but I might have missed something
'write strtextfile back as it was originally
' Set ObjText = FSO.OpenTextFile("D:\DownloadsD\Ipsum.txt", 2,True)
' ObjText.Write strFileText
'ObjText.Close

'Set oShell = Nothing

'wscript.echo "        No Further Matches" & vbCrLf & vbCrLf & "Click OK To Close"

message edited by nbrane


Reply ↓  Report •

#25
June 20, 2020 at 03:15:16
nbrane, not sure what's gone wrong here but it is not identifying the paras or their location correctly. Notebook opens but instead of displaying the text of "ipsum" with the replications in uppercase, notebook presents a blank page.

The reason why there was a lot of writing and reading of "ipsum" was that the reversion from uppercase to lowercase before identifying the next new paragraph match, meant that the paragraph was all in lowercase not the sentence case it should have been in. However, thinking about it, I realised that I would be silly to risk using what would be the original document if something went badly wrong or got corrupted. By copying the original document and using the copy means that it would not really matter if the paragraph was all lowercase.


Reply ↓  Report •

Ask Question