Author Topic: Computer people: I think what I am after is "regular expressions"..  (Read 1243 times)

0 Members and 1 Guest are viewing this topic.

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
I have a large text file (~250,000 lines) and I want to extract all lines that begin with a specific string of characters.  I am using Notepad++ and the search/replace dialog has many options that are probably useful for this purpose, but I am not knowledgeable about them.

Offline Tomislav95

  • Posts: 6309
  • Gender: Male
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #1 on: July 05, 2016, 01:58:48 PM »
Try with ^string.*$\r\n (replace it with nothing)
EDIT: And if you need your source file don't forget to save new file with different name.
...the years just pass like trains
I wave but they don't slow down...

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #2 on: July 05, 2016, 02:03:19 PM »
Sorry, by "extract" I didn't mean "remove from file".  I want to either copy them to the clipboard, or otherwise isolate them.  There are 72 lines of interest, and I need to do this to 2 files.  They aren't hard to find, but ctrl+F, copy, paste, and repeat 144 times is a bit tedious.

Offline Tomislav95

  • Posts: 6309
  • Gender: Male
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #3 on: July 05, 2016, 02:06:09 PM »
Sorry, by "extract" I didn't mean "remove from file".  I want to either copy them to the clipboard, or otherwise isolate them.
Why don't you just copy it into new tab and then revert changes in source file? Or I'm not getting something :lol
...the years just pass like trains
I wave but they don't slow down...

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #4 on: July 05, 2016, 02:48:49 PM »
WHat would I copy into the new tab?  I'm not concerned with altering the source file.  I can regenerate it pretty easily.

Offline Sacul

  • Spinettapilled
  • DTF.org Alumni
  • ****
  • Posts: 12156
  • Gender: Male
  • ¿De qué sirvió haber cruzado a nado la mar?
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #5 on: July 05, 2016, 03:14:04 PM »
Maybe just delete the lines that don't start with those characters and copy what's left on another file :P

Offline Tomislav95

  • Posts: 6309
  • Gender: Male
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #6 on: July 06, 2016, 12:42:47 AM »
Ok, I don't know why I said to replace it when you need the opposite :lol
Just use ^string.* in find tab. Select Find All In All Opened Documents (if you want to search in both). Then tab with results will open, from there you can easily select all and copy.
...the years just pass like trains
I wave but they don't slow down...

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #7 on: July 06, 2016, 06:34:35 AM »
Thanks for the tip.  It looks like the exact syntax is ^.*[string].*$

This isolates the lines in a separate results view in notepad++ where I can just copy and paste elsewhere.

:tup


Offline Tomislav95

  • Posts: 6309
  • Gender: Male
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #8 on: July 06, 2016, 07:50:20 AM »
Thanks for the tip.  It looks like the exact syntax is ^.*[string].*$

This isolates the lines in a separate results view in notepad++ where I can just copy and paste elsewhere.

:tup
That will find every line containing string because .* means anything zero or more times.
...the years just pass like trains
I wave but they don't slow down...

Offline rumborak

  • DT.net Veteran
  • ****
  • Posts: 26664
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #9 on: July 06, 2016, 03:08:29 PM »
On Linux this would be as easy as

egrep "^thestring" file.txt

In Notepad++, not sure.
"I liked when Myung looked like a women's figure skating champion."

Offline Stadler

  • DTF.org Alumni
  • ****
  • Posts: 43408
  • Gender: Male
  • Pointing out the "unfunny" since 2014!
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #10 on: July 06, 2016, 04:20:48 PM »
Nerds.

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #11 on: July 06, 2016, 07:06:55 PM »
Now I feel like I know something.


*I don't really know regular expressions.  Only this ad hoc solution for my problem...

Offline Orbert

  • Recovering Musician
  • EZBoard Elder
  • *****
  • Posts: 19267
  • Gender: Male
  • In and around the lake
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #12 on: July 07, 2016, 12:49:05 PM »
I love that comic.

I'm such a nerd.  I saw "Computer people" and the question, and my first thought was to just write a program to do it.  If Notepad can do it, cool, but by time I learned the syntax, I'd have the program done.

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #13 on: July 08, 2016, 12:20:28 PM »
I need to learn programming for this reason.  I come across this sort of thing often.  My job is stress analysis, but it seems most of the time I am doing more bookkeeping than analysis.  I've become more efficient, but learning how to overcome things like this would really put me into another level, and allow me to focus more of my time on the analysis, which is really what matters.

When you say you would write a program to do this, where would you actually "write" it, and how would it be compiled/executed?  What language (which I suppose is a very open question)?

Offline Orbert

  • Recovering Musician
  • EZBoard Elder
  • *****
  • Posts: 19267
  • Gender: Male
  • In and around the lake
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #14 on: July 08, 2016, 01:07:56 PM »
I'm a programmer by profession, so I sit at a computer all day anyway.  I currently use SAS, which is perfect for this kind of thing.  The language was specifically designed to handle large amounts of data quickly.  There's no looping; every statement is assumed to be executed on every line of input.

1 Open input file.
2 Open output file.
3 Read in a line from the input file (implied - you don't even have to write this).  If the first x characters match the pattern, write it to the output file.

If I understand the OP correctly, that's all you're asking for, right?  I would literally have the program done in less than a minute.

Offline Tomislav95

  • Posts: 6309
  • Gender: Male
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #15 on: July 08, 2016, 01:18:22 PM »
I don't know if you're familiar with Linux but if you have one you could easily learn how to use bash in Linux termina. It can be used for executing scripts but also as interactive program. 
...the years just pass like trains
I wave but they don't slow down...

Offline jasc15

  • Posts: 5026
  • Gender: Male
  • TTAL: Yeti welcome
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #16 on: July 08, 2016, 07:31:56 PM »
I'm a programmer by profession, so I sit at a computer all day anyway.  I currently use SAS, which is perfect for this kind of thing.  The language was specifically designed to handle large amounts of data quickly.  There's no looping; every statement is assumed to be executed on every line of input.

1 Open input file.
2 Open output file.
3 Read in a line from the input file (implied - you don't even have to write this).  If the first x characters match the pattern, write it to the output file.

If I understand the OP correctly, that's all you're asking for, right?  I would literally have the program done in less than a minute.
That is what I am asking in this particular instance, and using regular expressions in notepad++ worked quite well once I knew the syntax.  I'm a mechanical engineer and I deal with large data files generated by a finite element analysis program, but have no training or experience in programming.  The closest thing I've done is MATLAB, which uses a high level language and relatively intuitive syntax once you learn it.  With regard to my finite element analysis models, I often reconfigure them to investigate different different design configurations and this generates a lot of data very quickly.  The workflow can be pretty tedious with every iteration, and my main analysis tool is excel, into which I enter the data from these large files I mentioned.

Also, I love xkcd too, and learning a programming language would be worth it solely for understanding more of Randall Munroe's jokes.
« Last Edit: July 08, 2016, 07:37:44 PM by jasc15 »

Offline Orbert

  • Recovering Musician
  • EZBoard Elder
  • *****
  • Posts: 19267
  • Gender: Male
  • In and around the lake
Re: Computer people: I think what I am after is "regular expressions"..
« Reply #17 on: July 10, 2016, 07:17:02 PM »
It's weird.  Since I'm a programmer and have been for a while, it's hard to remember what it was like before.  When I was teaching, it seemed perfectly natural to just write something in Pascal -- which I was teaching at the time -- to handle all the grades and generate notices and status reports and stuff.  It's what I do.  Problem solving via data manipulation.  Nowadays I'd probably just do it in SAS.  Not really the perfect language for it, but it's what I "think in" right now.

Working with regular expressions, to me, is similar to programming.  There's obviously a syntax to it, a set of codified rules, but it's finite, therefore it can be learned and mastered.

I'm not a regular xkcd reader, but I've seen a lot of their stuff (it's impossible not to) and it's usually pretty funny.  And I can see what you mean about the inside programmer jokes.  Fun stuff.