Preferred Method of Exploring Large Datasets
by countdown1945 - Tuesday April 30, 2024 at 03:27 AM
#1
What is everyone's preferred method of exploring large datasets and dumps? So far I've had luck using linux terminal (grepping, awk'ing, etc.,).

Loading really huge dumps will inevitably make my shit freeze.

What's your preferred method of exploring datasets?
Reply
#2
For medium sized data a simple Python Pandas solution (or even excel) is enough but for large data you can’t process it in a single computer, a cluster is needed.
Reply
#3
I like, first, analyze what type of data it contains and, once you figured out, put in a local database. For me is simple and easy dump all the data in a database/table (with the proper format) and do selects or whatever you need to read the data. Although depends on the size of the dump, usually I could use grep/awk/cli commands to extract the data, but if the dump is large, the best option is throwing to a database
Reply
#4
This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.
Reply
#5
(04-30-2024, 03:37 PM)user512 Wrote: This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.

So as far as I'm aware, the process to importing an sql dump in postgresql is fairly straightforward:

1. Create a database 
2. Sign in to database
3. Use: \i /path/to/your/dump_file.sql;
4. Use queries to find relevant data
Reply
#6
(05-01-2024, 12:21 AM)countdown1945 Wrote:
(04-30-2024, 03:37 PM)user512 Wrote: This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.

So as far as I'm aware, the process to importing an sql dump in postgresql is fairly straightforward:

1. Create a database 
2. Sign in to database
3. Use: \i /path/to/your/dump_file.sql;
4. Use queries to find relevant data

Amazing, thank you. I'll start trying to play around with that. 

Generally, what's considered a large data set? I expected 15 gb to be pretty huge but was able to navigate reasonably well with notepad++
Reply
#7
emeditor to the rescue. That is what is mostly use by people here
Reply
#8
for windows i use Select-String at powershell and for linux grep and cut
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do you open very large DBs? foxbf01 9 415 08-04-2025, 02:06 AM
Last Post: mikikiki
  Looking for method to convert raw mysql files to csv metzelplix 0 350 12-19-2024, 05:14 PM
Last Post: metzelplix
  Do you do waxing? Which method do you use to wax your entire body and intimate parts? Ater 5 779 11-26-2024, 06:30 AM
Last Post: Hermit64
  Efficient Methods for Searching Large Text and Compressed Files breachxyz 3 433 10-06-2024, 03:45 AM
Last Post: Breach_Forums
  To Tor First or VPN First? Exploring the Order for Enhanced Anonymity Aurora 126 19,380 08-12-2024, 07:28 PM
Last Post: ijustwanttheleak

Forum Jump:


 Users browsing this thread: 1 Guest(s)