How to view a very huge csv data file ?
by curiousbeast - Friday August 16, 2024 at 06:24 PM
#31
I find that visualizing 10GB of data is a gui is mostly useless. I agree with masedan, and use the command line. The examples that are provided are excellent.

Another idea is to  throw it all into an ElasticSearch instance or Postgresql database and do searches on those. If you need help, PM me, I can help you get some scripts to push the data to either of those.

Another way is to split the file up based on the common search criterea. For examples, if you have data that contains a "State" field, you could write a script to reach each line and place it in a CSV named for each state. That way, when you are searching, rather than searching the entire dataset, you will search only the needed state, which should make things a lot faster.

Yet another way is to create an index for common search terms, Like if there is an email address field, you can md5 that field and store the md5 along with the offset into the larger file that contains that record. Then, store the MD5+Offset record in order in a file. That way, if you search an email address, you can check to see if it exists in the index very quickly O(log N) time and retrieve the exact offset. Then a quick fseek to that offset and read until end of line from that location and there you  go.

All to say is that there are a ton of ways to solve theissue.. but looking at this in a GUI is rarely useful.

My two cents.
Reply
#32
use grep -riE "NAME1.*NAME2" /Desktop/user/...

this is for search into high volume files, if you want to organize big files, could you convert the .csv to .db with a easy python script, and you can get all the data in columns and with a clean presentation
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Statement about Bouygues Recent Data Breach xzin0vich 0 125 08-07-2025, 06:26 PM
Last Post: xzin0vich
  REQUEST 108 million Turkish citizens personal data mivore 55 9,925 08-05-2025, 04:28 PM
Last Post: egge
  REQUEST looking for The Sepah Bank Data " Bank Sepah Iran Customers" 12 TB of Data leak thelorenzo 0 174 08-04-2025, 06:03 AM
Last Post: thelorenzo
  Im looking Spain data KingAnonymous 0 172 08-04-2025, 03:01 AM
Last Post: KingAnonymous
  2023 Turkish Citizenship data Dublage 15 4,900 08-03-2025, 09:50 AM
Last Post: egge

Forum Jump:


 Users browsing this thread: 1 Guest(s)