Handling/managing 10TB of data
by Rebenga - Friday March 7, 2025 at 11:36 PM
#1
"I do not own 10tb of data"


But if i was to own it, how can i manage it eg (create a search engine) locally.
Whats the best practice for this, frameworks to use etc..
what specs will i need ?


Note:
I need something good, as fast as mongo but with less ram it's ok if its a bit slower something that would mainly rely on the SSDs Nvme
We don't do anything illegal whatsoever Smile
Reply
#2
If you have a small machine. just stick to grep and patience
Otherwise mongo or casandra.
Ideally Redis for lightning fast responses but requires a shitload of ram
Reply
#3
(03-08-2025, 10:52 AM)eVee Wrote: If you have a small machine. just stick to grep and patience
Otherwise mongo or casandra.
Ideally Redis for lightning fast responses but requires a shitload of ram

Yeah man those are just too much ram, my budget just cant handle it.
im trying to make this work on 64gb ram 4tb ssd if i can somehow compress/index the data
We don't do anything illegal whatsoever Smile
Reply
#4
(03-08-2025, 06:53 PM)Rebenga Wrote:
(03-08-2025, 10:52 AM)eVee Wrote: If you have a small machine. just stick to grep and patience
Otherwise mongo or casandra.
Ideally Redis for lightning fast responses but requires a shitload of ram

Yeah man those are just too much ram, my budget just cant handle it.
im trying to make this work on 64gb ram 4tb ssd if i can somehow compress/index the data
Any form of compression will slow it down.
To make it work, you can look at casandra with LZ4 compression enabled. not sure if 4tb will be quite enough.  I think lz4 will compress it down to about 6 or 7tb
Reply
#5
Don't forget that in order to have redundancy, if you want to store 10TB of data you'll need 20TB of storage. Look into RAID technologies to see which one is the most apropriate for your use case. Smile
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Any UK data leaks consisting of NHS medical records? Like psychiatrist reports? GenT3xture 0 171 07-30-2025, 05:16 PM
Last Post: GenT3xture
  Opinions on data set formats (JSON, CSV) pinksauce 9 358 04-14-2025, 07:43 PM
Last Post: pinksauce
  Let's start the hunt for FRANCE DATA ! suicid 19 1,651 04-11-2025, 04:05 PM
Last Post: ShelterTurtle
  Crypto24 data sharing ? SpaceCat 0 191 04-09-2025, 03:47 PM
Last Post: SpaceCat
  Mark Zuckerberg about collecting Data mzkz 14 399 04-05-2025, 01:39 AM
Last Post: termit

Forum Jump:


 Users browsing this thread: