File chunker
Messages   Related Types
This message was discovered on microsoft.public.dotnet.general.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.
Post a new message to this list...

shiva (VIP)
Anyone know any component or samples on how to chunk a big file (approx 20MB)
to smaller files to handle memory ? The file is a text file with fixed
length, doesn't have any segment seperators.

Reply to this message...
 
    
Morten Wennevik
Hi shiva,

This piece of code should split a file into smaller files based on how many bytes the smaller files should contain.
Note that if you use unicode you would probably need to use a BinaryReader/Writer to read and write whole Characters.

FileStream fsR = File.Open("MyFile", FileMode.Open);            

int size = 1000000; // file size of the smaller files
int count = 0; // a counter to determine when size is reached
int i = 0; // used for storing bytes
int n = 0; // used to give unique names for each file

string filename = "c:\\test.fl"; // base filename

FileStream fsW = File.Create(filename + n); // initial file

while((i = fsR.ReadByte()) != -1) // while there are bytes
{
    if(count >= size) // if count has reached size, time
    {             // to create a new file
        n++;
        fsW.Close();
        fsW = File.Create(filename + n);
        count = 0;
    }
    fsW.WriteByte((byte)i); // write the byte that was taken
    count++;            // from the original file
}

fsW.Close();
fsR.Close();

--
Happy coding!
Morten Wennevik [C# MVP]
Reply to this message...
 
    
shiva (VIP)
Thanks much Morten :)

My next challenge is how do i get a complete record out of this chunking
which was i worried. Once i chunk it, i need to map the data to an XML.

Below is my sample data. After the chunking, the smaller piece of the file
should contain an entire family data.

Sample data:
ABCDE DIAGNOSTICS Stevens Teresa
ABCD Lakeview Drive
Noblesville, IN 44444 3333333333177762444
F030819580108200412021996 N 01082004
EC
555555555Stevens Michael W

01082004 01082004M00484690306041985
C N
ABCDE DIAGNOSTICS Gabriel Jason
MMMMMM Echo Trail
Indianapolis, IN 55555 6666666663178262999
M093019700101200410292001 N 01012004
FA
055608717Gabriel Stacy L

01012004 01012004F29064573305171972
S N
055608717Gabriel Taylor A

01012004 01012004F31421914109131999
C N
055608717Gabriel Ashley M

01012004 01012004F30525265307102001
C N

"Morten Wennevik" wrote:

[Original message clipped]

Reply to this message...
 
    
Morten Wennevik
On Thu, 9 Sep 2004 08:55:02 -0700, shiva <Click here to reveal e-mail address> wrote:

[Original message clipped]

Then you need to somehow read enough to know when you have an entire family, dump that to a file and read in the next family. Know of some mark that indicates the beginning or the end of the family read in a chunk of data

The pseudocode would be something like this

while(end of file not found)
{

    do
    {
        read an array of bytes,
        search for family marker
        add this chunk to other chunks stored in memory        
    }
    while(family marker not found && end of file not found)

    dump the family to file, keeping the extra bytes from the last chunk not belonging to this family

    if(file size is above or nearing the limit)
        create new file
}

--
Happy coding!
Morten Wennevik [C# MVP]
Reply to this message...
 
    
shiva (VIP)
Thanks much again!, that was an excellent idea :)

In reality, since i am reading the big file for the first time, before even
splitting it up it is running out of memory.

Now, i am thinking i can't clear out a well full of water on one shot, and i
have to get it bucket by bucket.

Is there any API using which i get read only 1 MB at a time for example
instead of the entire file?

Thank you,
Shiva

"Morten Wennevik" wrote:

[Original message clipped]

Reply to this message...
 
 
System.IO.BinaryReader
System.IO.File
System.IO.FileMode
System.IO.FileStream




Ad
MBR BootFX
Best-of-breed application framework for .NET projects, developed by Matthew Baxter-Reynolds and MBR IT
 
 Copyright © Matthew Baxter-Reynolds 2001-2008. '.NET 247 Software Development Services' is a trading style of MBR IT Solutions Ltd.
Contact Us - Terms of Use - Privacy Policy - www.dotnet247.com