Split and reassemble almost doubles the file size

Using the pdf editor control

Split and reassemble almost doubles the file size

Postby charu vasudev » Tue Mar 09, 2010 8:55 am

Hi,

I am trying to Split one multipage Pdf file into single pages and merge that splited pdfs' to a single PDF file using following methods. I am using the latest version.

Original file details: Size=3.50MB, Pages:80 (Pl. find attachment)

Extract/split Obeservation: process time: 2 sec, size of all 80 files=5.02MB
Merge Obeservation: Process Time:1 Sec, Size =8.63MB

Need assitance on:

1. Is it correct method for split/Merge?
2. If extracted pages total size is 5.02MB then why merged pdf file size increased?
3. How can I reproduce file after extracting to nearby its original size?


private sub ExtractPages()

Dim f As New OpenFileDialog
f.Filter = "PDF Files|*.pdf"
If f.ShowDialog = Windows.Forms.DialogResult.OK Then
'Label6.Text = Now
'Label6.Update()
'Dim fl As New FileInfo(f.FileName)
'Label20.Text = Math.Round(fl.Length / 1024, 3)
'Label20.Update()

v.Load(f.FileName, "")
For i As Int32 = 1 To v.pageCount
v.extractPagesToFile(Application.StartupPath & "\Extracted\Extract_Sample" & i.ToString & ".pdf", i)
''v.Save(Application.StartupPath & "\Extract_Sample" & i.ToString & ".pdf")
Next
'Label5.Text = Now
'Label5.Update()
'Label4.Text = DateDiff(DateInterval.Second, CType(Label6.Text, Date), CType(Label5.Text, Date))
''
MsgBox("Done")
End If

end sub

private sub MergePages()

Dim d As New DirectoryInfo(Application.StartupPath & "\Extracted")
Dim f() As FileInfo
f = d.GetFiles()
For i As Int32 = 0 To f.Count - 1
v.insertPagesFromPDF(f(i).FullName, "", 1, VSPDFEditorX.TxVSPDFInsertPagesPosition.VSPDF_INSERTPAGES_AFTER_CURRENTPAGE)
Next
v.Save(Application.StartupPath & "\Merge\test.pdf")


' Label8.Text = Now
' Label8.Update()
' Label7.Text = DateDiff(DateInterval.Second, CType(Label9.Text, Date), CType(Label8.Text, Date))

' Dim fl As New FileInfo(Application.StartupPath & "\Merge\test.pdf")
'Label19.Text = Math.Round(fl.Length / 1024, 3)
'Label19.Update()

MsgBox("done")

end sub

Thanks n Rgds
Charu
Attachments
orignal_baroda.pdf
(3.68 MiB) Downloaded 25 times
charu vasudev
 
Posts: 108
Joined: Thu Oct 04, 2007 12:46 pm

Re: Split and reassemble almost doubles the file size

Postby nick.visagesoft » Tue Mar 09, 2010 10:30 am

Hi sharu,

The reason for increased merged file size is the pdf objects been used in the origianl, while such objects (like fonts, images) are shared in the original document, when merging each object is been created again.
Lets say that we have 3 fonts been used in the original document, and that each font takes about 2M, when extracting the font been used from the page is extracted as well so we end up with a pdf file size of lets say 3M (2M fonts and 1M for other objects).
When merging the merge process take all objects from the document and insert them in the original document resulting larger file sizes in other words the 2M font is duplicated as many times needed from imported pdfs.
There is no way for the merge process to identify objects that are exactly the same and exclude them or whatever.

Hope this helps
nick.visagesoft
 
Posts: 507
Joined: Fri Jul 27, 2007 1:33 pm

Re: Split and reassemble almost doubles the file size

Postby charu vasudev » Wed Mar 10, 2010 9:26 am

Hi Nick,

Thank you for your quick response. Thinking about a work around, we have tried to delete pages from the original pdf file. However, the size of the pdf doesnot change even if we remove all but one page from it. In other words, the size of my file of 80 pages is almost same as this new file which has just one page in it. Can you help me with this?

Rgds
Charu
charu vasudev
 
Posts: 108
Joined: Thu Oct 04, 2007 12:46 pm

Re: Split and reassemble almost doubles the file size

Postby nick.visagesoft » Wed Mar 10, 2010 9:31 am

Delete pages will also increase size :-(
Updates with pdf document are incremental updates, in other words when saving/updating you will always end up with increased file size. This is the way pdf document structure handles updates.

I could implement a new save method which will save all pdf document objects without using incremental updates if this is of any help let me know and i'll see what i can do, this new save method will be much slower that incremental updates since all objects needs to be parsed and saved back to disk.
nick.visagesoft
 
Posts: 507
Joined: Fri Jul 27, 2007 1:33 pm

Re: Split and reassemble almost doubles the file size

Postby charu vasudev » Thu Mar 11, 2010 9:33 am

Yes..if that is possible...we would like to have the new save method asap.

Thanks n Rgds
Charu
charu vasudev
 
Posts: 108
Joined: Thu Oct 04, 2007 12:46 pm

Re: Split and reassemble almost doubles the file size

Postby nick.visagesoft » Fri Mar 12, 2010 10:06 am

Charu,

Please download new build from
http://www.visagesoft.com/downloads/get ... ditorx.zip

The new build implements a new method named SaveEx(filename, mode) where mode is enum type of
VSPDF_SAVE_UPDATES (for incremental updates)
VSPDF_SAVE_REASSEBLE (for full save)

you will need a call like saveex('my.pdf', VSPDF_SAVE_REASSEBLE)

Apply some tests and let me know if you run into problems
nick.visagesoft
 
Posts: 507
Joined: Fri Jul 27, 2007 1:33 pm


Return to Using

Who is online

Users browsing this forum: No registered users and 1 guest

cron