class MsCsv
When Microsoft's Excel exports a file as a CSV, there is hardly a
convention that will not be violated. Fields that contain newlines will be
exported in quotes and the newlines will not be escaped. They will be
written just as one "\x0a". Yet, the records
themselves still are separated by MS newlines, that is
"\x0d\x0a".
Constants
- QUOTE
- SEP
- VERSION
Attributes
Public Class Methods
The sep and quote parameters are
";" and "\"" by default.
More than one character may be specified.
# File lib/mscsv.rb, line 50 def initialize file, sep = nil, quote = nil @file = file @sep, @quote = sep||SEP, quote||QUOTE end
Public Instance Methods
Return record with fields converted to several types. In case any of the
fields in a record may not be converted legally, the err
variable will contain the exception and the fields will be returned as
strings. At present, following types may be demanded:
s => string S => stripped string, nil if empty n => integer c => currency # class +Currency+ must respond to +parse+ (*) d => date # class +Date+ must respond to +strptime+ (*) t => time # class +Time+ must respond to +parse+ (*) b => boolean (*) choose the appropriate +require+ yourself.
Example:
MsCsv.open "somefile.csv" do |f| f.each_as "ndsscb" do |r,d| puts r.length.inspect + " " + r.inspect unless d end end
# File lib/mscsv.rb, line 240 def each_as recdef case recdef when String then recdef = recdef.scan /\S/ end d = recdef.map { |f| FORMATS[ f] } each_notempty do |r| begin i = 0 r = (d.zip r).map do |(fmt,fld)| i += 1 if fld then f = fmt.new fld f.val end end rescue r = r.map { |x| x.notempty? } err = "#{i}: #$!" end yield r, err end end
Same as each_record except that records containing only
nil fields will be skipped.
# File lib/mscsv.rb, line 105 def each_notempty each_record { |r| yield r unless r.compact.empty? } end
Iterate through the CSV file. The fields will be returned as strings or
nil. A line consists of at least one field, so an empty line
will yield a one-element array containing nil.
# File lib/mscsv.rb, line 62 def each_record unless defined? Encoding then require "iconv" @iconv = Iconv.new "utf-8", @encoding||"ms-ansi" end while l = read_line_utf8 do record, field = [], "" until l =~ /^$/ do c = l.eat 1 if @sep.include? c then record.push field field = "" elsif @quote.include? c then q = c while l.notempty? or (l = read_line_utf8) do c = l.eat 1 if c == q then d = l.head 1 if q == d then field << q l.eat 1 else break end else field << c end end else field << c end end record.push field yield record end end
Open a file. The file may only be read as I refuse to write such weird formats.
Your guess is right what the sep and quote
parameters mean.
# File lib/mscsv.rb, line 29 def open name, sep = nil, quote = nil File.open name do |f| i = new f, sep, quote yield i end end
Private Instance Methods
# File lib/mscsv.rb, line 265 def read_line_utf8 l = @file.readline if @iconv then l = @iconv.iconv l l.gsub! /\xc2\xa0/, " " else l.force_encoding @encoding||Encoding::Windows_1252 l.encode! Encoding::UTF_8 l.gsub! "\u00a0", " " end l rescue EOFError end