Clojure China

数据如何转换 ?

#1

输入文件规则:

  1. @sheet是关键词, 后面是sheet名字。sheet名字不重复。
  2. 没有@sheet关键词的行,以逗号分割

输入文件的一个例子:

@sheet user-info
北京,张三,男,海淀
北京,张三,男,海淀,王五

@sheet income-detail
北京,张三,男,海淀,java,golang
北京,张四,女,海淀,java,nodejs

@sheet category
品类一,业务,分区一
品类一,业务,分区一

对应的输出希望是:

[
{:name“user-info" 
:myrow  [
["北京" "张三" "男" "海淀"]
["北京" "张三" "男" "海淀" "王五"]
]}

{:name "income-detail"
:myrow: [["北京" "张三" "男" "海淀" "java" "golang"]
["北京" "张四" "女" "海淀" "java" "nodejs"]
]}

{:name "category"
:myrow: [
["品类一" "业务" "分区一"
["品类一" "业务" "分区一"]
]}
]

输出结果是 asheet的向量,
其中asheet 定义为:

    (defrecord asheet name myrow])
其中myrow是二维向量,如: [["北京" "张三" "男" "海淀"] ["北京" "张三" "男" "海淀" "王五"]]

请问一下,用clojure代码如何实现这个转换?

#2

大概是这样吧,如果用instaparse之类的,可以短很多。

(require '[clojure.java.io :as io]
         '[clojure.string :as str])

(defrecord asheet [name myrow])

(defn head-line? [line]
  (str/starts-with? line "@"))

(defn add-item [acc name]
  (conj acc (map->asheet {:name name :myrow []})))

(defn add-myrow [acc new-row]
  (letfn [(append-row [item] (update item :myrow conj new-row))]
    (update acc (dec (count acc)) append-row)))

(defn parse-line [acc line]
  (cond
    (head-line? line)
    (let [name (second (str/split line #" "))]
      (add-item acc name))

    :else
    (let [new-row (str/split line #",")]
      (add-myrow acc new-row))))

(defn parse [filename]
  (->> filename
       io/file
       io/reader
       line-seq
       (remove #(str/blank? (str/trim %)))
       (reduce parse-line [])))

;;; Usage:
(comment
  (parse "resources/input.txt"))
#3

非常感谢。这个方案把clojure的纯函数体现出来了,函数也没有副作用。

#4

用 update-in 代码还能简化点。

(defn add-myrow [acc new-row]
  (letfn [(append-row [item] (update item :myrow conj new-row))]
    (update acc (dec (count acc)) append-row)))

修改为:

(defn add-myrow [acc new-row]
    (update-in acc [(dec (count acc)) :myrow] conj new-row))
#5

instsparse 版本的实现,代码没有减多少,主要是里面的 mergeObject方法放不到 insta/transform里面。

(require '[clojure.string :as str] 
         '[instaparse.core :as insta])
  
(def log-parser
  (insta/parser
    "S = ((sheet / <blankrows> / rows) <nextline>)* (sheet / <blankrows> / rows)?
     sheet = <sheetname> <whitespace> word <#'.*'>
     sheetname= '@sheet'
     blankrows = #'( )+'   
     nextline = #'(\n)+'
     rows = #'.*'   
     word = #'\\S+'    
     whitespace = #'\\s+'"))

(defn parseResult [filename]    
 (rest (insta/transform 
   {:rows  (fn [line] (str/split line #",")),
    :sheet (fn [[_ sheetname]] {:name sheetname})} 
    (log-parser 
    (slurp filename)))))   

(defn mergeObject [acc element]
  (cond
    (map? element)
     (conj acc element)
    (vector? element)
     (update-in acc [(dec (count acc)) :myrow] conj element)
    :else
     acc))

;;; Usage:
(comment 
  (reduce mergeObject [] (parseResult "/home/ping/temp/source2.txt")))