database - DB bulk validation and upload -
i designing application involve bulk upload of records postgres db (lets call schema db-1). uploads done every week. size range few million billion records. data going uploaded needs validated/cleansed first need conform constaints , format of db-1. thinking of adopting following approach:
- everytime new upload needs done, new schema created (lets call db-2 - staging place) same db-1 but lenient constraints. make sure data gets loaded in db-2 start with.
- run validation process on data. thinking middleware process when realized amount of data processed, kind of started thinking coding validation+cleansing layer in db - set of stored procs run on db-2, check data , generate report records not conform rules (ie constaints present in db-1, data format etc).
- after this, data needs changed again @ source, step 1 repeated , if looks ok, select db-1 db-2 shift valid data final desitnation.
what opinion on above process? obvious/hidden issues see here? suggestions make better welcome.
thanks
j
Comments
Post a Comment