|
|
Our Hot Pick: Rising Antivirus 2006 - Certified by TUV & Checkmark! Get 10% discount by entering this coupon code: ONDISCOUNT10
I'm inheriting a SharePoint-based system that has thousands of InfoPath documents scattered across hundreds of WSS sites. Business users are now starting to complain that they want to be able to do some ad-hoc reports on the data contained in all of these documents. SharePoint essentially stores these documents in a big blob of text, far from being an efficient structure to query. What approaches have people used to report on this data? I can think of a couple ways:
1. Promote all of the fields in the document so that SharePoint will store them in them separately in the database. Then I can do some ETL to load this data into a data warehouse that users can query.
2. Add logic to the InfoPath forms to write the data to a normalized database in addition to saving the file. This way I'll have my data in an easy format to extract to a DW.
3. Keep doing things the way I am now but create a custom ETL process that parses the XML documents out of the SharePoint database and loads it into a DW.
Any thoughts? Thanks in advance.
|
|
Depending on the complexity of reporting you're looking for, you may want to go with either of the routes you mentioned: 1) Simplest scenarios - promote a few properties. This will help you get to the flat fields and do some aggregation. 2) Moderate scenarios - save stuff necessary for reporting into a relational database, and do reporting from there. More complex, requires writing web services. 3) Very complex scenarios - save entire XML forms as blobs to SQL server 2005 instead of SharePoint. SQL Server 2005 supports XML data type, which allows you to go digging through the data using XQuery (very powerful schema-aware search and reporting).
"Bill" wrote:
[Quoted Text] > I'm inheriting a SharePoint-based system that has thousands of InfoPath > documents scattered across hundreds of WSS sites. Business users are now > starting to complain that they want to be able to do some ad-hoc reports on > the data contained in all of these documents. SharePoint essentially stores > these documents in a big blob of text, far from being an efficient structure > to query. What approaches have people used to report on this data? I can > think of a couple ways: > > 1. Promote all of the fields in the document so that SharePoint will store > them in them separately in the database. Then I can do some ETL to load this > data into a data warehouse that users can query. > > 2. Add logic to the InfoPath forms to write the data to a normalized > database in addition to saving the file. This way I'll have my data in an > easy format to extract to a DW. > > 3. Keep doing things the way I am now but create a custom ETL process that > parses the XML documents out of the SharePoint database and loads it into a > DW. > > Any thoughts? Thanks in advance.
|
|
Thank you Alex. I'll probably go with the first approach, but I'm interested in learning more about the third approach. What's a good way to get SQL Server 2005 to store these XML documents as an XML data type with the least amount of effort? Should I just take the binary data in the SharePoint tables and move it into another table with an XML data type field? I guess I could set up a trigger to do the work for me automatically. Is anyone doing this now?
"[MSFT] AlexWein" wrote:
[Quoted Text] > Depending on the complexity of reporting you're looking for, you may want to > go with either of the routes you mentioned: > 1) Simplest scenarios - promote a few properties. This will help you get to > the flat fields and do some aggregation. > 2) Moderate scenarios - save stuff necessary for reporting into a relational > database, and do reporting from there. More complex, requires writing web > services. > 3) Very complex scenarios - save entire XML forms as blobs to SQL server > 2005 instead of SharePoint. SQL Server 2005 supports XML data type, which > allows you to go digging through the data using XQuery (very powerful > schema-aware search and reporting). > > "Bill" wrote: > > > I'm inheriting a SharePoint-based system that has thousands of InfoPath > > documents scattered across hundreds of WSS sites. Business users are now > > starting to complain that they want to be able to do some ad-hoc reports on > > the data contained in all of these documents. SharePoint essentially stores > > these documents in a big blob of text, far from being an efficient structure > > to query. What approaches have people used to report on this data? I can > > think of a couple ways: > > > > 1. Promote all of the fields in the document so that SharePoint will store > > them in them separately in the database. Then I can do some ETL to load this > > data into a data warehouse that users can query. > > > > 2. Add logic to the InfoPath forms to write the data to a normalized > > database in addition to saving the file. This way I'll have my data in an > > easy format to extract to a DW. > > > > 3. Keep doing things the way I am now but create a custom ETL process that > > parses the XML documents out of the SharePoint database and loads it into a > > DW. > > > > Any thoughts? Thanks in advance.
|
|
For the third approach - implementation is eally not super-difficult; you just need to some custom code (in a web service or inside your form) to store the entire XML of the form in an InfoPath table.
Basically, you need to: 1) Define a schema for your data and create a sample filled out form. 2) Create a SQL table with the following structure: ID autonumber, XML myform [note that SQL Server 2005 natively supports XML columns] 3) Save the sample filled out form to the table you just created. 4) Setup some logic to retrieve and submit XML blobs. In the demo that I built, I'm using a web service with two methods - getForm(formid) and setForm(xml, formid). 5) Hook up an InfoPath form to the logic you built in step 4. In my demo, since I'm using a web service, I just started a new form based on that web service. 6) Configure your form to submit and query for data properly 7) Setup logic to create new records - for me it was just a third web service method.
This method is very new, and I don't know of anyone presently using this in a production environment; I have this setup end-to-end in a demo - we've shown it at a couple of developer conferences this year. Feel free to follow up with questions!
Also, et me talk a bit more about what you can achieve if you implement this. 1) No need to shred the XML data into relational structures - this means that when your schema changes, you don't need to re-create tables and worry about migrating data to a new relational schema. 2) Go into your forms' data with XQuery. You could search the filled out forms using a context-aware search - "show me all status reports that have Joe as the author" (much better than a full-text search for Joe for obvious reasons) 3) Taking this a step further - imagine doing joins between data you get from forms using XQuery and your other relational tables. Very, very powerful. 4) If you're worried about performance of #3 (and you should be :-)) - you could "promote" certain frequently used XPaths from your XMLs into relational columns, set up indexes on those columns, and do the joins. This would mean that you need to write a trigger to do the promotion, though - or have this done as a part of your submit logic.
Thanks, Alex
Bill wrote:
[Quoted Text] > Thank you Alex. I'll probably go with the first approach, but I'm interested > in learning more about the third approach. What's a good way to get SQL > Server 2005 to store these XML documents as an XML data type with the least > amount of effort? Should I just take the binary data in the SharePoint tables > and move it into another table with an XML data type field? I guess I could > set up a trigger to do the work for me automatically. Is anyone doing this > now? >
|
|
Can you provide a link to a demo of this? I am thinking about trying this out and would like to see it come together in a demo to see if it will fit.
alex94040[ at ]gmail.com wrote:
[Quoted Text] > For the third approach - implementation is eally not super-difficult; > you just need to some custom code (in a web service or inside your > form) to store the entire XML of the form in an InfoPath table. > > Basically, you need to: > 1) Define a schema for your data and create a sample filled out form. > 2) Create a SQL table with the following structure: > ID autonumber, > XML myform [note that SQL Server 2005 natively supports XML columns] > 3) Save the sample filled out form to the table you just created. > 4) Setup some logic to retrieve and submit XML blobs. In the demo that > I built, I'm using a web service with two methods - getForm(formid) and > setForm(xml, formid). > 5) Hook up an InfoPath form to the logic you built in step 4. In my > demo, since I'm using a web service, I just started a new form based on > that web service. > 6) Configure your form to submit and query for data properly > 7) Setup logic to create new records - for me it was just a third web > service method. > > This method is very new, and I don't know of anyone presently using > this in a production environment; I have this setup end-to-end in a > demo - we've shown it at a couple of developer conferences this year. > Feel free to follow up with questions! > > Also, et me talk a bit more about what you can achieve if you implement > this. > 1) No need to shred the XML data into relational structures - this > means that when your schema changes, you don't need to re-create tables > and worry about migrating data to a new relational schema. > 2) Go into your forms' data with XQuery. You could search the filled > out forms using a context-aware search - "show me all status reports > that have Joe as the author" (much better than a full-text search for > Joe for obvious reasons) > 3) Taking this a step further - imagine doing joins between data you > get from forms using XQuery and your other relational tables. Very, > very powerful. > 4) If you're worried about performance of #3 (and you should be :-)) - > you could "promote" certain frequently used XPaths from your XMLs into > relational columns, set up indexes on those columns, and do the joins. > This would mean that you need to write a trigger to do the promotion, > though - or have this done as a part of your submit logic. > > Thanks, > Alex > > > Bill wrote: > > Thank you Alex. I'll probably go with the first approach, but I'm interested > > in learning more about the third approach. What's a good way to get SQL > > Server 2005 to store these XML documents as an XML data type with the least > > amount of effort? Should I just take the binary data in the SharePoint tables > > and move it into another table with an XML data type field? I guess I could > > set up a trigger to do the work for me automatically. Is anyone doing this > > now? > >
|
|
|