Thursday, 30 May 2013

Assuring Scraping Success with Proxy Data Scraping

Have you ever heard of "Data Scraping?" Data Scraping is the process of collecting useful data that has been placed in the public domain of the internet (private areas too if conditions are met) and storing it in databases or spreadsheets for later use in various applications. Data Scraping technology is not new and many a successful businessman has made his fortune by taking advantage of data scraping technology.

Sometimes website owners may not derive much pleasure from automated harvesting of their data. Webmasters have learned to disallow web scrapers access to their websites by using tools or methods that block certain ip addresses from retrieving website content. Data scrapers are left with the choice to either target a different website, or to move the harvesting script from computer to computer using a different IP address each time and extract as much data as possible until all of the scraper's computers are eventually blocked.

Thankfully there is a modern solution to this problem. Proxy Data Scraping technology solves the problem by using proxy IP addresses. Every time your data scraping program executes an extraction from a website, the website thinks it is coming from a different IP address. To the website owner, proxy data scraping simply looks like a short period of increased traffic from all around the world. They have very limited and tedious ways of blocking such a script but more importantly -- most of the time, they simply won't know they are being scraped.

You may now be asking yourself, "Where can I get Proxy Data Scraping Technology for my project?" The "do-it-yourself" solution is, rather unfortunately, not simple at all. Setting up a proxy data scraping network takes a lot of time and requires that you either own a bunch of IP addresses and suitable servers to be used as proxies, not to mention the IT guru you need to get everything configured properly. You could consider renting proxy servers from select hosting providers, but that option tends to be quite pricey but arguably better than the alternative: dangerous and unreliable (but free) public proxy servers.

There are literally thousands of free proxy servers located around the globe that are simple enough to use. The trick however is finding them. Many sites list hundreds of servers, but locating one that is working, open, and supports the type of protocols you need can be a lesson in persistence, trial, and error. However if you do succeed in discovering a pool of working public proxies, there are still inherent dangers of using them. First off, you don't know who the server belongs to or what activities are going on elsewhere on the server. Sending sensitive requests or data through a public proxy is a bad idea. It is fairly easy for a proxy server to capture any information you send through it or that it sends back to you. If you choose the public proxy method, make sure you never send any transaction through that might compromise you or anyone else in case disreputable people are made aware of the data.

A less risky scenario for proxy data scraping is to rent a rotating proxy connection that cycles through a large number of private IP addresses. There are several of these companies available that claim to delete all web traffic logs which allows you to anonymously harvest the web with minimal threat of reprisal. Companies such as http://www.Anonymizer.com offer large scale anonymous proxy solutions, but often carry a fairly hefty setup fee to get you going.

The other advantage is that companies who own such networks can often help you design and implementation of a custom proxy data scraping program instead of trying to work with a generic scraping bot. After performing a simple Google search, I quickly found one company (www.ScrapeGoat.com) that provides anonymous proxy server access for data scraping purposes. Or, according to their website, if you want to make your life even easier, ScrapeGoat can extract the data for you and deliver it in a variety of different formats often before you could even finish configuring your off the shelf data scraping program.

Whichever path you choose for your proxy data scraping needs, don't let a few simple tricks thwart you from accessing all the wonderful information stored on the world wide web!


Source: http://ezinearticles.com/?Assuring-Scraping-Success-with-Proxy-Data-Scraping&id=248993

Monday, 27 May 2013

Required knowledge: Data Scraping, CMS

The work will be as follows:


Automatically needs to update/scrape from Site:http://www.moviefone.com/dvd/coming-soon:infinitely for all coming soon videos. and upload to myrentalstation.com at homepage under coming soon section.




Insert all new video info from the site : title, image, release date


Get the information(all content) for each of these videos and update it to site.


Add features to allow users to vote until day of release date and make it ip protected so that only one vote per ip. [they need not sign in to vote, this improves number of votes]




Voting options:


These option to be available in cms: Display,Hide


Will rent


Will not rent


Will rent on DVD,


Will rent on Blu-ray,


Will not rent


Watched in theather and did not like it,


Watched in theather and loved it but will not rent


Watched in theather and loved it & will rent to watch again.




Add feature to allow to make these options visible or hidden in the cms admin panel


Show number of votes to all users and update after every vote.


Allow admin to add votes if required for a particular movie and type


Always show/sort by closest new release dates first on home page


Source: http://allforfreelancers.com/scriptlance/2010/12/Website-Scripting-scrapping-autoupdate-Data-Scraping-CMS-1291575970.html

Friday, 24 May 2013

Database builder faces web-scraping lawsuit

A US company faces a copyright, trespass and trade secrets lawsuit because it 'scraped' the website of a rival on behalf of a client. The case underlines the legal uncertainty surrounding the practice.

Website 'scraping' is the practice of automatically taking information from a website and can be used to retrieve the contents of entire back-end databases from other websites.

The legality of scraping is unclear in the UK and the US. Uncertainty still surrounds the degree to which it is copyright infringement, hacking, a violation of database rights or a breach of other laws.

Snap-on Business Solutions hopes that an Ohio court agrees with it that scraping is a violation of several laws. It has lodged a claim against O'Neil Associates over activity surrounding Mitsubishi's moving of outsourced work from Snap-on to O'Neil.

Snap-on built a parts database for Mitsubishi so that dealers could access spare parts. It later moved the work to O'Neil and asked Snap-on for the database, which it saw as its property.

Snap-on, though, said that Mitsubishi would have to pay an extra fee to be given a copy of the database it had built.

O'Neil told Mitsubishi that it could 'scrape' the website to retrieve all the elements of the database. Mitsubishi gave it login details so that this could happen. Snap-on claims that this constituted an unlawful access to its database and unlawful copying of it.

Snap-on and O'Neil are competitors in the business of building spare parts databases for manufacturers and Snap-on had constructed the database from paper catalogues, manuals and photographs supplied by Mitsubishi.

When Mitsubishi was discussing it taking over the database work O'Neil said that it could scrape the Snap-on site. Mitsubishi gave O'Neil 30 logon details to reduce the chance of detection of the scraping, an employee said in testimony to the Ohio court.

That scraping process caused the Snap-on site to crash, though, and the company, realising what had happened, sued O'Neil.

Snap-on claims that O'Neil broke parts of the Computer Fraud and Abuse Act when it "caused damage" by unlawfully accessing its computers. O'Neil said that it was authorised by Mitsubishi to access Snap-on's computers.

The Ohio court was being asked to issue a summary judgment on the case and it said that there was enough doubt about whether or not Mitsubishi had the right to authorise that access that the case must proceed to a full trial.

Snap-on also said that O'Neil broke contract law by violating the terms of an end user agreement that came into effect each time it accessed the website. O'Neil disputed that it was party to any agreement, and the judge said that a trial would be necessary on this issue, as well as the issues of whether the activity broke trespass and copyright laws.

If the activities had happened in the UK a court would also have to consider whether or not it broke the database rights created by EU law. This right gives protection separate to copyright to databases and was brought into force to protect the investment companies would have to make to compile a database.

A ruling from the European Court of Justice in 2004 cast doubt on exactly how much protection database law gives to the back end of websites. Bookmaker William Hill successfully argued at the European Court of Justice that the British Horseracing Board (BHB) could not protect its database of races using the database right.

The Court agreed with the gambling company that the database was a by-product of the BHB's main activity, which was organising horse races. Making the database, then, did not demand the kind of effort that required the law's protection, it successfully argued.

However, a ruling from the European Court of Justice in 2008, on a case involving an anthology of German poetry, and a ruling last year in the English High Court, suggest that web scraping could still fall foul of the database right.

Struan Robertson, a technology lawyer at Pinsent Masons, the law firm behind OUT-LAW.COM, said that scraping can be challenged in the UK on a number of grounds.

"The database right may be the strongest argument against web scraping, but there can be other arguments, including copyright infringement, breach of contract and breach of the Computer Misuse Act," he said. "Some businesses will scrape other sites until they're told to stop, because they gamble that nobody will mind, and if they do, they probably won't sue, provided the demand to stop is fulfilled. To carry on in spite of such a demand, though, is to take a pretty big risk."


Source: http://www.out-law.com/page-10975

Thursday, 16 May 2013

Android Apk Giveaway Moviefone - Movies & Showtimes 1.9.21.1

Moviefone - Movies & Showtimes is one of the most interesting applications amongst the android apps. This application is absolutely very useful for your android device; besides, this application is providing some very interesting features that will satisfy you. The Moviefone - Movies & Showtimes 1.9.21.1 is also one of the latest developed applications. This application is especially designed to fulfill your need of a sophisticated gadget in your daily activities. You can get the Moviefone - Movies & Showtimes 1.9.21.1 application in the Google Play Store. You can visit Google Play Tore from your android device, then download Moviefone - Movies & Showtimes and install it for your android device. The application will be very useful for supporting your android device. Moviefone - Movies & Showtimes 1.9.21.1’s Interesting Features Moviefone is the best app for movie showtimes, trailers, reviews, exclusive movie clips and news.

Love movies? Trying to find movie showtimes? Moviefone is for you. Find movies and showtimes near you, view movie trailers and get movie info. Ready to go to the movies? Get maps and directions to theaters across the U.S (tap the red location bar to change zip codes). Buy tickets to the latest flicks - just look for the red "ticket" next to the theater and purchase your tickets through Fandango without leaving the app.


• Buy tickets on the go to any of our participating theaters (now including AMC)
• Watch high quality trailers and movie clips
• Enjoy Moviefone's celebrity bios, photos and filmographies
• Install Moviefone on your SD Card so that you can optimize your device's storage space
• Get to the theater easily with our maps and directions feature
• Movies database is updated daily
• Search for movies by actor or title

Note: If you have Facebook 1.5 installed, Moviefone will honor your Facebook login using the new Facebook single sign-on feature.

Movie showtime and theater data currently available for U.S. locations only.
Newest Features:

 Bug fixes!   Moviefone - Movies & Showtimes as the latest application for your android has various features that will be very useful for your android device. It is very helpful to support your daily activities with an interesting and useful application in your android devices, whether the android samrtphone as well as android tablet pc. These are the main features of Moviefone - Movies & Showtimes.   Get Moviefone - Movies & Showtimes 1.9.21.1 Now Get these interesting features in your android device by installing Moviefone - Movies & Showtimes 1.9.21.1 in your android device by following this link. You can access Google Play right through your android device or you can DOWNLOAD it from this site and install it in your android device.

Source: http://www.androidapk.info/2013/01/android-apk-giveaway-moviefone-movies.html

Monday, 6 May 2013

Interesting Database Scraping Case Survives Summary Judgment--Snap-On Business Solutions v. O'Neil

Snap-on Business Solutions Inc. v. O'Neil & Assocs., Inc. (N.D. Ohio April 16, 2010) [scribd]

Snap-on is one of those cases that's great because the court canvasses the various claims that come into play in the increasingly common scenario when someone accesses a computer or network to extract data following termination of (or outside of) a contractual relationship. (The practice of extracting data from a website is commonly known as 'scraping'.) The court punts based on the existence of factual disputes, but the court's order is well worth a read just because it lays out the issues and theories.

The background facts are straightforward. Mitsubishi hired Snap-on to build a database of parts data which Mitsubishi dealers could then access online. Mitsubishi provided the underlying documents and images (parts information) to Snap-on, who converted them and built a "searchable database with linked data and images." At some point, Mitsubishi decided to move the parts database over to O'Neil, instead of Snap-on. When Mitsubishi asked for a copy of the database, Snap-on predictably declined. Snap-on told Mitsubishi that Mitsubishi could have the database, but would have to pay an extra fee. Meanwhile, O'Neil, Mitsubishi's new vendor suggested that it could extract the data from Snap-on's servers using O'Neil "scraper tool." O'Neil ran the scraping program, and used log-ins provided by Mitsubishi in the process of gathering the data. According to testimony from Snap-on, O'Neil's access of Snap-on's website caused Snap-on's website to "crash" in at least one instance.

Snap-on sued O'Neil (and interestingly not Mitsubishi) alleging Computer Fraud and Abuse Act, trespass to chattels, unjust enrichment, breach of contract, copyright infringement, and misappropriation of trade secrets.

Computer Fraud and Abuse Act: The key question on the Computer Fraud and Abuse Act claim was whether O'Neil's access of the website was "without authorization." The court held that the underlying agreement between Mitsubishi and Snap-on did not clearly resolve the question of whether Mitsubishi could authorize O'Neil to access Snap-on's website and servers and, whether even assuming Mitsubishi had this ability, Mitsubishi somehow lost it.

I think the court came to the correct conclusion on whether the access was without authorization. There's a split of authority in the employment context as to whether an employee's access to the employer's servers for the employee's own purposes constitutes "unauthorized access," but this case doesn't implicate that scenario. (Jeff Neuburger covers the 9th Circuit's recent ruling in LVRC Holdings, LLC v. Brekka, which acknowledges this split.) Here, the parties had an agreement, and the only viable argument by O'Neil on the unauthorized access issue was that Mitsubishi had authorized O'Neil to access Snap-on's computers and servers. (Since you had to log-in to access the website, O'Neil could not argue that Snap-on impliedly authorized everyone (including search engines) to access its site.) The terms of the agreement between the parties would resolve this issue and the agreement didn't provide a definitive answer, at least at the summary judgment stage.

Trespass to Chattels: Snap-on also asserted a trespass claim based on damage or temporary deprivation of the ability to use its servers. The court also declined to resolve this issue on summary judgment, finding that Snap-on presented sufficient evidence to find that O'Neil's unauthorized access caused Snap-on's servers to crash and "deprived Snap-on of their use for a substantial time.

O'Neil argued that copyright law preempts Snap-on's trespass claim. The court summarily (and in a conclusory fashion) rejects this argument, finding that Snap-on's argument seeks to protect the integrity of its computer servers, rather than its "possessory interest in the [software] or accompanying database."

Unjust Enrichment: The court finds that Snap-on's unjust enrichment claims were preempted by the Copyright Act since Snap-on failed to provide any evidence as to how the unjust infringement claims were based on rights distinct from Snap-on's rights as a copyright owner.

Breach of Contract: Snap-on also asserted a claim for a breach of its end user license agreement. The court declined to dismiss this claim based on the existence of factual dispute as to whether the parties entered the EULA and whether O'Neil breached it. Surprisingly, Snap-on's website required a log-in but only contained a statement that "[the] use of and access to the information on [Snap-on's] site is subject to the terms and conditions set out in [Snap-on's] legal statement." Snap-on did not users to check the box, acknowledging that they read and agreed to the end user terms.

Copyright Infringement: Snap-on knew it had an uphill battle on the copyright claim for a few reasons. First, much of the material (such as the images) is owned by Mitsubishi to begin with. Second, it's tough for anyone to argue that pricing and parts information is copyrightable. With this in mind, Snap-on argued that the "database structure" is entitled to copyright protection and Snap-on owned the copyrights in the structure.

The court went through the Feist analysis. In Feist, the court held that a "factual compilation is eligible for copyright if it features an original selection or arrangement of facts, but the copyright is limited to the particular selection or arrangement. In no event may copyright extend to the facts themselves." Lower courts have applied Feist and found that databases containing facts may be copyrightable. O'Neil argued that the "arrangement" or the database structure was obvious and was thus not entitled to copyright protection. The court again agrees with Snap-on that factual disputes preclude summary judgment on copyrightability and ownership.

The court's conclusion on the copyright issue seemed the most problematic. Even if Snap-on owned some part of the underlying arrangement or database structure, did O'Neil "copy" the structure, or otherwise exercise any rights exclusive to the copyright owner? This is a tough sell. Also, on the ownership issue, I would think Mitsubishi would have a colorable argument that even if it didn't own the copyright, it should be treated as a joint owner along with Snap-on?

Trade Secrets: Finally, the court also declines to grant summary judgment on Snap-on's trade secrets claims. I'm not sure what trade secrets Snap-on is using to support its claim, and I'm skeptical that any trade secrets exist here that O'Neil misappropriated. However, given that the court declined to grant summary judgment on the other claims, it wasn't the end of the world for the court to let this claim go to the jury as well.
__

Apart from canvassing the various legal theories that come into play in this type of a factual scenario, the case also offers a few practice pointers.

First, if someone is hosting or storing data for you, it makes sense to have a provision in the agreement that allows you to get access to the data at the termination of the relationship, regardless of any contractual dispute that may arise between the parties. The party with physical access to the data will have leverage as a practical matter, and this is the type of thing contractual language should address. As a last resort, the party who may be in a position to extract the data should have an unbridled ongoing right to extract the data during the course of the relationship. The agreement should also have a notice and breach provision that would prevent the summary denial or revocation of authorization.

Second, I'm surprised there wasn't a clear ownership clause in the agreement that said Mitsubishi owns the underlying data, the database structure, and any copyrightable elements in the database. A determination by the court that Snap-on owns some copyrights in the database structure could cause problems down the road for Mitsubishi. There are many reasons why it made sense for Mitsubishi to own the data, and Snap-on doesn't have much of a business justification for owning the data because it can't use it at the termination of the relationship. As a last resort, Mitsubishi should have had a broad license to the data.

Third, the agreement should contain terms allowing Mitsubishi to authorize third parties the right to access Snap-on's servers and any copyrighted material, at least for back-up and archiving purposes.

Fourth, if you are a website that is looking to prevent scraping, ownership of the underlying data and restrictions on access (such as a log-in) help significantly. Professor Goldman's comments below highlight that scraping is problematic from a legal standpoint. However, two things that bolstered Snap-on’s claims are its ownership of the data and the fact that O’Neil accessed the site through a log-in which it wasn’t clearly authorized to use. This, coupled with the fact that Snap-on was in physical possession of the data at the termination of the relationship, pretty much put it in the driver’s seat.

Finally, Snap-on's contract formation process could have been cleaner. Where you have a situation involving access of a website for a business purpose (where the person is accessing data that they need) there's much less risk of people declining to access your website based on additional hurdles in the form of click throughs or check the box. In the consumer setting, websites often weigh certainty of contract formation against customer conversion, but this isn't really present in Snap-on's case. I guess what I'm saying in a long-winded way was that Snap-on should have implemented a mandatory, non-leaky, clickthrough, as discussed in Professor Goldman's post covering Scherillo v. Dun & Bradstreet.

Source: http://blog.ericgoldman.org/archives/2010/04/court_denies_su_1.htm

Thursday, 2 May 2013

Web Data Extraction Services India- Web Scraping, Web Site Data Extraction

Web Data Extraction is process of extracting data specifically from targeted web pages and web sites and structuring the information as per clients business needs.

Some of the most typical data which are looked now a day are as follows:

    Financial information data extraction
    Real estate data extraction
    Sales leads
    Data extraction of auction websites
    Data extraction of e commerce websites
    Email, contact information data extraction
    Data extraction from job portals and websites
    Web credit card data extractor
    Site map extraction
    Scraping it information
    Scraping images from websites
    Image scraping

Other information which are primarily extracted from the websites are URL’s, meta descriptions, product descriptions, services description, phone numbers, zip cods, email id, fax numbers etc.

Once the web data extracting company extracts the relevant data from the targeted websites it is converted to structured formats like Microsoft access database, html, excel, xml, my sql etc to enhance the business prospective of the clients and helping them to get most updated database which in turns helps to take active and accurate marketing and business decisions.

Data Extraction Services one the industrial leader of data extraction services and is well known for quality web data extraction and webscrape outsourcing services provider at most economical prices to companies in countries like USA, UK, Australia, New Zealand, Canada, France, Russia and many more..

To get the quality web scraping and web data extraction services outsource web scraping project requirement expert web data extractor company in India. Contact us on http://www.dataextractionservices.com/contact-us.php

Source: http://dataextractionservicesindia.blogspot.in/2011/07/web-data-extraction-services-india-web.html

Note:

Delta Ray is experienced web scraping consultant and writes articles on web data scraping, website data scraping, data scraping services, web scraping services, website scraping, eBay product scraping, Forms Data Entry etc.